Below is a revamped proposal for the type_specificity signal.
Type Specificity (type_specificity)
The type_specificity signal evaluates how strongly the API models its data, instead of treating everything as loosely typed strings.
Formula:
type_specificity = ( Σ field_specificity ) / total_fields
Where:
total_fields is the total number of schema properties considered
- for each schema property, an implementation MUST compute a
field_specificity score in the range [0, 1].
Total Fields (total_fields)
The implementation MUST first derive a set of candidate fields from the OpenAPI Description. Conceptually, total_fields is the count of all leaf schema fields after reference resolution, flattening objects into their properties, and arrays into their item schemas.
Fields SHOULD include:
- JSON Schema properties in
components.schemas.*.properties.*.
- Inline
schema.properties.* under request/response bodies.
- Parameter schemas (path, query, header, cookie).
- Header schemas (response headers).
- Request and response body root schemas (if primitives).
- Webhook/request-like schemas where applicable.
Rules:
$ref MUST be resolved before evaluation.
type: object fields are NOT counted directly; instead, their child properties are evaluated.
type: array fields MUST be classified based on their items schema.
Fields that are purely structural markers (for example, empty objects used as containers) MAY be excluded.
total_fields MUST be the count of all such leaf fields after flattening objects/arrays.
Field Specificity (field_specificity)
field_specificity is a score in the range of [0, 1] according to the following rules:
Categorical Fields (field_specificity = 1.0)
Any field which allows an enum or const with an enum or const defined.
Semantic Strings (field_specificity = 1.0)
Strings that convey strong semantics via a registered format or a structural pattern:
type: string with a format whose JSON Data Type is string in the OpenAPI Format Registry (for example: uuid, email, ipv6, date)
type: string with a non-trivial pattern defined (e.g., not .* or equivalent)
This allows string identifiers like UUIDs, emails, IPs, and other registry-backed formats to achieve the maximum specificity score.
Strongly Modelled Scalars (field_specificity = 1.0)
Numeric fields that are both typed with a recognised format and constrained:
type: integer or type: number, with a numeric format whose JSON Data Type is number in the OpenAPI Format Registry (for example: int32, int64, float, double, decimal)
- have at least one numeric constraint:
minimum, maximum, exclusiveMinimum, exclusiveMaximum, or multipleOf
Moderately Modelled Scalars (field_specificity = 0.9)
Scalars that are meaningfully modelled, but missing either format or bounds:
integer / number with any numeric constraint but no recognised format
integer / number with recognised format but no numeric constraints
type: string with boundaries defined (e.g., minLength / maxLength), but no recognised string format and no pattern
Basic Primitives (field_specificity = 0.7)
Primitive types with no extra modelling but still inherently clearer than raw strings:
type: integer, type: number, or type: boolean without enum, const, format, or constraints
Weak Strings (field_specificity = 0.4)
Unconstrained strings that carry little machine-usable structure:
type: string with no enum, no const, no recognised string format, no pattern, no minLength, no maxLength
Below is a revamped proposal for the
type_specificitysignal.Type Specificity (type_specificity)
The
type_specificitysignal evaluates how strongly the API models its data, instead of treating everything as loosely typed strings.Formula:
Where:
total_fieldsis the total number of schema properties consideredfield_specificityscore in the range[0, 1].Total Fields (total_fields)
The implementation MUST first derive a set of candidate fields from the OpenAPI Description. Conceptually,
total_fieldsis the count of all leaf schema fields after reference resolution, flattening objects into their properties, and arrays into their item schemas.Fields SHOULD include:
components.schemas.*.properties.*.schema.properties.*under request/response bodies.Rules:
$refMUST be resolved before evaluation.type: objectfields are NOT counted directly; instead, their child properties are evaluated.type: arrayfields MUST be classified based on their items schema.Fields that are purely structural markers (for example, empty objects used as containers) MAY be excluded.
total_fields MUST be the count of all such leaf fields after flattening objects/arrays.
Field Specificity (field_specificity)
field_specificityis a score in the range of[0, 1]according to the following rules:Categorical Fields (
field_specificity = 1.0)Any field which allows an
enumorconstwith anenumorconstdefined.Semantic Strings (
field_specificity = 1.0)Strings that convey strong semantics via a registered format or a structural pattern:
type: stringwith aformatwhose JSON Data Type isstringin the OpenAPI Format Registry (for example:uuid,email,ipv6,date)type: stringwith a non-trivialpatterndefined (e.g., not.*or equivalent)This allows string identifiers like UUIDs, emails, IPs, and other registry-backed formats to achieve the maximum specificity score.
Strongly Modelled Scalars (
field_specificity = 1.0)Numeric fields that are both typed with a recognised format and constrained:
type: integerortype: number, with a numericformatwhose JSON Data Type isnumberin the OpenAPI Format Registry (for example:int32,int64,float,double,decimal)minimum,maximum,exclusiveMinimum,exclusiveMaximum, ormultipleOfModerately Modelled Scalars (
field_specificity = 0.9)Scalars that are meaningfully modelled, but missing either format or bounds:
integer / numberwith any numeric constraint but no recognised formatinteger / numberwith recognised format but no numeric constraintstype: stringwith boundaries defined (e.g.,minLength/maxLength), but no recognised stringformatand nopatternBasic Primitives (
field_specificity = 0.7)Primitive types with no extra modelling but still inherently clearer than raw strings:
type: integer,type: number, ortype: booleanwithout enum, const, format, or constraintsWeak Strings (
field_specificity = 0.4)Unconstrained strings that carry little machine-usable structure:
type: stringwith noenum, noconst, no recognised stringformat, nopattern, nominLength, nomaxLength