Skip to content

Review and improve clarity and formula for type_specificity signal #21

@frankkilcommins

Description

@frankkilcommins

Below is a revamped proposal for the type_specificity signal.

Type Specificity (type_specificity)

The type_specificity signal evaluates how strongly the API models its data, instead of treating everything as loosely typed strings.

Formula:

type_specificity = ( Σ field_specificity ) / total_fields

Where:

  • total_fields is the total number of schema properties considered
  • for each schema property, an implementation MUST compute a field_specificity score in the range [0, 1].
Total Fields (total_fields)

The implementation MUST first derive a set of candidate fields from the OpenAPI Description. Conceptually, total_fields is the count of all leaf schema fields after reference resolution, flattening objects into their properties, and arrays into their item schemas.

Fields SHOULD include:

  • JSON Schema properties in components.schemas.*.properties.*.
  • Inline schema.properties.* under request/response bodies.
  • Parameter schemas (path, query, header, cookie).
  • Header schemas (response headers).
  • Request and response body root schemas (if primitives).
  • Webhook/request-like schemas where applicable.

Rules:

  • $ref MUST be resolved before evaluation.
  • type: object fields are NOT counted directly; instead, their child properties are evaluated.
  • type: array fields MUST be classified based on their items schema.

Fields that are purely structural markers (for example, empty objects used as containers) MAY be excluded.

total_fields MUST be the count of all such leaf fields after flattening objects/arrays.

Field Specificity (field_specificity)

field_specificity is a score in the range of [0, 1] according to the following rules:

Categorical Fields (field_specificity = 1.0)

Any field which allows an enum or const with an enum or const defined.

Semantic Strings (field_specificity = 1.0)

Strings that convey strong semantics via a registered format or a structural pattern:

  • type: string with a format whose JSON Data Type is string in the OpenAPI Format Registry (for example: uuid, email, ipv6, date)
  • type: string with a non-trivial pattern defined (e.g., not .* or equivalent)

This allows string identifiers like UUIDs, emails, IPs, and other registry-backed formats to achieve the maximum specificity score.

Strongly Modelled Scalars (field_specificity = 1.0)

Numeric fields that are both typed with a recognised format and constrained:

  • type: integer or type: number, with a numeric format whose JSON Data Type is number in the OpenAPI Format Registry (for example: int32, int64, float, double, decimal)
  • have at least one numeric constraint: minimum, maximum, exclusiveMinimum, exclusiveMaximum, or multipleOf
Moderately Modelled Scalars (field_specificity = 0.9)

Scalars that are meaningfully modelled, but missing either format or bounds:

  • integer / number with any numeric constraint but no recognised format
  • integer / number with recognised format but no numeric constraints
  • type: string with boundaries defined (e.g., minLength / maxLength), but no recognised string format and no pattern
Basic Primitives (field_specificity = 0.7)

Primitive types with no extra modelling but still inherently clearer than raw strings:

  • type: integer, type: number, or type: boolean without enum, const, format, or constraints
Weak Strings (field_specificity = 0.4)

Unconstrained strings that carry little machine-usable structure:

  • type: string with no enum, no const, no recognised string format, no pattern, no minLength, no maxLength

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions