[Feature] Make adapters mandatory for generated data submissions #145

tommasocerruti · 2026-05-22T15:22:04Z

tommasocerruti
May 22, 2026
Maintainer

Context: Related to issue #144.

Making adapters mandatory by design provides many benefits for data integrity and reproducibility (see #144). However, it creates a significant problem: authors who only have access to aggregated results won't be able to contribute.

I opened this thread to brainstorm how we can best navigate the tradeoff between reproducibility/data integrity and contribution friction.

tommasocerruti · 2026-05-22T15:27:13Z

tommasocerruti
May 22, 2026
Maintainer Author

We can handle this tradeoff cleanly by enforcing it directly in the JSON schema itself.

By adding an ingestion_method field, we can use JSON Schema's native conditional logic to dynamically require adapter_path only when the data is generated via an adapter. For authors uploading aggregated results directly, the field becomes optional.
Here is how we could patch the root level of our current v0.2.2 schema to support this:

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "version": "0.2.2",
    "type": "object",
    "description": "Schema for storing and validating LLMs evaluation data...",
    "required": [
        "schema_version",
        "evaluation_id",
        "retrieved_timestamp",
        "source_metadata",
        "model_info",
        "eval_library",
        "evaluation_results",
        "ingestion_method" 
    ],
    "additionalProperties": false,
    "properties": {
        // ... (existing properties) ...
        
        "ingestion_method": {
            "type": "string",
            "enum": ["adapter_generated", "manual_aggregation"],
            "description": "Indicates whether the data was parsed via an adapter or manually provided as aggregated results."
        },
        "adapter_path": {
            "type": "string",
            "description": "Path or identifier for the adapter script used. Required if ingestion_method is adapter_generated."
        }
    },
    "if": {
        "properties": {
            "ingestion_method": { "const": "adapter_generated" }
        }
    },
    "then": {
        "required": ["adapter_path"]
    }
}

This ensures we maintain strict reproducibility and data-integrity checks for standard pipelines, while explicitly accommodating the edge case of authors contributing personal aggregated results.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Make adapters mandatory for generated data submissions #145

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Feature] Make adapters mandatory for generated data submissions #145

Uh oh!

tommasocerruti May 22, 2026 Maintainer

Replies: 1 comment

Uh oh!

tommasocerruti May 22, 2026 Maintainer Author

tommasocerruti
May 22, 2026
Maintainer

tommasocerruti
May 22, 2026
Maintainer Author