[Feature] Make adapters mandatory for generated data submissions #145
tommasocerruti
started this conversation in
Ideas
Replies: 1 comment
-
|
We can handle this tradeoff cleanly by enforcing it directly in the JSON schema itself. By adding an {
"$schema": "http://json-schema.org/draft-07/schema#",
"version": "0.2.2",
"type": "object",
"description": "Schema for storing and validating LLMs evaluation data...",
"required": [
"schema_version",
"evaluation_id",
"retrieved_timestamp",
"source_metadata",
"model_info",
"eval_library",
"evaluation_results",
"ingestion_method"
],
"additionalProperties": false,
"properties": {
// ... (existing properties) ...
"ingestion_method": {
"type": "string",
"enum": ["adapter_generated", "manual_aggregation"],
"description": "Indicates whether the data was parsed via an adapter or manually provided as aggregated results."
},
"adapter_path": {
"type": "string",
"description": "Path or identifier for the adapter script used. Required if ingestion_method is adapter_generated."
}
},
"if": {
"properties": {
"ingestion_method": { "const": "adapter_generated" }
}
},
"then": {
"required": ["adapter_path"]
}
}This ensures we maintain strict reproducibility and data-integrity checks for standard pipelines, while explicitly accommodating the edge case of authors contributing personal aggregated results. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Context: Related to issue #144.
Making adapters mandatory by design provides many benefits for data integrity and reproducibility (see #144). However, it creates a significant problem: authors who only have access to aggregated results won't be able to contribute.
I opened this thread to brainstorm how we can best navigate the tradeoff between reproducibility/data integrity and contribution friction.
Beta Was this translation helpful? Give feedback.
All reactions