Skip to content

fix(llmobs): handle None metadata in experiment dataset records#17729

Open
asaxena2019 wants to merge 1 commit intomainfrom
anushka/fix-llmobs-experiment-none-metadata
Open

fix(llmobs): handle None metadata in experiment dataset records#17729
asaxena2019 wants to merge 1 commit intomainfrom
anushka/fix-llmobs-experiment-none-metadata

Conversation

@asaxena2019
Copy link
Copy Markdown

Summary

  • Handles dataset records whose metadata field is explicitly None in LLMObs experiments (previously crashed with TypeError: 'NoneType' object is not a mapping)
  • Root cause: dict.get("metadata", {}) returns the stored None when the key is present, so the subsequent {**record_metadata, ...} spread crashes
  • Fix: record.get("metadata") or {} at every read site in ddtrace/llmobs/_experiment.py (four sites)
  • Adds a regression test that constructs an in-memory dataset with metadata=None and runs the full task → evaluator → summary-evaluator flow

Context

Several ddeval projects across dd-source currently fail during ddeval run with:

❌ Evaluation failed: 'NoneType' object is not a mapping

The failure reproduces against a bare ddeval run (no custom tooling) on multi-claim-example, single-claim-example, synthetics-critical-endpoint-selector, and others. The full stack trace lands at ddtrace/llmobs/_experiment.py:2004:

File "ddtrace/llmobs/_experiment.py", line 2004, in _prepare_summary_evaluator_data
    metadata_list.append({**record_metadata, "experiment_config": self._config})
TypeError: 'NoneType' object is not a mapping

This cropped up after ddeval 0.0.109272488 (released 2026-04-23) which JSON-decodes pulled dataset records in place. Before that fix, the error was 'str' object is not a mapping (metadata was the string "null"); after, the decoded value is Python None — which dict.get("metadata", {}) happily returns unchanged, and **None then crashes.

The upstream ddeval fix was necessary but exposed a latent None-safety bug on this side of the boundary.

Change details

Every record.get("metadata", {}) in _experiment.py becomes record.get("metadata") or {}:

Line Site
1652 Dataset.as_dataframe() — previously guarded by isinstance(metadata, dict), now consistent
2003 _prepare_summary_evaluator_datathe crash site
2240 Per-record task argument plumbing (user-visible dict passed to tasks)
2361 Per-record evaluator context (combined_metadata spread)

Test plan

  • New regression test test_experiment_run_summary_evaluators_handles_none_metadata in tests/llmobs/test_experiments.py
    • Constructs an in-memory Dataset with metadata=None (no backend fixture)
    • Runs _run_task_run_evaluators_run_summary_evaluators with raise_errors=True
    • Asserts no error surfaces from the summary evaluator
  • Verified the test fails on main at _experiment.py:2004 with the exact production error, and passes with the fix applied
  • Backwards-compatible: records with metadata=None, missing metadata, and populated metadata all coerce to the same shape downstream
  • Full LLMObs experiment suite (backend-fixture tests skipped locally; CI will cover)

Release note

Added at releasenotes/notes/fix-llmobs-experiment-none-metadata-*.yaml.

Dataset records whose `metadata` field is `None` (a common shape for records
serialized from JSON with an explicit `null`) caused
`TypeError: 'NoneType' object is not a mapping` in the LLMObs experiment
pipeline. `dict.get("metadata", {})` returns the stored `None` rather than
the default `{}` when the key is present, so the subsequent
`{**record_metadata, "experiment_config": self._config}` spread crashed.

Fix every `record.get("metadata", {})` read in `_experiment.py` to use
`record.get("metadata") or {}`, which correctly coerces missing, absent, and
explicitly-None values to an empty dict. Four sites were affected:
- Dataset.as_dataframe() flattening (previously guarded by `isinstance`, now
  consistent with the others)
- _prepare_summary_evaluator_data (the main crash site reported by multiple
  ddeval projects)
- per-record task argument plumbing
- per-record evaluator context building

Adds a regression test (`test_experiment_run_summary_evaluators_handles_none_metadata`)
that constructs an in-memory Dataset with `metadata=None`, runs the full
task → evaluator → summary-evaluator flow, and asserts no crash. The test
fails on main at `_experiment.py:2004` without this fix and passes with it.

Impact: unblocks ddeval projects across dd-source (multi-claim-example,
single-claim-example, synthetics-critical-endpoint-selector, and others)
that currently fail with `'NoneType' object is not a mapping` when pulling
datasets via LLMObs' `pull_dataset` since metadata is JSON-decoded to Python
None at that layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@asaxena2019 asaxena2019 requested review from a team as code owners April 24, 2026 20:22
@cit-pr-commenter-54b7da
Copy link
Copy Markdown

Codeowners resolved as

ddtrace/llmobs/_experiment.py                                           @DataDog/ml-observability
releasenotes/notes/fix-llmobs-experiment-none-metadata-7d4e2f9a8c1b3e5f.yaml  @DataDog/apm-python
tests/llmobs/test_experiments.py                                        @DataDog/ml-observability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants