fix(llmobs): handle None metadata in experiment dataset records by asaxena2019 · Pull Request #17729 · DataDog/dd-trace-py

asaxena2019 · 2026-04-24T20:22:41Z

Summary

Handles dataset records whose metadata field is explicitly None in LLMObs experiments (previously crashed with TypeError: 'NoneType' object is not a mapping)
Root cause: dict.get("metadata", {}) returns the stored None when the key is present, so the subsequent {**record_metadata, ...} spread crashes
Fix: record.get("metadata") or {} at every read site in ddtrace/llmobs/_experiment.py (four sites)
Adds a regression test that constructs an in-memory dataset with metadata=None and runs the full task → evaluator → summary-evaluator flow

Context

Several ddeval projects across dd-source currently fail during ddeval run with:

❌ Evaluation failed: 'NoneType' object is not a mapping

The failure reproduces against a bare ddeval run (no custom tooling) on multi-claim-example, single-claim-example, synthetics-critical-endpoint-selector, and others. The full stack trace lands at ddtrace/llmobs/_experiment.py:2004:

File "ddtrace/llmobs/_experiment.py", line 2004, in _prepare_summary_evaluator_data
    metadata_list.append({**record_metadata, "experiment_config": self._config})
TypeError: 'NoneType' object is not a mapping

This cropped up after ddeval 0.0.109272488 (released 2026-04-23) which JSON-decodes pulled dataset records in place. Before that fix, the error was 'str' object is not a mapping (metadata was the string "null"); after, the decoded value is Python None — which dict.get("metadata", {}) happily returns unchanged, and **None then crashes.

The upstream ddeval fix was necessary but exposed a latent None-safety bug on this side of the boundary.

Change details

Every record.get("metadata", {}) in _experiment.py becomes record.get("metadata") or {}:

Line	Site
1652	`Dataset.as_dataframe()` — previously guarded by `isinstance(metadata, dict)`, now consistent
2003	`_prepare_summary_evaluator_data` — the crash site
2240	Per-record task argument plumbing (user-visible dict passed to tasks)
2361	Per-record evaluator context (`combined_metadata` spread)

Test plan

New regression test test_experiment_run_summary_evaluators_handles_none_metadata in tests/llmobs/test_experiments.py
- Constructs an in-memory Dataset with metadata=None (no backend fixture)
- Runs _run_task → _run_evaluators → _run_summary_evaluators with raise_errors=True
- Asserts no error surfaces from the summary evaluator
Verified the test fails on main at _experiment.py:2004 with the exact production error, and passes with the fix applied
Backwards-compatible: records with metadata=None, missing metadata, and populated metadata all coerce to the same shape downstream
Full LLMObs experiment suite (backend-fixture tests skipped locally; CI will cover)

Release note

Added at releasenotes/notes/fix-llmobs-experiment-none-metadata-*.yaml.

Dataset records whose `metadata` field is `None` (a common shape for records serialized from JSON with an explicit `null`) caused `TypeError: 'NoneType' object is not a mapping` in the LLMObs experiment pipeline. `dict.get("metadata", {})` returns the stored `None` rather than the default `{}` when the key is present, so the subsequent `{**record_metadata, "experiment_config": self._config}` spread crashed. Fix every `record.get("metadata", {})` read in `_experiment.py` to use `record.get("metadata") or {}`, which correctly coerces missing, absent, and explicitly-None values to an empty dict. Four sites were affected: - Dataset.as_dataframe() flattening (previously guarded by `isinstance`, now consistent with the others) - _prepare_summary_evaluator_data (the main crash site reported by multiple ddeval projects) - per-record task argument plumbing - per-record evaluator context building Adds a regression test (`test_experiment_run_summary_evaluators_handles_none_metadata`) that constructs an in-memory Dataset with `metadata=None`, runs the full task → evaluator → summary-evaluator flow, and asserts no crash. The test fails on main at `_experiment.py:2004` without this fix and passes with it. Impact: unblocks ddeval projects across dd-source (multi-claim-example, single-claim-example, synthetics-critical-endpoint-selector, and others) that currently fail with `'NoneType' object is not a mapping` when pulling datasets via LLMObs' `pull_dataset` since metadata is JSON-decoded to Python None at that layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cit-pr-commenter-54b7da · 2026-04-24T20:23:29Z

Codeowners resolved as

ddtrace/llmobs/_experiment.py                                           @DataDog/ml-observability
releasenotes/notes/fix-llmobs-experiment-none-metadata-7d4e2f9a8c1b3e5f.yaml  @DataDog/apm-python
tests/llmobs/test_experiments.py                                        @DataDog/ml-observability

asaxena2019 requested review from a team as code owners April 24, 2026 20:22

asaxena2019 requested review from juanjux and sabrenner April 24, 2026 20:22

juanjux approved these changes Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llmobs): handle None metadata in experiment dataset records#17729

fix(llmobs): handle None metadata in experiment dataset records#17729
asaxena2019 wants to merge 1 commit intomainfrom
anushka/fix-llmobs-experiment-none-metadata

asaxena2019 commented Apr 24, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asaxena2019 commented Apr 24, 2026

Summary

Context

Change details

Test plan

Release note

Uh oh!

cit-pr-commenter-54b7da Bot commented Apr 24, 2026

Codeowners resolved as

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants