Skip to content

[Evaluation] Fix App Insights emission silently dropping all events when evaluator definition has no "metrics"#47175

Draft
slister1001 wants to merge 1 commit into
Azure:mainfrom
slister1001:fix/rubric-empty-metrics-emit
Draft

[Evaluation] Fix App Insights emission silently dropping all events when evaluator definition has no "metrics"#47175
slister1001 wants to merge 1 commit into
Azure:mainfrom
slister1001:fix/rubric-empty-metrics-emit

Conversation

@slister1001
Copy link
Copy Markdown
Member

Summary

Fixes a silent App Insights emission bug where rubric evaluators registered without metric metadata produced zero gen_ai.evaluation.result events for the entire run, even though the eval itself completed successfully.

Root cause

In _build_internal_log_attributes (_evaluate.py:1138):

if evaluator_definition := testing_criteria_config.get("_evaluator_definition"):
    metric_config_detail = evaluator_definition.get("metrics").get(metric_name)

RAISvc's RubricBasedEvaluatorDefinition.Metrics defaults to empty and is not required by Validate(), so a rubric registered without metric metadata is sent as {"type": "rubric"} — no metrics key at all. evaluator_definition.get("metrics") returns None, then None.get(...) raises:

AttributeError: 'NoneType' object has no attribute 'get'

That exception is caught by the per-event try/except at _log_events_to_app_insights line 1287 and silently swallowed before event_logger.emit() is reached. Because every event in the run hits the same code path with the same evaluator definition, every event is dropped — App Insights receives nothing.

Observed in production: rubric eval rubric-manual-260526043804-e45a09 in the westus2 bug bash project had zero AppInsights events while other rubric evals in the same workspace (e.g., aprilk-repro) that had populated metric metadata emitted normally.

Fix

Guard the metrics lookup with isinstance(metrics_section, dict) so a single misshapen definition does not abort emission for the entire run. The event is still emitted; only the optional min_value / max_value / desirable_direction / type attributes are absent for evaluators that did not register them.

Tests

Added TestBuildInternalLogAttributesEvaluatorDefinition with 5 regression cases:

  • test_definition_without_metrics_key_does_not_raise — the exact failing payload ({"type": "rubric"})
  • test_definition_with_metrics_none_does_not_raise
  • test_definition_with_metrics_empty_dict_does_not_raise
  • test_definition_with_metrics_list_does_not_raise — defensive against future malformed types
  • test_definition_with_metric_metadata_still_populates_attributes — happy path unchanged

All 10 tests in TestBuildInternalLogAttributesThreshold + TestBuildInternalLogAttributesEvaluatorDefinition pass.

Validation

  • tox run -e black -c ../../../eng/tox/tox.ini --root . — clean
  • pytest tests/unittests/test_evaluate.py::TestBuildInternalLogAttributesEvaluatorDefinition tests/unittests/test_evaluate.py::TestBuildInternalLogAttributesThreshold -v — 10 passed
  • Reproduced the original crash with a 2-line repro before the fix; confirmed the fix prevents it.

@github-actions github-actions Bot added the Evaluation Issues related to the client library for Azure AI Evaluation label May 27, 2026
…inition has no `metrics`

Rubric evaluators registered without metric metadata produce an evaluator
definition payload of `{"type": "rubric"}` (RAISvc's `RubricBasedEvaluatorDefinition.Metrics`
defaults to empty and validation does not require it). In
`_build_internal_log_attributes` the helper called
`evaluator_definition.get("metrics").get(metric_name)` which raises
`AttributeError: 'NoneType' object has no attribute 'get'` when `metrics`
is missing or set to None. That exception is caught by the per-event
`try/except` in `_log_events_to_app_insights` and silently swallowed, so
`event_logger.emit()` is never called for any event in the run. The net
effect is that App Insights receives zero `gen_ai.evaluation.result` events
for the entire eval — observed in production for `rubric-manual-260526043804-e45a09`
in the westus2 bug bash project, while other rubric evaluators in the same
workspace that had populated metric metadata continued to emit normally.

Guard the metrics lookup so a single misshapen definition does not abort
emission for the whole run.

Added regression tests for `_build_internal_log_attributes` covering:
missing `metrics` key, `metrics: None`, `metrics: {}`, `metrics: [...]`
(malformed type), and the happy path where metric metadata is preserved.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@slister1001 slister1001 force-pushed the fix/rubric-empty-metrics-emit branch from ea539db to 31160ea Compare May 27, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Issues related to the client library for Azure AI Evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant