Commit 6b06163
[Evaluation] Fix AOAI evaluation to preserve list values instead of stringifying them (#45574)
* Fix AOAI evaluation to preserve list values instead of stringifying them
The _convert_value helper in _get_data_source was converting list values
to strings via str(), turning [] into '[]'. The AOAI API then rejected
these with 'is not of type array' errors.
Move list from the stringify branch to the pass-through branch alongside
dict, since both are structured JSON types that should be preserved as
native objects for proper serialization.
Update existing test assertions and add a new test for list/dict value
preservation including empty collections.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Infer array/object schema types for list/dict columns in flat mode
The flat schema generator in _generate_data_source_config now samples
the first row to emit the correct JSON Schema type (array, object, or
string) instead of defaulting everything to string. This ensures the
schema aligns with the data produced by _convert_value.
Add test for schema type inference and an integration test verifying
schema-data alignment for list/dict columns including empty collections.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix pass_threshold propagation and zero-threshold logging
- Use 'is not None' instead of truthiness check in
_build_internal_log_attributes so threshold=0 is not silently dropped.
- Propagate _pass_threshold from evaluator_config into
testing_criteria_metadata in _extract_testing_criteria_metadata.
- Inject pass_threshold into metric results in _process_criteria_metrics
when the evaluator (e.g. PythonGrader) does not emit one, without
overwriting evaluator-provided thresholds.
- Add 12 unit tests covering all three changes including zero-value
edge cases.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Skip None/NaN rows when inferring schema types
The flat schema generator now scans past None and NaN values to find
the first non-null sample for type inference, instead of only checking
iloc[0]. This avoids schema-data mismatches when the first row has
missing values but later rows contain lists or dicts.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Address PR review comments
- Use _is_none_or_nan for threshold injection check so NaN thresholds
are also replaced by pass_threshold from config.
- Use pd.isna with guard for list/dict when skipping null sentinels
(handles pd.NA, NaT, etc. in addition to None and float NaN).
- Infer leaf types in nested schema via leaf_type_map parameter on
_build_schema_tree_from_paths so nested paths with list/dict data
get array/object schema types instead of always defaulting to string.
- Add tests for leaf_type_map, nested schema type inference, pd.NA
handling, and NaN threshold injection.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Apply black formatting to pass CI checks
Use line-length=120 from eng/black-pyproject.toml config.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>1 parent 233f129 commit 6b06163
4 files changed
Lines changed: 416 additions & 13 deletions
File tree
- sdk/evaluation/azure-ai-evaluation
- azure/ai/evaluation/_evaluate
- tests/unittests
Lines changed: 14 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1103 | 1103 | | |
1104 | 1104 | | |
1105 | 1105 | | |
1106 | | - | |
| 1106 | + | |
1107 | 1107 | | |
1108 | 1108 | | |
1109 | 1109 | | |
| |||
2030 | 2030 | | |
2031 | 2031 | | |
2032 | 2032 | | |
| 2033 | + | |
| 2034 | + | |
| 2035 | + | |
| 2036 | + | |
| 2037 | + | |
2033 | 2038 | | |
2034 | 2039 | | |
2035 | 2040 | | |
| |||
2503 | 2508 | | |
2504 | 2509 | | |
2505 | 2510 | | |
| 2511 | + | |
| 2512 | + | |
| 2513 | + | |
| 2514 | + | |
| 2515 | + | |
| 2516 | + | |
| 2517 | + | |
| 2518 | + | |
2506 | 2519 | | |
2507 | 2520 | | |
2508 | 2521 | | |
| |||
Lines changed: 50 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
590 | 590 | | |
591 | 591 | | |
592 | 592 | | |
| 593 | + | |
593 | 594 | | |
594 | 595 | | |
595 | 596 | | |
| |||
629 | 630 | | |
630 | 631 | | |
631 | 632 | | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
632 | 636 | | |
633 | 637 | | |
634 | 638 | | |
635 | 639 | | |
636 | 640 | | |
637 | | - | |
638 | | - | |
| 641 | + | |
| 642 | + | |
639 | 643 | | |
640 | 644 | | |
641 | 645 | | |
642 | 646 | | |
643 | 647 | | |
644 | 648 | | |
645 | 649 | | |
646 | | - | |
| 650 | + | |
647 | 651 | | |
648 | 652 | | |
649 | 653 | | |
| 654 | + | |
650 | 655 | | |
651 | 656 | | |
652 | 657 | | |
653 | 658 | | |
| 659 | + | |
| 660 | + | |
654 | 661 | | |
655 | 662 | | |
656 | 663 | | |
657 | | - | |
658 | | - | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
659 | 667 | | |
660 | 668 | | |
661 | 669 | | |
| |||
715 | 723 | | |
716 | 724 | | |
717 | 725 | | |
718 | | - | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
719 | 741 | | |
| 742 | + | |
| 743 | + | |
720 | 744 | | |
721 | 745 | | |
722 | 746 | | |
| |||
754 | 778 | | |
755 | 779 | | |
756 | 780 | | |
757 | | - | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
758 | 799 | | |
759 | 800 | | |
760 | 801 | | |
| |||
816 | 857 | | |
817 | 858 | | |
818 | 859 | | |
819 | | - | |
| 860 | + | |
820 | 861 | | |
821 | | - | |
| 862 | + | |
822 | 863 | | |
823 | 864 | | |
824 | 865 | | |
| |||
Lines changed: 178 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
171 | 171 | | |
172 | 172 | | |
173 | 173 | | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
174 | 194 | | |
175 | 195 | | |
176 | 196 | | |
| |||
297 | 317 | | |
298 | 318 | | |
299 | 319 | | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
300 | 416 | | |
301 | 417 | | |
302 | 418 | | |
| |||
437 | 553 | | |
438 | 554 | | |
439 | 555 | | |
440 | | - | |
| 556 | + | |
441 | 557 | | |
442 | 558 | | |
443 | 559 | | |
| |||
464 | 580 | | |
465 | 581 | | |
466 | 582 | | |
467 | | - | |
| 583 | + | |
468 | 584 | | |
469 | 585 | | |
470 | 586 | | |
| |||
485 | 601 | | |
486 | 602 | | |
487 | 603 | | |
488 | | - | |
| 604 | + | |
489 | 605 | | |
490 | 606 | | |
491 | 607 | | |
| |||
504 | 620 | | |
505 | 621 | | |
506 | 622 | | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
507 | 652 | | |
508 | 653 | | |
509 | 654 | | |
| |||
600 | 745 | | |
601 | 746 | | |
602 | 747 | | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
0 commit comments