Commit a31a7f3
Update Tool Call Accuracy to output unified format (#4930)
* Unify the output of Tool Call Accuracy
* Add status to prompty
* Update Tool Call Accuracy Output Format
* Update documentation to state deprecate 'gpt_' prefix
Co-authored-by: Copilot <copilot@github.com>
* Rename not_applicable to pass in tool_call_accuracy result key and update tests (#4964)
Agent-Logs-Url: https://github.com/Azure/azureml-assets/sessions/ba5b2838-661b-419e-9645-b960cc227d25
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>
* Use response-specific tool definitions in function_call/mcp_approval tests (#4971)
Agent-Logs-Url: https://github.com/Azure/azureml-assets/sessions/0d2db933-6e9b-4b8d-b1a9-789026ec14c8
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>
* Bump tool_call_accuracy evaluator version to 9
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>1 parent 5020434 commit a31a7f3
8 files changed
Lines changed: 158 additions & 39 deletions
File tree
- assets/evaluators
- builtin/tool_call_accuracy
- evaluator
- tests
- common
- test_evaluators_behavior
- test_evaluators_quality
Lines changed: 37 additions & 26 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
787 | 787 | | |
788 | 788 | | |
789 | 789 | | |
790 | | - | |
791 | | - | |
792 | | - | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
793 | 794 | | |
794 | 795 | | |
795 | 796 | | |
| |||
804 | 805 | | |
805 | 806 | | |
806 | 807 | | |
807 | | - | |
| 808 | + | |
808 | 809 | | |
809 | 810 | | |
810 | 811 | | |
| |||
973 | 974 | | |
974 | 975 | | |
975 | 976 | | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
976 | 983 | | |
977 | 984 | | |
978 | 985 | | |
| |||
989 | 996 | | |
990 | 997 | | |
991 | 998 | | |
992 | | - | |
| 999 | + | |
993 | 1000 | | |
994 | 1001 | | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
995 | 1014 | | |
996 | 1015 | | |
| 1016 | + | |
997 | 1017 | | |
998 | | - | |
| 1018 | + | |
999 | 1019 | | |
1000 | | - | |
1001 | | - | |
1002 | | - | |
1003 | | - | |
1004 | | - | |
1005 | | - | |
1006 | | - | |
1007 | | - | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
1008 | 1023 | | |
1009 | 1024 | | |
1010 | 1025 | | |
| |||
1049 | 1064 | | |
1050 | 1065 | | |
1051 | 1066 | | |
1052 | | - | |
| 1067 | + | |
1053 | 1068 | | |
1054 | 1069 | | |
1055 | 1070 | | |
1056 | 1071 | | |
1057 | 1072 | | |
1058 | 1073 | | |
1059 | 1074 | | |
1060 | | - | |
| 1075 | + | |
1061 | 1076 | | |
1062 | 1077 | | |
1063 | | - | |
| 1078 | + | |
| 1079 | + | |
1064 | 1080 | | |
1065 | | - | |
| 1081 | + | |
1066 | 1082 | | |
1067 | | - | |
1068 | | - | |
1069 | | - | |
1070 | | - | |
1071 | | - | |
1072 | | - | |
1073 | | - | |
1074 | | - | |
| 1083 | + | |
| 1084 | + | |
| 1085 | + | |
1075 | 1086 | | |
1076 | 1087 | | |
1077 | 1088 | | |
| |||
Lines changed: 17 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
57 | 67 | | |
58 | 68 | | |
59 | 69 | | |
| |||
139 | 149 | | |
140 | 150 | | |
141 | 151 | | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
146 | 159 | | |
147 | 160 | | |
148 | 161 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
| 52 | + | |
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
| 57 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
| 165 | + | |
| 166 | + | |
165 | 167 | | |
166 | 168 | | |
167 | 169 | | |
| |||
175 | 177 | | |
176 | 178 | | |
177 | 179 | | |
| 180 | + | |
178 | 181 | | |
179 | 182 | | |
180 | 183 | | |
| |||
195 | 198 | | |
196 | 199 | | |
197 | 200 | | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
198 | 204 | | |
199 | 205 | | |
200 | 206 | | |
| |||
246 | 252 | | |
247 | 253 | | |
248 | 254 | | |
249 | | - | |
| 255 | + | |
250 | 256 | | |
251 | 257 | | |
252 | 258 | | |
253 | 259 | | |
254 | 260 | | |
255 | 261 | | |
256 | 262 | | |
257 | | - | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
258 | 269 | | |
259 | 270 | | |
260 | 271 | | |
| |||
Lines changed: 5 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
862 | 863 | | |
863 | 864 | | |
864 | 865 | | |
865 | | - | |
| 866 | + | |
866 | 867 | | |
867 | 868 | | |
868 | 869 | | |
| |||
872 | 873 | | |
873 | 874 | | |
874 | 875 | | |
875 | | - | |
| 876 | + | |
876 | 877 | | |
877 | 878 | | |
878 | 879 | | |
| |||
884 | 885 | | |
885 | 886 | | |
886 | 887 | | |
887 | | - | |
| 888 | + | |
888 | 889 | | |
889 | 890 | | |
890 | 891 | | |
| |||
894 | 895 | | |
895 | 896 | | |
896 | 897 | | |
897 | | - | |
| 898 | + | |
898 | 899 | | |
899 | 900 | | |
900 | 901 | | |
Lines changed: 53 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3211 | 3211 | | |
3212 | 3212 | | |
3213 | 3213 | | |
| 3214 | + | |
| 3215 | + | |
| 3216 | + | |
| 3217 | + | |
| 3218 | + | |
| 3219 | + | |
| 3220 | + | |
| 3221 | + | |
| 3222 | + | |
| 3223 | + | |
| 3224 | + | |
| 3225 | + | |
| 3226 | + | |
| 3227 | + | |
| 3228 | + | |
| 3229 | + | |
| 3230 | + | |
| 3231 | + | |
| 3232 | + | |
| 3233 | + | |
| 3234 | + | |
| 3235 | + | |
| 3236 | + | |
| 3237 | + | |
| 3238 | + | |
| 3239 | + | |
| 3240 | + | |
| 3241 | + | |
| 3242 | + | |
| 3243 | + | |
| 3244 | + | |
| 3245 | + | |
| 3246 | + | |
| 3247 | + | |
| 3248 | + | |
| 3249 | + | |
| 3250 | + | |
| 3251 | + | |
| 3252 | + | |
| 3253 | + | |
| 3254 | + | |
| 3255 | + | |
| 3256 | + | |
| 3257 | + | |
| 3258 | + | |
| 3259 | + | |
| 3260 | + | |
| 3261 | + | |
| 3262 | + | |
| 3263 | + | |
| 3264 | + | |
| 3265 | + | |
| 3266 | + | |
Lines changed: 15 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
69 | 70 | | |
70 | 71 | | |
71 | 72 | | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
Lines changed: 15 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
7 | 8 | | |
8 | 9 | | |
9 | 10 | | |
| |||
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
19 | 34 | | |
20 | 35 | | |
21 | 36 | | |
| |||
0 commit comments