Skip to content

Commit 46be034

Browse files
authored
feat(llm-obs): added report evaluator integration from pydantic (#17032)
## Description Add integration for pydantic ai report evaluators that return scalars (i.e. return a `ScalarResult` object) to be submitted as summary evaluators in experiments sdks <!-- Provide an overview of the change and motivation for the change --> ## Testing Pydantic summary report evaluators are working in experiments. I used this [test script](https://github.com/DataDog/llm-obs/pull/245/changes#diff-ab2adacd550f0dd442d7bcdbf2a73ed6bb5064ad6f5ad5ddbe7a19ec45875df8) which generated [this experiment](https://dd.datad0g.com/llm/experiments/0a70a7d8-215b-44a5-9b12-cbd045d99d6b?project=jenn-test-pydantic-evaluators). Added unit tests as well confirming that both sync and async pydantic summary evaluators are working and that unsupported types (i.e. non ScalarReport types) fail as expected. <!-- Describe your testing strategy or note what tests are included --> ## Risks <!-- Note any risks associated with this change, or "None" if no risks --> None ## Additional Notes <!-- Any other information that would be helpful for reviewers --> Co-authored-by: jennifer.mickel <jennifer.mickel@datadoghq.com>
1 parent ee88ac2 commit 46be034

17 files changed

Lines changed: 808 additions & 974 deletions

.riot/requirements/151ddb8.txt

Lines changed: 0 additions & 126 deletions
This file was deleted.
Lines changed: 37 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -2,59 +2,48 @@
22
# This file is autogenerated by pip-compile with Python 3.11
33
# by the following command:
44
#
5-
# pip-compile --allow-unsafe --no-annotate .riot/requirements/aed405b.in
5+
# pip-compile --allow-unsafe --no-annotate .riot/requirements/17d690f.in
66
#
77
aiohappyeyeballs==2.6.1
8-
aiohttp==3.13.3
8+
aiohttp==3.13.5
99
aiosignal==1.4.0
1010
annotated-doc==0.0.4
1111
annotated-types==0.7.0
1212
anyio==4.13.0
13-
appdirs==1.4.4
1413
attrs==26.1.0
1514
backoff==2.2.1
16-
boto3==1.42.77
17-
botocore==1.42.77
15+
boto3==1.42.80
16+
botocore==1.42.80
1817
certifi==2026.2.25
1918
cffi==2.0.0
2019
charset-normalizer==3.4.6
2120
click==8.3.1
22-
colorama==0.4.6
2321
coverage[toml]==7.13.5
2422
cryptography==46.0.6
25-
dataclasses-json==0.6.7
26-
datasets==4.8.4
27-
deepeval==3.9.3
28-
dill==0.4.1
23+
deepeval==3.9.5
2924
distro==1.9.0
3025
docstring-parser==0.17.0
3126
execnet==2.1.2
32-
filelock==3.25.2
3327
frozenlist==1.8.0
34-
fsspec[http]==2026.2.0
3528
genai-prices==0.0.56
36-
google-api-core[grpc]==2.30.0
29+
google-api-core[grpc]==2.30.1
3730
google-auth[requests]==2.49.1
38-
google-cloud-aiplatform==1.143.0
39-
google-cloud-bigquery==3.40.1
40-
google-cloud-core==2.5.0
31+
google-cloud-aiplatform==1.144.0
32+
google-cloud-bigquery==3.41.0
33+
google-cloud-core==2.5.1
4134
google-cloud-resource-manager==1.17.0
4235
google-cloud-storage==3.10.1
4336
google-crc32c==1.8.0
44-
google-genai==1.68.0
45-
google-resumable-media==2.8.0
37+
google-genai==1.70.0
38+
google-resumable-media==2.8.2
4639
googleapis-common-protos[grpc]==1.73.1
47-
griffe==2.0.2
48-
griffecli==2.0.2
4940
griffelib==2.0.2
50-
grpc-google-iam-v1==0.14.3
51-
grpcio==1.78.0
52-
grpcio-status==1.78.0
41+
grpc-google-iam-v1==0.14.4
42+
grpcio==1.80.0
43+
grpcio-status==1.80.0
5344
h11==0.16.0
54-
hf-xet==1.4.2
5545
httpcore==1.0.9
5646
httpx==0.28.1
57-
huggingface-hub==1.8.0
5847
hypothesis==6.45.0
5948
idna==3.11
6049
importlib-metadata==8.7.1
@@ -64,50 +53,47 @@ jiter==0.13.0
6453
jmespath==1.1.0
6554
jsonpatch==1.33
6655
jsonpointer==3.1.1
67-
langchain==0.2.17
68-
langchain-community==0.2.19
69-
langchain-core==0.2.43
70-
langchain-openai==0.1.25
71-
langchain-text-splitters==0.2.4
72-
langsmith==0.1.147
73-
logfire-api==4.30.0
56+
langchain==1.2.14
57+
langchain-core==1.2.23
58+
langgraph==1.1.4
59+
langgraph-checkpoint==4.0.1
60+
langgraph-prebuilt==1.0.8
61+
langgraph-sdk==0.3.12
62+
langsmith==0.7.23
63+
logfire-api==4.31.0
7464
markdown-it-py==4.0.0
7565
markupsafe==3.0.3
76-
marshmallow==3.26.2
7766
mdurl==0.1.2
7867
mock==5.2.0
7968
multidict==6.7.1
80-
multiprocess==0.70.19
81-
mypy-extensions==1.1.0
8269
nest-asyncio==1.6.0
83-
numpy==1.26.4
84-
openai==1.109.1
70+
numpy==2.4.4
71+
openai==2.30.0
8572
opentelemetry-api==1.40.0
8673
opentelemetry-sdk==1.40.0
8774
opentelemetry-semantic-conventions==0.61b0
8875
opentracing==2.4.0
89-
orjson==3.11.7
90-
packaging==24.2
91-
pandas==3.0.1
76+
orjson==3.11.8
77+
ormsgpack==1.12.2
78+
packaging==26.0
79+
pandas==3.0.2
9280
pluggy==1.6.0
9381
portalocker==3.2.0
9482
posthog==5.4.0
9583
propcache==0.4.1
9684
proto-plus==1.27.2
9785
protobuf==6.33.6
98-
pyarrow==23.0.1
9986
pyasn1==0.6.3
10087
pyasn1-modules==0.4.2
10188
pycparser==3.0
10289
pydantic==2.12.5
103-
pydantic-ai-slim==1.30.1
90+
pydantic-ai-slim==1.75.0
10491
pydantic-core==2.41.5
105-
pydantic-evals==1.30.1
106-
pydantic-graph==1.30.1
92+
pydantic-evals==1.75.0
93+
pydantic-graph==1.75.0
10794
pydantic-settings==2.13.1
10895
pyfiglet==1.0.4
109-
pygments==2.19.2
110-
pysbd==0.3.4
96+
pygments==2.20.0
11197
pytest==9.0.2
11298
pytest-asyncio==0.21.1
11399
pytest-cov==7.1.0
@@ -118,34 +104,31 @@ pytest-xdist==3.8.0
118104
python-dateutil==2.9.0.post0
119105
python-dotenv==1.2.2
120106
pyyaml==6.0.3
121-
ragas==0.1.21
122-
regex==2026.2.28
123-
requests==2.33.0
107+
requests==2.33.1
124108
requests-toolbelt==1.0.0
125109
rich==14.3.3
126110
s3transfer==0.16.0
127-
sentry-sdk==2.56.0
111+
sentry-sdk==2.57.0
128112
shellingham==1.5.4
129113
six==1.17.0
130114
sniffio==1.3.1
131115
sortedcontainers==2.4.0
132-
sqlalchemy==2.0.48
133116
tabulate==0.9.0
134-
tenacity==8.5.0
135-
tiktoken==0.12.0
117+
tenacity==9.1.4
136118
tqdm==4.67.3
137119
typer==0.24.1
138120
typing-extensions==4.15.0
139-
typing-inspect==0.9.0
140121
typing-inspection==0.4.2
141122
urllib3==2.6.3
123+
uuid-utils==0.14.1
142124
vcrpy==8.1.1
143125
websockets==16.0
144126
wheel==0.46.3
145127
wrapt==2.1.2
146128
xxhash==3.6.0
147129
yarl==1.23.0
148130
zipp==3.23.0
131+
zstandard==0.25.0
149132

150133
# The following packages are considered to be unsafe in a requirements file:
151134
setuptools==82.0.1

0 commit comments

Comments
 (0)