Skip to content

Commit 0ca57bd

Browse files
coreyjadamskashif
authored andcommitted
Testmon db env (#1655)
* Enable a lock file to ensure clean dep resolution of ci specific deps. * 27;2;32~qq27;2;73~27;2;67~make * Update with real ci-requirements.lock * Update to strip out machinery to create the lock file * Remove errant git ignore update. Signed-off-by: Kashif Rasul <kashif.rasul@gmail.com>
1 parent 6fe67e8 commit 0ca57bd

6 files changed

Lines changed: 553 additions & 5 deletions

File tree

.github/CACHE_CONTRACT.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,60 @@ recorded. Switching to a `-latest` mutable slot via `replace-cache`
8282
fixes the save bug, and the embedded verify step turns any future
8383
silent save failure into a hard job failure.
8484

85+
#### Why a separate `ci-requirements.lock`
86+
87+
The cache fix above only addresses *saving* the DB; the DB is still
88+
worthless to PRs if testmon's environment fingerprint at PR time
89+
differs from the fingerprint stored at nightly time. Testmon
90+
computes that fingerprint from `importlib.metadata.distributions()`
91+
over the active venv -- i.e. *everything* in
92+
`.venv/lib/python3.12/site-packages`, not just the lockfile-pinned
93+
closure.
94+
95+
`setup-uv-env` builds the venv in two layered steps:
96+
97+
1. `uv sync --frozen --group dev --extra <EXTRAS_TAG>` -- deterministic
98+
against `uv.lock`.
99+
2. `uv pip install -r .github/ci-requirements.txt` -- adds CI-only
100+
test deps that have no home in pyproject extras (moto,
101+
scikit-image, numpy-stl, shapely, multi-storage-client, tensorstore,
102+
plus the PyG CUDA wheel swap).
103+
104+
Step 2 is *not* covered by `uv.lock`. Several of the direct pins in
105+
`ci-requirements.txt` are absent from `uv.lock` entirely, so their
106+
transitive closure (`responses`, `xmltodict`, `jsonpath-ng`,
107+
`lazy-loader`, `tifffile`, `pywavelets`, `imageio`,
108+
`antlr4-python3-runtime`, ...) gets re-resolved fresh against PyPI on
109+
every job. A single transitive minor bump between the nightly that
110+
publishes the testmon DB and the PR that consumes it changes the
111+
sorted `name version` string testmon hashes, trips its
112+
"packages installed have been changed" guard, and re-runs the entire
113+
suite.
114+
115+
[`.github/ci-requirements.lock`](ci-requirements.lock) is a fully
116+
pinned closure of `ci-requirements.txt` (direct + transitive), passed
117+
to the layered install via `--constraint`. It is generated by
118+
[`.github/regen-ci-deps-lock.sh`](regen-ci-deps-lock.sh), and must be
119+
regenerated and committed whenever a `==` pin in
120+
`ci-requirements.txt` changes.
121+
122+
Two ways to run the regen:
123+
124+
1. **Standalone [`Regen CI-deps Lock`](workflows/regen-ci-deps-lock.yml)
125+
workflow** (workflow_dispatch). Runs the regen on a CPU runner
126+
in ~5 min and uploads `.github/ci-requirements.lock` as an
127+
artifact for the maintainer to download and commit. Requires
128+
the workflow file to be on the default branch (GitHub refuses
129+
workflow_dispatch on files that exist only on feature branches --
130+
both the UI dropdown and `gh workflow run --ref` enforce this).
131+
132+
2. **Local docker.** See the header of
133+
[`.github/regen-ci-deps-lock.sh`](regen-ci-deps-lock.sh) for the
134+
`docker run …` invocation. Useful when iterating on the script
135+
itself or when the standalone workflow is unavailable (e.g. a
136+
feature branch where the workflow file has not yet landed on
137+
the default branch).
138+
85139
### Coverage baseline cache (`.coverage*`)
86140

87141
| Property | Value |

.github/actions/setup-uv-env/action.yml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,8 @@ runs:
172172
# uv.lock remains the single source of truth for the synced
173173
# environment; this step layers on top without invalidating that
174174
# guarantee. Pin list and per-pin rationale live in
175-
# .github/ci-requirements.txt.
175+
# .github/ci-requirements.txt; the pinned transitive closure lives in
176+
# .github/ci-requirements.lock (see .github/regen-ci-deps-lock.sh).
176177
- name: Install CI-only test dependencies
177178
shell: bash
178179
env:
@@ -190,11 +191,22 @@ runs:
190191
# currently latest on PyPI and breaking ABI compatibility with
191192
# the lockfile-pinned GPU stack (cudf / cuml / pylibcudf were
192193
# built against the synced pyarrow ABI, etc.).
194+
#
195+
# `--constraint .github/ci-requirements.lock` pins the transitive
196+
# closure of the CI-only deps so testmon's `system_packages`
197+
# fingerprint matches between the nightly that builds the testmon
198+
# DB and the PR that consumes it. Without this, unpinned
199+
# transitives of moto / scikit-image / multi-storage-client /
200+
# tensorstore / etc. resolve to whatever's latest on PyPI at job
201+
# time and invalidate the entire testmon cache. Regenerate the
202+
# lock via .github/regen-ci-deps-lock.sh whenever a `==` pin in
203+
# .github/ci-requirements.txt changes.
193204
uv pip install --python .venv/bin/python \
194205
--reinstall-package torch_scatter \
195206
--reinstall-package torch_sparse \
196207
--reinstall-package torch_cluster \
197208
--reinstall-package pyg_lib \
209+
--constraint .github/ci-requirements.lock \
198210
-r .github/ci-requirements.txt
199211
echo "::endgroup::"
200212

.github/ci-requirements.lock

Lines changed: 263 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,263 @@
1+
# This file was autogenerated by uv via the following command:
2+
# uv pip compile --python .venv/bin/python --constraint /tmp/tmp.PZXhTSZc0F --output-file .github/ci-requirements.lock .github/ci-requirements.txt
3+
attrs==26.1.0
4+
# via
5+
# -c /tmp/tmp.PZXhTSZc0F
6+
# jsonschema
7+
# referencing
8+
boto3==1.43.11
9+
# via
10+
# -c /tmp/tmp.PZXhTSZc0F
11+
# moto
12+
# multi-storage-client
13+
botocore==1.43.11
14+
# via
15+
# -c /tmp/tmp.PZXhTSZc0F
16+
# boto3
17+
# moto
18+
# s3transfer
19+
bracex==2.6
20+
# via
21+
# -c /tmp/tmp.PZXhTSZc0F
22+
# wcmatch
23+
certifi==2026.2.25
24+
# via
25+
# -c /tmp/tmp.PZXhTSZc0F
26+
# requests
27+
cffi==2.0.0
28+
# via
29+
# -c /tmp/tmp.PZXhTSZc0F
30+
# cryptography
31+
# xattr
32+
charset-normalizer==3.4.7
33+
# via
34+
# -c /tmp/tmp.PZXhTSZc0F
35+
# requests
36+
cryptography==46.0.7
37+
# via
38+
# -c /tmp/tmp.PZXhTSZc0F
39+
# moto
40+
filelock==3.29.0
41+
# via
42+
# -c /tmp/tmp.PZXhTSZc0F
43+
# multi-storage-client
44+
idna==3.11
45+
# via
46+
# -c /tmp/tmp.PZXhTSZc0F
47+
# requests
48+
imageio==2.37.3
49+
# via
50+
# -c /tmp/tmp.PZXhTSZc0F
51+
# scikit-image
52+
jmespath==1.1.0
53+
# via
54+
# -c /tmp/tmp.PZXhTSZc0F
55+
# boto3
56+
# botocore
57+
# multi-storage-client
58+
jsonschema==4.26.0
59+
# via
60+
# -c /tmp/tmp.PZXhTSZc0F
61+
# multi-storage-client
62+
jsonschema-specifications==2025.9.1
63+
# via
64+
# -c /tmp/tmp.PZXhTSZc0F
65+
# jsonschema
66+
lark==1.3.1
67+
# via
68+
# -c /tmp/tmp.PZXhTSZc0F
69+
# multi-storage-client
70+
lazy-loader==0.5
71+
# via
72+
# -c /tmp/tmp.PZXhTSZc0F
73+
# scikit-image
74+
markupsafe==3.0.3
75+
# via
76+
# -c /tmp/tmp.PZXhTSZc0F
77+
# werkzeug
78+
ml-dtypes==0.5.4
79+
# via
80+
# -c /tmp/tmp.PZXhTSZc0F
81+
# tensorstore
82+
moto==5.2.1
83+
# via
84+
# -c /tmp/tmp.PZXhTSZc0F
85+
# -r .github/ci-requirements.txt
86+
multi-storage-client==0.48.0
87+
# via
88+
# -c /tmp/tmp.PZXhTSZc0F
89+
# -r .github/ci-requirements.txt
90+
networkx==3.6.1
91+
# via
92+
# -c /tmp/tmp.PZXhTSZc0F
93+
# scikit-image
94+
numpy==2.2.6
95+
# via
96+
# -c /tmp/tmp.PZXhTSZc0F
97+
# -r .github/ci-requirements.txt
98+
# imageio
99+
# ml-dtypes
100+
# numpy-stl
101+
# scikit-image
102+
# scipy
103+
# shapely
104+
# tensorstore
105+
# tifffile
106+
numpy-stl==3.2.0
107+
# via
108+
# -c /tmp/tmp.PZXhTSZc0F
109+
# -r .github/ci-requirements.txt
110+
opentelemetry-api==1.42.0
111+
# via
112+
# -c /tmp/tmp.PZXhTSZc0F
113+
# multi-storage-client
114+
packaging==25.0
115+
# via
116+
# -c /tmp/tmp.PZXhTSZc0F
117+
# lazy-loader
118+
# scikit-image
119+
pillow==12.2.0
120+
# via
121+
# -c /tmp/tmp.PZXhTSZc0F
122+
# imageio
123+
# scikit-image
124+
prettytable==3.17.0
125+
# via
126+
# -c /tmp/tmp.PZXhTSZc0F
127+
# multi-storage-client
128+
psutil==7.2.2
129+
# via
130+
# -c /tmp/tmp.PZXhTSZc0F
131+
# multi-storage-client
132+
py-partiql-parser==0.6.3
133+
# via
134+
# -c /tmp/tmp.PZXhTSZc0F
135+
# moto
136+
pyarrow==24.0.0
137+
# via
138+
# -c /tmp/tmp.PZXhTSZc0F
139+
# -r .github/ci-requirements.txt
140+
pycparser==3.0
141+
# via
142+
# -c /tmp/tmp.PZXhTSZc0F
143+
# cffi
144+
pyg-lib==0.6.0+pt211cu128
145+
# via
146+
# -c /tmp/tmp.PZXhTSZc0F
147+
# -r .github/ci-requirements.txt
148+
python-dateutil==2.9.0.post0
149+
# via
150+
# -c /tmp/tmp.PZXhTSZc0F
151+
# botocore
152+
# multi-storage-client
153+
python-utils==3.9.1
154+
# via
155+
# -c /tmp/tmp.PZXhTSZc0F
156+
# numpy-stl
157+
pyyaml==6.0.3
158+
# via
159+
# -c /tmp/tmp.PZXhTSZc0F
160+
# moto
161+
# multi-storage-client
162+
# responses
163+
referencing==0.37.0
164+
# via
165+
# -c /tmp/tmp.PZXhTSZc0F
166+
# jsonschema
167+
# jsonschema-specifications
168+
requests==2.33.1
169+
# via
170+
# -c /tmp/tmp.PZXhTSZc0F
171+
# moto
172+
# responses
173+
responses==0.26.0
174+
# via
175+
# -c /tmp/tmp.PZXhTSZc0F
176+
# moto
177+
rpds-py==0.30.0
178+
# via
179+
# -c /tmp/tmp.PZXhTSZc0F
180+
# jsonschema
181+
# referencing
182+
s3transfer==0.17.0
183+
# via
184+
# -c /tmp/tmp.PZXhTSZc0F
185+
# boto3
186+
scikit-image==0.26.0
187+
# via
188+
# -c /tmp/tmp.PZXhTSZc0F
189+
# -r .github/ci-requirements.txt
190+
scipy==1.17.1
191+
# via
192+
# -c /tmp/tmp.PZXhTSZc0F
193+
# scikit-image
194+
# torch-cluster
195+
# torch-sparse
196+
shapely==2.1.2
197+
# via
198+
# -c /tmp/tmp.PZXhTSZc0F
199+
# -r .github/ci-requirements.txt
200+
six==1.17.0
201+
# via
202+
# -c /tmp/tmp.PZXhTSZc0F
203+
# python-dateutil
204+
tensorstore==0.1.83
205+
# via
206+
# -c /tmp/tmp.PZXhTSZc0F
207+
# -r .github/ci-requirements.txt
208+
tifffile==2026.5.15
209+
# via
210+
# -c /tmp/tmp.PZXhTSZc0F
211+
# scikit-image
212+
torch-cluster==1.6.3+pt211cu128
213+
# via
214+
# -c /tmp/tmp.PZXhTSZc0F
215+
# -r .github/ci-requirements.txt
216+
torch-scatter==2.1.2+pt211cu128
217+
# via
218+
# -c /tmp/tmp.PZXhTSZc0F
219+
# -r .github/ci-requirements.txt
220+
torch-sparse==0.6.18+pt211cu128
221+
# via
222+
# -c /tmp/tmp.PZXhTSZc0F
223+
# -r .github/ci-requirements.txt
224+
tqdm==4.67.3
225+
# via
226+
# -c /tmp/tmp.PZXhTSZc0F
227+
# multi-storage-client
228+
typing-extensions==4.15.0
229+
# via
230+
# -c /tmp/tmp.PZXhTSZc0F
231+
# opentelemetry-api
232+
# python-utils
233+
# referencing
234+
tzdata==2025.3
235+
# via
236+
# -c /tmp/tmp.PZXhTSZc0F
237+
# multi-storage-client
238+
urllib3==2.6.3
239+
# via
240+
# -c /tmp/tmp.PZXhTSZc0F
241+
# botocore
242+
# requests
243+
# responses
244+
wcmatch==10.1
245+
# via
246+
# -c /tmp/tmp.PZXhTSZc0F
247+
# multi-storage-client
248+
wcwidth==0.6.0
249+
# via
250+
# -c /tmp/tmp.PZXhTSZc0F
251+
# prettytable
252+
werkzeug==3.1.8
253+
# via
254+
# -c /tmp/tmp.PZXhTSZc0F
255+
# moto
256+
xattr==1.3.0
257+
# via
258+
# -c /tmp/tmp.PZXhTSZc0F
259+
# multi-storage-client
260+
xmltodict==1.0.4
261+
# via
262+
# -c /tmp/tmp.PZXhTSZc0F
263+
# moto

.github/ci-requirements.txt

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,22 @@
99
# upstream release between the nightly that builds the testmon DB and
1010
# the PR that consumes it would invalidate the entire cache. Bumping a
1111
# pin is a deliberate change: the next nightly repopulates the DB
12-
# against the new fingerprint and PRs start hitting again. Pinning
13-
# here only stabilises *direct* deps; transitive churn (e.g. boto3 ->
14-
# urllib3) can still trigger invalidations, in which case the next
15-
# escalation is a constraints.txt passed via `uv pip install -c ...`.
12+
# against the new fingerprint and PRs start hitting again.
13+
#
14+
# Why a companion `.github/ci-requirements.lock`:
15+
# `==`-pinning the direct deps stabilises only the lines listed here.
16+
# Several of them (moto, scikit-image, numpy-stl, shapely,
17+
# multi-storage-client, tensorstore) are NOT in uv.lock, so their
18+
# transitive closure (responses, xmltodict, jsonpath-ng, lazy-loader,
19+
# tifffile, pywavelets, imageio, antlr4-python3-runtime, ...) gets
20+
# resolved fresh against PyPI on every job and drifts between the
21+
# nightly that builds the testmon DB and the PR that consumes it. The
22+
# companion `.github/ci-requirements.lock` (passed via `-c` in
23+
# setup-uv-env) pins that closure so the fingerprint matches.
24+
# Regenerate it whenever any `==` pin below changes; the three regen
25+
# paths (nightly artifact, standalone workflow, local docker) are
26+
# documented in .github/CACHE_CONTRACT.md under
27+
# "Why a separate ci-requirements.lock".
1628
#
1729
# Coupling notes (bump in lockstep):
1830
# * Dockerfile lines 240-243 still use `>=` lower bounds for the

0 commit comments

Comments
 (0)