Skip to content

ci: add .venv caching step to speed up dep install#4516

Draft
isaac-chung wants to merge 7 commits intomainfrom
cache-deps-in-ci
Draft

ci: add .venv caching step to speed up dep install#4516
isaac-chung wants to merge 7 commits intomainfrom
cache-deps-in-ci

Conversation

@isaac-chung
Copy link
Copy Markdown
Collaborator

@isaac-chung isaac-chung commented Apr 26, 2026

Currently a simple test like linting takes 14 minutes to run, in which 13 minutes were used to install dependencies: e.g. https://github.com/embeddings-benchmark/mteb/actions/runs/24953209689/job/73066896371?pr=4515

Fix: use hashFiles('uv.lock') and other params to form a cache key to cache .venv for applicable workflows.

The first CI run will take the same amount of time as before. The second run (given no changes to uv.lock) should be much faster with cache read.

Details

Workflow: dataset_loading
Cache key suffix: py3.11-install
Notes: make install
────────────────────────────────────────
Workflow: documentation
Cache key suffix: py3.10-docs
Notes: uv sync --group docs
────────────────────────────────────────
Workflow: lint
Cache key suffix: py3.10-install
Notes: make install
────────────────────────────────────────
Workflow: model_loading
Cache key suffix: py3.10-model-loading
Notes: uv sync --extra pylate --group dev
────────────────────────────────────────
Workflow: test
Cache key suffix: py${{ matrix.python-version }}-install-for-tests
Notes: Matrix across 3.10-3.14 + Windows
────────────────────────────────────────
Workflow: typechecking
Cache key suffix: py3.10-install-for-tests
Notes: Shares cache with test py3.10/ubuntu
────────────────────────────────────────
Workflow: reference_models
Cache key suffix: py3.11-install-for-tests
Notes: Shares cache with test py3.11/ubuntu
────────────────────────────────────────
Workflow: leaderboard_build
Cache key suffix: py3.10-leaderboard
Notes: uv run --group test --extra leaderboard
────────────────────────────────────────
Workflow: update_leaderboard_models
Cache key suffix: py3.10-sync
Notes: Bare uv sync

Skipped: test_lowest (uses --resolution lowest-direct which deliberately ignores the
lockfile — caching would defeat its purpose), release, leaderboard_docker,
leaderboard_refresh, leaderboard_healthcheck, hf_space_docker, stale_pr (no Python deps
installed on runner).

@isaac-chung isaac-chung changed the title add .venv caching step to speed up dep install ci: add .venv caching step to speed up dep install Apr 26, 2026
Copy link
Copy Markdown
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would probably need a test to check that uv.lock is up to date then (which I don't think it currently it)

@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

I think it might be better to use the uv cache: https://docs.astral.sh/uv/guides/integration/github/#caching

It just caches the dependency and point to it during the setup. So you are never using the incorrect environment, but you can to "build" your environment every time (which is just setting pointer in uv so almost instant)

@Samoed
Copy link
Copy Markdown
Member

Samoed commented Apr 26, 2026

We would probably need a test to check that uv.lock is up to date then (which I don't think it currently it)

We can run it after each new mteb version, but this solves problem only partly.

@isaac-chung
Copy link
Copy Markdown
Collaborator Author

Each .venv cache is ~3.8 GB. GitHub Actions has a 10 GB cache limit per repository. With multiple jobs each creating separate venv cache keys (lint, tests, docs, model-loading, etc.), they evict each other almost immediately — only ~2-3 can exist at once.

@Samoed
Copy link
Copy Markdown
Member

Samoed commented Apr 26, 2026

Pip merged standart py.lock implementation. Might be interested in it pypa/pip#13334

@isaac-chung isaac-chung marked this pull request as draft April 26, 2026 19:08
@isaac-chung
Copy link
Copy Markdown
Collaborator Author

.venv/lib/python3.13/site-packages/sentence_transformers/base/modality.py:1
4: in
from sentence_transformers.base.modality_types import (
.venv/lib/python3.13/site-packages/sentence_transformers/base/modality_type
s.py:16: in
from torchcodec.decoders import AudioDecoder, VideoDecoder
.venv/lib/python3.13/site-packages/torchcodec/init.py:12: in
from . import decoders, encoders, samplers, transforms # noqa
.venv/lib/python3.13/site-packages/torchcodec/decoders/init.py:7: in

Seems like sentence transformers need torchcodec now?

@Samoed
Copy link
Copy Markdown
Member

Samoed commented Apr 26, 2026

No, it has fallback. https://github.com/huggingface/sentence-transformers/blob/01cd5f5bf33ad7b6435cf08b1d7984ab59875c5c/sentence_transformers/base/modality_types.py#L15-L19

If torchcodec installed incorrectly it would fail on each import. In ci wrong version combination of torch and torchcodec is installed

 + torch==2.10.0
 + torchaudio==2.11.0
 + torchcodec==0.11.1

Torch 2.10 is compatible with torchcodec 0.10, not 0.11

@Samoed
Copy link
Copy Markdown
Member

Samoed commented Apr 30, 2026

Hm, after the changes here it seems all our caches were reset and now our dependencies are taking more than 20 minutes to install.

@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

KennethEnevoldsen commented Apr 30, 2026

Hm, after the changes here it seems all our caches were reset and now our dependencies are taking more than 20 minutes to install.

On resolution? (we could just force it to use the lock file and then add a test ensuring that the lock if up to date)

@Samoed
Copy link
Copy Markdown
Member

Samoed commented Apr 30, 2026

I don't know why. In this branch seems everything normal, but our test CIs started installing dependencies 20 min https://github.com/embeddings-benchmark/mteb/actions/runs/25146122299/job/73706301456 maybe this can be caused by changing cache here, idk

@Samoed
Copy link
Copy Markdown
Member

Samoed commented Apr 30, 2026

In 2a73572 ci took ~15 min https://github.com/embeddings-benchmark/mteb/actions/runs/24936417753/job/73022655878 but in f625341 it took 30 min https://github.com/embeddings-benchmark/mteb/actions/runs/24954390462/job/73070030880

Between them CI or pyproject wasn't changed 2a73572...f625341

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants