Skip to content

Commit 321dc7e

Browse files
chore: add an AGENTS file (#227)
* chore: add an AGENTS file Assisted-by: ClaudeCode:claude-opus-4.8 Signed-off-by: Henry Schreiner <henryfs@princeton.edu> * Apply suggestions from code review, ensuring HDF5 files are also known from now on Co-authored-by: Eduardo Rodrigues <eduardo.rodrigues@cern.ch> * Apply suggestions from code review Co-authored-by: Eduardo Rodrigues <eduardo.rodrigues@cern.ch> --------- Signed-off-by: Henry Schreiner <henryfs@princeton.edu> Co-authored-by: Eduardo Rodrigues <eduardo.rodrigues@cern.ch>
1 parent d1b5117 commit 321dc7e

2 files changed

Lines changed: 56 additions & 0 deletions

File tree

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,3 +117,7 @@ ENV/
117117
dev/make-root/*.root
118118
dev/make-root/*.so
119119
dev/make-root/*.d
120+
121+
# Symlink to AGENTS.md
122+
CLAUDE.md
123+
.claude/

AGENTS.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# AGENTS.md
2+
3+
`scikit-hep-testdata` distributes example HEP files (mostly ROOT and LHE) for testing downstream packages like `uproot` and `pylhe`. It is primarily a *data* package with a thin Python helper layer that resolves a filename to an absolute path, downloading and caching files on demand when they aren't present locally.
4+
5+
## Commands
6+
7+
```bash
8+
pip install -e .[test] # editable install pulls in test deps; also keeps data files local
9+
pytest # run the suite (config in pyproject.toml [tool.pytest])
10+
pytest tests/test_local_files.py::test_data_path # single test
11+
prek -a --quiet # lint (ruff + black + mypy), preferred over pre-commit run -a
12+
uv run pytest # if working inside a uv-managed env
13+
```
14+
15+
Tests run with `filterwarnings = ["error"]`, so any warning fails the suite. mypy runs in strict mode over `src` only.
16+
17+
## How file resolution works
18+
19+
`data_path(filename, raise_missing=True, cache_dir=None)` in `local_files.py` is the entry point. The resolution order:
20+
21+
1. **Remote files** (`remote_files.is_known_remote`): scoped names like `cms_hep_2012_tutorial/data.root`, defined in `remote_datasets.yml`. Downloaded (and tar-extracted) on first access into the cache dir. See "remote" below.
22+
2. **Local files**: names in `known_files`. If the file isn't physically present (sdist/wheel install strips data), it's downloaded from the `main` branch on GitHub raw and cached.
23+
3. Otherwise raise `FileNotFoundError` (unless `raise_missing=False`).
24+
25+
Cache directory defaults to `~/.local/skhepdata` (`data.cache_path`), overridable via `cache_dir` / the CLI `--dir` flag.
26+
27+
`known_files` is loaded from `src/skhep_testdata/data/file_list.txt`, which is **generated by `setup.py` at build time** by scanning `data/` for `.root/.lhe/.gz/.json/.hdf5`. Don't hand-edit `file_list.txt`.
28+
29+
## The data-stripping build (setup.py)
30+
31+
The package data files are large, so they are **excluded from the sdist/wheel by default**. `setup.py`:
32+
- Generates `file_list.txt` from the contents of `data/`.
33+
- A custom `SDist` command and `exclude_package_data` strip the actual data files unless `SKHEP_DATA=1` is set in the environment, or you do an editable install.
34+
35+
This is why an end user installing from PyPI gets the helper code + `file_list.txt` but downloads actual files lazily. Keep this dual local/remote behavior in mind when changing install or path-resolution logic.
36+
37+
## Remote datasets
38+
39+
`remote_datasets.yml` maps a dataset name → `{url, files}`. `files` is either a list or a `{output_name: path_inside_tar}` map. `RemoteDatasetList` (a classmethod-only singleton with a `_all_files` class cache) flattens these into scoped `dataset/filename` keys. Tests use a separate `tests/test_remote_datasets.yml` loaded explicitly via `load_remote_configs(path)`.
40+
41+
## Adding files
42+
43+
- Drop the file into `src/skhep_testdata/data/`. It becomes a "local" file automatically (extension must be `.root/.lhe/.gz/.json/.hdf5` to be picked up by the build). No code change needed.
44+
- Large files (>~25 MB) should go to an external open-access repo and be wired in through `remote_datasets.yml` instead.
45+
- Scripts/notes used to *generate* ROOT test files live in `dev/make-root/` (ROOT C++ macros and a few Python scripts).
46+
- `check-added-large-files` pre-commit hook will flag oversized additions.
47+
- It is good practice to add a .readme file for files taken from elsewhere, using the same names(s) as the file(s) being added.
48+
49+
## Packaging notes
50+
51+
- Version is managed by `setuptools_scm` and written to `src/skhep_testdata/version.py` (generated; not committed/edited).
52+
- Two console scripts (`scikit-hep-testdata`, `skhep-testdata`) both point at `skhep_testdata.__main__:main`, equivalent to `python -m skhep_testdata`.

0 commit comments

Comments
 (0)