drivendataorg · chrisjkuch · Mar 1, 2026 · Mar 3, 2026 · Mar 3, 2026 · Mar 3, 2026
diff --git a/README.md b/README.md
@@ -44,6 +44,7 @@ The directory structure of your new project will look something like this (depen
 ├── LICENSE            <- Open-source license if one is chosen
 ├── Makefile           <- Makefile with convenience commands like `make data` or `make train`
 ├── README.md          <- The top-level README for developers using this project.
+├── AGENTS.md          <- The top-level AGENTS file for AI coding agents.
 ├── data
 │   ├── external       <- Data from third party sources.
 │   ├── interim        <- Intermediate data that has been transformed.

diff --git a/docs/docs/index.md b/docs/docs/index.md
@@ -93,6 +93,7 @@ The directory structure of your new project will look something like this (depen
 ├── LICENSE            <- Open-source license if one is chosen
 ├── Makefile           <- Makefile with convenience commands like `make data` or `make train`
 ├── README.md          <- The top-level README for developers using this project.
+├── AGENTS.md          <- The top-level AGENTS file for AI coding agents.
 ├── data
 │   ├── external       <- Data from third party sources.
 │   ├── interim        <- Intermediate data that has been transformed.

diff --git a/docs/docs/using-the-template.md b/docs/docs/using-the-template.md
@@ -155,6 +155,10 @@ Now you'll be able to [create a Pull Request in GitHub](https://docs.github.com/
 
 There's no magic in the `Makefile`. We often add project-specific commands or update the existing ones over the course of a project. For example, we've added scripts to generate reports with pandoc, build and serve documentation, publish static sites from assets, package code for distribution, and more.
 
+## Changing the `AGENTS.md` file
+
+There's no magic in the `AGENTS.md` file either (apart from the ✨magic✨ of AI). This is a place to put instructions for AI coding agents that you might use in your project. We often add project-specific instructions for how to use agents effectively with the codebase and data in our project. Best practices for instructing agents in a project are evolving rapidly, so we recommend updating this file as you learn what works best for your project.
+
 ## Installing Make on Windows
 
 Unfortunately, GNU Make is not typically pre-installed on Windows. Here are a few different options for getting Make:

diff --git a/tests/test_creation.py b/tests/test_creation.py
@@ -41,6 +41,62 @@ def no_curlies(filepath):
     return not any(template_strings_in_file)
 
 
+def verify_agents_md(root, config):
+    """Test that AGENTS.md is correctly rendered for the given config."""
+    agents_md = (root / "AGENTS.md").read_text()
+
+    # Project name and module name are always rendered
+    assert config["project_name"] in agents_md
+    assert config["module_name"] in agents_md
+
+    # No unrendered Jinja2 template strings
+    assert no_curlies(root / "AGENTS.md")
+
+    # Code scaffold section conditionally included in project structure
+    if config["include_code_scaffold"] == "Yes":
+        assert "dataset.py" in agents_md
+        assert "features.py" in agents_md
+        assert "modeling/" in agents_md
+    else:
+        assert "dataset.py" not in agents_md
+        assert "features.py" not in agents_md
+
+    # Dataset storage conditionals
+    has_storage = "none" not in config["dataset_storage"]
+    if has_storage:
+        assert "sync_data_down" in agents_md
+        assert "sync_data_up" in agents_md
+    else:
+        assert "sync_data_down" not in agents_md
+        assert "sync_data_up" not in agents_md
+
+    # Environment manager
+    env_manager = config["environment_manager"]
+    if env_manager == "conda":
+        assert "conda activate" in agents_md
+    elif env_manager == "virtualenv":
+        assert "workon" in agents_md
+    elif env_manager == "pipenv":
+        assert "pipenv shell" in agents_md
+    elif env_manager == "uv":
+        assert "source .venv/bin/activate" in agents_md
+    elif env_manager == "pixi":
+        assert "pixi shell" in agents_md
+    elif env_manager == "poetry":
+        assert "poetry env activate" in agents_md
+
+    # Dependency file
+    assert f"`{config['dependency_file']}`" in agents_md
+
+    # Linting and formatting
+    if config["linting_and_formatting"] == "ruff":
+        assert "Uses ruff" in agents_md
+        assert "flake8" not in agents_md
+    elif config["linting_and_formatting"] == "flake8+black+isort":
+        assert "flake8, black, and isort" in agents_md
+        assert "Uses ruff" not in agents_md
+
+
 def test_baking_configs(config, fast):
     """For every generated config in the config_generator, run all
     of the tests.
@@ -49,6 +105,7 @@ def test_baking_configs(config, fast):
     with bake_project(config) as project_directory:
         verify_folders(project_directory, config)
         verify_files(project_directory, config)
+        verify_agents_md(project_directory, config)
 
         if fast < 2:
             verify_makefile_commands(project_directory, config)
@@ -96,6 +153,7 @@ def verify_folders(root, config):
 def verify_files(root, config):
     """Test that expected files and only expected files exist."""
     expected_files = [
+        "AGENTS.md",
         "Makefile",
         "README.md",
         "pyproject.toml",

diff --git a/{{ cookiecutter.repo_name }}/AGENTS.md b/{{ cookiecutter.repo_name }}/AGENTS.md
@@ -0,0 +1,153 @@
+# AGENTS.md
+
+* **Project name:** {{ cookiecutter.project_name }}
+* **Description:** {{ cookiecutter.description }}
+
+This project was generated from the [Cookiecutter Data Science](https://cookiecutter-data-science.drivendata.org/) template. Follow the conventions below when working in this codebase.
+
+## Key Commands
+
+Run tasks through `make`. Key recipes:
+
+* `make` — List all available commands
+* `make requirements` — Install/update dependencies
+* `make data` — Run the data processing pipeline (assumes data is present in `data/raw/`){% if cookiecutter.linting_and_formatting != 'none' %}
+* `make lint` — Check code style
+* `make format` — Auto-format code{% endif %}{% if cookiecutter.testing_framework != 'none' %}
+* `make test` — Run the test suite{% endif %}{% if not cookiecutter.dataset_storage.none %}
+* `make sync_data_down` — Pull data from cloud storage
+* `make sync_data_up` — Push data to cloud storage{% endif %}
+* `make create_environment` — Set up the Python environment
+* `make clean` — Remove compiled Python files
+
+Add project-specific recipes to the `Makefile` for commands that are run frequently or require multiple steps.
+
+## Project Directory Structure
+
+* {{ cookiecutter.repo_name }}/
+  * data/           <- Data files (gitignored)
+    * raw/          <- Original, immutable data. NEVER modify.
+    * external/     <- Third-party data sources.
+    * interim/      <- Intermediate transformed data.
+    * processed/    <- Final, canonical datasets.
+  * models/         <- Trained models, predictions, summaries (gitignored)
+  * notebooks/      <- Jupyter notebooks for exploration
+  * references/     <- Data dictionaries, manuals, documentation
+  * reports/        <- Generated analysis outputs
+    * figures/      <- Generated figures
+  * {{ cookiecutter.module_name }}/  <- Source code for this project{%- if cookiecutter.include_code_scaffold == 'Yes' %}
+    * config.py     <- Project configuration and path definitions
+    * dataset.py    <- Data loading and generation
+    * features.py   <- Feature engineering code
+    * plots.py      <- Visualization code
+    * modeling/
+      * train.py    <- Model training.
+      * predict.py  <- Model inference{% endif %}
+  * tests/          <- Test suite
+
+## Core Principles
+
+Reproducibility is the most critical component of any data science project. The following principles should be followed to ensure that the project is reproducible and maintainable.
+
+### Data analysis is a directed acyclic graph
+
+Treat the data pipeline as a directed acyclic graph. Each step takes inputs and produces outputs with no circular dependencies. Anyone must be able to reproduce final outputs from code and raw data alone.
+
+### Raw data is immutable
+
+Never edit, overwrite, or manually modify files in `data/raw/`. Data flows one direction:
+
+`data/raw/` → `data/interim/` → `data/processed/`
+
+Intermediate outputs should be cached in `interim/`. Analysis-ready datasets that are end products in themselves or don't require any more preprocessing or feature engineering go in `processed/`.
+
+### Data is not in source control
+
+The `data/` and `models/` directories are gitignored. Do not commit data files, trained models, or `.env` files to git.{% if not cookiecutter.dataset_storage.none %} Use `make sync_data_down` / `make sync_data_up` to sync data with cloud storage.{% endif %}
+
+### Use Make as the task runner
+
+Run all tasks through `make` — see the Key Commands section above for the full list of available recipes.
+
+### Notebooks are for exploration; source code is for repetition
+
+Use `notebooks/` for exploratory analysis notebooks. When code is reused across notebooks, refactor it into the `{{ cookiecutter.module_name }}/` package. The project is installed as a local package, so you can import with:
+
+```python
+from {{ cookiecutter.module_name }}.dataset import main
+```
+
+Notebook naming convention: `<step>.<order>-<identifier>-<description>.ipynb` (e.g., `0.3-bull-visualize-distributions.ipynb`). Step numbers: 0=exploration, 1=cleaning/features, 2=visualizations, 3=modeling, 4=publication. Order number defines execution order within each step.
+
+### Secrets
+
+**NEVER** read `.env` or any secrets files directly. Code should load secrets with `python-dotenv`. Use `{{ cookiecutter.module_name }}/config.py` for project paths and configuration. Never hardcode credentials, print secrets in logs, or add them to source control.
+
+## Development Workflow
+
+* **Python version:** {{ cookiecutter.python_version_number }}
+{% if cookiecutter.environment_manager == 'conda' %}
+* **Environment:** conda. Activate with `conda activate {{ cookiecutter.repo_name }}`.
+{%- elif cookiecutter.environment_manager == 'virtualenv' %}
+* **Environment:** virtualenv. Activate with `workon {{ cookiecutter.repo_name }}`.
+{%- elif cookiecutter.environment_manager == 'pipenv' %}
+* **Environment:** pipenv. Activate with `pipenv shell`.
+{%- elif cookiecutter.environment_manager == 'uv' %}
+* **Environment:** uv. Activate with `source .venv/bin/activate`.
+{%- elif cookiecutter.environment_manager == 'pixi' %}
+* **Environment:** pixi. Activate with `pixi shell`.
+{%- elif cookiecutter.environment_manager == 'poetry' %}
+* **Environment:** poetry. Activate with `$(poetry env activate)` or prefix commands with `poetry run`.
+{%- endif %}
+{% if cookiecutter.dependency_file == 'requirements.txt' %}
+* **Dependencies:** Defined in `requirements.txt`. Install with `make requirements`.
+{%- elif cookiecutter.dependency_file == 'pyproject.toml' %}
+* **Dependencies:** Defined in `pyproject.toml`. Install with `make requirements`.
+{%- elif cookiecutter.dependency_file == 'environment.yml' %}
+* **Dependencies:** Defined in `environment.yml`. Install with `make requirements`.
+{%- elif cookiecutter.dependency_file == 'Pipfile' %}
+* **Dependencies:** Defined in `Pipfile`. Install with `make requirements`.
+{%- elif cookiecutter.dependency_file == 'pixi.toml' %}
+* **Dependencies:** Defined in `pixi.toml`. Install with `make requirements`.
+{%- endif %}
+{% if cookiecutter.linting_and_formatting == 'ruff' %}
+* **Linting/Formatting:** Uses ruff. Run `make lint` to check, `make format` to fix.
+{%- elif cookiecutter.linting_and_formatting == 'flake8+black+isort' %}
+* **Linting/Formatting:** Uses flake8, black, and isort. Run `make lint` to check, `make format` to fix.
+{%- endif %}
+{% if cookiecutter.testing_framework == 'pytest' %}
+* **Testing:** Uses pytest. Run `make test`.
+{%- elif cookiecutter.testing_framework == 'unittest' %}
+* **Testing:** Uses unittest. Run `make test`.
+{%- endif %}
+
+Linting and testing should succeed before committing work or at the end of each session. Run `make format` to format code if linting fails.
+
+## Version Control
+
+* Do not push to a remote repository without asking first.
+* Write concise commit messages that describe the change and why it was made.{% if cookiecutter.linting_and_formatting != 'none' %}
+* Run `make lint` and `make format` before committing changes.{% endif %}{% if cookiecutter.testing_framework != 'none' %}
+* Run `make test` before committing changes. Fix any failures — do not skip or disable tests.{% endif %}
+
+## Boundaries
+
+### Always do
+
+* Run all Python code within the project environment. {% if cookiecutter.environment_manager == 'conda' %}Use `conda run -n {{ cookiecutter.repo_name }}` to prefix commands, or activate with `conda activate {{ cookiecutter.repo_name }}` first.{% elif cookiecutter.environment_manager == 'uv' %}Use `uv run` to prefix commands, or activate with `source .venv/bin/activate` first.{% elif cookiecutter.environment_manager == 'pipenv' %}Use `pipenv run` to prefix commands, or activate with `pipenv shell` first.{% elif cookiecutter.environment_manager == 'pixi' %}Use `pixi run` to prefix commands, or activate with `pixi shell` first.{% elif cookiecutter.environment_manager == 'poetry' %}Use `poetry run` to prefix commands.{% elif cookiecutter.environment_manager == 'virtualenv' %}Activate with `workon {{ cookiecutter.repo_name }}` first.{% endif %}
+* Update the dependency file when installing new packages
+* Refactor reusable notebook code into the `{{ cookiecutter.module_name }}/` package
+
+### Ask first
+
+* Before deleting or overwriting files in `data/processed/`
+* Before adding new dependencies to the project
+* Before modifying the `Makefile`
+* Before creating new notebooks
+
+### Never do
+
+* Delete, edit, or overwrite files in `data/raw/`
+* Commit data files, trained models, or `.env` to version control
+* Hardcode credentials or print secrets in logs
+* Skip or disable lint rules or tests to get around failures
diff --git a/{{ cookiecutter.repo_name }}/README.md b/{{ cookiecutter.repo_name }}/README.md
@@ -12,6 +12,7 @@
 ├── LICENSE            <- Open-source license if one is chosen
 ├── Makefile           <- Makefile with convenience commands like `make data` or `make train`
 ├── README.md          <- The top-level README for developers using this project.
+├── AGENTS.md          <- The top-level AGENTS file for AI coding agents.
 ├── data
 │   ├── external       <- Data from third party sources.
 │   ├── interim        <- Intermediate data that has been transformed.