Skip to content

chore: broaden .gitignore to stop committing runtime DB artifacts#1313

Open
kulcsarrudolf wants to merge 1 commit into
ed-donner:mainfrom
kulcsarrudolf:fix/reduce-repo-size
Open

chore: broaden .gitignore to stop committing runtime DB artifacts#1313
kulcsarrudolf wants to merge 1 commit into
ed-donner:mainfrom
kulcsarrudolf:fix/reduce-repo-size

Conversation

@kulcsarrudolf
Copy link
Copy Markdown

@kulcsarrudolf kulcsarrudolf commented Apr 18, 2026

First, a big thank you to Ed for The Complete Agentic AI Engineering Course (2025). I learned a lot from it. I already built my own agents, and I am still working through the material. I also plan to check out the other courses on your list next, because the way you explain things (clear, practical, and hands-on) really works for me. Thanks for putting this together.

With that said, here is a small contribution back. When I first cloned the repo, it was noticeably slow. I looked into why, and found a lot of runtime-generated files adding to the size. That is what motivated this PR.

Problem

Cloning the repo pulls down ~97 MB of git data (working tree ~211 MB). A big chunk of that comes from runtime-generated database and binary files that contributors committed over time:

  • ChromaDB vector stores: chroma.sqlite3, data_level0.bin, header.bin, length.bin, link_lists.bin
  • SQLite runtime databases: *.db, *.db-wal, *.db-shm, *.sqlite
  • Generated audio: *.wav

The existing .gitignore only matched a narrow set of exact names (memory.db, memory.db-wal, memory.db-shm, 6_mcp/accounts.db, 6_mcp/memory/*.db), so contributors kept committing their local runtime state without realizing it.

Changes

This PR is intentionally non-destructive:

  • Broaden .gitignore to catch the patterns above going forward.
  • Leave already-tracked files alone. No git rm --cached, no file deletions. Existing forks, clones, and open PRs stay unaffected. Nothing disappears from anyone's working tree on pull.

Effect: New contributions stop adding DB/vector-store artifacts. The repo stops growing from this class of mistake.

Why not also untrack existing files?

A git rm --cached sweep would:

  • Delete the files from every contributor's working tree on their next pull (potentially wiping locally-populated Chroma stores and agent memory).
  • Break open PRs and forks that touch any of these paths with merge conflicts.
  • Still not shrink clone size (blobs remain in history).

Given that the course has a large contributor base with active forks, the right sequencing is:

  1. This PR: Stop the bleeding via .gitignore (safe, no one notices).
  2. Future, coordinated PR: Selectively untrack the most egregious tracked artifacts (e.g. the ~8 MB ChromaDB folders), announced in advance so contributors can save local state.
  3. Optional, far-future: History rewrite (git filter-repo / BFG) to actually shrink clones. This is disruptive for all forks and needs coordination.

Test plan

  • .gitignore patterns verified against representative tracked paths (3_crew/stock_picker/memory/chroma.sqlite3, 4_langgraph/memory.db, *.wav under NLP_Agent_Dinesh_Uthayakumar).
  • git status on a clean checkout shows no unintended deletions.
  • Confirm CI still passes. No code paths change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant