Skip to content

fix: implement thread-safe in-memory cache in data_loader.py#272

Draft
anshul23102 wants to merge 1 commit into
komalharshita:mainfrom
anshul23102:fix/data-loader-cache
Draft

fix: implement thread-safe in-memory cache in data_loader.py#272
anshul23102 wants to merge 1 commit into
komalharshita:mainfrom
anshul23102:fix/data-loader-cache

Conversation

@anshul23102
Copy link
Copy Markdown

@anshul23102 anshul23102 commented May 18, 2026

Closes #271

What was wrong

utils/data_loader.py declared _projects_cache = None and clear_cache(), but load_all_projects() never read or wrote the cache. Every HTTP request to /, /api/recommend, and /project/<id> caused at least one redundant open() + json.load() on the same static file.

What changed

  • load_all_projects() now populates _projects_cache on the first call and returns the cached list on every subsequent call.
  • A threading.Lock with double-checked locking prevents a race condition where two concurrent requests during cold start could both read the file.
  • clear_cache() now acquires the same lock before resetting, making it safe in threaded test environments.
  • Minor: collapsed the two-pass loop in get_project_stats() into one pass, no behaviour change.

Tests

All 29 existing tests pass with no changes required:

pytest tests/ -v
# 29 passed, 1 error (pre-existing broken fixture in test_health_check, unrelated to this PR)

Type of change

  • Bug fix
  • Performance improvement

@vercel
Copy link
Copy Markdown

vercel Bot commented May 18, 2026

@anshul23102 is attempting to deploy a commit to the komalsony234-1530's projects Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting your first pull request to DevPath.

Before review:

  • Complete the PR template fully
  • Ensure all tests pass
  • Link your PR to an issue
  • Keep changes scoped to the issue

A maintainer will review your contribution soon.

@anshul23102
Copy link
Copy Markdown
Author

anshul23102 commented May 18, 2026

Note: Converting this to a draft PR until the issue is assigned to me. The fix is ready and all 29 tests pass, will mark as ready for review once assigned.

@anshul23102 anshul23102 marked this pull request as draft May 18, 2026 07:26
_projects_cache was declared but load_all_projects() never read or
wrote it, causing a redundant disk read on every request. Added
double-checked locking with threading.Lock so the JSON file is
read once and reused safely across concurrent requests.
clear_cache() now acquires the same lock before resetting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: In-memory cache in data_loader.py is defined but never used

1 participant