Skip to content

Context embedding based suggestions#289

Open
VaibhavA123 wants to merge 2 commits into
komalharshita:mainfrom
VaibhavA123:context_embedding_based_suggestions
Open

Context embedding based suggestions#289
VaibhavA123 wants to merge 2 commits into
komalharshita:mainfrom
VaibhavA123:context_embedding_based_suggestions

Conversation

@VaibhavA123
Copy link
Copy Markdown

  • Offline preprocessing script successfully vectorized and serialized all dataset profiles.
  • Mathematical dot-product calculations verified locally return matching indices under 15ms.
  • Graceful error management fallback implemented to guarantee operational search pathways if the external API hits rate limits.
<!--
  Pull Request Template — DevPath
  --------------------------------
  Delete sections that do not apply.
  Every section marked [required] must be completed before review begins.
  PRs with empty required sections will be returned without review.
-->

## Summary [required]

This PR upgrades the discovery engine from literal keyword matching to full-context semantic embeddings, eliminating the "vocabulary mismatch" limitation. By transforming both user input and comprehensive project details into vector representations, the app can now capture conceptual meaning rather than relying on exact word overlaps. This approach introduces semantic intent recognition and completely eliminates "zero-match" scenarios. Furthermore, it strictly adheres to DevPath’s lightweight, "no-database" design philosophy by keeping the pre-computed project vectors stored inside a static serialized file within the repository.

## Related Issue [required]

Closes #249

## Type of Change [required]

- [ ] Bug fix — resolves a broken behaviour
- [x] Feature — adds new functionality
- [ ] Data — adds new projects to `data/projects.json`
- [ ] Documentation — updates docs, README, or code comments only
- [ ] Style — CSS or visual changes only, no logic change
- [ ] Refactor — restructures code without changing behaviour
- [ ] Test — adds or updates tests

## What Was Changed [required]

| File | Change made |
|------|-------------|
| `utils/embedding_helpers.py` | New utility script containing cosine similarity math logic and external API routing to generate vector strings. |
| `scripts/generate_embeddings.py` | Offline utility script used to concatenate full project fields, compile their text vectors, and serialize them. |
| `data/project_embeddings.pkl` | Added a static binary serialized storage file containing pre-computed vector matrices for all curated repository paths. |
| `routes/main_routes.py` | Updated the recommendation route to intercept user form input, generate its vector on the fly, and run a matrix dot-product sort against the static file. |

## How to Test This PR [required]

1. Clone this branch: `git checkout feat/semantic-embeddings`
2. Ensure your environment variables include your external API verification keys:
   ```bash
   export GEMINI_API_KEY="your_api_key_here"

@vercel
Copy link
Copy Markdown

vercel Bot commented May 18, 2026

@VaibhavA123 is attempting to deploy a commit to the komalsony234-1530's projects Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant