feat: upgrade recommender.py scoring to ML-based cosine similarity us…#136
feat: upgrade recommender.py scoring to ML-based cosine similarity us…#136Yogesh23-03 wants to merge 1 commit into
Conversation
|
@Yogesh23-03 is attempting to deploy a commit to the komalsony234-1530's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
Hi @komalharshita! I noticed there is a merge conflict in |
komalharshita
left a comment
There was a problem hiding this comment.
Thank you for working on a more advanced improvement to the recommendation engine. This is one of the more technically ambitious PRs submitted so far and the effort is appreciated.
The TF-IDF + cosine similarity implementation is logically correct, the code is readable, and CI passes successfully. However, there are several concerns that need to be addressed before this can be merged.
Main concerns:
-
The repository is currently very lightweight, and adding
scikit-learnintroduces a large dependency for a relatively small recommendation dataset. The complexity increase may not be justified for the current scale of the project. -
TF-IDF cosine similarity is being described as “ML-based”, but this implementation is closer to vector similarity / information retrieval rather than machine learning. The terminology should be adjusted for accuracy.
-
The new scoring system becomes difficult to interpret and maintain:
final_score = (skill_score * 10) + bonus_scoreThe scaling factor appears arbitrary and there is no calibration or explanation for why 10 was selected.
- The PR claims recommendation quality improvements, but no comparison examples or benchmarking against the existing algorithm were provided. Please include:
- before vs after recommendation examples,
- edge-case comparisons,
- and reasoning showing why the new approach improves recommendation relevance.
- Since the dataset is still relatively small, consider whether a lighter-weight approach (improved weighted matching, fuzzy matching, synonym expansion, etc.) may achieve similar benefits without introducing heavy ML dependencies.
This is a strong attempt at a meaningful backend improvement, but additional justification and refinement are needed before merge.
|
Thank you for the detailed review! I've addressed all four concerns below.
so skill match weight is comparable to bonus_score (max 5 points)final_score = (skill_score * SIMILARITY_SCALE) + bonus_score |
komalharshita
left a comment
There was a problem hiding this comment.
Thank you for the detailed follow-up and for addressing the earlier review concerns thoroughly.
The terminology corrections, scaling explanation, and before-vs-after recommendation comparisons significantly improved the clarity and justification of this implementation.
The new cosine similarity helper is modular and the implementation is readable overall. The additional reasoning around why the scaling factor exists also makes the scoring logic much easier to understand and maintain.
At this point, the main remaining blocker is that the branch still has unresolved merge conflicts in utils/recommender.py.
Please rebase/merge the latest main branch and resolve the conflicts cleanly. Once conflicts are resolved and CI passes again, this PR should be in a good state for merge.
5f2058c to
59f3e2a
Compare
|
Deployment failed with the following error: |
|
Hi @komalharshita ! Conflict resolved and all issues addressed.
Ready for final review! 🙏 |




Summary [required]
This PR upgrades the recommendation engine in utils/recommender.py
from a fixed point-based scoring system to ML-based cosine similarity
using scikit-learn's TfidfVectorizer and cosine_similarity.
This makes project recommendations smarter and more accurate by
computing actual vector similarity between user skills and project
skills instead of simple point counting.
Related Issue [required]
Closes #135
Type of Change [required]
data/projects.jsonWhat Was Changed [required]
utils/recommender.pytests/test_basic.pyrequirements.txtHow to Test This PR [required]
git checkout feature/ml-cosine-similaritypip install -r requirements.txtpython app.pypython tests/test_basic.pyExpected test output:
27 passed, 0 failed out of 27 tests
Test Results [required]
PASS test_projects_json_loads
PASS test_each_project_has_required_fields
PASS test_find_project_by_id_found
PASS test_find_project_by_id_missing
PASS test_parse_skills_basic
PASS test_parse_skills_empty_string
PASS test_parse_skills_single_entry
PASS test_score_single_project_full_match
PASS test_score_single_project_no_match
PASS test_get_recommendations_returns_results
PASS test_get_recommendations_max_three
PASS test_get_recommendations_no_match_returns_empty
PASS test_get_recommendations_result_format
PASS test_validate_all_valid
PASS test_validate_missing_skills
PASS test_validate_missing_level
PASS test_validate_missing_interest
PASS test_validate_missing_time
PASS test_validate_all_missing
PASS test_home_route
PASS test_recommend_api_valid
PASS test_recommend_api_missing_field
PASS test_recommend_api_empty_body
PASS test_project_detail_found
PASS test_project_detail_not_found
PASS test_view_code_found
PASS test_download_code_found
27 passed, 0 failed out of 27 tests
Screenshots (if UI change)
No UI changes in this PR.
Self-Review Checklist [required]
feat/,fix/,docs/,data/,style/,test/python tests/test_basic.pyand all 27 tests passflake8 .locally and there are no errorsprint()orconsole.log()debug statementsNotes for Reviewer
The test expected value was updated from 8 to 15 because the scoring
engine was upgraded from fixed points to ML-based cosine similarity.
The old value (8) reflected simple point counting. The new value (15)
reflects cosine similarity score scaled to 10 plus bonus points for
level, interest and time match. All 27 tests pass successfully.