feat: fetch and store repo license via licensee IN-1105#4095
Conversation
|
|
There was a problem hiding this comment.
Pull request overview
This PR adds repository license detection to the git integration pipeline and persists the detected SPDX identifier into the main public.repositories table, enabling downstream consumers to query repository license metadata.
Changes:
- Adds a
licensecolumn topublic.repositories(with rollback migration). - Extends the git integration Docker image to install the
licenseegem and its libgit2 build/runtime dependencies. - Introduces
LicenseService(invokeslicensee detect --json) and wires it into the repository worker’s first-batch processing, persisting results via a new CRUD helper.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/src/database/migrations/V1778154987__addLicenseToRepositories.sql | Adds license column to public.repositories. |
| backend/src/database/migrations/U1778154987__addLicenseToRepositories.sql | Drops license column on rollback. |
| scripts/services/docker/Dockerfile.git_integration | Installs Ruby + licensee and required libgit2/toolchain deps in the git integration image. |
| services/apps/git_integration/src/crowdgit/services/license/license_service.py | New async service to execute licensee and parse SPDX from JSON output. |
| services/apps/git_integration/src/crowdgit/services/license/init.py | Exports LicenseService from the license service module. |
| services/apps/git_integration/src/crowdgit/services/init.py | Re-exports LicenseService at the services package level. |
| services/apps/git_integration/src/crowdgit/worker/repository_worker.py | Runs license detection on first clone batch and writes the result to DB. |
| services/apps/git_integration/src/crowdgit/database/crud.py | Adds update_repository_license helper to persist SPDX ID. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
…N-1105 Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
b02ba60 to
58d4968
Compare
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
| async def detect(self, repo_path: str) -> str | None: | ||
| """Run licensee against repo_path and return the SPDX identifier, or None.""" | ||
| try: | ||
| output = await run_shell_command( | ||
| ["licensee", "detect", "--json", repo_path], timeout=60 | ||
| ) | ||
| except CommandExecutionError: | ||
| self.logger.info(f"licensee found no license in {repo_path}") | ||
| return None |
| ) | ||
| await self.maintainer_service.process_maintainers(repository, batch_info) | ||
| license_spdx = await self.license_service.detect(batch_info.repo_path) | ||
| await update_repository_license(repository.id, license_spdx) |
| async def detect(self, repo_path: str) -> str | None: | ||
| """Run licensee against repo_path and return the SPDX identifier, or None.""" | ||
| try: | ||
| output = await run_shell_command( | ||
| ["licensee", "detect", "--json", repo_path], timeout=60 | ||
| ) |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ef58578. Configure here.
| return None | ||
| except Exception as e: | ||
| self.logger.warning(f"licensee failed: {repr(e)}") | ||
| return None |
There was a problem hiding this comment.
Transient detection errors silently clear existing license data
Medium Severity
LicenseService.detect() returns None for both "no license exists" and all error conditions (timeout, binary not found, parse failure, etc.). The caller in repository_worker.py unconditionally passes this result to update_repository_license, which will overwrite a previously valid license (e.g. "MIT") with NULL when the tool fails transiently. The IS DISTINCT FROM guard won't help because 'MIT' IS DISTINCT FROM NULL evaluates to TRUE. A persistent issue like a missing licensee binary would gradually erase all stored license data.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit ef58578. Configure here.


Summary
licensecolumn (VARCHAR(255)) topublic.repositoriesvia a new migrationlicenseeRuby gem (v9.15.3, the last version compatible with Ruby 2.7 on Debian Bullseye) in the git integration Docker image, along withlibgit2build and runtime deps required by theruggedgemLicenseServicethat runslicensee detect --json <repo_path>and extracts the SPDX identifier from the JSON outputMIT,Apache-2.0,BSD-3-Clause) topublic.repositories.licensevia a newupdate_repository_licenseCRUD helperChanges
backend/src/database/migrations/V1778154987__addLicenseToRepositories.sql— addlicensecolumnbackend/src/database/migrations/U1778154987__addLicenseToRepositories.sql— undo migrationscripts/services/docker/Dockerfile.git_integration— install licensee v9.15.3 + libgit2 depsservices/apps/git_integration/src/crowdgit/services/license/license_service.py— new serviceservices/apps/git_integration/src/crowdgit/services/license/__init__.py— module initservices/apps/git_integration/src/crowdgit/services/__init__.py— export LicenseServiceservices/apps/git_integration/src/crowdgit/worker/repository_worker.py— wire serviceservices/apps/git_integration/src/crowdgit/database/crud.py— add update_repository_licenseNote
Medium Risk
Adds a new
licensecolumn and a new external dependency (licenseeRuby gem) executed during repository processing, which may impact worker runtime and database migrations if detection is slow or the tool behaves unexpectedly.Overview
Adds a new
public.repositories.license(VARCHAR(255)) column with forward/undo migrations and updates the data-access layer to select this field.Extends the git-integration worker to run a new
LicenseServiceon the first clone batch, using thelicenseegem to detect an SPDX ID and persist it via a newupdate_repository_licenseDB helper.Updates the git-integration Docker image to install Ruby +
licensee(and requiredlibgit2build/runtime deps) so license detection can run in production.Reviewed by Cursor Bugbot for commit ef58578. Bugbot is set up for automated code reviews on this repo. Configure here.