Skip to content

Pipeline for Maven#1953

Open
chinyeungli wants to merge 21 commits into
mainfrom
1763_pipeline_for_maven
Open

Pipeline for Maven#1953
chinyeungli wants to merge 21 commits into
mainfrom
1763_pipeline_for_maven

Conversation

@chinyeungli
Copy link
Copy Markdown
Contributor

@chinyeungli chinyeungli commented Nov 13, 2025

Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
- Update format

Signed-off-by: Chin Yeung Li <tli@nexb.com>
@chinyeungli chinyeungli requested a review from tdruez November 13, 2025 10:40
Comment thread scanpipe/pipes/resolve.py Fixed
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Comment thread scanpipe/pipes/resolve.py Fixed
…1763

- Update package's license if missing while the same package has license detected in RESOURCES

Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Copy link
Copy Markdown
Contributor

@tdruez tdruez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Create a new maven pipe module in place of use resolve
  • Opening and loading a large file to make edits multiple times in various steps is not great.
  • To be discussed: Do we need a dedicated pipeline for just an extra step? Shouldn't the original scan_single_package detect that it's a Maven package and apply the necessary? Any reason to keep this new logic separated?

Comment on lines +57 to +74
with open(self.scan_output_location) as file:
data = json.load(file)
# Return and do nothing if data has pom.xml
for file in data["files"]:
if "pom.xml" in file["path"]:
return
packages = data.get("packages", [])

pom_url_list = get_pom_url_list(self.project.input_sources[0], packages)
pom_file_list = download_pom_files(pom_url_list)
scanned_pom_packages, scanned_dependencies = scan_pom_files(pom_file_list)

updated_packages = packages + scanned_pom_packages
# Replace/Update the package and dependencies section
data["packages"] = updated_packages
data["dependencies"] = scanned_dependencies
with open(self.scan_output_location, "w") as file:
json.dump(data, file, indent=2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code logic should not be on the pipeline itself but in dedictated and easilly testable pipe functions

cls.extract_input_to_codebase_directory,
cls.extract_archives,
cls.run_scan,
cls.update_package_license_from_resource_if_missing,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have quite an impact on the default ScanSinglePackage results. We should probably handle this one separatly of the Maven context.

if not packages or not resources:
return

updated_packages = update_package_license_from_resource_if_missing(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use database queries instead of manipulating complex dictionaries.

Comment thread scanpipe/pipes/resolve.py Outdated
return pom_file_list


def scan_pom_files(pom_file_list):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too complex, it needs to be refactored as smaller functions

Comment thread scanpipe/pipes/resolve.py Outdated
return scanned_pom_packages, scanned_pom_deps


def update_package_license_from_resource_if_missing(packages, resources):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be query-based.

Signed-off-by: Chin Yeung Li <tli@nexb.com>
- Create a new maven pipe module
- Use database queries for update_package_license_from_resource_if_missing()
- Add tests

Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
@chinyeungli
Copy link
Copy Markdown
Contributor Author

@tdruez I’ve updated the code to include support for the "D2D" option.

Screenshot 2025-12-29 113240

The "deploy_to_devel" option is equivalent to the "map_deploy_to_develop" pipeline, which runs on Java, JavaScript, Kotlin, and Scala as these are the languages commonly found in Maven projects.

Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
@chinyeungli chinyeungli requested a review from tdruez January 28, 2026 03:31
@TG1999 TG1999 self-requested a review May 1, 2026 09:08
Some projects encountered a unique constraint violation when a resource
was already mapped:
```
duplicate key value violates unique constraint "scanpipe_codebaserelation_unique_relation"
DETAIL:  Key (from_resource_id, to_resource_id, map_type)=(1512780, 1512790, jar_to_source) already exists.
```

Signed-off-by: Chin Yeung Li <tli@nexb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants