Pipeline for Maven by chinyeungli · Pull Request #1953 · aboutcode-org/scancode.io

chinyeungli · 2025-11-13T10:40:59Z

Issue: maven-heaven: Design and implement Scancode.io pipeline for a single Maven package #1763
Create a pipeline for Maven package

Signed-off-by: Chin Yeung Li <tli@nexb.com>

- Update format Signed-off-by: Chin Yeung Li <tli@nexb.com>

Signed-off-by: Chin Yeung Li <tli@nexb.com>

…1763 - Update package's license if missing while the same package has license detected in RESOURCES Signed-off-by: Chin Yeung Li <tli@nexb.com>

Signed-off-by: Chin Yeung Li <tli@nexb.com>

tdruez

Create a new maven pipe module in place of use resolve
Opening and loading a large file to make edits multiple times in various steps is not great.
To be discussed: Do we need a dedicated pipeline for just an extra step? Shouldn't the original scan_single_package detect that it's a Maven package and apply the necessary? Any reason to keep this new logic separated?

tdruez · 2025-11-19T04:28:19Z

+        with open(self.scan_output_location) as file:
+            data = json.load(file)
+            # Return and do nothing if data has pom.xml
+            for file in data["files"]:
+                if "pom.xml" in file["path"]:
+                    return
+            packages = data.get("packages", [])
+
+        pom_url_list = get_pom_url_list(self.project.input_sources[0], packages)
+        pom_file_list = download_pom_files(pom_url_list)
+        scanned_pom_packages, scanned_dependencies = scan_pom_files(pom_file_list)
+
+        updated_packages = packages + scanned_pom_packages
+        # Replace/Update the package and dependencies section
+        data["packages"] = updated_packages
+        data["dependencies"] = scanned_dependencies
+        with open(self.scan_output_location, "w") as file:
+            json.dump(data, file, indent=2)


Code logic should not be on the pipeline itself but in dedictated and easilly testable pipe functions

tdruez · 2025-11-19T04:31:36Z

            cls.extract_input_to_codebase_directory,
            cls.extract_archives,
            cls.run_scan,
+            cls.update_package_license_from_resource_if_missing,


This may have quite an impact on the default ScanSinglePackage results. We should probably handle this one separatly of the Maven context.

tdruez · 2025-11-19T04:33:36Z

+            if not packages or not resources:
+                return
+
+        updated_packages = update_package_license_from_resource_if_missing(


We should use database queries instead of manipulating complex dictionaries.

tdruez · 2025-11-19T04:37:55Z

+    return pom_file_list
+
+
+def scan_pom_files(pom_file_list):


This is too complex, it needs to be refactored as smaller functions

tdruez · 2025-11-19T04:39:04Z

+    return scanned_pom_packages, scanned_pom_deps
+
+
+def update_package_license_from_resource_if_missing(packages, resources):


This should be query-based.

Signed-off-by: Chin Yeung Li <tli@nexb.com>

- Create a new maven pipe module - Use database queries for update_package_license_from_resource_if_missing() - Add tests Signed-off-by: Chin Yeung Li <tli@nexb.com>

Signed-off-by: Chin Yeung Li <tli@nexb.com>

chinyeungli · 2025-12-29T03:37:00Z

@tdruez I’ve updated the code to include support for the "D2D" option.

The "deploy_to_devel" option is equivalent to the "map_deploy_to_develop" pipeline, which runs on Java, JavaScript, Kotlin, and Scala as these are the languages commonly found in Maven projects.

Signed-off-by: Chin Yeung Li <tli@nexb.com>

Some projects encountered a unique constraint violation when a resource was already mapped: ``` duplicate key value violates unique constraint "scanpipe_codebaserelation_unique_relation" DETAIL: Key (from_resource_id, to_resource_id, map_type)=(1512780, 1512790, jar_to_source) already exists. ``` Signed-off-by: Chin Yeung Li <tli@nexb.com>

chinyeungli added 5 commits November 13, 2025 07:52

Add "scan_maven_package" pipeline (Working-in-progress) #1763

9e93317

Signed-off-by: Chin Yeung Li <tli@nexb.com>

Use pom_url as the datafile_path for the dependenies #1763

928f742

Signed-off-by: Chin Yeung Li <tli@nexb.com>

Use empty string for datafile_path in dependencies #1763

d63a1e5

Signed-off-by: Chin Yeung Li <tli@nexb.com>

Removed dup code that's already present in ScanSinglePackage #1763

9812129

Signed-off-by: Chin Yeung Li <tli@nexb.com>

Update the matching regex for parse_maven_filename and added test #1763

114eb75

- Update format Signed-off-by: Chin Yeung Li <tli@nexb.com>

chinyeungli requested a review from tdruez November 13, 2025 10:40

github-advanced-security AI found potential problems Nov 13, 2025

View reviewed changes

Comment thread scanpipe/pipes/resolve.py Fixed

Refactor code and add tests #1763

13e8e88

Signed-off-by: Chin Yeung Li <tli@nexb.com>

github-advanced-security AI found potential problems Nov 14, 2025

View reviewed changes

Comment thread scanpipe/pipes/resolve.py Fixed

chinyeungli added 5 commits November 17, 2025 18:08

Implement "update_package_license_from_resource_if_missing" function #…

cb623c1

…1763 - Update package's license if missing while the same package has license detected in RESOURCES Signed-off-by: Chin Yeung Li <tli@nexb.com>

Only work with "maven" package type #1763

b937cff

Signed-off-by: Chin Yeung Li <tli@nexb.com>

Update docstring description #1763

0bbe979

Signed-off-by: Chin Yeung Li <tli@nexb.com>

Add test for the "scan_maven_package" pipeline #1763

2b3c9bb

Signed-off-by: Chin Yeung Li <tli@nexb.com>

Fix the Incomplete URL substring sanitization #1763

9a11dd8

Signed-off-by: Chin Yeung Li <tli@nexb.com>

chinyeungli mentioned this pull request Nov 18, 2025

maven-heaven: Design and implement Scancode.io pipeline for a single Maven package #1763

Open

tdruez requested changes Nov 19, 2025

View reviewed changes

chinyeungli added 7 commits November 19, 2025 13:28

Add "maven.google.com" as a maven_hosts and accept .aar file #1763

f303430

Signed-off-by: Chin Yeung Li <tli@nexb.com>

Reorganize code structure and code enhancement #1763

ab71751

- Create a new maven pipe module - Use database queries for update_package_license_from_resource_if_missing() - Add tests Signed-off-by: Chin Yeung Li <tli@nexb.com>

Update build-in-pipelines.rst #1763

5579db8

Signed-off-by: Chin Yeung Li <tli@nexb.com>

Merge remote-tracking branch 'origin/main' into 1763_pipeline_for_maven

9ed4c2d

Added steps to handle D2D in the maven pipeline #1763

12728e4

Signed-off-by: Chin Yeung Li <tli@nexb.com>

Update "exclude_from_diff" list #1763

3242b16

Signed-off-by: Chin Yeung Li <tli@nexb.com>

Ruff reformatted #1763

97a109b

Signed-off-by: Chin Yeung Li <tli@nexb.com>

chinyeungli added 2 commits December 29, 2025 11:45

Update class description #1763

9992d15

Signed-off-by: Chin Yeung Li <tli@nexb.com>

Error handling if pom url not found #1763

4f1b4d3

Signed-off-by: Chin Yeung Li <tli@nexb.com>

chinyeungli requested a review from tdruez January 28, 2026 03:31

pombredanne requested a review from ppkarwasz February 20, 2026 10:48

TG1999 self-requested a review May 1, 2026 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pipeline for Maven#1953

Pipeline for Maven#1953
chinyeungli wants to merge 21 commits into
mainfrom
1763_pipeline_for_maven

chinyeungli commented Nov 13, 2025 •

edited by pombredanne

Loading

Uh oh!

Uh oh!

Uh oh!

tdruez left a comment

Uh oh!

tdruez Nov 19, 2025

Uh oh!

tdruez Nov 19, 2025

Uh oh!

tdruez Nov 19, 2025

Uh oh!

tdruez Nov 19, 2025

Uh oh!

tdruez Nov 19, 2025

Uh oh!

chinyeungli commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return scanned_pom_packages, scanned_pom_deps


		def update_package_license_from_resource_if_missing(packages, resources):

Uh oh!

Conversation

chinyeungli commented Nov 13, 2025 • edited by pombredanne Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tdruez left a comment

Choose a reason for hiding this comment

Uh oh!

tdruez Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

tdruez Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

tdruez Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

tdruez Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

tdruez Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

chinyeungli commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chinyeungli commented Nov 13, 2025 •

edited by pombredanne

Loading