Skip to content

Commit af0cff9

Browse files
zhenghao104Copilot
andauthored
Promote MavenWithFallback detector replacing MvnCli (#1756)
* Promote MavenWithFallback detector replacing MvnCli * Fix and add test * Small nit * Update maven markdown * Update docs/detectors/maven.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Mask URL * Nit * Nit --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 8f1a8b6 commit af0cff9

File tree

12 files changed

+3000
-4331
lines changed

12 files changed

+3000
-4331
lines changed

docs/detectors/maven.md

Lines changed: 171 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,185 @@
44

55
Maven detection depends on the following to successfully run:
66

7-
- Maven CLI as part of your PATH. mvn should be runnable from a given command line.
8-
- Maven Dependency Plugin (installed with Maven).
97
- One or more `pom.xml` files.
8+
- Maven CLI (`mvn`) available in PATH **and** the Maven Dependency Plugin installed — required for full graph detection. If unavailable, the detector automatically falls back to static `pom.xml` parsing (see [Detection Strategy](#detection-strategy) below).
109

1110
## Detection strategy
1211

13-
Maven detection is performed by running `mvn dependency:tree -f {pom.xml}` for each pom file and parsing down the results.
12+
The detector (`MvnCliComponentDetector`, ID: `MvnCli`) uses a **two-path strategy**: Maven CLI for full dependency graph resolution, with automatic fallback to static `pom.xml` parsing when CLI is unavailable or fails. Both paths are handled in the same detector class.
1413

15-
Components tagged as a test dependency are marked as development dependencies.
14+
### High-level lifecycle
1615

17-
Full dependency graph generation is supported.
16+
```
17+
OnPrepareDetectionAsync (Phase 1 — runs once before any file is processed)
18+
19+
├─ [CLI disabled or unavailable] → return pom.xml stream as-is → OnFileFoundAsync (static path)
20+
21+
└─ [CLI available]
22+
23+
├─ Collect all pom.xml ProcessRequests via observable
24+
├─ Sort by directory depth (shallowest first) → filter to root-level pom.xml only
25+
├─ For each root pom.xml (sequentially):
26+
│ ├─ Run `mvn dependency:tree` → writes bcde.mvndeps next to pom.xml
27+
│ ├─ [success] record directory as succeeded
28+
│ └─ [failure] record directory as failed, capture error output
29+
├─ Scan entire source tree for all bcde.mvndeps files
30+
│ └─ Read each into MemoryStream (file handle released immediately)
31+
│ └─ Emit as ProcessRequests → OnFileFoundAsync (CLI path)
32+
└─ For each failed directory → re-emit original pom.xml ProcessRequests
33+
└─ → OnFileFoundAsync (static path)
34+
35+
OnFileFoundAsync (Phase 2 — called once per file emitted from Phase 1)
36+
37+
├─ [bcde.mvndeps] → ParseDependenciesFile → register full graph with scopes
38+
│ └─ [CleanupCreatedFiles=true] → delete bcde.mvndeps from disk
39+
40+
└─ [pom.xml] → static XML parsing (3-pass approach, see below)
41+
42+
OnDetectionFinishedAsync (Phase 3 — runs once after all files are processed)
43+
├─ Pass 2: resolve deferred Maven parent relationships
44+
└─ Pass 3: resolve pending components with hierarchy-aware variable substitution
45+
```
46+
47+
---
48+
49+
### Phase 1 — Prepare: CLI detection
50+
51+
#### Step 1.1 — Skip check
52+
53+
If the environment variable `CD_MAVEN_DISABLE_CLI=true` is set, Maven CLI is skipped entirely. All `pom.xml` files are passed through unchanged to Phase 2 for static parsing. `FallbackReason` is recorded as `MvnCliDisabledByUser`.
54+
55+
#### Step 1.2 — CLI availability check
56+
57+
`mvn --version` is executed (also tries `mvn.cmd` on Windows). If it fails to locate, all `pom.xml` files fall through to static parsing. `FallbackReason` is recorded as `MavenCliNotAvailable`.
58+
59+
#### Step 1.3 — Root pom.xml identification
60+
61+
All discovered `pom.xml` ProcessRequests are buffered and sorted by directory path length (shallowest first). The detector then walks each file's ancestors: if any ancestor directory already contains a `pom.xml`, the current file is **nested** and excluded from direct CLI invocation. This ensures Maven CLI is only run on the outermost project root in any given directory tree, which is how Maven itself works (parent POMs aggregate submodules).
62+
63+
For each root pom.xml and all its nested children, a mapping is recorded in `parentPomDictionary` (keyed by root directory). This mapping is used for fallback: if CLI fails for a root, all its nested children are re-emitted for static parsing.
64+
65+
> **Why `.ToList()` instead of streaming?** The nesting check requires knowledge of all discovered paths before any can be classified as root or nested. A streaming approach would risk emitting a file as a root before its true parent has been seen. Sorting by depth first guarantees correctness. The `ProcessRequest` objects at this stage hold a `LazyComponentStream` that does not open the file until `.Stream` is first accessed, so no file handles are held during the buffer.
66+
67+
#### Step 1.4 — Sequential Maven CLI invocation
68+
69+
For each root `pom.xml`, Maven CLI is invoked **sequentially** (not in parallel) to avoid Maven local repository lock contention and reduce JVM memory pressure:
70+
71+
```
72+
mvn dependency:tree -B -DoutputFile=bcde.mvndeps -DoutputType=text -f{pom.xml}
73+
```
74+
75+
- **`-B`** — batch mode (no interactive prompts).
76+
- **`-DoutputFile=bcde.mvndeps`** — writes the dependency tree next to the `pom.xml`.
77+
- **`-DoutputType=text`** — text format parseable by `MavenStyleDependencyGraphParser`.
78+
79+
If the `MvnCLIFileLevelTimeoutSeconds` environment variable is set, a per-file cancellation timeout is applied via a linked `CancellationTokenSource`.
80+
81+
On success, the existence of `bcde.mvndeps` is verified (CLI can exit 0 but skip the file in edge cases). On failure, error output is captured for later authentication error analysis.
82+
83+
#### Step 1.5 — Dependency file discovery
84+
85+
After all CLI invocations complete, the entire source directory is re-scanned for `bcde.mvndeps` files (this catches submodule output files generated by the parent POM run). Each file is:
86+
1. Read fully into a `MemoryStream` — releasing the underlying file handle immediately.
87+
2. Wrapped in a new `ProcessRequest` with a `SingleFileComponentRecorder` keyed to the corresponding `pom.xml` path in the same directory.
88+
89+
#### Step 1.6 — Failure analysis and fallback assembly
90+
91+
If any CLI invocations failed, error output is scanned for authentication patterns (`401`, `403`, `Unauthorized`, `Access denied`). If found, `FallbackReason` is set to `AuthenticationFailure` and any matching repository URLs are extracted and logged as guidance. Otherwise, `FallbackReason` is set to `OtherMvnCliFailure`.
92+
93+
For each failed root directory, all `pom.xml` ProcessRequests from `parentPomDictionary` (the root itself plus all nested children) are emitted in depth-first order (parent before child) for static parsing.
94+
95+
The final observable returned to the framework is the concatenation of:
96+
- All `bcde.mvndeps` ProcessRequests (CLI successes)
97+
- All `pom.xml` ProcessRequests from failed directories (static fallback)
98+
99+
---
100+
101+
### Phase 2 — File processing: `OnFileFoundAsync`
102+
103+
Each `ProcessRequest` emitted in Phase 1 is dispatched here. The file type is distinguished by its `Pattern` field.
104+
105+
#### CLI path: `bcde.mvndeps`
106+
107+
The file is passed to `MavenStyleDependencyGraphParser` via `MavenCommandService.ParseDependenciesFile`. The parser reads the text-format dependency tree line-by-line:
108+
109+
1. **First non-blank line** — the root artifact (`groupId:artifactId:packaging:version`). Registered as a direct dependency.
110+
2. **Subsequent lines** — each is a tree node prefixed with `+-` (direct child) or `\-` (last child) at an indented position. The indentation depth (character offset of the splitter) is used to maintain a parse stack, from which parent-child edges are derived and registered.
111+
112+
Component string format:
113+
```
114+
groupId:artifactId:packaging:version:scope
115+
```
116+
Scope is mapped to `DependencyScope` (`MavenCompile`, `MavenTest`, `MavenProvided`, `MavenRuntime`, `MavenSystem`). `test`-scoped dependencies are also marked as `isDevelopmentDependency=true`.
117+
118+
If `CleanupCreatedFiles` is set on the scan request, `bcde.mvndeps` is deleted from disk after parsing (wrapped in a try/catch so failures are non-fatal).
119+
120+
#### Static fallback path: `pom.xml`
121+
122+
Static parsing operates in **three passes** spread across Phase 2 and Phase 3, designed to handle Maven's property inheritance correctly.
123+
124+
**Pass 1 (during `OnFileFoundAsync`):**
125+
126+
The `pom.xml` XML document is parsed once. For each file, the detector:
127+
128+
1. **Tracks project coordinates** — queries `groupId`, `artifactId`, and (from `<parent>` if own `groupId` is absent) stores the project in `processedMavenProjects` under both `artifactId` and `groupId:artifactId` keys. This enables coordinate-based parent lookup.
129+
130+
2. **Parses Maven parent relationship** — reads `<parent><groupId>` and `<parent><artifactId>`. If the parent pom.xml has already been processed, the `child → parent` relationship is stored immediately in `mavenParentChildRelationships`. Otherwise, the relationship is queued in `unresolvedParentRelationships` for Pass 2.
131+
132+
3. **Collects variables** — all `<properties>` sections are read (supports multiple `<properties>` blocks for malformed XML). `project.version`, `project.groupId`, `project.artifactId`, `version`, `groupId`, `artifactId` are also collected. Variables are stored in `collectedVariables` keyed as `filePath::variableName` to scope them to their source file for hierarchy-aware resolution.
133+
134+
4. **Registers dependencies:**
135+
- **Literal version** (e.g., `1.2.3`) → registered immediately.
136+
- **Variable version resolved locally** (e.g., `${revision}` defined in this same file's `<properties>`) → resolved and registered immediately.
137+
- **Variable version unresolvable locally** (e.g., `${revision}` from a parent POM) → added to `pendingComponents` queue with the raw template for Pass 3.
138+
- **Range version** (contains `,`) or **missing version** → skipped with a debug log.
139+
140+
---
141+
142+
### Phase 3 — Finish: `OnDetectionFinishedAsync`
143+
144+
#### Pass 2 — Deferred parent relationship resolution
145+
146+
The `unresolvedParentRelationships` queue is drained. For each entry, the cache entry is cleared and `processedMavenProjects` is queried again (now fully populated). Lookup tries `groupId:artifactId` first, then `artifactId` alone. Resolved relationships are written to `mavenParentChildRelationships`.
147+
148+
#### Pass 3 — Hierarchy-aware variable resolution
149+
150+
All entries in `pendingComponents` are drained. For each component with an unresolved version template (e.g., `${myVersion}`):
151+
152+
1. Starting from the component's own `pom.xml`, the detector walks up `mavenParentChildRelationships` (child → parent → grandparent).
153+
2. At each level, `collectedVariables[filePath::variableName]` is checked.
154+
3. The **first match wins** — this implements Maven's child-overrides-parent property precedence.
155+
4. Circular parent references are detected via a `visitedFiles` HashSet and broken safely.
156+
5. If the variable is still unresolved after exhausting the hierarchy (e.g., defined in an external parent POM not on disk), the component is skipped and `UnresolvedVariableCount` is incremented in telemetry.
157+
158+
---
159+
160+
### Detection method tracking
161+
162+
At completion, the `DetectionMethod` telemetry field records one of:
163+
164+
| Value | Meaning |
165+
|---|---|
166+
| `MvnCliOnly` | All root pom.xml files were processed by Maven CLI successfully |
167+
| `StaticParserOnly` | CLI was disabled or unavailable; all components from static parsing |
168+
| `Mixed` | Maven CLI was attempted; at least one root fell back to static parsing (possibly all) |
169+
| `None` | No pom.xml files were found |
170+
171+
`FallbackReason` records why static parsing was triggered: `None`, `MvnCliDisabledByUser`, `MavenCliNotAvailable`, `AuthenticationFailure`, or `OtherMvnCliFailure`.
172+
173+
---
18174

19175
## Known limitations
20176

21-
Maven detection will not run if `mvn` is unavailable.
177+
- Static fallback parsing does **not** resolve variables defined in external parent POMs that are not present on disk (e.g., published to a remote Maven repository). Affected components are skipped.
178+
- Static parsing does **not** produce a dependency graph (no parent-child edges between components) — it produces a flat component list only. Full graph with transitive dependencies requires Maven CLI.
179+
- Version ranges (e.g., `[1.0,2.0)`) are not supported by static parsing and are skipped.
180+
- Maven CLI invocations run sequentially. On repositories with many independent root `pom.xml` files, this can be slow. Set `MvnCLIFileLevelTimeoutSeconds` to bound per-file execution time.
181+
- If Maven CLI exits successfully but the `bcde.mvndeps` file is not created (edge case with certain POM configurations), the file falls back to static parsing.
22182

23-
## Environment Variables
183+
## Environment variables
24184

25-
The environment variable `MvnCLIFileLevelTimeoutSeconds` is used to control the max execution time Mvn CLI is allowed to take per each `pom.xml` file. Default value, unbounded. This will restrict any spikes in scanning time caused by Mvn CLI during package restore. We suggest to restore the Maven packages beforehand, so that no network calls happen when executing "mvn dependency:tree" and the graph is captured quickly.
185+
| Variable | Default | Description |
186+
|---|---|---|
187+
| `MvnCLIFileLevelTimeoutSeconds` | Unbounded | Maximum seconds Maven CLI may spend on a single `pom.xml`. Pre-restoring packages eliminates network calls and makes this limit more predictable. |
188+
| `CD_MAVEN_DISABLE_CLI` | `false` | Set to `true` to skip Maven CLI entirely and use only static `pom.xml` parsing. |

src/Microsoft.ComponentDetection.Detectors/maven/MavenCommandService.cs

Lines changed: 0 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
namespace Microsoft.ComponentDetection.Detectors.Maven;
33

44
using System;
5-
using System.Collections.Concurrent;
65
using System.IO;
76
using System.Threading;
87
using System.Threading.Tasks;
@@ -20,19 +19,6 @@ internal class MavenCommandService : IMavenCommandService
2019

2120
internal static readonly string[] AdditionalValidCommands = ["mvn.cmd"];
2221

23-
/// <summary>
24-
/// Per-location semaphores to prevent concurrent Maven CLI executions for the same pom.xml.
25-
/// This allows multiple detectors (e.g., MvnCliComponentDetector and MavenWithFallbackDetector)
26-
/// to safely share the same output file without race conditions.
27-
/// </summary>
28-
private readonly ConcurrentDictionary<string, SemaphoreSlim> locationLocks = new();
29-
30-
/// <summary>
31-
/// Tracks locations where dependency generation has completed successfully.
32-
/// Used to skip duplicate executions when multiple detectors process the same pom.xml.
33-
/// </summary>
34-
private readonly ConcurrentDictionary<string, MavenCliResult> completedLocations = new();
35-
3622
private readonly ICommandLineInvocationService commandLineInvocationService;
3723
private readonly IMavenStyleDependencyGraphParserService parserService;
3824
private readonly IEnvironmentVariableService envVarService;
@@ -58,56 +44,6 @@ public async Task<bool> MavenCLIExistsAsync()
5844
}
5945

6046
public async Task<MavenCliResult> GenerateDependenciesFileAsync(ProcessRequest processRequest, CancellationToken cancellationToken = default)
61-
{
62-
var pomFile = processRequest.ComponentStream;
63-
var pomDir = Path.GetDirectoryName(pomFile.Location);
64-
var depsFilePath = Path.Combine(pomDir, this.BcdeMvnDependencyFileName);
65-
66-
// Check the cache before acquiring the semaphore to allow fast-path returns
67-
// even when cancellation has been requested.
68-
if (this.completedLocations.TryGetValue(pomFile.Location, out var cachedResult)
69-
&& cachedResult.Success
70-
&& File.Exists(depsFilePath))
71-
{
72-
this.logger.LogDebug("{DetectorPrefix}: Skipping duplicate \"dependency:tree\" for {PomFileLocation}, already generated", DetectorLogPrefix, pomFile.Location);
73-
return cachedResult;
74-
}
75-
76-
// Use semaphore to prevent concurrent Maven CLI executions for the same pom.xml.
77-
// This allows MvnCliComponentDetector and MavenWithFallbackDetector to safely share the output file.
78-
var semaphore = this.locationLocks.GetOrAdd(pomFile.Location, _ => new SemaphoreSlim(1, 1));
79-
80-
await semaphore.WaitAsync(cancellationToken);
81-
try
82-
{
83-
// Re-check the cache after acquiring the semaphore in case another caller
84-
// completed while we were waiting.
85-
if (this.completedLocations.TryGetValue(pomFile.Location, out cachedResult)
86-
&& cachedResult.Success
87-
&& File.Exists(depsFilePath))
88-
{
89-
this.logger.LogDebug("{DetectorPrefix}: Skipping duplicate \"dependency:tree\" for {PomFileLocation}, already generated", DetectorLogPrefix, pomFile.Location);
90-
return cachedResult;
91-
}
92-
93-
var result = await this.GenerateDependenciesFileCoreAsync(processRequest, cancellationToken);
94-
95-
// Only cache successful results. Failed results should allow retries for transient failures,
96-
// and caching them would waste memory since the cache check requires Success == true anyway.
97-
if (result.Success)
98-
{
99-
this.completedLocations[pomFile.Location] = result;
100-
}
101-
102-
return result;
103-
}
104-
finally
105-
{
106-
semaphore.Release();
107-
}
108-
}
109-
110-
private async Task<MavenCliResult> GenerateDependenciesFileCoreAsync(ProcessRequest processRequest, CancellationToken cancellationToken)
11147
{
11248
var cliFileTimeout = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
11349
var timeoutSeconds = -1;

src/Microsoft.ComponentDetection.Detectors/maven/MavenConstants.cs

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,4 @@ public static class MavenConstants
1515
/// Detector ID for MvnCliComponentDetector.
1616
/// </summary>
1717
public const string MvnCliDetectorId = "MvnCli";
18-
19-
/// <summary>
20-
/// Detector ID for MavenWithFallbackDetector.
21-
/// </summary>
22-
public const string MavenWithFallbackDetectorId = "MavenWithFallback";
2318
}

0 commit comments

Comments
 (0)