Merge pull request #159 from rostilos/1.5.3-rc

rostilos · web-flow · commit 3c465a58c46e · 2026-03-03T04:13:15.000+02:00
feat: Enhance branch issue reconciliation and file snapshot retrieval…
diff --git a/README.md b/README.md
@@ -4,25 +4,110 @@
 
 ## Capabilities by Platform
 
-CodeCrow supports multiple version control systems with varying levels of integration. Below is the current feature matrix:
-
-| Feature                | Bitbucket | GitHub | GitLab |
-| :--------------------- | :-------: | :----: | :----: |
-| PR Analysis            |     +     |   +    |   +    |
-| Branch Analysis        |     +     |   +    |   +    |
-| Task Context Retrieval |     -     |   -    |   -    |
-| /ask                   |     +     |   +    |   +    |
-| /analyze               |     +     |   +    |   +    |
-| /summarize             |     +     |   +    |   +    |
-| Continuous Analysis    |     +     |   +    |   +    |
-| RAG Pipeline           |     +     |   +    |   +    |
+CodeCrow supports multiple version control systems. The AI analysis engine is the same across all platforms — the differences are in how results are surfaced in each VCS.
+
+### Analysis & Review
+
+| Feature                  | Bitbucket | GitHub | GitLab |
+| :----------------------- | :-------: | :----: | :----: |
+| PR / MR Analysis         |    ✅     |   ✅   |   ✅   |
+| Branch Analysis (push)   |    ✅     |   ✅   |   ✅   |
+| Continuous Analysis      |    ✅     |   ✅   |   ✅   |
+| Incremental / Delta Diff |    ✅     |   ✅   |   ✅   |
+| RAG-Augmented Review     |    ✅     |   ✅   |   ✅   |
+
+### PR / MR Comment Integration
+
+| Feature                            |     Bitbucket     | GitHub | GitLab |
+| :--------------------------------- | :---------------: | :----: | :----: |
+| PR Summary Comment                 |        ✅         |   ✅   |   ✅   |
+| Inline Diff Comments               | via Code Insights |   ✅   |   ✅   |
+| Code Insights Report + Annotations |        ✅         |   —    |   —    |
+| Check Runs                         |         —         |   ✅   |   —    |
+| Threaded Comment Replies           |        ✅         |   —    |   ✅   |
+| Placeholder While Analyzing        |        ✅         |   ✅   |   ✅   |
+
+### Slash Commands (in PR comments)
+
+| Command           | Bitbucket | GitHub | GitLab |
+| :---------------- | :-------: | :----: | :----: |
+| `/ask <question>` |    ✅     |   ✅   |   ✅   |
+| `/analyze`        |    ✅     |   ✅   |   ✅   |
+| `/summarize`      |    ✅     |   ✅   |   ✅   |
+
+### Dashboard & Issue Management
+
+These features are platform-independent and available through the CodeCrow web UI.
+
+| Feature                     | Description                                                                    |
+| :-------------------------- | :----------------------------------------------------------------------------- |
+| Issue Tracker               | Per-branch and per-PR issue lists with severity, category, and status filters  |
+| Issue Lifecycle             | Automatic resolution tracking across analyses; manual resolve/reopen           |
+| Source Context Viewer       | Full source code browser with inline issue annotations for every analyzed file |
+| Git Graph                   | Visual commit history with per-commit analysis status and branch health        |
+| Quality Gates               | Configurable pass/fail thresholds per workspace                                |
+| Custom Rules                | Per-project enforce/suppress rules with glob-based file patterns               |
+| Project Analytics           | Aggregated severity breakdown, analysis history, and branch health             |
+| AI Model Selection          | Choose your LLM provider and model (OpenRouter, Anthropic, Google, OpenAI)     |
+| Workspace & Team Management | Roles (Owner, Admin, Member, Viewer), member invites, ownership transfer       |
+| Two-Factor Authentication   | TOTP-based 2FA for sensitive operations                                        |
+
+### Setup Methods
+
+| Method             |  Bitbucket   |     GitHub      | GitLab |
+| :----------------- | :----------: | :-------------: | :----: |
+| Native App Install | ✅ (Connect) | ✅ (GitHub App) |   —    |
+| Manual Webhook     |      ✅      |       ✅        |   ✅   |
+| CI Pipeline Action |      ✅      |        —        |   —    |
+
+---
+
+## Supported Languages
+
+CodeCrow's AI review is **language-agnostic** — it analyzes any language or framework the underlying LLM can understand. No special configuration is required.
+
+The RAG pipeline (codebase indexing for context-aware reviews) provides enhanced support for languages with dedicated AST parsers. All other text-based files are indexed using a generic chunker.
+
+| Language                 | AI Review | RAG (AST) | Notes                                             |
+| :----------------------- | :-------: | :-------: | :------------------------------------------------ |
+| Java                     |    ✅     |    ✅     | incl. Spring, Jakarta EE, Android                 |
+| Kotlin                   |    ✅     |    ✅     | incl. Android, Ktor                               |
+| Python                   |    ✅     |    ✅     | incl. Django, Flask, FastAPI                      |
+| JavaScript               |    ✅     |    ✅     | incl. React, Vue, Svelte, Node.js                 |
+| TypeScript               |    ✅     |    ✅     | incl. Angular, Next.js, Deno                      |
+| Go                       |    ✅     |    ✅     |                                                   |
+| Rust                     |    ✅     |    ✅     |                                                   |
+| C                        |    ✅     |    ✅     |                                                   |
+| C++                      |    ✅     |    ✅     |                                                   |
+| C#                       |    ✅     |    ✅     | incl. .NET, ASP.NET, Unity                        |
+| PHP                      |    ✅     |    ✅     | incl. Laravel, Symfony                            |
+| Ruby                     |    ✅     |    ✅     | incl. Rails                                       |
+| Swift                    |    ✅     |    ✅     | incl. iOS / macOS                                 |
+| Scala                    |    ✅     |    ✅     |                                                   |
+| Lua                      |    ✅     |    ✅     |                                                   |
+| Perl                     |    ✅     |    ✅     |                                                   |
+| Haskell                  |    ✅     |    ✅     |                                                   |
+| COBOL                    |    ✅     |    ✅     |                                                   |
+| Objective-C              |    ✅     |     —     |                                                   |
+| Bash / Shell             |    ✅     |     —     |                                                   |
+| SQL                      |    ✅     |     —     |                                                   |
+| R                        |    ✅     |     —     |                                                   |
+| HTML / CSS / SCSS        |    ✅     |     —     |                                                   |
+| Vue / Svelte SFCs        |    ✅     |     —     |                                                   |
+| YAML / TOML / JSON / XML |    ✅     |     —     | config files, IaC                                 |
+| Markdown / RST           |    ✅     |     —     | documentation                                     |
+| _Any other language_     |    ✅     |  generic  | LLM-dependent; no AST, uses text chunking for RAG |
+
+> **Framework-specific?** The review quality scales with the LLM's knowledge of the framework. Popular frameworks (React, Spring Boot, Django, Rails, Laravel, .NET, etc.) get high-quality, idiomatic feedback out of the box. Niche frameworks work too — the LLM simply has less training data to draw on.
 
 ## Key Features
 
 - **Context-Aware Reviews**: Powered by a custom RAG (Retrieval-Augmented Generation) pipeline using Qdrant vector storage.
 - **Incremental Analysis**: Only scans changed code to keep feedback fast and cost-efficient.
 - **Multi-Tenant Architecture**: Securely manage multiple teams and projects from a single dashboard.
 - **Interactive Commands**: Command CodeCrow directly from PR comments using `/ask`, `/analyze`, and `/summarize`.
+- **Issue Lifecycle**: Automatic tracking of resolved vs. open issues across analyses with deterministic and AI-based reconciliation.
+- **Bring Your Own Model**: Connect your preferred LLM provider — OpenRouter, Anthropic, Google, or OpenAI.
 
 ## Documentation
 
diff --git a/java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueReconciliationService.java b/java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueReconciliationService.java
@@ -279,6 +279,11 @@ public int sweepDeterministicResolutions(
 
         int resolvedCount = 0;
 
+        // Collect file contents fetched during sweep for branch-level snapshot backfill.
+        // This progressively fills the Source Context tab with all files that have issues,
+        // even if those files were never in a diff scope.
+        Map<String, String> fetchedFileContents = new LinkedHashMap<>();
+
         for (Map.Entry<String, List<BranchIssue>> entry : issuesByFile.entrySet()) {
             String filePath = entry.getKey();
             List<BranchIssue> fileIssues = entry.getValue();
@@ -303,6 +308,7 @@ public int sweepDeterministicResolutions(
                     continue;
                 }
                 currentHashes = LineHashSequence.from(fileContent);
+                fetchedFileContents.put(filePath, fileContent);
             } catch (Exception e) {
                 log.debug("Sweep: skipping file {} (fetch failed: {})", filePath, e.getMessage());
                 continue; // Don't resolve on error — leave for next run
@@ -357,6 +363,24 @@ public int sweepDeterministicResolutions(
             }
         }
 
+        // ── Backfill branch-level snapshots for non-diff files ────────────
+        // The sweep already fetched content for these files; persisting them
+        // as branch-level snapshots ensures they appear in the Source Context
+        // tab alongside files from the normal diff-based analysis scope.
+        if (!fetchedFileContents.isEmpty()) {
+            try {
+                int backfilled = fileSnapshotService.persistSnapshotsForBranch(
+                        branch, fetchedFileContents, request.getCommitHash());
+                if (backfilled > 0) {
+                    log.info("Backfilled {} branch-level file snapshots from sweep-fetched content (Branch: {})",
+                            backfilled, request.getTargetBranchName());
+                }
+            } catch (Exception e) {
+                log.warn("Failed to backfill branch snapshots from sweep (non-critical): {}",
+                        e.getMessage());
+            }
+        }
+
         if (resolvedCount > 0) {
             log.info("Deterministic sweep resolved {} stale issues across {} non-diff files (Branch: {})",
                     resolvedCount, issuesByFile.size(), request.getTargetBranchName());
diff --git a/java-ecosystem/libs/file-content/src/main/java/org/rostilos/codecrow/filecontent/service/FileSnapshotService.java b/java-ecosystem/libs/file-content/src/main/java/org/rostilos/codecrow/filecontent/service/FileSnapshotService.java
@@ -16,6 +16,8 @@
 import java.nio.charset.StandardCharsets;
 import java.security.MessageDigest;
 import java.security.NoSuchAlgorithmException;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.Optional;
@@ -468,21 +470,38 @@ public List<AnalyzedFileSnapshot> getSnapshotsForPr(Long pullRequestId) {
     // ── Branch-level aggregated retrieval ────────────────────────────────
 
     /**
-     * Get the latest file snapshots for a branch. Tries the direct branch_id FK first;
-     * falls back to the legacy DISTINCT ON aggregation across analyses.
+     * Get the latest file snapshots for a branch.
+     * <p>
+     * Merges two snapshot sources to ensure ALL ever-analysed files are visible:
+     * <ol>
+     *   <li><b>Branch-level snapshots</b> (direct branch_id FK) — created by
+     *       {@link #persistSnapshotsForBranch} during each analysis run. These only
+     *       cover files that appeared in a diff scope.</li>
+     *   <li><b>Legacy analysis-level snapshots</b> (via analysis_id + DISTINCT ON) — cover
+     *       all files from prior analyses that used the older code path.</li>
+     * </ol>
+     * Branch-level snapshots take precedence when both exist for the same file path.
      * Returns metadata only (no content loaded).
      */
     public List<AnalyzedFileSnapshot> getSnapshotsForBranch(Long projectId, String branchName) {
-        // Try direct FK first
+        Map<String, AnalyzedFileSnapshot> snapshotsByPath = new LinkedHashMap<>();
+
+        // 1. Branch-level snapshots (highest priority — latest content)
         Optional<Branch> branchOpt = branchRepository.findByProjectIdAndBranchName(projectId, branchName);
         if (branchOpt.isPresent()) {
             List<AnalyzedFileSnapshot> direct = snapshotRepository.findByBranchId(branchOpt.get().getId());
-            if (!direct.isEmpty()) {
-                return direct;
+            for (AnalyzedFileSnapshot s : direct) {
+                snapshotsByPath.put(s.getFilePath(), s);
             }
         }
-        // Legacy fallback
-        return snapshotRepository.findLatestSnapshotsByBranch(projectId, branchName);
+
+        // 2. Legacy analysis-level snapshots (fill gaps for files not yet in branch FK)
+        List<AnalyzedFileSnapshot> legacy = snapshotRepository.findLatestSnapshotsByBranch(projectId, branchName);
+        for (AnalyzedFileSnapshot s : legacy) {
+            snapshotsByPath.putIfAbsent(s.getFilePath(), s);
+        }
+
+        return new ArrayList<>(snapshotsByPath.values());
     }
 
     /**