Skip to content

fix: better whitebox wiki#418

Open
bearsyankees wants to merge 10 commits intomainfrom
whitebox-wiki
Open

fix: better whitebox wiki#418
bearsyankees wants to merge 10 commits intomainfrom
whitebox-wiki

Conversation

@bearsyankees
Copy link
Copy Markdown
Collaborator

No description provided.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR is a substantial feature release that evolves Strix into a source-aware whitebox analysis platform. The changes introduce:

  • Diff-scope resolution (--scope-mode, --diff-base): detects PR context automatically in CI, resolves a merge-base, filters changed files, and prepends a scoped instruction block to the agent task.
  • Wiki note persistence: notes in the wiki category are now written to per-run .md files and a JSON index on disk, enabling cross-agent shared memory that survives in-memory resets.
  • get_note tool: exposes individual note full-content retrieval so agents can use the more efficient metadata-first list_notes + targeted get_note pattern.
  • Source-aware skills (source_aware_whitebox, source_aware_sast): new skill files injected for whitebox runs, providing concrete SAST playbooks with semgrep, ast-grep, gitleaks, trufflehog, and trivy.
  • Dockerfile additions: @ast-grep/cli, tree-sitter-cli, all major grammar repos pre-cloned, and gitleaks installed from GitHub Releases.
  • is_whitebox flag propagation: threaded through LLMConfigLLM._get_skills_to_load → child agent creation, so source-aware skills and wiki context injection happen automatically for all whitebox scans.

Two minor style observations noted; no blocking issues found.

Confidence Score: 5/5

Safe to merge — no P0 or P1 issues found; all remaining observations are P2 style/consistency suggestions.

Core logic (diff resolution, wiki persistence, whitebox flag propagation) is correct and comprehensively tested. The two flagged items are minor code-quality observations that do not affect correctness or runtime behavior.

strix/interface/utils.py — minor inconsistency in _classify_diff_entries (added_files deduplication) and implicit ordering dependency between build_diff_scope_instruction and scope.to_metadata().

Important Files Changed

Filename Overview
strix/interface/utils.py Adds 600+ lines of diff-scope resolution logic; two minor style issues (added_files deduplication, ordering side-effect).
strix/tools/notes/notes_actions.py Large refactor adding wiki persistence, get_note tool, append_note_content, RLock wrapping — well-structured and tested.
strix/tools/agents_graph/agents_graph_actions.py Adds whitebox wiki context injection, wiki delta on agent_finish, and is_whitebox propagation to child agents.
containers/Dockerfile Adds ast-grep, tree-sitter, gitleaks; jq confirmed available from earlier apt-get block.
strix/interface/main.py Adds --scope-mode/--diff-base flags and wires diff_scope into scan config cleanly.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: strix/interface/utils.py
Line: 759-784

Comment:
**`added_files` list lacks deduplication**

`added_files` appends directly without a seen-set, while `modified_files` uses `_append_unique` + `modified_seen`. In theory `git diff --name-status -z` cannot emit the same path twice as `A`, but the inconsistency means `added_files_count` in `to_metadata()` could over-count in any edge case where a duplicate slips through the parser.

Consider introducing `added_seen: set[str] = set()` and switching the `A` branch to use `_append_unique(added_files, added_seen, path)` to match the pattern used for all other file categories.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/interface/utils.py
Line: 1011-1025

Comment:
**Implicit side-effect ordering dependency between `build_diff_scope_instruction` and `scope.to_metadata()`**

`build_diff_scope_instruction` mutates `scope.truncated_sections` as a side effect. The caller in `resolve_diff_scope_context` relies on a specific ordering: `build_diff_scope_instruction` must be called before `scope.to_metadata()` so truncation state is populated. This is currently correct but invisible to future readers.

Consider documenting this with a comment:
```python
# NOTE: build_diff_scope_instruction populates scope.truncated_sections as a
# side effect; to_metadata() must be called after it.
instruction_block = build_diff_scope_instruction(repo_scopes)
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "whitebox follow up: better wiki" | Re-trigger Greptile

Comment on lines +759 to +784

def _classify_diff_entries(entries: list[DiffEntry]) -> dict[str, Any]:
added_files: list[str] = []
modified_files: list[str] = []
deleted_files: list[str] = []
renamed_files: list[dict[str, Any]] = []
analyzable_files: list[str] = []
analyzable_seen: set[str] = set()
modified_seen: set[str] = set()

for entry in entries:
path = entry.path
if not path:
continue

if entry.status == "D":
deleted_files.append(path)
continue

if entry.status == "A":
added_files.append(path)
_append_unique(analyzable_files, analyzable_seen, path)
continue

if entry.status == "M":
_append_unique(modified_files, modified_seen, path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 added_files list lacks deduplication

added_files appends directly without a seen-set, while modified_files uses _append_unique + modified_seen. In theory git diff --name-status -z cannot emit the same path twice as A, but the inconsistency means added_files_count in to_metadata() could over-count in any edge case where a duplicate slips through the parser.

Consider introducing added_seen: set[str] = set() and switching the A branch to use _append_unique(added_files, added_seen, path) to match the pattern used for all other file categories.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/interface/utils.py
Line: 759-784

Comment:
**`added_files` list lacks deduplication**

`added_files` appends directly without a seen-set, while `modified_files` uses `_append_unique` + `modified_seen`. In theory `git diff --name-status -z` cannot emit the same path twice as `A`, but the inconsistency means `added_files_count` in `to_metadata()` could over-count in any edge case where a duplicate slips through the parser.

Consider introducing `added_seen: set[str] = set()` and switching the `A` branch to use `_append_unique(added_files, added_seen, path)` to match the pattern used for all other file categories.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1011 to +1025
active=False,
mode=scope_mode,
metadata={"active": False, "mode": scope_mode},
)

if not local_sources:
raise ValueError("Diff-scope is active, but no local repository targets were provided.")

repo_scopes: list[RepoDiffScope] = []
skipped_non_git: list[str] = []
skipped_diff_scope: list[str] = []
for source in local_sources:
source_path = source.get("source_path")
if not source_path:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Implicit side-effect ordering dependency between build_diff_scope_instruction and scope.to_metadata()

build_diff_scope_instruction mutates scope.truncated_sections as a side effect. The caller in resolve_diff_scope_context relies on a specific ordering: build_diff_scope_instruction must be called before scope.to_metadata() so truncation state is populated. This is currently correct but invisible to future readers.

Consider documenting this with a comment:

# NOTE: build_diff_scope_instruction populates scope.truncated_sections as a
# side effect; to_metadata() must be called after it.
instruction_block = build_diff_scope_instruction(repo_scopes)
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/interface/utils.py
Line: 1011-1025

Comment:
**Implicit side-effect ordering dependency between `build_diff_scope_instruction` and `scope.to_metadata()`**

`build_diff_scope_instruction` mutates `scope.truncated_sections` as a side effect. The caller in `resolve_diff_scope_context` relies on a specific ordering: `build_diff_scope_instruction` must be called before `scope.to_metadata()` so truncation state is populated. This is currently correct but invisible to future readers.

Consider documenting this with a comment:
```python
# NOTE: build_diff_scope_instruction populates scope.truncated_sections as a
# side effect; to_metadata() must be called after it.
instruction_block = build_diff_scope_instruction(repo_scopes)
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant