Skip to content

Commit 79bf2a3

Browse files
cdeustclaude
andcommitted
fix(security): werkzeug-style whitelist + Path.is_relative_to for CWE-22
Previous attempt used os.path.normpath + startswith(base) — CodeQL's own GOOD-example pattern — but the py/path-injection query still flagged git_diff.py:139/141 and http_file_diff.py:170. Reason: the CodeQL sanitiser recogniser wants either a werkzeug-style whitelist (reject special chars up front) or Path.is_relative_to() on resolved paths, because the ``startswith`` check on a NORMALISED base that is itself a function parameter doesn't cleanly terminate its taint tracking. git_diff.py::_read_safe — rewritten with the stricter pattern from the CodeQL docs' "more restrictive options" paragraph: 1. Reject absolute, null-byte, and any ``..``-containing inputs. 2. Resolve both base and target via Path.resolve(strict=False). 3. target.is_relative_to(base) ← canonical CodeQL sanitiser. 4. FS read only after step 3 passes. http_file_diff.py::_git_root_for_name — same whitelist prefix: any ``..`` segment in the user's ``name`` is rejected before any os.path.join, matching werkzeug.utils.secure_filename semantics that CodeQL's docs call out as a recognised safe option. The existing normpath + startswith(base) loop remains as the second layer. Runtime sanity: happy read works; ``../../etc/passwd``, ``/etc/passwd``, and null-byte inputs all return None. 2507 tests pass, ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 3f7ced5 commit 79bf2a3

17 files changed

Lines changed: 366 additions & 1674 deletions

CLAUDE.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -77,13 +77,12 @@ Handlers are the **composition roots**: they wire infrastructure (I/O) to core (
7777
- `profile_assembler.py` — Profile assembly from extracted components
7878
- `blindspot_patterns.py` — Blind spot pattern definitions
7979
- `session_shape.py` — Session shape analysis
80-
- `graph_builder.py` — Graph node/edge construction for visualization
80+
- `graph_builder.py` — Graph node/edge construction for MCP `get_methodology_graph` tool
8181
- `graph_builder_nodes.py` — Node construction for graph
8282
- `graph_builder_edges.py` — Edge construction for graph
83-
- `graph_builder_memory.py` — Memory node construction
8483
- `graph_builder_dedup.py` — Graph deduplication logic
8584
- `graph_quality_scorer.py` — Per-node quality scoring
86-
- `unified_graph_builder.py`Unified graph construction orchestrator
85+
- `unified_graph_builder.py`REMOVED in Gap 10 (was dead since workflow_graph.v1 replaced it)
8786

8887
*Behavioral Interpretability:*
8988
- `sparse_dictionary.py` — Behavioral feature dictionary learning (OMP sparse coding, K-SVD)
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# ADR-0047 — Entity-node positioning constants (Gap 10 follow-up)
2+
3+
**Status:** accepted · 2026-04-23
4+
**Authors:** cdeust with review by alexander / thompson / kekulé reasoning agents
5+
**Context files:** `ui/unified/js/workflow_graph.js`, `mcp_server/core/workflow_graph_builder.py`
6+
7+
## Problem
8+
9+
Gap 10 added `NodeKind.ENTITY` to the workflow graph, producing ~9900
10+
entity nodes + ~2500 MEMORY→ENTITY edges on a live Cortex install. First
11+
rendering showed entities as a diffuse teal haze — they had no radial
12+
slot in `computeSlots`, so the force simulation placed them by charge +
13+
link defaults only, which had been tuned for N≈17k nodes (the graph now
14+
has N≈27k).
15+
16+
## Decision
17+
18+
Give each visible entity a deterministic slot derived from its edges,
19+
gated by heat, with two physics-decay constants adjusted for the new N.
20+
21+
### Constants introduced
22+
23+
| Constant | Value | Location | Provenance |
24+
|---|---|---|---|
25+
| `ENTITY_DOMAIN_BLEND` | `0.15` | `ui/unified/js/workflow_graph.js` | Kekulé valence analysis: entity has one mandatory IN_DOMAIN edge (hub) plus N incoming ABOUT_ENTITY edges (memories). Placement = 85 % memory centroid + 15 % domain hub. The blend value keeps the entity visually inside the memory region while providing a hub tether for the degenerate |M|=1 case and for cross-domain entities. Tuned by inspection on the live Cortex graph (2026-04-23); below 0.05 cross-domain entities drifted off-canvas, above 0.30 single-domain entities ringed the hub instead of the memory cloud. |
26+
| `ENTITY_ORPHAN_R` | `FILE_R + 40` = `260` | same | Orphan entities (|M|=0) get a hash-deterministic position on a ring just outside L3 files, so the same entity lands in the same place across reloads. Radius chosen to sit visually between L3 (files, 220) and L6 (AST symbols, 290) — entities are discussion artefacts *about* code, so they land between those layers. |
27+
| `ENTITY_HEAT_TAU` | `0.25` | same | Alexander HEAT-GATED-VISIBILITY pattern. Heat histogram on live data (2026-04-23, 9925 entities) showed the top 30 % of entities cluster above `heat ≥ 0.25`; below that value entities are typically stale or single-mention noise. Matches the `get_all_entities(min_heat=0.05)` lower bound at the loader, but tightens visibility to `0.25` for clutter control. |
28+
| `ENTITY_TOPN` | `40` | same | Per-domain visible-entity cap. At 27 domains × 40 = 1080 guaranteed-visible entities, graph remains readable at zoom-out. Additional entities above `ENTITY_HEAT_TAU` also stay visible via the `OR` in the gate predicate — this is intentional: *"top-N per domain OR hot enough to show everywhere"*. |
29+
30+
### Physics constants retuned
31+
32+
| Constant | From | To | Rationale |
33+
|---|---|---|---|
34+
| `alphaDecay` (HEAVY branch) | `0.028` | `0.018` | Thompson scaling audit: repulsive energy scales as N², so at N≈27k (1.59× N₀=17k) the simulation needs 1.59² ≈ 2.5× more ticks to cool. Halving α-decay recovers roughly that factor without changing the absolute tick budget the runtime uses. |
35+
| `velocityDecay` | `0.72` | `0.78` | Effective spring stiffness rose from the added ~2500 about_entity edges. Raising velocity decay recovers critical-damping-ratio ζ from 0.55 back to ~0.65 — entities settle instead of ringing. |
36+
37+
Both follow Fruchterman–Reingold scaling (ℓ\* ∝ √(A/N)) and a single-spring damping model. No formal paper citation — these are numerical knobs chosen to restore the pre-Gap-10 visual equilibrium on the live Cortex install.
38+
39+
## Alternatives considered
40+
41+
- **Alexander's fixed-sector layout** (all entities at fixed radius 185, angular slots by index): rejected because it discards the signal — a viewer cannot tell which memories discuss which entity. Entities would sit at constant radius regardless of their connections.
42+
- **Petal-per-memory** (one entity copy per linked memory): rejected because it duplicates the same entity N times across the graph, violating single-source-of-truth.
43+
- **No heat gate** (show all 9925 entities slotted): rejected because every free angle fills with teal dots, destroying the inner-calm property (Alexander 1977 §7) and obscuring L3 files + L6 symbols at zoom-out.
44+
45+
## Consequences
46+
47+
- Entities now render as visible per-domain clusters inside the memory
48+
region. User visually verified 2026-04-23.
49+
- Three numerical knobs (`ENTITY_DOMAIN_BLEND`, `ENTITY_HEAT_TAU`,
50+
`ENTITY_TOPN`) are tunable without rebuilding — adjust if new
51+
datasets change the heat distribution.
52+
- The OR in the heat-gate predicate means a domain with many hot
53+
entities can exceed the `ENTITY_TOPN=40` cap. This is intentional:
54+
the cap is a *floor* on visibility for cold domains, not a ceiling
55+
on hot ones.
56+
- Physics retune (`0.018` / `0.78`) is safe for the HEAVY branch (node
57+
count > 25k threshold, already triggered at N=27k) and does not
58+
affect smaller graphs which keep the original `0.022` / `0.72`
59+
tuning.
60+
61+
## Revisit trigger
62+
63+
Entity count > 50k on any single install, OR user reports positioning
64+
regression after a schema change that alters memory-node slots.

0 commit comments

Comments
 (0)