Core Improvement 9: In-File Hierarchy Intelligence

Current Status

PLANNED / FORWARD-LOOKING - Not implemented yet.

This proposal extends recur from:

file-level hierarchy (recur files, recur tree, recur stats)

to:

in-file hierarchy understanding (structured IDs, refs, contracts, tasks inside file content)

Overview

recur is already strong at selecting the right files. IMPROVEMENT9 proposes a second stage:

Select files with existing recur commands.
Run an in-file hierarchy command on that exact set.

This gives precise, composable analysis for both humans and LLMs.

Core Idea

Introduce an in-file command family (working name: recur in), designed to consume file sets from stdin or scope selection.

Examples (proposed):

# Stage 1: select files by file hierarchy
recur files "main.command.**.readme" -d docs/ \
  | recur in id "main.command.files.**" --stdin

# Stage 1 with Rust underscore naming
recur files "main_command_*_impl" -d src/ --sep _ \
  | recur in id "main.command.files.**" --stdin

# Analyze TODO chains only in selected command docs
recur files "main.command.**.todo*" -d docs/ \
  | recur in refs "todo.**" --stdin

This model keeps Unix composability: file filtering and in-file semantics stay separate but chain cleanly.

Design Thesis Update: Dual-Layer Hierarchy

IMPROVEMENT9 should formalize two layers that can be switched and chained at will:

File layer (recur files/tree/stats)

Answers: "Which files matter?"
Uses folder-appropriate separators (--sep _ for Rust modules, . for docs/tests).

In-file layer (recur in *)

Answers: "Which semantic IDs, refs, tasks, or recurring triggers matter inside those files?"
Reads selected files from stdin and/or a simple semantic-name list file.

The power is the composition:

# Select implementation modules with source separator
recur files "main_command_*_impl" -d src/ --sep _ \
  | recur in id "main.command.**" --stdin

This keeps separator policy local to the file selection phase while in-file semantics stay canonical.

Legacy Codebase Adoption: Simple Semantic Name List First

For existing repositories, do not require immediate file renaming or leaf-file proliferation. Start with one plain text semantic list:

docs/main.semantic.names.txt

Format rules:

one semantic ID per line
canonical dot IDs only (prefix.base.suffix[.qualifier])
dot notation is the default for hierarchical files and semantic IDs
optional blank lines and # comments
no embedded metadata schema required
use .todo.tracking when an item is centrally tracked in the list

Example file:

main.command.tree.todo.current
main.command.tree.todo.trigger.event
main.command.checkpoint.todo.current
main.command.checkpoint.todo.trigger.event

Resolution model:

Analyze/select candidate IDs quickly from the text list (human first, then LLM).
Resolve selected IDs against the file layer to retrieve concrete context.
Run in-file extraction only on resolved files.

In this model, interest lives in semantic IDs while actual working context is retrieved from matched files at the file layer.

Context retrieval examples:

# docs/tests lane (dot separator)
recur files "main.command.checkpoint.todo.current" -d docs/

# src lane (underscore separator)
recur files "main_command_checkpoint_todo_current" -d src/ --sep _

Recommended command additions:

recur in id|refs|trace|gaps --names-file docs/main.semantic.names.txt
recur in sync --names-file docs/main.semantic.names.txt (refresh list from repo)
recur in drift --names-file docs/main.semantic.names.txt (list IDs with no matching files)
recur in lane current|set --names-file ... (single active cursor management)

This gives immediate structure to legacy repos with minimal disruption.

Tracking example:

main.improvement.9.todo.tracking

*.todo.tracking is intended for fast, centralized queueing in one file instead of scattered per-item metadata files.

Why This Matters

Human Value

Faster impact analysis: not just which files changed, but which in-file IDs/contracts changed.
Better reviews: reviewers can inspect ref chains and unresolved identifiers quickly.
Less drift: docs/tests/notes can share the same ID taxonomy as code references.

LLM Value

Deterministic context narrowing: LLM can query exact files first, then exact in-file symbols.
Better planning loops: detect missing IDs/references and propose concrete next files.
Lower hallucination risk: the LLM can query real semantic IDs and resolved files instead of inferring from prose.

Human + LLM Combined Value

Shared operational state: both humans and LLMs consume the same IDs, refs, statuses, and recurring trigger logs.
Faster handoffs: "current lane" is queryable, not hidden in chat memory.
Better prioritization: trigger and dependency data can rank what to do next.
Lower cognitive load: operators ask the system for "next valid action" instead of manually stitching context.

Scope Expansion: Track More Than Code

The same hierarchy model can represent all work categories:

engineering: main.command.tree.impl
testing: main.command.tree.test.case.stdin
docs: main.command.tree.readme
incident response: ops.incident.auth.outage.2026_02_08
release: release.v2_3.rc1.checklist
experiments: research.llm.context.windowing.sep_policy
product tasks: product.search.ux.todo.priority

This allows one query language for engineering + operations + planning.

Recurring Workflow Triggers Only

Avoid broad one-off event modeling in the seed list. Keep only recurring workflow triggers that are repeatedly useful:

*.todo.trigger.event

Default recurring complete checklist:

update docs/history for the command
create a Git commit
push the branch
rotate *.todo.current to the next lane

Example:

main.command.checkpoint.todo.trigger.event

This keeps trigger behavior auditable while avoiding unnecessary event complexity.

Productivity and Interest Model

"Interest" here means what deserves attention now. Use an explicit scoring model over the semantic ID list plus discovered refs:

urgency (blocked, failing, near deadline)
impact (number of downstream refs)
freshness (stale TODOs / old checkpoints)
confidence (parser certainty for extracted IDs)

Then expose:

recur in focus --names-file ... --top 20 (proposed)
ranked work queue for humans
deterministic context pack for LLM sessions

Result:

humans get a prioritized worklist
LLMs get high-signal context windows
both operate on the same evidence base

Proposed Command Surface

1) `recur in id`

Find in-file hierarchical identifiers matching a pattern.

recur in id <PATTERN> [--stdin] [-d DIR] [--ext LIST] [--sep CHAR] [--json]

Example:

recur in id "main.command.files.**" -d docs/

2) `recur in refs`

Find references between in-file IDs (edge view).

recur in refs <PATTERN> [--stdin] [-d DIR] [--json] [--count]

Example:

recur in refs "main.command.files.todo.**" -d docs/

3) `recur in trace`

Trace in-file ID references (similar to function trace, but for ID graph).

recur in trace <ID> [--stdin] [--depth N] [--direction callers|callees|both] [--json]

Example:

recur in trace "main.command.files.todo.priority" -d docs/ --depth 2

4) `recur in gaps`

Gap detection for required suffix chains inside selected files.

recur in gaps <BASE> --require readme,test,todo [--stdin] [--json]

Example:

recur files "main.command.**" -d docs/ \
  | recur in gaps "main.command.files" --require readme,todo,todo.priority --stdin

Data Model (Proposed)

In-file IDs should follow the same contract as filenames:

main.<area>.<unit>.<artifact>[.<qualifier>]

Examples inside file content:

main.command.files.contract.v1
main.command.files.todo.priority
main.command.files.test.case.stdin.empty

Reference formats (examples):

Markdown link-style tags
comment tags (// id: main.command.files.contract.v1)
YAML/JSON key-value markers

Parser strategy:

start with regex-based extractors per file type
allow language-specific extractors later

Immediate Workflows Enabled

A) Changed-file semantic impact

git diff --name-only \
  | recur in id "main.command.**" --stdin --json

B) Docs-to-tests consistency check

recur files "main.command.**.readme" -d docs/ \
  | recur in refs "main.command.**.test" --stdin --count

C) Priority audit

recur files "main.command.**.todo*" -d docs/ \
  | recur in gaps "main.command" --require todo,todo.priority --stdin

Separation of Concerns (Important)

recur files/tree/stats: filesystem hierarchy truth.
recur in *: content hierarchy truth.
optional --names-file: coordination entry point (simple semantic ID seed list).

Do not merge them into one monolithic command. Composable stages are easier to reason about, test, and automate.

Implementation Plan (Suggested)

Phase 1: Minimal Viable In-File

Add recur in id with plain-text extraction.
Support --stdin, --ext, --sep, --json.
Reuse existing search option plumbing.

Phase 2: Legacy-Friendly Semantic Name Overlay

Add --names-file read path for recur in id.
Add recur in sync and recur in drift.
Add "single current lane" helpers (recur in lane).

Phase 3: Reference Graph

Add recur in refs.
Emit (from_id -> to_id, file, line) edges.

Phase 4: Trace + Gaps + Focus

Add recur in trace.
Add recur in gaps with required suffix policy.
Add recur in focus ranking from trigger/dependency signals.

Phase 5: Language Extractors

Markdown extractor.
Rust comment/doc extractor.
JSON/YAML structured key extractor.

Testing Strategy (Julia + Rust)

Julia Integration

Add julia-tests/main.command.in.id.test.jl
Add julia-tests/main.command.in.refs.test.jl
Add julia-tests/main.command.in.trace.test.jl
Add julia-tests/main.command.in.gaps.test.jl

Test goals:

respects stdin-selected file sets
honors separator choice and precedence
consistent JSON contracts
stable exit codes for no-match scenarios

Rust Unit Tests

parser/extractor tests by file type
ID normalization tests
edge extraction tests
gap policy tests

Risks and Controls

Risk: False positives from naive regex

Control:

explicit marker prefixes for high-confidence mode
language extractor adapters

Risk: ID taxonomy drift

Control:

central naming guide (docs/main.dogfooding.readme.md)
CI checks using recur in gaps

Risk: Performance on large repos

Control:

always support stdin-scoped execution
optional caching as future optimization (not required for the text-list model)

Success Criteria

Can chain file selection + in-file graph queries in one pipeline.
Humans can answer "what changed semantically?" in minutes, not hours.
LLM workflows become deterministic:
- select files
- extract IDs
- trace references
- report gaps
Legacy repos can adopt with a single semantic-name text file before any large rename campaign.
Recurring-trigger lanes are queryable and executable from data, not tribal memory.

Example End-to-End Session (Target UX)

# 1) Select all command docs and tests for files command
recur files "main.command.files.**" -d docs/ \
  | recur in id "main.command.files.**" --stdin

# 2) Trace todo priority dependencies
recur files "main.command.files.todo*" -d docs/ \
  | recur in trace "main.command.files.todo.priority" --stdin --depth 2

# 3) Check missing required branches
recur files "main.command.files.**" -d docs/ \
  | recur in gaps "main.command.files" --require readme,todo,todo.priority --stdin

If this works reliably, recur becomes not only a file hierarchy tool, but a semantic coordination layer for humans + LLMs.

FilesExpand file tree

README.CORE.IMPROVEMENT9.md

Latest commit

History

README.CORE.IMPROVEMENT9.md

File metadata and controls

Core Improvement 9: In-File Hierarchy Intelligence

Current Status

Overview

Core Idea

Design Thesis Update: Dual-Layer Hierarchy

Legacy Codebase Adoption: Simple Semantic Name List First

Why This Matters

Human Value

LLM Value

Human + LLM Combined Value

Scope Expansion: Track More Than Code

Recurring Workflow Triggers Only

Productivity and Interest Model

Proposed Command Surface

1) recur in id

2) recur in refs

3) recur in trace

4) recur in gaps

Data Model (Proposed)

Immediate Workflows Enabled

A) Changed-file semantic impact

B) Docs-to-tests consistency check

C) Priority audit

Separation of Concerns (Important)

Implementation Plan (Suggested)

Phase 1: Minimal Viable In-File

Phase 2: Legacy-Friendly Semantic Name Overlay

Phase 3: Reference Graph

Phase 4: Trace + Gaps + Focus

Phase 5: Language Extractors

Testing Strategy (Julia + Rust)

Julia Integration

Rust Unit Tests

Risks and Controls

Risk: False positives from naive regex

Risk: ID taxonomy drift

Risk: Performance on large repos

Success Criteria

Example End-to-End Session (Target UX)

1) `recur in id`

2) `recur in refs`

3) `recur in trace`

4) `recur in gaps`