Skip to content

[recipes] Thought enrichment pipeline#192

Open
alanshurafa wants to merge 1 commit into
NateBJones-Projects:mainfrom
alanshurafa:contrib/alanshurafa/thought-enrichment
Open

[recipes] Thought enrichment pipeline#192
alanshurafa wants to merge 1 commit into
NateBJones-Projects:mainfrom
alanshurafa:contrib/alanshurafa/thought-enrichment

Conversation

@alanshurafa
Copy link
Copy Markdown
Collaborator

Summary

LLM-powered recipe that retroactively classifies existing thoughts with type, importance, quality score, sensitivity tier, and structured metadata (topics, tags, people, action items).

  • Supports OpenRouter and Anthropic providers
  • Batching, retry, and checkpoint/resume for large backfills
  • Sensitivity-tier classification with pattern-based pre-filter
  • Separate backfill-type and backfill-sensitivity scripts for targeted re-runs
  • Pure Node.js ESM, no global installs — config via .env.local

Why

Most Open Brain sources hand you raw content. Without enrichment, the type, importance, quality_score, and sensitivity columns from the enhanced-thoughts schema stay null and the thoughts remain hard to retrieve or filter. This recipe is the turn-key way to populate those columns for anyone who's already running Open Brain and has accumulated untyped thoughts.

Requires the schemas/enhanced-thoughts/ columns from #191. The README calls that out in Prerequisites.

Part 2 of 12 in the OB1 Alpha Milestone consolidation.

Test plan

  • Apply schemas/enhanced-thoughts/schema.sql first (dependency)
  • Configure .env.local with Supabase + one LLM provider key
  • Run enrich-thoughts.mjs --limit 10 — verify 10 thoughts get classified
  • Re-run with same limit — verify idempotency (already-enriched thoughts skipped)
  • Kill mid-run and resume — verify checkpoint picks up cleanly
  • Run backfill-sensitivity.mjs --dry-run — verify pattern matches print but no writes
  • Verify metadata.json passes the gate

@github-actions github-actions Bot added the recipe Contribution: step-by-step recipe label Apr 18, 2026
@alanshurafa
Copy link
Copy Markdown
Collaborator Author

5 of my open PRs have merge conflicts and may also be overlapped by recent upstream work. Before I spend time rebasing them, I'd like a quick keep/close signal on each. I know 25 open PRs from one contributor is a lot; happy to close anything that's not useful.

Possibly reshaped by recent upstream direction (#278, #280, #283):

Possibly superseded by adjacent upstream PRs:

Possibly too specialized for core OB1:

No detailed review needed; a keep/close direction per item is enough.

@alanshurafa alanshurafa added area: recipes Review area: recipes review: needs-refresh Branch is stale, conflicted, or needs rebase before review alan-reviewed Reviewed by Alan Shurafa in Community Reviewer role labels May 20, 2026
@alanshurafa
Copy link
Copy Markdown
Collaborator Author

This branch conflicts with main. I'm using #192 as the design discussion for the enrichment-pipeline set; #201, #202, #219, and #221 are the related PRs. I'm holding rebases on all five until there's a direction here.

LLM-based thought classification. Hardened from code review: bounded
regex (closes a ReDoS vector), AbortController fetch timeouts,
delimited untrusted content with capped outputs, a --max-calls spend
cap, cursor pagination, and checkpoint resume.
@alanshurafa alanshurafa force-pushed the contrib/alanshurafa/thought-enrichment branch from f8522bf to fe64827 Compare May 20, 2026 16:34
@alanshurafa
Copy link
Copy Markdown
Collaborator Author

Rebased onto main — no conflicts now. This updates the existing recipes/thought-enrichment/ rather than adding it.

For review context: main's copy of this recipe predates its code review. This PR is the reviewed version — bounded regex (removes a ReDoS vector), AbortController fetch timeouts, prompt-injection delimiting with capped outputs, a --max-calls spend cap, cursor pagination, and checkpoint resume. It's an upgrade over what's on main, not a duplicate.

@alanshurafa alanshurafa added review: ready-for-maintainer Community reviewer recommends maintainer review and removed review: needs-refresh Branch is stale, conflicted, or needs rebase before review labels May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

alan-reviewed Reviewed by Alan Shurafa in Community Reviewer role area: recipes Review area: recipes recipe Contribution: step-by-step recipe review: ready-for-maintainer Community reviewer recommends maintainer review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant