Last Updated: 2025-10-09
Main project documentation - Quick start, installation, architecture overview, API reference, and troubleshooting guide.
System architecture - Detailed technical architecture including data flow, components, and design decisions.
Split scraping architecture - Documentation for the two-phase RSS/scraping pipeline that separates fast feed fetching from slow content scraping.
Key Points:
- Phase 1: Fast RSS metadata ingestion
- Phase 2: Async content scraping with worker pool
- Status tracking: pending/in_progress/completed/failed/skipped
- Retry logic and error handling
Event correlation & novel facts - How the system intelligently merges duplicate events while extracting new information.
Key Points:
- OpenAI-based similarity analysis
- Smart merging of duplicate events
- Novel facts detection and extraction
- Creation of "Additional Details" events
Test suite analysis - Analysis of integration test results, including AI-based correlation tests.
Test Coverage:
- 22 of 24 tests passing (91.7%)
- Deduplication: 100% passing
- Correlation: 75% passing (AI-based, some subjectivity)
- Confidence: 100% passing
- Magnitude: 100% passing
Admin dashboard specification - Design spec for the brutalist cyberpunk admin interface.
Covers:
- Authentication system
- Source management UI
- Event moderation dashboard
- System monitoring views
- Configuration management
UI design system - Complete design specification for the brutalist cyberpunk aesthetic.
Includes:
- Color palette and typography
- Component library
- Animation effects (glitch, scan lines)
- Responsive design patterns
OSINT data sources research - Research on available data sources, APIs, rate limits, and terms of service.
Sources Covered:
- Twitter/X API
- Telegram
- 4chan
- Government RSS feeds
- News APIs
MCP function design - Design for Model Context Protocol (MCP) integration.
Note: MCP integration is planned but not currently the primary focus. The system provides a REST API instead.
Google Cloud deployment guide - Complete deployment architecture and cost estimates for Google Cloud Platform.
Covers:
- Cloud Run configuration
- Cloud SQL (PostgreSQL) setup
- Secret Manager integration
- Cloud Logging and Monitoring
- CI/CD with Cloud Build
- Cost estimates and scaling
Data models package - Documentation for core data structures.
Models:
- Event - Processed intelligence events
- Source - Raw OSINT data
- Entity - Extracted named entities
- EventQuery - Query/filter parameters
Ingestion pipeline - RSS fetching, scraping, and deduplication.
Components:
- RSSConnector - Feed fetching
- PlaywrightScraper - Content extraction
- ScraperService - Async scraping worker pool
- Repositories - Data storage interfaces
AI enrichment system - OpenAI integration for event analysis.
Components:
- OpenAIClient - GPT-4 integration
- EventCorrelator - Similarity analysis
- PromptTemplates - OSINT-optimized prompts
- MockEnricher - Testing implementation
Database layer - PostgreSQL repositories and migrations.
Repositories:
- PostgresSourceRepository
- PostgresEventRepository
- TrackedAccountRepository
- IngestionErrorRepository
- ThresholdRepository
Caching layer - Currently unused, Redis integration stub.
Status: Not currently implemented. Database queries are fast enough without caching.
Historical session notes and status reports have been moved to archive/session-docs/. These are kept for reference but are no longer actively maintained.
Getting Started:
- Read README.md for installation and quick start
- Review ARCHITECTURE.md for system understanding
- Check feature docs for specific implementations
Developing:
- Module READMEs in
internal/*/README.mdfor package details - SCRAPING_SPLIT_IMPLEMENTATION.md for pipeline understanding
- NOVEL_FACTS_IMPLEMENTATION.md for correlation logic
Deploying:
- docs/GOOGLE_CLOUD_DEPLOYMENT.md for GCP deployment
- README.md for configuration and environment setup
Design Reference:
- docs/FRONTEND_DESIGN.md for UI patterns
- docs/ADMIN_PANEL_SPEC.md for admin features
- docs/DATA_SOURCES.md for OSINT sources
When adding new features:
- Update relevant module README
- Add feature documentation in root (like SCRAPING_SPLIT_IMPLEMENTATION.md)
- Update this index
- Update README.md if it affects installation/usage
- Feature Docs: Markdown with code examples, architecture diagrams (ASCII art), and usage examples
- Module Docs: Package-level README with API reference and examples
- Session Notes: Move to
archive/session-docs/when done - Specs: Keep in
docs/folder for reference
- Installation Issues: See README.md troubleshooting section
- API Reference: See README.md API endpoints section
- Architecture Questions: See ARCHITECTURE.md
- Specific Features: See feature documentation files