When you release, tag the commit (e.g. v0.1.0), run uv pip install -e . (or a build) so core/_version.py is regenerated by setuptools-scm, update this file under [Unreleased], then move notes into a dated ## [x.y.z] section. With no matching Git tag, the version falls back to fallback_version in pyproject.toml ([tool.setuptools_scm]).
All notable changes to this project are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
reddit_activity_tracker: new collector for r/cpp — OAuth fetcher (client credentials or bearer/session cookie), incremental resume from DBcreated_utc, comment-first discovery with full comment trees, DB upserts and per-record workspace JSON,RedditUserprofile incppa_user_tracker, and daily Celery schedule (run_reddit_activity_trackerat 17:00 UTC).core.collectors/TrackerResult:AbstractCollector.collect()must returnTrackerResult;run()validates the result, backfillsduration_seconds, and exposeslast_resultafter a successfulpost_collect(). Shared DTOs (GenericTrackerResult,GenericIncrementalState) and optional incremental checkpoint hooks (load_incremental_state/persist_incremental_state). All in-monorepo collectors migrated to frozenprotocol_implDTOs;BaseCollectorCommandlogs structuredresult_repr/result_jsonwhen the result is aTrackerResultDataclasssubclass.core.collectorslifecycle hooks: optionalpre_collect(),post_collect(), andon_error(exc)onAbstractCollector(_CollectorLifecycleMixin);run()orchestrates the hook sequence before re-raising on failure.- Protocol DTO serialization:
core.protocol_dtobase dataclasses (TrackerResultDataclass,IncrementalStateDataclass,ActivityRecordDataclass) provide canonicalasdict(),to_json(),from_dict(), and truncated__repr__on all trackerprotocol_implfrozen dataclasses;core.collectors.GenericActivityRecordadded for the defaultActivityRecordimplementation. core.adapters: stable adapter protocols and implementations for Pinecone (PineconeAdapter), Slack Web API (SlackWebApiAdapter), and GitHub REST/GraphQL (GitHubApiAdapter). ThepineconeSDK is imported only fromcore/adapters/pinecone.py;cppa_pinecone_sync.ingestionusesPineconeClientProtocolwith injectable fakes for tests.core.errors:AuthenticationErrormapped toCollectorFailureCategory.AUTHinclassify_failure()for credential-rejection observability (WG21 and other collectors).- API contract and idempotency tests: recorded JSON fixtures in
tests/fixtures/api_contracts/exercise GitHub/Slack Pydantic boundary parsers;tests/test_idempotency_under_retry.pyasserts duplicate-safeget_or_create_*service calls undertransaction.atomic(). - Stability policy (
STABILITY.md): documents stable vs evolving vs unstable interfaces for production and contributors; README links to it. - Pyright coverage for
cppa_user_trackerandcppa_pinecone_sync(added topyrightconfig.json; both apps pass at configuredbasicstrictness).
- discord_activity_tracker: per-channel incremental lower bounds (each channel resumes from its own latest message, not guild-wide max); per-channel per-UTC-day DiscordChatExporter runs; daily raw JSON archives merge by message id under
raw/discord_activity_tracker/<server_id>/<channel_id>/;backfill_discord_activity_trackerreports per-file import failures onDiscordCollectionTrackerResult(success=False,errors,failed_filescount) instead of always returningsuccess=True. - slack_event_handler: replaces Selenium-based xoxc/xoxd flow with Chrome-profile token extraction (
plyvel,browser-cookie3) persisted inworkspace/slack_event_handler/slack_internal_tokens.json; optional Compose profileslack-session(slack-chromium+ noVNC). Workspace data moved fromcppa_slack_transcript_tracker/toworkspace/slack_event_handler/(scripts/migrate_slack_workspace_paths.sh).plyvelomitted on Windows native venv (LevelDB read skipped; cookie/SQLite paths unchanged). - core.protocols / ActivityRecord:
occurred_atis timezone-aware UTCdatetime | None;source_systemisSourceSystem(StrEnum);activity_typeis brandedActivityType;actor_external_idisActorExternalId(NewType). Legacy string payloads usecore.activity_types.migrate_legacy_activity_fieldsandactivity_record_to_legacy_dicton GitHub/Discordprotocol_impldataclasses. - boost_mailing_list_tracker: pilot identity-hub decoupling —
MailingListMessage.senderORM FK replaced with softsender_profile_id(BigIntegerField); profiles resolved viacppa_user_tracker.servicesat boundaries (seedocs/adr/identity-hub-decoupling.md). - Concurrency refactor: module-level locks/semaphores replaced with dedicated state classes across GitHub ops, Slack ops,
github_activity_tracker, andslack_event_handlerjob queue; topology documented indocs/CONCURRENCY.md(behavior unchanged). - Celery schedule:
discordgroup inconfig/boost_collector_schedule.yaml(run_discord_activity_trackerdaily at 16:40 UTC). - Resolved five cross-app import tech-debt edges: Pinecone via
cppa_pinecone_sync.sync_api, dashboard model shim removed, CSV owner lookup viacppa_user_tracker.services, clang imports viagithub_activity_tracker.sync_api. - Added import-linter contracts and pre-commit hook to prevent regressions.
- Enforced service-layer-only ORM writes with
scripts/check_service_layer_writes.pyand pre-commit; moved remaining direct writes (repo metadata sync, star bulk-update, GitHub file backfill, BoostVersion import, commit file-change backfill) intogithub_activity_tracker.services/boost_library_tracker.services. Allowlist.service-layer-write-allowlist.jsonis empty by default for new debt only. - Pydantic boundary schemas at GitHub, Slack, and Discord ingestion (
api_schemas.pyper app; Discord ChatExporter usesstaging_schema.py); fetchers validate withmodel_validate(); services accept typed payloads;classify_failuremaps validation errors toVALIDATION. - CI / Deploy: Security audit workflow runs pip-audit on every PR against
requirements.lockandrequirements-dev.lock; Deploy workflow addsworkflow_dispatchwithrefinput (develop|main) for cross-repo triggers.
- Deprecated
CollectorBaseandDjangoCommandCollector; the supported collector contract isAbstractCollector+BaseCollectorCommand(see docs).
- slack_event_handler: concurrent
enqueue_job/ worker dequeue could overwrite per-team queue state and silently drop jobs; fixed with per-team advisory file locking (modify_state,state_file_lock).
- pip-audit in CI on every pull request; bumped idna (
>=3.15, CVE-2026-45409) and pytest (>=9.0.3, CVE-2025-71176) in lockfiles.
docs/Tutorial_building_a_collector.md: end-to-end collector tutorial (startcollector,AbstractCollectorhooks, pytest, YAML/Celery scheduling, deployment); linked from CONTRIBUTING, README, Onboarding, and how-to docs.docs/Architecture_overview.md: system-design entry point (domain apps, persistence, coupling); onboarding 1:1 runbooks underdocs/onboarding/; bus-factor deliverables indocs/BUS_FACTOR_DELIVERABLES.md.- ADRs:
docs/adr/paradigm-unification.md(batch vs event-driven collection paradigms) anddocs/adr/identity-hub-decoupling.md; index atdocs/adr/README.md. - Document
pandocas a system dependency (not installed bypip/pypandocalone); centralize install steps for macOS, Debian/Ubuntu, and Windows in the root README and link from contributor setup docs. .github/pull_request_template.mdfor consistent PR bodies; branch-protection and CODEOWNERS runbook updates.
core— shared utilities, collector base classes, and cross-cutting operations (e.g. GitHub, Slack, files, markdown).boost_collector_runner— YAML-driven schedules, Celery tasks, andrun_scheduled_collectorsmanagement command.github_activity_tracker— GitHub repos, commits, issues, and related activity.boost_library_tracker— Boost libraries, versions, dependencies, and maintainer/author roles.boost_library_docs_tracker— Boost library documentation scraping and doc content metadata.boost_library_usage_dashboard— usage dashboard collection and analysis.boost_usage_tracker— Boost usage and repository search tracking.boost_mailing_list_tracker— Boost mailing list message ingestion and formatting.clang_github_tracker— Clang-related GitHub activity and workspace sync.cppa_slack_tracker— Slack workspace sync and message tracking (CPPA).cppa_pinecone_sync— Pinecone vector index sync for collected content.cppa_user_tracker— CPPA user records and tracking.cppa_youtube_script_tracker— YouTube script / transcript tracking.discord_activity_tracker— Discord messages and activity sync.wg21_paper_tracker— WG21 paper metadata and pipeline.slack_event_handler— Slack events, jobs, and GitHub/PR integration helpers.