Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 5 additions & 87 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -116,10 +116,12 @@ DATABASE_URL=postgres://user:password@localhost:5432/boost_dashboard
# CELERY_BROKER_URL=redis://localhost:6379/0
# CELERY_RESULT_BACKEND=redis://localhost:6379/0
# Worker: celery -A config worker -l info
# Optional override: set true to require config/boost_collector_schedule.yaml at startup
# Optional override: set true to require the collector schedule YAML at startup
# when DEBUG=True (e.g. CI / staging). Unset or false for typical local dev; production
# already enforces this when DEBUG=False.
# BOOST_COLLECTOR_SCHEDULE_STRICT=false
# Path to schedule YAML (relative to repo root or absolute). Default: config/boost_collector_schedule.yaml
# BOOST_COLLECTOR_SCHEDULE_YAML=config/boost_collector_schedule.yaml

# ==============================================================================
# Workspace
Expand Down Expand Up @@ -204,16 +206,9 @@ DATABASE_URL=postgres://user:password@localhost:5432/boost_dashboard
# GIT_AUTHOR_EMAIL=unknown@noreply.github.com

# ==============================================================================
# Slack (slack_event_handler, cppa_slack_tracker)
# Slack (cppa_slack_tracker — public batch collector)
# ==============================================================================
# Used by Socket Mode listeners, huddle transcript collection, and the Slack PR bot.
# See docs/Docker.md (Slack session tokens) and SECURITY.md (secret handling).

# --- Huddle transcripts (GitHub upload target) ---
# GITHUB_SLACK_HUDDLE_REPO_OWNER=your-org
# GITHUB_SLACK_HUDDLE_REPO_NAME=your-repo

# --- Bot tokens (required for API calls and Socket Mode) ---
# --- Bot tokens (required for cppa_slack_tracker API calls) ---
# List every workspace team ID, then set one bot token per team.
# SLACK_TEAM_IDS=T01234ABCD,T05678EFGH
# SLACK_BOT_TOKEN_T01234ABCD=xoxb-your-bot-token
Expand All @@ -222,83 +217,6 @@ DATABASE_URL=postgres://user:password@localhost:5432/boost_dashboard
# Optional single-team shorthand when SLACK_TEAM_IDS is unset:
# SLACK_TEAM_ID=T01234ABCD

# --- Socket Mode (App-Level Token; scope: connections:write) ---
# SLACK_APP_TOKEN_T01234ABCD=xapp-your-app-token
# SLACK_APP_TOKEN_T05678EFGH=xapp-your-app-token

# --- Per-team features (slack_event_handler) ---
# SLACK_TEAM_SCOPE_<team_id>: comma-separated scopes; omit or leave empty for both.
# 0 = huddle AI note / transcript pipeline
# 1 = Slack PR comment bot
# SLACK_TEAM_SCOPE_T01234ABCD=0
# SLACK_TEAM_SCOPE_T05678EFGH=1

# --- Slack PR bot (GitHub comments from Slack) ---
# SLACK_PR_BOT_TEAM=your-github-org-or-user
# SLACK_PR_BOT_GITHUB_TOKEN=ghp_your_token
# SLACK_PR_BOT_CHANNEL_NAME=slack-bot
# SLACK_PR_BOT_COMMENT_TEMPLATE=Automated comment from Slack bot.
# SLACK_PR_BOT_COMMENTS_MAX_PER_WINDOW=5
# SLACK_PR_BOT_COMMENTS_WINDOW_SECONDS=3600

# --- Internal session tokens (xoxc/xoxd; compliance-gated) ---
# Do not put xoxc/xoxd in .env. When enabled, tokens live in workspace JSON and are
# loaded at runtime (not at Django startup). Huddle fetch can re-extract from the
# Chrome profile when JSON tokens are stale but the browser session is still valid.
# ALLOW_INTERNAL_SLACK_TOKENS=false
# SLACK_INTERNAL_TOKENS_JSON=
# Default path: workspace/slack_event_handler/slack_internal_tokens.json
#
# Chrome user-data directory (logged-in Slack session on disk):
# CHROME_PROFILE_PATH=
# Default: workspace/slack_event_handler/chrome_profile

# ==============================================================================
# Discord (discord_activity_tracker)
# ==============================================================================
# Preferred: bot token.
# DISCORD_TOKEN=your.bot.token
#
# User token violates Discord ToS; use only if the bot path is impossible.
# See docs/operations/discord_chat_exporter.md (Tyrrrz upstream: Token and IDs, CLI guide).
# DISCORD_USER_TOKEN=your.user.token
#
# --- Internal Discord user token (compliance-gated) ---
# Do not put user token in .env when using workspace JSON. When enabled, tokens live in
# workspace JSON and are loaded at runtime (not at Django startup). Export can re-extract
# from the Chrome profile when JSON tokens are stale but the browser session is still valid.
# ALLOW_INTERNAL_DISCORD_TOKENS=false
# DISCORD_INTERNAL_TOKENS_JSON=
# Default path: workspace/discord_activity_tracker/discord_internal_tokens.json
#
# Chrome user-data directory (logged-in Discord session on disk):
# DISCORD_CHROME_PROFILE_PATH=
# Default: workspace/discord_activity_tracker/chrome_profile
#
# DISCORD_SERVER_ID=987654321098765432
# DISCORD_CONTEXT_REPO_PATH=/absolute/path/to/discord-cplusplus-together-context
# DISCORD_CONTEXT_AUTO_COMMIT=false
#
# DiscordChatExporter CLI:
# https://github.com/Tyrrrz/DiscordChatExporter/releases
# DISCORD_CHAT_EXPORTER_CLI=/path/to/DiscordChatExporter.Cli
# macOS: system dotnet + DLL (avoids quarantined bundled runtime on some disks):
# DISCORD_CHAT_EXPORTER_DOTNET_DLL=/path/to/DiscordChatExporter.Cli.dll
# DISCORD_CHAT_EXPORTER_DOTNET=/usr/local/share/dotnet/dotnet
# DISCORD_CHAT_EXPORTER_MACOS_CLEAR_QUARANTINE=false
# DISCORD_CHAT_EXPORTER_PARALLEL=1
# DISCORD_CHAT_EXPORTER_INCLUDE_VC=false
# DISCORD_CHAT_EXPORTER_SEQUENTIAL_EXPORT=true
#
# Injected into the exporter subprocess unless overridden (macOS memory pressure):
# DOTNET_GCConserveMemory=9
# DOTNET_GCHighMemPercent=50
# DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1
#
# DISCORD_CHANNEL_IDS=851121440425639956,123456789012345678
# PINECONE_DISCORD_APP_TYPE=discord
# PINECONE_DISCORD_NAMESPACE=discord-messages

# ==============================================================================
# Reddit (reddit_activity_tracker)
# ==============================================================================
Expand Down
2 changes: 0 additions & 2 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,8 @@ boost_mailing_list_tracker/ @jonathanMLDev @wpak-ai
cppa_pinecone_sync/ @jonathanMLDev @wpak-ai
clang_github_tracker/ @snowfox1003 @wpak-ai
cppa_slack_tracker/ @snowfox1003 @wpak-ai
discord_activity_tracker/ @snowfox1003 @wpak-ai
wg21_paper_tracker/ @snowfox1003 @wpak-ai
cppa_youtube_script_tracker/ @jonathanMLDev @wpak-ai
slack_event_handler/ @snowfox1003 @wpak-ai

core/ @snowfox1003 @jonathanMLDev @wpak-ai
.github/workflows/ @snowfox1003 @wpak-ai
Expand Down
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ db.sqlite3
staticfiles/
media/
.test_artifacts/
config/boost_collector_schedule.local.yaml
# Ephemeral probe app + pyrightconfig for scripts/validate_collector_scaffold.py (entire tree ignored above).

# Testing / coverage
Expand Down Expand Up @@ -46,8 +47,6 @@ celerybeat.pid
*.swo
.cursor/

# Optional legacy CLI folder under the Django app (default CLI lives in workspace/.../script/)
discord_activity_tracker/tools/
# macOS
.DS_Store
._*
Expand Down
2 changes: 0 additions & 2 deletions .importlinter
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,7 @@ root_packages =
cppa_slack_tracker
cppa_user_tracker
cppa_youtube_script_tracker
discord_activity_tracker
github_activity_tracker
slack_event_handler
wg21_paper_tracker

[importlinter:contract:forbid-tech-debt-pinecone]
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Removed

- **`discord_activity_tracker`** and **`slack_event_handler`** apps, workspace layouts, service API docs, and related operations guides from this repository.
- **`DiscordProfile`** from `cppa_user_tracker` Django state (`0010_remove_discordprofile`); physical table `cppa_user_tracker_discordprofile` is unchanged.

## [0.2.0] - 2026-06-12

### Added
Expand Down
1 change: 0 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ Each Django app that has **models** provides a **`services.py`** module. This is
| `boost_library_docs_tracker` | `boost_library_docs_tracker/services.py` | BoostDocContent and BoostLibraryDocumentation (doc scrape and sync status). |
| `boost_usage_tracker` | `boost_usage_tracker/services.py` | External repos, Boost usage, missing-header tmp. |
| `cppa_pinecone_sync` | `cppa_pinecone_sync/services.py` | Pinecone fail list and sync status writes. |
| `discord_activity_tracker` | `discord_activity_tracker/services.py` | Servers, channels, messages, reactions (Discord user profiles in cppa_user_tracker). |
| `cppa_youtube_script_tracker` | `cppa_youtube_script_tracker/services.py` | YouTube channels, videos, tags, transcript state, speaker links. |
| `clang_github_tracker` | `clang_github_tracker/services.py` | Clang/llvm GitHub issue, PR, and commit upserts; fetch watermarks. |
| `boost_mailing_list_tracker` | `boost_mailing_list_tracker/services.py` | Mailing list messages and names. |
Expand Down
73 changes: 0 additions & 73 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -52,22 +52,6 @@ help:
@echo " test-fast Run tests, stop on first failure"
@echo " test-cov Run tests with coverage report"
@echo ""
@echo " Slack session (xoxc/xoxd token extraction)"
@echo " slack-login Start slack-chromium (noVNC http://127.0.0.1:7900)"
@echo " slack-wait-profile Wait until Slack login wrote Cookies + LevelDB"
@echo " slack-login-stop Stop slack-chromium before extract"
@echo " extract-slack-tokens Extract tokens to workspace JSON (one-shot)"
@echo " slack-tokens-reextract Stop chromium → extract JSON"
@echo " slack-tokens-refresh Login (noVNC) → wait → extract JSON"
@echo ""
@echo " Discord session (user token extraction)"
@echo " discord-login Start discord-chromium (noVNC http://127.0.0.1:7901)"
@echo " discord-wait-profile Wait until Discord login wrote Cookies + LevelDB"
@echo " discord-login-stop Stop discord-chromium before extract"
@echo " extract-discord-tokens Extract token to workspace JSON (one-shot)"
@echo " discord-tokens-reextract Stop chromium → extract JSON"
@echo " discord-tokens-refresh Login (noVNC) → wait → extract JSON"
@echo ""
@echo " Utilities"
@echo " clean-mac Remove macOS ._* resource-fork files"
@echo " clean-pyc Remove compiled Python files"
Expand Down Expand Up @@ -184,63 +168,6 @@ test-fast:
test-cov:
python -m pytest --tb=short --cov=. --cov-report=term-missing

# ── Slack session ─────────────────────────────────────────────────────────────

.PHONY: slack-login slack-wait-profile slack-login-stop extract-slack-tokens \
slack-tokens-reextract slack-tokens-refresh

slack-login:
@mkdir -p workspace/slack_event_handler/chrome_profile
$(COMPOSE) --profile slack-session up -d --force-recreate slack-chromium
@echo "Open http://127.0.0.1:7900 and sign in at https://app.slack.com (wait until Slack is fully loaded)"
@command -v open >/dev/null 2>&1 && open "http://127.0.0.1:7900" || true

slack-wait-profile:
@chmod +x scripts/wait_slack_chrome_profile.sh
@./scripts/wait_slack_chrome_profile.sh

slack-login-stop:
$(COMPOSE) --profile slack-session stop slack-chromium

extract-slack-tokens: slack-login-stop
$(MANAGE) extract_slack_tokens

# Profile already exists (re-extract without opening noVNC again).
slack-tokens-reextract: extract-slack-tokens

# Login in noVNC, wait for profile files, then extract JSON.
slack-tokens-refresh: slack-login slack-wait-profile extract-slack-tokens

# ── Discord session ───────────────────────────────────────────────────────────

.PHONY: discord-login discord-wait-profile discord-login-stop extract-discord-tokens \
discord-tokens-reextract discord-tokens-refresh

discord-login:
@mkdir -p workspace/discord_activity_tracker/chrome_profile
@rm -f workspace/discord_activity_tracker/chrome_profile/SingletonLock \
workspace/discord_activity_tracker/chrome_profile/SingletonCookie \
workspace/discord_activity_tracker/chrome_profile/SingletonSocket
$(COMPOSE) --profile discord-session up -d --force-recreate discord-chromium
@echo "noVNC (password: secret) — Chrome does NOT open automatically:"
@echo " http://127.0.0.1:7901/?autoconnect=1&resize=scale&password=secret"
@echo "Right-click desktop → Web Browsing → Google Chrome → https://discord.com"
@command -v open >/dev/null 2>&1 && open "http://127.0.0.1:7901/?autoconnect=1&resize=scale&password=secret" || true

discord-wait-profile:
@chmod +x scripts/wait_discord_chrome_profile.sh
@./scripts/wait_discord_chrome_profile.sh

discord-login-stop:
$(COMPOSE) --profile discord-session stop discord-chromium

extract-discord-tokens: discord-login-stop
$(MANAGE) extract_discord_tokens

discord-tokens-reextract: extract-discord-tokens

discord-tokens-refresh: discord-login discord-wait-profile extract-discord-tokens

# ── Utilities ─────────────────────────────────────────────────────────────────

.PHONY: clean-mac
Expand Down
8 changes: 2 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ python -m pytest github_activity_tracker/tests/test_sync_utils.py -v

CI runs pytest with coverage (`--cov`, HTML/XML reports). To match a **local** coverage gate, use **`--cov-fail-under=90`** (see step 5 above). If coverage fails locally or you need a fresh test DB schema after model changes, run once with `python -m pytest --create-db`.

**Pyright (local):** with dev dependencies installed (`uv pip install -r requirements-dev.lock`), run **`uv run pyright`** from the repo root to match the **`pyright`** CI job (`pyrightconfig.json` scopes `core`, `github_activity_tracker`, `discord_activity_tracker`, `cppa_slack_tracker`, `cppa_user_tracker`, and `cppa_pinecone_sync`).
**Pyright (local):** with dev dependencies installed (`uv pip install -r requirements-dev.lock`), run **`uv run pyright`** from the repo root to match the **`pyright`** CI job (`pyrightconfig.json` scopes `core`, `github_activity_tracker`, `cppa_slack_tracker`, `cppa_user_tracker`, and `cppa_pinecone_sync`).

See [docs/Development_guideline.md](docs/Development_guideline.md#testing-workflow) for when to run tests during development.

Expand Down Expand Up @@ -231,7 +231,7 @@ Typical top-level layout after clone (folder name is usually **`boost-data-colle
│ ├── shared/
│ ├── scripts/
│ ├── github_activity_tracker/
│ └── … # e.g. boost_library_tracker/, discord_activity_tracker/, …
│ └── … # e.g. boost_library_tracker/, cppa_slack_tracker/, …
├── scripts/ # Repo maintenance and codegen helpers
├── core/ # Shared collectors + operations (GitHub, Slack, markdown, files)
├── boost_collector_runner/ # YAML schedule → run_scheduled_collectors
Expand All @@ -245,9 +245,7 @@ Typical top-level layout after clone (folder name is usually **`boost-data-colle
├── cppa_slack_tracker/
├── cppa_user_tracker/
├── cppa_youtube_script_tracker/
├── discord_activity_tracker/
├── github_activity_tracker/
├── slack_event_handler/
└── wg21_paper_tracker/
```

Expand All @@ -274,8 +272,6 @@ Some Django apps include a **README.md** at the app package root when that helps
| [`boost_library_usage_dashboard/`](boost_library_usage_dashboard/README.md) | Library usage data for dashboards. |
| [`cppa_slack_tracker/`](cppa_slack_tracker/README.md) | CPPA Slack workspace collection. |
| [`cppa_user_tracker/`](cppa_user_tracker/README.md) | CPPA users and GitHub account linkage. |
| [`discord_activity_tracker/`](discord_activity_tracker/README.md) | Discord activity ingestion (exporter + workspace). |
| [`slack_event_handler/`](slack_event_handler/README.md) | Slack Socket Mode listener (dev `runserver` integration). |

## How it works

Expand Down
4 changes: 2 additions & 2 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,9 +83,9 @@ If you operate a deployment and suspect a leak or breach, **rotate** at least th

| Category | Examples / environment variables |
| --- | --- |
| **GitHub** | `GITHUB_TOKEN`, `GITHUB_TOKENS_SCRAPING` (multi-token pool), `GITHUB_TOKEN_WRITE`; PAT-style tokens used by integrations (for example `SLACK_PR_BOT_GITHUB_TOKEN` if it is a PAT) |
| **GitHub** | `GITHUB_TOKEN`, `GITHUB_TOKENS_SCRAPING` (multi-token pool), `GITHUB_TOKEN_WRITE` |
| **Slack** | `SLACK_BOT_TOKEN_<team_id>`, `SLACK_APP_TOKEN_<team_id>` |
| **Discord** | `DISCORD_TOKEN` |
| **Notifications** | `DISCORD_WEBHOOK_URL`, `SLACK_WEBHOOK_URL` (optional error alerting) |
Comment thread
leostar0412 marked this conversation as resolved.
| **Pinecone** | `PINECONE_API_KEY`, `PINECONE_PRIVATE_API_KEY`, and any host/index settings that grant write access |
| **YouTube** | `YOUTUBE_API_KEY` |

Expand Down
3 changes: 1 addition & 2 deletions STABILITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ These per-collector command names appear in [config/boost_collector_schedule.yam
- `collect_boost_libraries`
- `run_wg21_paper_tracker`
- `run_cppa_slack_tracker`
- `run_discord_activity_tracker`
- `run_boost_mailing_list_tracker`

Other `manage.py` commands exist for manual runs, backfills, and development; only commands **listed in your deployed schedule YAML** (plus **`run_scheduled_collectors`**) are Tier A for that deployment.
Expand Down Expand Up @@ -149,7 +148,7 @@ No compatibility promise. May change in any release without deprecation.
- Imports of tracker internals bypassing `sync_api` (e.g. `github_activity_tracker.fetcher`, `cppa_pinecone_sync.sync` from apps covered by import-linter).
- Workspace directory layouts under `WORKSPACE_DIR`, except paths explicitly documented in [`.env.example`](.env.example) and [docs/Workspace.md](docs/Workspace.md). **Per-app JSON schemas** under `workspace/` are not stable.
- Docker Compose service names (`web`, `celery_worker`, `celery_beat`) and host ports are not Tier A unless documented here in a future release.
- `slack_event_handler` internals, management commands not in your schedule, scripts under `scripts/`, tests, and Django admin customization.
- Optional apps registered via `config/local_settings.py`, management commands not in your schedule, scripts under `scripts/`, tests, and Django admin customization.

## Deprecation

Expand Down
21 changes: 21 additions & 0 deletions boost_collector_runner/schedule_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,27 @@
)


def resolve_schedule_yaml_path(
*,
base_dir: Path,
env_path: str = "",
) -> Path:
"""
Resolve the collector schedule YAML path.

Precedence: ``env_path`` (from parent ``BOOST_COLLECTOR_SCHEDULE_YAML`` in ``.env``),
then the default ``config/boost_collector_schedule.yaml`` under ``base_dir``.
Relative paths are resolved under ``base_dir``.
"""
raw = (env_path or "").strip()
if not raw:
return (base_dir / "config" / "boost_collector_schedule.yaml").resolve()
path = Path(raw)
if not path.is_absolute():
path = base_dir / path
return path.resolve()


class ScheduleConfigurationError(ImproperlyConfigured):
"""Raised when the collector schedule YAML is missing or invalid in strict mode."""

Expand Down
Loading
Loading