Operations guide for the code indexing pipeline: dispatch sources, task processing, dead letter queue, and recovery.
Code indexing is event-driven, not periodic. Two dispatch sources feed code indexing task requests into the GKG_INDEXER stream:
Push events (CDC):
Siphon -> p_knowledge_graph_code_indexing_tasks -> NATS siphon_stream_main_db
-> SiphonCodeIndexingTaskDispatcher (DispatchIndexing mode)
-> GKG_INDEXER: code.task.indexing.requested.<project>.<branch>
Namespace backfill:
Siphon -> knowledge_graph_enabled_namespaces -> NATS siphon_stream_main_db
-> NamespaceCodeBackfillDispatcher (DispatchIndexing mode)
-> GKG_INDEXER: code.task.indexing.requested.<project>._
The CodeIndexingTaskHandler (Indexer mode) consumes from code.task.indexing.requested.*.*, fetches the repository, parses code, builds a property graph, and writes to ClickHouse.
| Subject | Source | Purpose |
|---|---|---|
code.task.indexing.requested.<project_id>.<base64_branch> |
Push dispatcher | Index a specific branch after a push |
code.task.indexing.requested.<project_id>._ |
Backfill dispatcher | Index all branches for a project (namespace enable) |
Branch names are base64-encoded in the subject. _ means no specific branch (backfill).
Consumer names are derived from the configured consumer_name (default in production: gkg-indexer) and the subscription subject, with dots replaced by hyphens and wildcards spelled out.
| Consumer | Stream | Role |
|---|---|---|
gkg-indexer-code-task-indexing-requested-wildcard-wildcard |
GKG_INDEXER |
Indexer handler (processes code indexing tasks) |
dispatch-code-task-indexing-requested-wildcard-wildcard |
GKG_INDEXER |
Code task dispatcher |
With ephemeral consumers (consumer_name: None, the local dev default), NATS assigns random names that don't survive restarts.
To inspect a specific consumer:
nats consumer info GKG_INDEXER gkg-indexer-code-task-indexing-requested-wildcard-wildcard| Setting | Default |
|---|---|
max_attempts |
5 |
retry_interval_secs |
60 |
dead_letter_on_exhaustion |
true |
concurrency_group |
code (4 workers by default) |
Unlike SDLC handlers, code indexing handlers retry via NATS because tasks are event-driven and won't be re-dispatched automatically.
Consumes CDC events from siphon_stream_main_db for the p_knowledge_graph_code_indexing_tasks table. Each event represents a Git push.
The dispatcher:
- Fetches a batch of pending messages (default batch size: 100)
- Decodes Siphon replication events, extracting
project_id,ref,commit_sha,traversal_path - Deduplicates by
(project_id, branch), keeping the highesttask_id - Publishes
CodeIndexingTaskRequesttoGKG_INDEXER - Acknowledges the batch
Consumes CDC events for knowledge_graph_enabled_namespaces. When a namespace is newly enabled:
- Looks up the namespace's traversal path
- Queries all projects under that namespace from the graph DB
- Publishes a backfill request per project (
task_id: 0, no branch, no commit)
When the CodeIndexingTaskHandler receives a message:
- Compare
task_idagainst stored checkpoint. Skip iftask_id <= last_task_id. - Acquire a NATS KV lock on
project.{project_id}.{base64_branch}(TTL: 60 seconds). Skip if lock is held by another worker. - Download the repository archive from Rails API (or use incremental fetch / cache).
- Run tree-sitter + swc parsers across supported languages, build an in-memory property graph.
- Convert graph to Arrow batches and insert into graph tables.
- Record checkpoint:
(traversal_path, project_id, branch, last_task_id, last_commit, indexed_at).
Individual file parse failures are logged but do not fail the task. The pipeline writes whatever it parsed successfully.
Failed code indexing tasks are sent to the GKG_DEAD_LETTERS stream after exhausting all 5 attempts.
dlq.GKG_INDEXER.code.task.indexing.requested.<project_id>.<base64_branch>
Each dead letter contains:
| Field | Description |
|---|---|
original_subject |
The original NATS subject |
original_stream |
GKG_INDEXER |
original_payload |
The CodeIndexingTaskRequest JSON |
original_message_id |
NATS message ID |
original_timestamp |
When the message was first published |
failed_at |
When the message entered the DLQ |
attempts |
Number of delivery attempts (5) |
last_error |
Error message from the final attempt |
# List all dead-lettered code indexing messages
nats stream info GKG_DEAD_LETTERS
nats consumer create GKG_DEAD_LETTERS dlq-inspector \
--filter='dlq.GKG_INDEXER.code.task.indexing.requested.>' \
--pull --deliver=all --ack=none
nats consumer next GKG_DEAD_LETTERS dlq-inspector --count=10Extract the original_payload and republish to the original subject:
nats pub 'code.task.indexing.requested.<project_id>.<base64_branch>' \
'<original_payload_json>'The message will be processed as a new request (attempt counter resets to 1).
# Purge all code indexing dead letters
nats stream purge GKG_DEAD_LETTERS \
--subject='dlq.GKG_INDEXER.code.task.indexing.requested.>'
# Purge dead letters for a specific project
nats stream purge GKG_DEAD_LETTERS \
--subject='dlq.GKG_INDEXER.code.task.indexing.requested.<project_id>.*'- Code indexing logs show no activity for a project that should be indexing
- NATS consumer has pending messages that are not being processed
- A project's code graph is stale (missing recent commits)
-
Check if a message is stuck in the stream:
nats stream info GKG_INDEXER --subjects | grep 'code.task'
-
Check consumer pending count:
nats consumer info GKG_INDEXER gkg-indexer-code-task-indexing-requested-wildcard-wildcard
-
Check for lock contention in NATS KV:
nats kv ls indexing_locks
-
Check indexer logs for the project:
kubectl logs -n gkg deployment/gkg-indexer -f | grep 'project_id=<id>'
The NATS KV lock has a 60-second TTL and expires automatically. If a worker crashes, the lock releases after at most 60 seconds and the message is redelivered.
If the lock is not expiring (clock skew or NATS bug):
nats kv del indexing_locks 'project.<project_id>.<base64_branch>'If a message keeps failing and hasn't reached max_attempts yet:
- Check the error in logs (repository fetch failure, ClickHouse write error, parse crash)
- Fix the root cause
- The next redelivery (after
retry_interval_secs) will succeed
If you want to skip it immediately:
nats stream purge GKG_INDEXER \
--subject='code.task.indexing.requested.<project_id>.<base64_branch>'If the checkpoint's last_task_id is higher than the incoming task, the handler skips the message.
Check the checkpoint:
SELECT traversal_path, project_id, branch, last_task_id, last_commit, indexed_at
FROM `<gkg-database>`.code_indexing_checkpoint
WHERE project_id = <id>;To force reprocessing, delete the checkpoint row:
ALTER TABLE `<gkg-database>`.code_indexing_checkpoint
DELETE WHERE project_id = <id> AND branch = '<branch>';-
Clear the checkpoint:
ALTER TABLE `<gkg-database>`.code_indexing_checkpoint DELETE WHERE project_id = <id> AND branch = '<branch>';
-
Publish a new indexing request:
# Encode branch name: echo -n 'main' | base64 nats pub 'code.task.indexing.requested.<project_id>.<base64_branch>' \ '{"task_id":0,"project_id":<id>,"branch":"<branch>","commit_sha":null,"traversal_path":"<path>"}'
ALTER TABLE `<gkg-database>`.code_indexing_checkpoint
DELETE WHERE project_id = <id>;Then publish a backfill request (no branch):
nats pub 'code.task.indexing.requested.<project_id>._' \
'{"task_id":0,"project_id":<id>,"branch":null,"commit_sha":null,"traversal_path":"<path>"}'Trigger a namespace backfill by re-enabling the namespace in GitLab (toggle the Knowledge Graph feature flag off and on), or simulate the Siphon event that the NamespaceCodeBackfillDispatcher consumes.
Code indexing retries differently from SDLC indexing because tasks are event-driven:
| Level | Mechanism | Details |
|---|---|---|
| NATS redelivery | max_attempts: 5, retry_interval_secs: 60 |
Handler nacks with 60-second delay on failure |
| Dead letter | GKG_DEAD_LETTERS stream |
After 5 attempts, message moves to DLQ with full error context |
| Ack timeout | 300 seconds | If handler doesn't ack within 5 minutes, NATS redelivers |
| Progress heartbeat | ack_progress() |
Long-running handlers reset the ack timer to prevent premature redelivery |
The retry flow:
Attempt 1 fails -> nack (60s delay)
-> Attempt 2 fails -> nack (60s delay)
-> Attempt 3 fails -> nack (60s delay)
-> Attempt 4 fails -> nack (60s delay)
-> Attempt 5 fails -> publish to GKG_DEAD_LETTERS -> ack original
If DLQ publication itself fails, the original message is nacked for redelivery rather than being dropped.
If code indexing dispatchers are running but no tasks are being processed, the Siphon events stream name may not match between GKG and Siphon.
Symptoms:
- Dispatcher logs show no events consumed
- The
GKG_INDEXERstream has no code indexing messages - Siphon is publishing to a stream that GKG is not subscribed to
Check the configured stream name:
# In the GKG config or Helm values
kubectl -n gkg exec deployment/gkg-dispatcher -- env | grep EVENTS_STREAMCheck what streams exist in NATS:
nats stream lsThe events_stream_name in schedule.tasks.code_indexing_task must match the Siphon stream name exactly. For example, staging Siphon may publish to stg_siphon_event_stream while GKG defaults to siphon_stream_main_db.
Fix via Helm values or environment variable:
# Helm
helm upgrade gkg gkg-helm-charts/gkg \
--set dispatcher.extraEnv.GKG_SCHEDULE__TASKS__CODE_INDEXING_TASK__EVENTS_STREAM_NAME=stg_siphon_event_stream
# Or in values.yaml under schedule.tasks.code_indexing_task.events_stream_nameThe cleanup stage runs after indexing but failures are logged as warnings and do not block the pipeline. Over time, stale rows from deleted files may accumulate.
To clean up manually:
OPTIMIZE TABLE `<gkg-database>`.gl_file FINAL CLEANUP;
OPTIMIZE TABLE `<gkg-database>`.gl_directory FINAL CLEANUP;
OPTIMIZE TABLE `<gkg-database>`.gl_imported_symbol FINAL CLEANUP;
OPTIMIZE TABLE `<gkg-database>`.gl_definition FINAL CLEANUP;| Metric | What it tells you |
|---|---|
Handler outcome: indexed |
Successful indexing completions |
Handler outcome: skipped_checkpoint |
Messages skipped (already processed) |
Handler outcome: skipped_lock |
Messages skipped (another worker holds the lock) |
Handler outcome: error |
Failed processing attempts |
Error stage: repository_fetch |
Repository download failures |
Error stage: checkpoint |
Checkpoint read/write failures |
Error stage: indexing |
Parse or graph build failures |
Error stage: arrow_conversion |
Schema conversion failures |
Error stage: write |
ClickHouse write failures |
nats stream info GKG_INDEXER
nats stream info GKG_DEAD_LETTERS
nats consumer ls GKG_INDEXER