feat(spec): SPEC-WFAIPRO-001 — WorkflowAI Pro Technical Specification by OneFineStarstuff · Pull Request #39 · OneFineStarstuff/OneFineStarstuff.github.io

OneFineStarstuff · 2026-03-20T08:59:16Z

User description

SPEC-WFAIPRO-001 — WorkflowAI Pro Technical Specification

Overview

Implementation-ready XML technical specification for WorkflowAI Pro, an enterprise-grade AI-powered workflow optimisation platform. Tri-model AI architecture: GNN document routing, collaborative filtering bottleneck prediction, active learning UI adaptation.

Document Reference: SPEC-WFAIPRO-001 v1.0.0
File: docs/specifications/workflow-ai-pro.xml (52,955 bytes, 1,323 lines)
Format: XML with CDATA-wrapped Markdown content
Classification: CONFIDENTIAL — CTO Office, VP Engineering, Lead Developers

Required Sections — All 6 Present and Validated

#	Section	Key Content
1	Executive Summary	Tri-model architecture overview, legacy vs WorkflowAI Pro differentiators table
2	System Architecture	Syntax-valid Mermaid.js C4 Container diagram (13 containers, 3 external systems, 27 relationships)
3	AI Components	HeteroGAT GNN (18M params, <200ms P99), NCF + temporal attention (72h lookahead, >91% precision), pool-based AL with BatchBALD (200 labels/day, MC Dropout T=20)
4	Implementation Specs	Deep dive on 3 entities — see below
5	Performance, Security & Compliance	Exactly 3 bullet points: SLAs, GDPR/SOC 2, RBAC
6	18-Month Roadmap & Risks	Exactly 8 bullet points: Q1-Q6 milestones + 2 risk mitigations

Section 4 — Implementation Specs Detail

Document Router

OpenAPI 3.0: 3 endpoints (routeDocument, getRoutingStatus, getGraphHealth)
PostgreSQL: 4 tables with RLS multi-tenancy, 16 hash partitions, monthly audit log partitions
Kafka: 4 topics (doc.ingested, doc.routed, doc.routing.escalated, DLQ), exactly-once semantics

Approval Predictor

OpenAPI 3.0: 2 endpoints (predictBottlenecks, getApproverLoad)
MongoDB: 2 collections (prediction_logs, cf_model_artifacts) with JSON Schema validation
Redis: 4 key patterns (user embeddings, stage embeddings, CF scores, temporal features), TTL policies
Kafka: 3 topics (approval.requested, approval.predicted, approval.completed), 5-tier retry backoff

Adaptive UI Engine

OpenAPI 3.0: 3 endpoints (getAdaptiveLayout, submitUIFeedback, getALStatus)
MongoDB: 2 collections (al_pool, al_experiments) with A/B experiment tracking
Kafka: 4 topics (ui.feedback, al.query, al.label.acquired, model.retrained)

Constraint Compliance

Output is raw XML (no markdown code block wrapper)
CDATA section wraps all Markdown content
No ]]> sequence inside CDATA content
Section 5: exactly 3 bullet points
Section 6: exactly 8 bullet points
Technical density prioritised over high-level explanation

Validation Results

XML Parse: OK (Python xml.etree.ElementTree)
All 6 sections: FOUND
Mermaid C4 diagram: PASS
OpenAPI 3.0: PASS
PostgreSQL schema: PASS
MongoDB schema: PASS
Redis schema: PASS
Kafka config: PASS
GNN detail (HeteroGAT): PASS
NCF detail: PASS
Active Learning (BatchBALD): PASS
CDATA wrapper: PASS

Files Changed

docs/specifications/workflow-ai-pro.xml (new — 1,323 lines)

Summary by Sourcery

Add a comprehensive XML technical specification document for the WorkflowAI Pro platform, defining its architecture, AI subsystems, data models, and integration contracts.

Documentation:

Introduce an implementation-ready XML specification for WorkflowAI Pro, covering executive summary, architecture, AI components, implementation specs, performance/compliance, and roadmap.
Document detailed contracts and schemas for the Document Router, Approval Predictor, and Adaptive UI Engine, including APIs, storage models, and Kafka topologies.

Summary by CodeRabbit

Documentation
- Added comprehensive technical specification for the WorkflowAI Pro platform, including platform architecture, system components, data flow configurations, API contracts, database schemas, operational requirements, and product roadmap.

Description

Introduced a detailed XML technical specification for WorkflowAI Pro, outlining its tri-model AI architecture.
Specified implementation details for core components: Document Router, Approval Predictor, and Adaptive UI Engine.
Added OpenAPI 3.0 definitions for key service endpoints, enhancing API documentation.
Included sections on performance metrics, security compliance, and an 18-month development roadmap.

Changes walkthrough 📝

Relevant files

Enhancement

workflow-ai-pro.xml `Comprehensive XML Specification for WorkflowAI Pro` docs/specifications/workflow-ai-pro.xml Added comprehensive XML technical specification for WorkflowAI Pro. Defined tri-model AI architecture and detailed implementation specs. Included OpenAPI 3.0 endpoints for Document Router and Approval Predictor. Documented performance, security, compliance, and roadmap sections.	+1323/-0

💡 Penify usage:
Comment /help on the PR to get a list of all available Penify tools and their descriptions

Implementation-ready XML specification for WorkflowAI Pro enterprise workflow optimisation platform. Tri-model AI architecture: GNN document routing, collaborative filtering bottleneck prediction, active learning UI adaptation. Document: SPEC-WFAIPRO-001 v1.0.0 (52,955 bytes, 1,323 lines) Format: XML with CDATA-wrapped Markdown content 6 Required Sections — All Present: 1. Executive Summary — tri-model architecture overview, key differentiators table 2. System Architecture — syntax-valid Mermaid.js C4 Container diagram (13 containers, 3 external systems, 27 relationships) 3. AI Components — HeteroGAT GNN (18M params, <200ms P99), NCF with temporal attention (72h lookahead, >91% precision), pool-based AL with BatchBALD (200 labels/day, MC Dropout T=20) 4. Implementation Specs — Deep dive on 3 entities: - Document Router: OpenAPI 3.0 (3 endpoints), PostgreSQL schema (4 tables, RLS multi-tenancy, hash partitioning), Kafka (4 topics, exactly-once) - Approval Predictor: OpenAPI 3.0 (2 endpoints), MongoDB schema (2 collections with JSON Schema validation), Redis feature store (4 key patterns, TTL policy), Kafka (3 topics, 5-tier retry backoff) - Adaptive UI Engine: OpenAPI 3.0 (3 endpoints), MongoDB schema (2 collections: al_pool, al_experiments), Kafka (4 topics including model.retrained) 5. Performance, Security & Compliance — exactly 3 bullet points: SLAs, GDPR/SOC 2, RBAC (6 roles, 23 permissions, OPA enforcement) 6. 18-Month Roadmap & Risks — exactly 8 bullet points: Q1-Q6 milestones + 2 risk mitigations (model drift, Kafka backpressure) Validation: XML well-formed (Python ET parse), all section/content checks pass.

code-genius-code-coverage · 2026-03-20T08:59:21Z

The files' contents are under analysis for test generation.

semanticdiff-com · 2026-03-20T08:59:22Z

Review changes with

Changed Files

File	Status
docs/specifications/workflow-ai-pro.xml	0% smaller

gitnotebooks · 2026-03-20T08:59:22Z

Review these changes at https://app.gitnotebooks.com/OneFineStarstuff/OneFineStarstuff.github.io/pull/39

vercel · 2026-03-20T08:59:23Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
v0-one-fine-starstuff-github-io	Ready	Preview, Comment, Open in v0	Mar 21, 2026 10:02am

difflens · 2026-03-20T08:59:25Z

View changes in DiffLens

chatgpt-codex-connector · 2026-03-20T08:59:26Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

sourcery-ai · 2026-03-20T08:59:31Z

Reviewer's Guide

Adds a new implementation-ready XML technical specification for WorkflowAI Pro, defining a tri-model AI workflow optimization platform with detailed architecture, AI models, APIs, data schemas, and messaging topologies for the Document Router, Approval Predictor, and Adaptive UI Engine services.

Sequence diagram for document routing and bottleneck prediction

sequenceDiagram
  actor User
  participant API_Gateway
  participant Document_Router_Service
  participant Redis_Feature_Store
  participant GNN_Inference_Engine
  participant PostgreSQL_DB
  participant Kafka_Broker
  participant Approval_Predictor_Service

  User->>API_Gateway: POST /api/v2/documents/route
  API_Gateway->>Document_Router_Service: routeDocument(document_id, tenant_id, content_hash, doc_type, metadata)

  Document_Router_Service->>Redis_Feature_Store: GET entity_embeddings(document_id, tenant_id)
  Redis_Feature_Store-->>Document_Router_Service: embeddings, graph_features

  Document_Router_Service->>GNN_Inference_Engine: gRPC infer_routing_paths(embeddings, graph_context)
  GNN_Inference_Engine-->>Document_Router_Service: top_paths, confidences

  Document_Router_Service->>PostgreSQL_DB: INSERT documents, routing_decisions, routing_paths
  Document_Router_Service->>Kafka_Broker: Produce doc.routed(document_id, routing_id, selected_path)
  Document_Router_Service->>Kafka_Broker: Produce doc.routing.escalated(if confidence < 0.75)

  Document_Router_Service-->>API_Gateway: 200 RoutingDecision or 202 EscalationResponse
  API_Gateway-->>User: Routing decision response

  Kafka_Broker-->>Approval_Predictor_Service: Consume approval.requested(document_id, approval_chain)
  Approval_Predictor_Service->>Redis_Feature_Store: GET user_stage_embeddings, temporal_features
  Redis_Feature_Store-->>Approval_Predictor_Service: feature_vectors

  Approval_Predictor_Service->>Approval_Predictor_Service: NCF_inference_for_chain
  Approval_Predictor_Service->>Kafka_Broker: Produce approval.predicted(document_id, stage_risks)
  Approval_Predictor_Service->>PostgreSQL_DB: UPDATE chain_risk_metadata(optional)

  Kafka_Broker-->>Document_Router_Service: approval.predicted(re_routing_triggers)
  Document_Router_Service->>PostgreSQL_DB: UPDATE routing_decisions_with_suggested_re_routes

Sequence diagram for adaptive UI layout resolution and active learning loop

sequenceDiagram
  actor User
  participant API_Gateway
  participant Adaptive_UI_Engine
  participant Active_Learning_Service
  participant Kafka_Broker
  participant MongoDB_AL_Collections

  User->>API_Gateway: POST /api/v2/ui/layout(context)
  API_Gateway->>Adaptive_UI_Engine: getAdaptiveLayout(user_id, tenant_id, role_id, task_type, accessibility_flags)

  Adaptive_UI_Engine->>Active_Learning_Service: Resolve_layout(context_vector)
  Active_Learning_Service->>MongoDB_AL_Collections: INSERT al_pool_sample(context, predicted_layout_id)
  Active_Learning_Service-->>Adaptive_UI_Engine: LayoutConfig(layout_id, components, theme_overrides)

  Adaptive_UI_Engine-->>API_Gateway: 200 LayoutConfig
  API_Gateway-->>User: Rendered adaptive UI

  User->>API_Gateway: POST /api/v2/ui/feedback(session_id, layout_id, feedback)
  API_Gateway->>Adaptive_UI_Engine: submitUIFeedback(payload)
  Adaptive_UI_Engine->>Kafka_Broker: Produce ui.feedback(session_id, layout_id, metrics)

  Kafka_Broker-->>Active_Learning_Service: Consume ui.feedback
  Active_Learning_Service->>MongoDB_AL_Collections: UPDATE al_pool_sample_with_implicit_label

  Active_Learning_Service->>Active_Learning_Service: Periodic_MC_Dropout_uncertainty_estimation
  Active_Learning_Service->>MongoDB_AL_Collections: FIND top_entropy_diverse_samples
  Active_Learning_Service->>Kafka_Broker: Produce al.query(sample_id, context)

  Kafka_Broker-->>Active_Learning_Service: Consume al.label.acquired(sample_id, assigned_layout_id)
  Active_Learning_Service->>MongoDB_AL_Collections: UPDATE annotation_for_sample
  Active_Learning_Service->>Active_Learning_Service: Trigger_model_retrain_when_labels_threshold_reached
  Active_Learning_Service->>Kafka_Broker: Produce model.retrained(model_type=al_layout, version)
  Kafka_Broker-->>Adaptive_UI_Engine: model.retrained(al_layout, version)
  Adaptive_UI_Engine->>Adaptive_UI_Engine: Hot_reload_layout_model

Entity relationship diagram for core WorkflowAI Pro data schemas

erDiagram
  DOCUMENTS {
    uuid id
    uuid tenant_id
    char64 content_hash
    varchar32 doc_type
    varchar16 urgency
    text_array compliance_flags
    jsonb metadata
    timestamptz created_at
    timestamptz updated_at
  }

  ROUTING_DECISIONS {
    uuid id
    uuid document_id
    uuid tenant_id
    varchar16 decision
    numeric4_3 confidence
    uuid selected_path_id
    varchar32 model_version
    numeric8_2 inference_latency_ms
    timestamptz created_at
  }

  ROUTING_PATHS {
    uuid id
    uuid routing_decision_id
    smallint path_rank
    numeric6_2 total_predicted_duration_h
    numeric4_3 path_confidence
    jsonb stages
  }

  ROUTING_AUDIT_LOG {
    bigint id
    uuid document_id
    uuid tenant_id
    varchar64 stage_id
    uuid approver_id
    varchar16 action
    timestamptz occurred_at
    jsonb metadata
  }

  PREDICTION_LOGS {
    string prediction_id
    string tenant_id
    string document_id
    string model_version
    date created_at
    double inference_latency_ms
    array stages
    string overall_chain_risk
    bool feedback_received
  }

  CF_MODEL_ARTIFACTS {
    string model_id
    string version
    date created_at
    string status
    object hyperparams
    object metrics
    string artifact_path
    string training_data_snapshot
  }

  AL_POOL {
    string sample_id
    string tenant_id
    object context
    string predicted_layout_id
    double prediction_entropy
    double mc_dropout_variance
    string status
    date created_at
    date selected_at
    date annotated_at
    object annotation
  }

  AL_EXPERIMENTS {
    string experiment_id
    string incumbent_version
    string challenger_version
    string status
    date created_at
    date concluded_at
    double traffic_split
    object metrics
    string decision
  }

  DOCUMENTS ||--o{ ROUTING_DECISIONS : has
  DOCUMENTS ||--o{ ROUTING_AUDIT_LOG : has
  ROUTING_DECISIONS ||--o{ ROUTING_PATHS : includes

  DOCUMENTS ||--o{ PREDICTION_LOGS : has
  CF_MODEL_ARTIFACTS ||--o{ PREDICTION_LOGS : generates

  AL_EXPERIMENTS ||--o{ AL_POOL : evaluates
  CF_MODEL_ARTIFACTS ||--o{ AL_EXPERIMENTS : compared_in

File-Level Changes

Change	Details	Files
Introduce a structured XML specification document with CDATA-wrapped Markdown defining the complete WorkflowAI Pro system architecture and behaviour.	Defines top-level XML metadata for the WorkflowAI Pro specification, including document reference, versioning, classification, and abstract. Embeds a full Markdown technical spec inside a CDATA section, covering executive summary, C4 container architecture with Mermaid, AI component designs, implementation specs, performance/security/compliance, and roadmap and risks. Ensures XML and CDATA constraints are respected (parsable XML, no forbidden CDATA terminators, required section counts, and validation notes).	`docs/specifications/workflow-ai-pro.xml`
Specify implementation details for the Document Router service, including external contracts and storage/event integration.	Defines OpenAPI 3.0 routes for document routing, routing status, and graph/model health with associated schemas and JWT security. Designs a PostgreSQL 16 schema with hash-partitioned multi-tenant tables, monthly-partitioned audit logs, and RLS policies for tenant isolation. Describes Kafka topics and consumer/producer configuration for document ingestion, routing events, escalation, DLQ handling, and exactly-once processing semantics.	`docs/specifications/workflow-ai-pro.xml`
Specify implementation details for the Approval Predictor service, including APIs, persistence, caching, and messaging.	Defines OpenAPI 3.0 endpoints for bottleneck prediction and approver load queries, including payloads, horizons, and response structures. Details MongoDB collections with JSON Schema validation for prediction logs and NCF model artifacts, plus indexing strategy for monitoring and model lifecycle. Describes Redis key patterns and TTL policies for user embeddings, stage embeddings, CF scores, and temporal features, along with Kafka topics and retry/backoff configuration for approval events.	`docs/specifications/workflow-ai-pro.xml`
Specify implementation details for the Adaptive UI Engine and its active learning loop.	Defines OpenAPI 3.0 endpoints for adaptive layout resolution, UI feedback ingestion, and active learning status, with emphasis on accessibility flags and layout config schema. Details MongoDB collections and validators for the active-learning pool and A/B experiment tracking, including status fields and metrics used for promotion decisions. Describes Kafka topics and consumer configs for UI feedback, annotation queries, label acquisition, and model retrain notifications across services.	`docs/specifications/workflow-ai-pro.xml`
Formalise AI model designs, non-functional requirements, and roadmap/risk posture for the platform.	Documents HeteroGAT GNN-based document routing, NCF with temporal attention for bottleneck prediction, and BatchBALD-driven active learning for UI layout selection, including model sizes, features, and training loops. Captures performance, security, and compliance requirements such as latency SLOs, availability targets, GDPR/SOC 2 alignment, and RBAC model structure. Outlines an 18‑month phased roadmap (Q1–Q6) and two key risk areas with mitigation strategies around model drift and Kafka backpressure/partition skew.	`docs/specifications/workflow-ai-pro.xml`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2026-03-20T08:59:32Z

📝 Walkthrough

Walkthrough

A new XML technical specification document for WorkflowAI Pro platform has been added, detailing a C4 architecture with three core services (Document Router, Approval Predictor, Adaptive UI Engine), AI component architectures, end-to-end data flows over Kafka/Redis/PostgreSQL/MongoDB, OpenAPI endpoint contracts, schema definitions, and operational requirements.

Changes

Cohort / File(s)	Summary
WorkflowAI Pro Specification `docs/specifications/workflow-ai-pro.xml`	New technical specification (v1.0.0, DRAFT) defining platform architecture, AI component workflows, API contracts, persistence schemas, infrastructure topology, performance/security/compliance requirements, and 18-month roadmap.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A specification born today,
With schemas bright and flows at play,
AI routers dancing through the streams,
Approval dreams and UI schemes,
WorkflowAI Pro takes flight away! 📋✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and specifically references the main change: adding SPEC-WFAIPRO-001, a technical specification document for WorkflowAI Pro, which aligns perfectly with the changeset containing a new XML specification file.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch genspark_ai_developer

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

netlify · 2026-03-20T08:59:37Z

❌ Deploy Preview for onefinestarstuff failed.

Name	Link
🔨 Latest commit	`d6dae5b`
🔍 Latest deploy log	https://app.netlify.com/projects/onefinestarstuff/deploys/69bd0c689fbbe7000849ab60

difflens · 2026-03-20T08:59:47Z

View changes in DiffLens

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

Kafka topic names and semantics are described in multiple places (e.g., high-level bullets vs detailed YAML sections); consider standardizing the exact topic names and DLQ naming across the entire spec to avoid ambiguity during implementation.
There are many timestamp and duration fields across APIs and schemas (PostgreSQL, MongoDB, OpenAPI); explicitly stating a global convention (e.g., all timestamps in ISO 8601 UTC, all durations in hours/ms) near the top of the spec would reduce the risk of subtle cross-service inconsistencies.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Kafka topic names and semantics are described in multiple places (e.g., high-level bullets vs detailed YAML sections); consider standardizing the exact topic names and DLQ naming across the entire spec to avoid ambiguity during implementation.
- There are many timestamp and duration fields across APIs and schemas (PostgreSQL, MongoDB, OpenAPI); explicitly stating a global convention (e.g., all timestamps in ISO 8601 UTC, all durations in hours/ms) near the top of the spec would reduce the risk of subtle cross-service inconsistencies.

## Individual Comments

### Comment 1
<location path="docs/specifications/workflow-ai-pro.xml" line_range="475-484" />
<code_context>
+CREATE TABLE routing_audit_log (
</code_context>
<issue_to_address>
**🚨 issue (security):** Apply consistent tenant isolation to `routing_audit_log` (and potentially `routing_paths`) to align with the multi-tenant security goals.

`routing_audit_log` includes `tenant_id` but is not protected by RLS, unlike `documents` and `routing_decisions`. To maintain strict tenant isolation and long-term audit data safety, enable RLS here and add a `tenant_isolation` policy. Also consider adding `tenant_id` and RLS to `routing_paths` so it remains tenant-scoped even if accessed without joining through `routing_decision_id`.
</issue_to_address>

### Comment 2
<location path="docs/specifications/workflow-ai-pro.xml" line_range="464-471" />
<code_context>
+CREATE INDEX idx_routing_decisions_tenant_created
+    ON routing_decisions (tenant_id, created_at DESC);
+
+CREATE TABLE routing_paths (
+    id                          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    routing_decision_id         UUID NOT NULL
+        REFERENCES routing_decisions(id) ON DELETE CASCADE,
+    path_rank                   SMALLINT NOT NULL,  -- 0=selected, 1-2=alternatives
+    total_predicted_duration_h  NUMERIC(6,2),
+    path_confidence             NUMERIC(4,3),
+    stages                      JSONB NOT NULL
+    -- array of {stage_id, approver_id, predicted_duration_h, bottleneck_prob}
+);
</code_context>
<issue_to_address>
**suggestion (performance):** Add an index on `(routing_decision_id, path_rank)` in `routing_paths` to support common query patterns efficiently.

Given how `RoutingDecision.selected_path` and `alternative_paths` will be used, queries will often filter/order by `routing_decision_id` and `path_rank` (e.g., rank 0 plus a few alternatives). With only a PK on `id`, these will devolve into table scans as data grows. Please add a non-unique index on `(routing_decision_id, path_rank)` to keep lookups efficient, especially under multi-tenant load.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-03-20T09:00:55Z

+CREATE TABLE routing_audit_log (
+    id          BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
+    document_id UUID NOT NULL,
+    tenant_id   UUID NOT NULL,
+    stage_id    VARCHAR(64) NOT NULL,
+    approver_id UUID,
+    action      VARCHAR(16) NOT NULL,
+    -- approved | rejected | delegated | escalated
+    occurred_at TIMESTAMPTZ NOT NULL DEFAULT now(),
+    metadata    JSONB DEFAULT '{}'


🚨 issue (security): Apply consistent tenant isolation to routing_audit_log (and potentially routing_paths) to align with the multi-tenant security goals.

routing_audit_log includes tenant_id but is not protected by RLS, unlike documents and routing_decisions. To maintain strict tenant isolation and long-term audit data safety, enable RLS here and add a tenant_isolation policy. Also consider adding tenant_id and RLS to routing_paths so it remains tenant-scoped even if accessed without joining through routing_decision_id.

sourcery-ai · 2026-03-20T09:00:55Z

+CREATE TABLE routing_paths (
+    id                          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    routing_decision_id         UUID NOT NULL
+        REFERENCES routing_decisions(id) ON DELETE CASCADE,
+    path_rank                   SMALLINT NOT NULL,  -- 0=selected, 1-2=alternatives
+    total_predicted_duration_h  NUMERIC(6,2),
+    path_confidence             NUMERIC(4,3),
+    stages                      JSONB NOT NULL


suggestion (performance): Add an index on (routing_decision_id, path_rank) in routing_paths to support common query patterns efficiently.

Given how RoutingDecision.selected_path and alternative_paths will be used, queries will often filter/order by routing_decision_id and path_rank (e.g., rank 0 plus a few alternatives). With only a PK on id, these will devolve into table scans as data grows. Please add a non-unique index on (routing_decision_id, path_rank) to keep lookups efficient, especially under multi-tenant load.

difflens · 2026-03-20T09:01:55Z

View changes in DiffLens

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (8)

docs/specifications/workflow-ai-pro.xml (8)
550-550: Long max_poll_interval_ms may cause consumer group instability.

The Document Router consumer configuration sets max_poll_interval_ms: 300000 (5 minutes) to accommodate GNN inference latency. However, this very long interval increases the risk of:

Delayed detection of consumer failures

Extended partition ownership during hung/slow consumers

Consumer group rebalancing delays

Recommendation:

Verify that GNN inference P99 latency target (<200ms) is achieved in production

If inference occasionally exceeds 5 minutes, consider processing messages asynchronously (immediately commit offset after queuing message for background processing)

Implement consumer heartbeat monitoring to detect processing delays earlier than the 5-minute timeout
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/specifications/workflow-ai-pro.xml` at line 550, The consumer
configuration sets max_poll_interval_ms: 300000 which is too long and can cause
consumer-group instability; update the consumer behavior by either lowering
max_poll_interval_ms to a safer value (e.g., closer to expected GNN P99 <200ms)
or change processing to be asynchronous: immediately commit offsets after
enqueueing messages for background GNN inference and implement
heartbeat/monitoring to detect slow consumers earlier; locate and modify the
max_poll_interval_ms setting in workflow-ai-pro.xml and ensure any consumer loop
(consumer poll/commit logic and background worker/queue) is changed to enqueue
work and commit promptly while adding heartbeat/monitoring hooks.
735-781: Consider adding TTL indexes to prevent unbounded collection growth.

The prediction_logs collection stores every prediction but lacks a TTL (Time-To-Live) index. Without automatic expiration, this collection will grow indefinitely, potentially impacting performance and storage costs.

Consider adding TTL indexes for time-series data:
// Auto-delete prediction logs older than 180 days
db.prediction_logs.createIndex(
  { created_at: 1 },
  { expireAfterSeconds: 15552000 }  // 180 days
);

// Similar for AL pool - expire samples in "expired" status after 30 days
db.al_pool.createIndex(
  { created_at: 1 },
  { expireAfterSeconds: 2592000,  // 30 days
    partialFilterExpression: { status: "expired" } }
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/specifications/workflow-ai-pro.xml` around lines 735 - 781, The schema
for the prediction_logs collection lacks TTL indexes so data will grow
unbounded; add a TTL index on prediction_logs.created_at to expire old
prediction documents (e.g., 180 days) by creating an index with
expireAfterSeconds, and also add a TTL index on al_pool.created_at with a
partialFilterExpression for status: "expired" (e.g., 30 days) to auto-remove
expired AL samples; update the migration/schema diff to include
db.prediction_logs.createIndex({ created_at: 1 }, { expireAfterSeconds:
<seconds> }) and db.al_pool.createIndex({ created_at: 1 }, { expireAfterSeconds:
<seconds>, partialFilterExpression: { status: "expired" } }) so retention is
enforced.
558-560: Clarify transactional_id implementation pattern.

Line 560 specifies transactional_id: "doc-router-tx-{instance_id}" for exactly-once semantics, but doesn't explain how {instance_id} should be generated or managed. Each producer instance must have a unique transactional ID that persists across restarts.

Document the implementation approach:

How is instance_id generated? (e.g., pod name, UUID, consumer group member ID)

Is it stable across pod restarts?

How to handle transactional ID exhaustion/cleanup?

Recovery procedure when a transactional producer fails mid-transaction
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/specifications/workflow-ai-pro.xml` around lines 558 - 560, Clarify how
the transactional_id pattern "doc-router-tx-{instance_id}" must be implemented:
specify generation strategies for instance_id (e.g., use Kubernetes pod name for
stability, or a cluster-assigned persistent UUID stored in a volume/secret),
state whether the chosen approach is stable across restarts, describe
lifecycle/cleanup to avoid transactional ID exhaustion (e.g., reusing stable
IDs, TTL/policy for ephemeral IDs, admin tooling to remove retired IDs), and
document recovery steps for a producer that failed mid-transaction (how to
detect in-doubt transactions, force abort or resume via broker/admin APIs, and
recommended monitoring/alerting). Include references to transactional_id and
instance_id so implementers know where to apply each guidance.
834-846: Optimize embedding storage format for Redis.

Storing 64-dimensional embeddings as JSON strings (e.g., "[0.12,-0.34,...,0.56]") in Redis hash fields is inefficient:

Parsing overhead when reading embeddings

Increased memory footprint compared to binary formats

Slower serialization/deserialization

Consider alternatives:

Use Redis vector data type (Redis Stack with RediSearch) for native vector storage and similarity search

Store as binary-encoded float arrays using MessagePack or Protocol Buffers

Use HSET with separate numeric fields if individual dimensions need independent access
Example: Binary encoding with MessagePack
import msgpack
import numpy as np

# Encode embedding
embedding = np.array([0.12, -0.34, ..., 0.56], dtype=np.float32)
packed = msgpack.packb(embedding.tolist(), use_bin_type=True)
redis.hset(f"feat:{tenant_id}:user_emb:{user_id}", 
           "embedding", packed)

# Decode embedding
packed = redis.hget(f"feat:{tenant_id}:user_emb:{user_id}", "embedding")
embedding = np.array(msgpack.unpackb(packed, raw=False), dtype=np.float32)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/specifications/workflow-ai-pro.xml` around lines 834 - 846, The current
HSET usage storing the 64-dim embedding as a JSON string under the "embedding"
field (key pattern feat:{tenant_id}:user_emb:{user_id}) is inefficient; update
the write/read flows to store the embedding in a binary/vector-native
format—either switch to Redis Vector/RediSearch native vectors for similarity
use, or encode the float32 array with MessagePack/Protobuf before HSET and
decode on read—so modify the code that calls HSET for "embedding" to pack the
float32 array and the corresponding reader to unpack it (or replace HSET with
the Redis vector API), and keep other hash fields (department_id, last_updated,
etc.) unchanged.
831-875: Add Redis Cluster hash tags to key patterns for optimal performance.

The feature store keys lack Redis Cluster hash tags, which can lead to related keys being distributed across different cluster nodes, requiring cross-node multi-key operations.

For example, fetching all features for a user (embedding + temporal features) might require multiple cross-node requests.

Add hash tags to ensure related keys reside on the same hash slot:
-# Key: feat:{tenant_id}:user_emb:{user_id}
+# Key: feat:{tenant_id}:user_emb:{user_id}  -> use {tenant_id} or {user_id} as hash tag

-HSET feat:t-abc123:user_emb:u-789def
+HSET feat:{t-abc123}:user_emb:u-789def

-HSET feat:t-abc123:temporal:u-789def
+HSET feat:{t-abc123}:temporal:u-789def
This ensures all keys with the same {tenant_id} hash to the same Redis Cluster node, enabling efficient MGET/pipeline operations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/specifications/workflow-ai-pro.xml` around lines 831 - 875, The keys
must include Redis Cluster hash tags around the tenant identifier so related
keys land on the same hash slot; update all key patterns (e.g.,
feat:{tenant_id}:user_emb:{user_id}, feat:{tenant_id}:stage_emb:{stage_id},
feat:{tenant_id}:cf_score:{user_id}:{stage_id},
feat:{tenant_id}:temporal:{user_id}) to wrap only the tenant_id in braces (e.g.,
feat:{<tenant_id>}:user_emb:<user_id>) in every HSET/SET example and any related
comments so pipelines/MGETs operate on a single node.
1311-1320: Consider documenting additional risks and dependencies.

The roadmap and risk section covers model drift and Kafka backpressure, which are well-mitigated. However, an 18-month platform development with three complex AI systems may benefit from addressing additional risk categories:

Potential additional risks:

Team skill gaps: GNN, NCF, and Active Learning require specialized ML expertise. Mitigation: training plan, consultants, or hiring timeline

Data quality: AI models depend on high-quality training data. Mitigation: data validation pipeline, labeling quality checks

Cold start: New tenants without historical data. Mitigation: covered partially for NCF (line 148) but not for GNN routing

Regulatory changes: GDPR/SOC 2 requirements may evolve. Mitigation: quarterly compliance review

Infrastructure costs: ML infrastructure (GPUs, Redis cluster, Kafka) can be expensive. Mitigation: cost monitoring and optimization plan

Dependency on external systems: SharePoint, S3, IdP availability. Mitigation: graceful degradation, caching strategies

Since this is marked CONFIDENTIAL for CTO/VP Engineering, including a more comprehensive risk register would strengthen the business case and resource planning.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/specifications/workflow-ai-pro.xml` around lines 1311 - 1320, Add a
comprehensive additional-risk subsection alongside the existing "Risk -- Model
Drift / Data Distribution Shift" and "Risk -- Kafka Partition Skew /
Backpressure Cascade" entries that enumerates and mitigations for: Team skill
gaps (training plan, consultant/hiring timeline), Data quality (validation
pipelines, labeling QA), Cold-start for GNN routing (seeded priors, transfer
learning, rule-based fallback), Regulatory changes (quarterly compliance
reviews, legal monitoring), Infrastructure costs (cost monitoring, GPU/instance
rightsizing, spot/commitment strategies), and External system dependencies
(graceful degradation, caching, SLA-based failover). Place this new "Risk --
Additional: Team/Data/ColdStart/Compliance/Cost/Dependencies" block in the
18-month roadmap/risk section and ensure each bullet pairs a clear mitigation
with a measurable trigger or owner to match the existing style and tone.
205-207: Add pattern constraint for SHA-256 content hash.

The content_hash field is described as "SHA-256 hash of document content" but lacks a pattern constraint. SHA-256 hashes are exactly 64 hexadecimal characters.

Add validation:
content_hash:
  type: string
  pattern: '^[a-f0-9]{64}$'
  description: SHA-256 hash of document content
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/specifications/workflow-ai-pro.xml` around lines 205 - 207, The
content_hash schema currently lacks a pattern constraint; update the
content_hash field definition in the schema (the content_hash property) to
include a pattern that enforces exactly 64 lowercase hexadecimal characters (use
the regex ^[a-f0-9]{64}$), keep type: string and the existing description, so
the field validates as a SHA-256 hex digest.
450-450: PostgreSQL confidence column allows out-of-range values.

The confidence column is defined as NUMERIC(4,3) which allows values from -9.999 to 9.999, but the CHECK constraint limits it to 0-1. The NUMERIC definition should be NUMERIC(3,3) to represent values from 0.000 to 0.999, or increase precision if needed:
confidence           NUMERIC(4,3) NOT NULL CHECK (confidence BETWEEN 0 AND 1),
Actually, NUMERIC(4,3) means total 4 digits with 3 after decimal point, so it allows 0.000 to 9.999. For confidence scores 0.000 to 1.000, this should be:
confidence           NUMERIC(4,3) NOT NULL CHECK (confidence >= 0 AND confidence <= 1),
The existing CHECK is correct, but the type could be more precise. Consider NUMERIC(4,3) is fine since it allows 1.000, but you might want to document why this precision was chosen.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/specifications/workflow-ai-pro.xml` at line 450, The confidence column's
precision is ambiguous: keep the type as NUMERIC(4,3) to allow 1.000 (since
NUMERIC(3,3) maxes at 0.999) and retain the existing CHECK (confidence BETWEEN 0
AND 1); update the schema line for the confidence column to include a short
inline comment explaining why NUMERIC(4,3) was chosen (to permit 1.000) and
ensure the CHECK constraint on confidence remains in place.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/specifications/workflow-ai-pro.xml`:
- Around line 534-538: The DLQ topic configuration doc.routing.dlq currently
uses retention_ms: -1 which risks unbounded storage growth; change retention_ms
to a large but finite value (e.g., 7776000000 for 90 days) instead of -1, keep
or confirm cleanup_policy: compact as needed, and add operational controls:
create automated DLQ monitoring/alerting for topic depth and storage, and add a
runbook for inspecting/reprocessing messages (document procedures and thresholds
alongside the doc.routing.dlq configuration).
- Around line 492-498: Add documentation/comments explaining that the RLS
policies tenant_isolation_documents and tenant_isolation_routing rely on
current_setting('app.current_tenant')::UUID and that the application must set
this via "SET LOCAL app.current_tenant = '<tenant_uuid>'" at the start of each
transaction (or via middleware that runs per-transaction); note tradeoffs of
per-transaction vs per-connection when using connection poolers, describe error
handling if the setting is missing (e.g., detect and abort the transaction with
a clear error or raise a custom NOTICE/ERROR), and include an example
implementation pattern for middleware/connection wrapper that reads the
authenticated tenant ID and issues the SET LOCAL before any DB statements for
tables documents and routing_decisions.
- Line 1305: The doc has conflicting latency expectations: the global Kafka
consumer SLA "<5s end-to-end event processing latency" conflicts with Document
Router's consumer config max_poll_interval_ms: 300000 and the GNN P99 <200ms
target; update the spec to (1) state which percentile the "<5s" SLA refers to
(P50/P95/P99), (2) define concrete behavior when GNN inference exceeds 5s for
the Document Router Kafka consumer (e.g., enforce an inference timeout of 5s in
the Document Router's inference handler, emit the message to DLQ or mark for
human review and increment a metric/alert), and (3) reconcile the config by
either reducing max_poll_interval_ms to match the chosen SLA (e.g., 5000ms if
you require consumer poll intervals to support a 5s E2E SLA) or relaxing the SLA
to accept longer tail latency; reference Document Router, max_poll_interval_ms,
the GNN inference path and the "<5s end-to-end event processing latency" SLA
when making the change.
- Around line 455-457: The FK constraint fk_doc currently uses ON DELETE CASCADE
which will remove routing decisions, paths, and audit logs when a documents row
is deleted; change the constraint to use ON DELETE RESTRICT or ON DELETE SET
NULL and implement a soft-delete pattern on the documents table (e.g., add a
deleted_at flag and update application queries in routing logic to filter out
soft-deleted documents) and update any code that deletes documents to set the
soft-delete flag instead of issuing a hard DELETE; also add a migration to alter
the fk_doc constraint and handle existing NULLability if you choose SET NULL.
- Around line 176-416: The OpenAPI spec declares endpoints (operationIds:
routeDocument, getRoutingStatus, getGraphHealth and the Approval Predictor/UI
endpoints) and schemas (RoutingDecision, RoutingPath, EscalationResponse,
RoutingStatus) that are not implemented in backend/server.js; either implement
matching Express handlers for POST /api/v2/documents/route, GET
/api/v2/documents/:document_id/routing-status, GET /api/v2/routing/graph/health,
POST /api/v2/predictions/bottlenecks, GET
/api/v2/predictions/approver-load/:approver_id, POST /api/v2/ui/layout, POST
/api/v2/ui/feedback, GET /api/v2/ui/al/status in backend/server.js (hooking into
your business logic and returning the declared response shapes) and add
request/response JSON Schema validation middleware for the schemas
RoutingDecision, RoutingPath, EscalationResponse, RoutingStatus (or reuse your
existing validation utilities), or alternatively prune/update the OpenAPI
document to exactly match the two existing handlers (GET /api/wheel/stages and
POST /api/wheel/progress) and remove the unused schema/type declarations so the
spec and implementation stay synchronized.

---

Nitpick comments:
In `@docs/specifications/workflow-ai-pro.xml`:
- Line 550: The consumer configuration sets max_poll_interval_ms: 300000 which
is too long and can cause consumer-group instability; update the consumer
behavior by either lowering max_poll_interval_ms to a safer value (e.g., closer
to expected GNN P99 <200ms) or change processing to be asynchronous: immediately
commit offsets after enqueueing messages for background GNN inference and
implement heartbeat/monitoring to detect slow consumers earlier; locate and
modify the max_poll_interval_ms setting in workflow-ai-pro.xml and ensure any
consumer loop (consumer poll/commit logic and background worker/queue) is
changed to enqueue work and commit promptly while adding heartbeat/monitoring
hooks.
- Around line 735-781: The schema for the prediction_logs collection lacks TTL
indexes so data will grow unbounded; add a TTL index on
prediction_logs.created_at to expire old prediction documents (e.g., 180 days)
by creating an index with expireAfterSeconds, and also add a TTL index on
al_pool.created_at with a partialFilterExpression for status: "expired" (e.g.,
30 days) to auto-remove expired AL samples; update the migration/schema diff to
include db.prediction_logs.createIndex({ created_at: 1 }, { expireAfterSeconds:
<seconds> }) and db.al_pool.createIndex({ created_at: 1 }, { expireAfterSeconds:
<seconds>, partialFilterExpression: { status: "expired" } }) so retention is
enforced.
- Around line 558-560: Clarify how the transactional_id pattern
"doc-router-tx-{instance_id}" must be implemented: specify generation strategies
for instance_id (e.g., use Kubernetes pod name for stability, or a
cluster-assigned persistent UUID stored in a volume/secret), state whether the
chosen approach is stable across restarts, describe lifecycle/cleanup to avoid
transactional ID exhaustion (e.g., reusing stable IDs, TTL/policy for ephemeral
IDs, admin tooling to remove retired IDs), and document recovery steps for a
producer that failed mid-transaction (how to detect in-doubt transactions, force
abort or resume via broker/admin APIs, and recommended monitoring/alerting).
Include references to transactional_id and instance_id so implementers know
where to apply each guidance.
- Around line 834-846: The current HSET usage storing the 64-dim embedding as a
JSON string under the "embedding" field (key pattern
feat:{tenant_id}:user_emb:{user_id}) is inefficient; update the write/read flows
to store the embedding in a binary/vector-native format—either switch to Redis
Vector/RediSearch native vectors for similarity use, or encode the float32 array
with MessagePack/Protobuf before HSET and decode on read—so modify the code that
calls HSET for "embedding" to pack the float32 array and the corresponding
reader to unpack it (or replace HSET with the Redis vector API), and keep other
hash fields (department_id, last_updated, etc.) unchanged.
- Around line 831-875: The keys must include Redis Cluster hash tags around the
tenant identifier so related keys land on the same hash slot; update all key
patterns (e.g., feat:{tenant_id}:user_emb:{user_id},
feat:{tenant_id}:stage_emb:{stage_id},
feat:{tenant_id}:cf_score:{user_id}:{stage_id},
feat:{tenant_id}:temporal:{user_id}) to wrap only the tenant_id in braces (e.g.,
feat:{<tenant_id>}:user_emb:<user_id>) in every HSET/SET example and any related
comments so pipelines/MGETs operate on a single node.
- Around line 1311-1320: Add a comprehensive additional-risk subsection
alongside the existing "Risk -- Model Drift / Data Distribution Shift" and "Risk
-- Kafka Partition Skew / Backpressure Cascade" entries that enumerates and
mitigations for: Team skill gaps (training plan, consultant/hiring timeline),
Data quality (validation pipelines, labeling QA), Cold-start for GNN routing
(seeded priors, transfer learning, rule-based fallback), Regulatory changes
(quarterly compliance reviews, legal monitoring), Infrastructure costs (cost
monitoring, GPU/instance rightsizing, spot/commitment strategies), and External
system dependencies (graceful degradation, caching, SLA-based failover). Place
this new "Risk -- Additional: Team/Data/ColdStart/Compliance/Cost/Dependencies"
block in the 18-month roadmap/risk section and ensure each bullet pairs a clear
mitigation with a measurable trigger or owner to match the existing style and
tone.
- Around line 205-207: The content_hash schema currently lacks a pattern
constraint; update the content_hash field definition in the schema (the
content_hash property) to include a pattern that enforces exactly 64 lowercase
hexadecimal characters (use the regex ^[a-f0-9]{64}$), keep type: string and the
existing description, so the field validates as a SHA-256 hex digest.
- Line 450: The confidence column's precision is ambiguous: keep the type as
NUMERIC(4,3) to allow 1.000 (since NUMERIC(3,3) maxes at 0.999) and retain the
existing CHECK (confidence BETWEEN 0 AND 1); update the schema line for the
confidence column to include a short inline comment explaining why NUMERIC(4,3)
was chosen (to permit 1.000) and ensure the CHECK constraint on confidence
remains in place.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5d128e43-fbcb-4f96-8e6e-1bd1b46c468f

📥 Commits

Reviewing files that changed from the base of the PR and between 1d3de7d and d6dae5b.

📒 Files selected for processing (1)

docs/specifications/workflow-ai-pro.xml

coderabbitai · 2026-03-20T09:05:39Z

+```yaml
+openapi: 3.0.3
+info:
+  title: Document Router Service API
+  version: 2.1.0
+  description: Intelligent document routing powered by heterogeneous GNN inference.
+
+paths:
+  /api/v2/documents/route:
+    post:
+      operationId: routeDocument
+      summary: Submit a document for AI-powered routing
+      tags: [routing]
+      security:
+        - BearerAuth: []
+      requestBody:
+        required: true
+        content:
+          application/json:
+            schema:
+              type: object
+              required: [document_id, tenant_id, content_hash, doc_type]
+              properties:
+                document_id:
+                  type: string
+                  format: uuid
+                tenant_id:
+                  type: string
+                  format: uuid
+                content_hash:
+                  type: string
+                  description: SHA-256 hash of document content
+                doc_type:
+                  type: string
+                  enum: [contract, invoice, policy, legal_brief, hr_form, engineering_spec, compliance_report]
+                urgency:
+                  type: string
+                  enum: [critical, high, standard, low]
+                  default: standard
+                compliance_flags:
+                  type: array
+                  items:
+                    type: string
+                    enum: [gdpr, sox, hipaa, pci_dss, itar]
+                metadata:
+                  type: object
+                  additionalProperties: true
+      responses:
+        '200':
+          description: Routing decision computed
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/RoutingDecision'
+        '202':
+          description: Low-confidence routing; escalated to human review
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/EscalationResponse'
+        '422':
+          description: Unprocessable document features
+        '429':
+          description: Rate limit exceeded
+
+  /api/v2/documents/{document_id}/routing-status:
+    get:
+      operationId: getRoutingStatus
+      summary: Retrieve current routing state and audit trail
+      tags: [routing]
+      security:
+        - BearerAuth: []
+      parameters:
+        - name: document_id
+          in: path
+          required: true
+          schema:
+            type: string
+            format: uuid
+      responses:
+        '200':
+          description: Routing status with full path trace
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/RoutingStatus'
+        '404':
+          description: Document not found
+
+  /api/v2/routing/graph/health:
+    get:
+      operationId: getGraphHealth
+      summary: GNN model and graph index health check
+      tags: [operations]
+      security:
+        - BearerAuth: []
+      responses:
+        '200':
+          description: Graph and model health metrics
+          content:
+            application/json:
+              schema:
+                type: object
+                properties:
+                  model_version:
+                    type: string
+                  graph_node_count:
+                    type: integer
+                  graph_edge_count:
+                    type: integer
+                  avg_inference_latency_ms:
+                    type: number
+                  p99_inference_latency_ms:
+                    type: number
+                  last_retrain_timestamp:
+                    type: string
+                    format: date-time
+                  feature_store_status:
+                    type: string
+                    enum: [healthy, degraded, unavailable]
+
+components:
+  securitySchemes:
+    BearerAuth:
+      type: http
+      scheme: bearer
+      bearerFormat: JWT
+
+  schemas:
+    RoutingDecision:
+      type: object
+      properties:
+        document_id:
+          type: string
+          format: uuid
+        routing_id:
+          type: string
+          format: uuid
+        decision:
+          type: string
+          enum: [auto_routed, human_review]
+        confidence:
+          type: number
+          minimum: 0
+          maximum: 1
+        selected_path:
+          $ref: '#/components/schemas/RoutingPath'
+        alternative_paths:
+          type: array
+          maxItems: 2
+          items:
+            $ref: '#/components/schemas/RoutingPath'
+        model_version:
+          type: string
+        inference_latency_ms:
+          type: number
+        timestamp:
+          type: string
+          format: date-time
+
+    RoutingPath:
+      type: object
+      properties:
+        path_id:
+          type: string
+          format: uuid
+        stages:
+          type: array
+          items:
+            type: object
+            properties:
+              stage_id:
+                type: string
+              stage_name:
+                type: string
+              assigned_approver_id:
+                type: string
+                format: uuid
+              predicted_duration_hours:
+                type: number
+              bottleneck_probability:
+                type: number
+        total_predicted_duration_hours:
+          type: number
+        path_confidence:
+          type: number
+
+    EscalationResponse:
+      type: object
+      properties:
+        document_id:
+          type: string
+          format: uuid
+        escalation_id:
+          type: string
+          format: uuid
+        reason:
+          type: string
+        top_candidate_paths:
+          type: array
+          items:
+            $ref: '#/components/schemas/RoutingPath'
+        escalated_to:
+          type: string
+          format: uuid
+
+    RoutingStatus:
+      type: object
+      properties:
+        document_id:
+          type: string
+          format: uuid
+        current_stage:
+          type: string
+        overall_status:
+          type: string
+          enum: [in_progress, completed, rejected, escalated, stalled]
+        path_trace:
+          type: array
+          items:
+            type: object
+            properties:
+              stage_id:
+                type: string
+              approver_id:
+                type: string
+                format: uuid
+              entered_at:
+                type: string
+                format: date-time
+              completed_at:
+                type: string
+                format: date-time
+                nullable: true
+              action:
+                type: string
+                enum: [approved, rejected, delegated, pending]
+        sla_status:
+          type: string
+          enum: [on_track, at_risk, breached]
+```


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify which API endpoints are actually implemented in the backend echo "=== Searching for declared endpoints in backend codebase ===" # Search for Document Router endpoints echo -e "\n--- Document Router endpoints ---" rg -n "'/api/v2/documents/route'" --type js rg -n "'/api/v2/routing/graph/health'" --type js # Search for Approval Predictor endpoints echo -e "\n--- Approval Predictor endpoints ---" rg -n "'/api/v2/predictions/bottlenecks'" --type js rg -n "'/api/v2/predictions/approver-load'" --type js # Search for Adaptive UI endpoints echo -e "\n--- Adaptive UI Engine endpoints ---" rg -n "'/api/v2/ui/layout'" --type js rg -n "'/api/v2/ui/feedback'" --type js rg -n "'/api/v2/ui/al/status'" --type js # Show what endpoints actually exist echo -e "\n=== Actual endpoints found in backend ===" rg -n "app\.(get|post|put|delete|patch)\(" backend/server.js -A 1

Repository: OneFineStarstuff/OneFineStarstuff.github.io

Length of output: 495

Specification-implementation mismatch: declared endpoints do not exist in the backend.

The specification declares 8 OpenAPI endpoints across three services, but the backend (backend/server.js) only implements 2 endpoints:

GET /api/wheel/stages

POST /api/wheel/progress

The specified but unimplemented endpoints are:

Document Router Service:

POST /api/v2/documents/route

GET /api/v2/documents/{document_id}/routing-status

GET /api/v2/routing/graph/health

Approval Predictor Service:

POST /api/v2/predictions/bottlenecks

GET /api/v2/predictions/approver-load/{approver_id}

Adaptive UI Engine:

POST /api/v2/ui/layout

POST /api/v2/ui/feedback

GET /api/v2/ui/al/status

Additionally, no validation schemas exist for the request/response types declared in this specification (RoutingDecision, RoutingPath, EscalationResponse, etc.).

Either implement all declared endpoints and add corresponding validation middleware, or update the specification to match the actual implementation. Leaving this unresolved will block integration and create confusion about system capabilities.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/specifications/workflow-ai-pro.xml` around lines 176 - 416, The OpenAPI spec declares endpoints (operationIds: routeDocument, getRoutingStatus, getGraphHealth and the Approval Predictor/UI endpoints) and schemas (RoutingDecision, RoutingPath, EscalationResponse, RoutingStatus) that are not implemented in backend/server.js; either implement matching Express handlers for POST /api/v2/documents/route, GET /api/v2/documents/:document_id/routing-status, GET /api/v2/routing/graph/health, POST /api/v2/predictions/bottlenecks, GET /api/v2/predictions/approver-load/:approver_id, POST /api/v2/ui/layout, POST /api/v2/ui/feedback, GET /api/v2/ui/al/status in backend/server.js (hooking into your business logic and returning the declared response shapes) and add request/response JSON Schema validation middleware for the schemas RoutingDecision, RoutingPath, EscalationResponse, RoutingStatus (or reuse your existing validation utilities), or alternatively prune/update the OpenAPI document to exactly match the two existing handlers (GET /api/wheel/stages and POST /api/wheel/progress) and remove the unused schema/type declarations so the spec and implementation stay synchronized.

coderabbitai · 2026-03-20T09:05:39Z

+    CONSTRAINT fk_doc FOREIGN KEY (document_id)
+        REFERENCES documents(id) ON DELETE CASCADE
+);


⚠️ Potential issue | 🟠 Major

Consider the impact of CASCADE deletion on routing decisions.

The foreign key constraint uses ON DELETE CASCADE, meaning if a document is deleted, all associated routing decisions, paths, and audit logs will be automatically deleted. This could result in loss of historical routing data that may be valuable for model training or compliance audit trails.

Consider using ON DELETE RESTRICT or ON DELETE SET NULL with a soft-delete pattern for documents to preserve routing history.

🛡️ Alternative approach: soft delete pattern

CREATE TABLE documents ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id UUID NOT NULL, content_hash CHAR(64) NOT NULL, doc_type VARCHAR(32) NOT NULL, urgency VARCHAR(16) NOT NULL DEFAULT 'standard', compliance_flags TEXT[] DEFAULT '{}', metadata JSONB DEFAULT '{}', + deleted_at TIMESTAMPTZ NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), updated_at TIMESTAMPTZ NOT NULL DEFAULT now() ) PARTITION BY HASH (tenant_id);

Then change the foreign key:

CONSTRAINT fk_doc FOREIGN KEY (document_id) - REFERENCES documents(id) ON DELETE CASCADE + REFERENCES documents(id) ON DELETE RESTRICT

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/specifications/workflow-ai-pro.xml` around lines 455 - 457, The FK constraint fk_doc currently uses ON DELETE CASCADE which will remove routing decisions, paths, and audit logs when a documents row is deleted; change the constraint to use ON DELETE RESTRICT or ON DELETE SET NULL and implement a soft-delete pattern on the documents table (e.g., add a deleted_at flag and update application queries in routing logic to filter out soft-deleted documents) and update any code that deletes documents to set the soft-delete flag instead of issuing a hard DELETE; also add a migration to alter the fk_doc constraint and handle existing NULLability if you choose SET NULL.

coderabbitai · 2026-03-20T09:05:39Z

+ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
+CREATE POLICY tenant_isolation_documents ON documents
+    USING (tenant_id = current_setting('app.current_tenant')::UUID);
+
+ALTER TABLE routing_decisions ENABLE ROW LEVEL SECURITY;
+CREATE POLICY tenant_isolation_routing ON routing_decisions
+    USING (tenant_id = current_setting('app.current_tenant')::UUID);


⚠️ Potential issue | 🟠 Major

Document the required session variable for Row-Level Security policies.

The RLS policies rely on current_setting('app.current_tenant')::UUID, but the specification doesn't document how this session variable should be set. The application must execute SET LOCAL app.current_tenant = '<tenant_uuid>' at the beginning of each transaction for RLS to work correctly.

Without proper documentation, developers may encounter authorization failures or inadvertently expose data across tenant boundaries.

Add a comment in the SQL schema or a dedicated section explaining:

How to set the session variable (e.g., via middleware, connection pooler)

When it must be set (per transaction vs per connection)

Error handling when the variable is not set

Example implementation pattern

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/specifications/workflow-ai-pro.xml` around lines 492 - 498, Add documentation/comments explaining that the RLS policies tenant_isolation_documents and tenant_isolation_routing rely on current_setting('app.current_tenant')::UUID and that the application must set this via "SET LOCAL app.current_tenant = '<tenant_uuid>'" at the start of each transaction (or via middleware that runs per-transaction); note tradeoffs of per-transaction vs per-connection when using connection poolers, describe error handling if the setting is missing (e.g., detect and abort the transaction with a clear error or raise a custom NOTICE/ERROR), and include an example implementation pattern for middleware/connection wrapper that reads the authenticated tenant ID and issues the SET LOCAL before any DB statements for tables documents and routing_decisions.

coderabbitai · 2026-03-20T09:05:39Z

+  doc.routing.dlq:
+    partitions: 6
+    replication_factor: 3
+    retention_ms: -1              # infinite retention for DLQ
+    cleanup_policy: compact


⚠️ Potential issue | 🟠 Major

Risk: Infinite retention on DLQ topic can cause unbounded storage growth.

The doc.routing.dlq topic uses retention_ms: -1 (infinite retention) combined with cleanup_policy: compact. While this ensures failed messages are never lost, it can lead to unbounded storage growth if DLQ messages are not actively monitored and remediated.

Consider:

Setting a large but finite retention (e.g., 90 days: 7776000000ms)

Implementing automated DLQ monitoring with alerting when depth exceeds thresholds

Creating a runbook for DLQ message inspection and reprocessing

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/specifications/workflow-ai-pro.xml` around lines 534 - 538, The DLQ topic configuration doc.routing.dlq currently uses retention_ms: -1 which risks unbounded storage growth; change retention_ms to a large but finite value (e.g., 7776000000 for 90 days) instead of -1, keep or confirm cleanup_policy: compact as needed, and add operational controls: create automated DLQ monitoring/alerting for topic depth and storage, and add a runbook for inspecting/reprocessing messages (document procedures and thresholds alongside the doc.routing.dlq configuration).

coderabbitai · 2026-03-20T09:05:39Z

+
+# 5. Performance, Security and Compliance
+
+- **SLAs:** Document routing P99 <200ms; bottleneck prediction P95 <500ms; UI layout resolution P95 <100ms; system-wide availability 99.95% (43.8 min/month max downtime); all Kafka consumers maintain <5s end-to-end event processing latency; model retraining completes within 4-hour nightly batch window.


⚠️ Potential issue | 🟠 Major

Conflicting latency requirements: Kafka consumer SLA vs GNN inference timeout.

The performance requirements state "all Kafka consumers maintain <5s end-to-end event processing latency," but the Document Router Kafka consumer configuration (line 550) sets max_poll_interval_ms: 300000 (5 minutes) to accommodate GNN inference.

This creates a conflict:

If GNN inference takes >5 seconds, the <5s end-to-end latency SLA is breached

The 5-minute timeout suggests inference could take much longer than 5 seconds

The stated GNN P99 latency target of <200ms should normally satisfy the 5s SLA, but the large timeout indicates potential outliers

Clarify:

Is the <5s latency requirement for P50, P95, or P99?

How should the system behave when GNN inference exceeds 5 seconds? (e.g., timeout and route to DLQ, escalate to human review)

Update either the max_poll_interval_ms or the latency SLA to be consistent

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/specifications/workflow-ai-pro.xml` at line 1305, The doc has conflicting latency expectations: the global Kafka consumer SLA "<5s end-to-end event processing latency" conflicts with Document Router's consumer config max_poll_interval_ms: 300000 and the GNN P99 <200ms target; update the spec to (1) state which percentile the "<5s" SLA refers to (P50/P95/P99), (2) define concrete behavior when GNN inference exceeds 5s for the Document Router Kafka consumer (e.g., enforce an inference timeout of 5s in the Document Router's inference handler, emit the message to DLQ or mark for human review and increment a metric/alert), and (3) reconcile the config by either reducing max_poll_interval_ms to match the chosen SLA (e.g., 5000ms if you require consumer poll intervals to support a 5s E2E SLA) or relaxing the SLA to accept longer tail latency; reference Document Router, max_poll_interval_ms, the GNN inference path and the "<5s end-to-end event processing latency" SLA when making the change.

difflens · 2026-03-20T09:29:21Z

View changes in DiffLens

penify-dev · 2026-03-20T09:29:33Z

PR Review 🔍

⏱️ Estimated effort to review [1-5]	5, because the PR introduces a substantial XML document with detailed technical specifications, including complex architecture, API definitions, and database schemas. The review will require careful examination of the entire document to ensure accuracy and completeness.
🧪 Relevant tests	No
⚡ Possible issues	Possible Bug: The XML structure must be validated against the appropriate XML schema to ensure it adheres to the expected format and standards.
⚡ Possible issues	Documentation Clarity: Some sections may require clearer explanations or examples to ensure that all stakeholders can understand the technical specifications.
🔒 Security concerns	No

penify-dev · 2026-03-20T09:29:38Z

PR Code Suggestions ✨

No code suggestions found for PR.

pull-request-size Bot added the size/XXL label Mar 20, 2026

gstraccini Bot assigned OneFineStarstuff Mar 20, 2026

gstraccini Bot approved these changes Mar 20, 2026

View reviewed changes

sourcery-ai Bot reviewed Mar 20, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 20, 2026

View reviewed changes

penify-dev Bot added the enhancement New feature or request label Mar 20, 2026

penify-dev Bot added the Review effort [1-5]: 5 label Mar 20, 2026

OneFineStarstuff assigned Claude and Codex Mar 20, 2026

OneFineStarstuff merged commit 4ffe209 into main Mar 20, 2026
26 of 95 checks passed

coderabbitai Bot mentioned this pull request Apr 24, 2026

feat(WORKFLOWAI-PRO-WP-033) v1.0.0 — WorkflowAI Pro Enterprise AI Governance Platform Specification (2026-2030) #59

Merged


		# 5. Performance, Security and Compliance

		- SLAs: Document routing P99 <200ms; bottleneck prediction P95 <500ms; UI layout resolution P95 <100ms; system-wide availability 99.95% (43.8 min/month max downtime); all Kafka consumers maintain <5s end-to-end event processing latency; model retraining completes within 4-hour nightly batch window.

Conversation

OneFineStarstuff commented Mar 20, 2026 • edited by penify-dev Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

SPEC-WFAIPRO-001 — WorkflowAI Pro Technical Specification

Overview

Required Sections — All 6 Present and Validated

Section 4 — Implementation Specs Detail

Document Router

Approval Predictor

Adaptive UI Engine

Constraint Compliance

Validation Results

Files Changed

Summary by Sourcery

Summary by CodeRabbit

Description

Changes walkthrough 📝

Uh oh!

code-genius-code-coverage Bot commented Mar 20, 2026

Uh oh!

semanticdiff-com Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gitnotebooks Bot commented Mar 20, 2026

Uh oh!

vercel Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

difflens Bot commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 20, 2026

Uh oh!

sourcery-ai Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for document routing and bottleneck prediction

Sequence diagram for adaptive UI layout resolution and active learning loop

Entity relationship diagram for core WorkflowAI Pro data schemas

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

netlify Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Deploy Preview for onefinestarstuff failed.

Uh oh!

difflens Bot commented Mar 20, 2026

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

difflens Bot commented Mar 20, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

OneFineStarstuff commented Mar 20, 2026 •

edited by penify-dev Bot

Loading

semanticdiff-com Bot commented Mar 20, 2026 •

edited

Loading

vercel Bot commented Mar 20, 2026 •

edited

Loading

sourcery-ai Bot commented Mar 20, 2026 •

edited

Loading

coderabbitai Bot commented Mar 20, 2026 •

edited

Loading

netlify Bot commented Mar 20, 2026 •

edited

Loading