feat(ui): KG-scoped data source onboarding (k-extract flow)#737
Open
aredenba-rh wants to merge 153 commits into
Open
feat(ui): KG-scoped data source onboarding (k-extract flow)#737aredenba-rh wants to merge 153 commits into
aredenba-rh wants to merge 153 commits into
Conversation
* chore(skills): add subagent delivery execution protocol Add a reusable subagent skill that standardizes issue-based branching, TDD execution, PR structure, and merge/conflict handling into feature/manage-knowledge-graph. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(management): add knowledge graph workspace mode lifecycle Implement schema_bootstrap as the default workspace mode and persist irreversible transition state to extraction_operations across domain, repository, API responses, and migration coverage. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
…681) Add a workspace-status API projection with mode, readiness flags, transition eligibility, and session pointers, including service and route authorization coverage for manage workspace rendering. Co-authored-by: Cursor <cursoragent@cursor.com>
…#682) Enforce workspace readiness checks for minimum entity/relationship type coverage and prepopulated type instance presence, and project blocking reasons so validate/transition workflows can render actionable feedback. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose authorized validate and transition commands for knowledge graph workspaces, persist session pointers, and create an extraction-mode session identifier when moving from bootstrap to extraction operations. Co-authored-by: Cursor <cursoragent@cursor.com>
Add durable run-level mutation metadata storage and lifecycle persistence for session/scope identity, timestamps, token-cost totals, and operation-count summaries linked to each sync run. Co-authored-by: Cursor <cursoragent@cursor.com>
Emit operation-class counts and token/cost totals from mutation-log application results into MutationsApplied payloads so downstream sync lifecycle persistence can finalize run-level metadata. Co-authored-by: Cursor <cursoragent@cursor.com>
#686) Scaffold extraction application/presentation package structure and add pytest-archon rules enforcing DDD layer boundaries plus cross-context isolation so subsequent extraction features stay architecturally clean. Co-authored-by: Cursor <cursoragent@cursor.com>
Implement per-user/per-knowledge-graph/per-mode extraction session lifecycle behaviors with clear-chat reset semantics and archived-session retention backed by repository ports and unit coverage. Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve mode-specific extraction skill templates from global defaults and apply deterministic knowledge-graph override merges so session prompts are stable, customizable, and repeatable. Co-authored-by: Cursor <cursoragent@cursor.com>
Persist clone-head, last-extraction baseline, and tracked-branch head commit references for data sources and expose them in management API responses for downstream ingestion and UI commit-status workflows. Co-authored-by: Cursor <cursoragent@cursor.com>
Prepare Git-backed ingestion context by loading data-source commit references, refreshing tracked branch head, and passing baseline commit plus resolved credentials into the ingestion pipeline before packaging begins. Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # src/api/ingestion/application/services/ingestion_service.py # src/api/ingestion/infrastructure/event_handler.py # src/api/ingestion/ports/services.py # src/api/tests/unit/ingestion/infrastructure/test_ingestion_event_handler.py
Skip heavy extraction when tracked branch head equals the last extraction baseline by emitting a completed lifecycle event and recording an explicit no-change audit log entry on the sync run. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose a data-source diff summary API that compares the last extraction baseline to tracked branch head and returns aggregate counts plus a large-list-safe changed-file preview for maintenance decisions. Co-authored-by: Cursor <cursoragent@cursor.com>
Show commit-based diff counts immediately on each data source card and render the changed-file list as collapsed-by-default with explicit expand/collapse controls for large-diff safe browsing. Co-authored-by: Cursor <cursoragent@cursor.com>
…695) Add explicit data-source actions to refresh tracked/clone commit references and adopt tracked head as the current extraction baseline. This lets the UI surface per-source changed-file counts with user-controlled commit context updates for maintenance decisioning. Co-authored-by: Cursor <cursoragent@cursor.com>
Strengthen subagent delivery guidance with a parallel execution model, required context packs, and a blocker-question escalation flow so multiple agents can pause and ask focused questions without serializing delivery. Co-authored-by: Cursor <cursoragent@cursor.com>
) (#698) Seed schema bootstrap sessions with a capabilities-intake prompt that offers first-pass or guided co-design paths, and persist the selected path/capability summary in session runtime context so the conversation remains continuous across requests. Co-authored-by: Cursor <cursoragent@cursor.com>
…679) (#699) Build a filesystem runtime context for extraction workloads by materializing ingestion package resources, reconstructing repository files, and exposing a deterministic skills directory path; wire it through extraction event handling and local/deployed container configuration. Co-authored-by: Cursor <cursoragent@cursor.com>
#700) Enhance schema browser rows to display prepopulated type indicators and live per-type instance counts with lazy query-backed loading, while extending shared type contracts and tests to cover the new inspector metadata behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
…671) (#701) Add manage-authorized run-control operations (start, pause, halt, reset_running, reset_failed, reset_completed, reset_all) over data source sync runs, expose them via dedicated management routes, and verify behavior with unit tests for both service transitions and HTTP contract responses. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose sync-run token/cost metadata in management API responses and add an extraction telemetry dashboard in the data-sources workspace with active worker counts, status buckets, recent job events, and 24h cost trend indicators backed by auto-refreshing sync data. Co-authored-by: Cursor <cursoragent@cursor.com>
Add knowledge-graph scoped maintenance schedule APIs with timezone-aware cron evaluation and persisted run outcomes, then expose the controls and history in the data-sources operations UI. Co-authored-by: Cursor <cursoragent@cursor.com>
…704) Extend the mutations console with a conversation-assisted draft flow and live entity/relationship inspector that highlights edited fields during the active session and resets highlights after apply/refresh. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace legacy row actions with Manage, Query, and Delete, remove inline edit controls from the list surface, and align structural tests to the new action contract. Co-authored-by: Cursor <cursoragent@cursor.com>
Extend the manage workspace page with an always-visible extraction conversation panel, clear-chat reset action, and a tabbed lower operations area for extraction jobs, manual mutations, and run/log navigation. Co-authored-by: Cursor <cursoragent@cursor.com>
Raise parallel extraction worker default from 2 to 20. Enforce per-instance description ownership on save, expose relationship authoring hints in config API, and keep assistant prompts correct on follow-up turns. Kill and Reset Running now stop orphaned extraction containers. Co-authored-by: Cursor <cursoragent@cursor.com>
…lization Re-fetch ingest-only archives when ZIPs are absent on disk so extraction jobs and sticky sessions populate repository-files. Gate readiness on archive presence and inject workload credentials into agentic-ci container env. Co-authored-by: Cursor <cursoragent@cursor.com>
…pare Persist successful extraction jobs as archived with mutation history and surface that in graph management. Validate relationship authoring against ontology and merge token/graph-write metrics from JSONL and agent streams. Use tarball-based GitHub full refresh with auth fallback, and order sync runs newest-first so prepare retries show accurate UI state. Co-authored-by: Cursor <cursoragent@cursor.com>
…aph failures Add one-command dev DB backup and restore, auto-repair corrupt tenant AGE graphs, return HTTP 503 for graph storage errors, and update GMA instructions to smoke-test prepopulation and stop on infrastructure failures. Co-authored-by: Cursor <cursoragent@cursor.com>
…pply chaining Add run_scanner.py to combine scan-to-JSONL in one step, enrich readiness tasks with order/run_command and underscore relationship paths, and return next_action from apply so agents can chain labels without polling readiness every batch. Co-authored-by: Cursor <cursoragent@cursor.com>
…one mutation log Enable CREATE/UPDATE/DELETE in workload validation and tools, accumulate applied JSONL per assistant session, and write one ARCHIVED extraction job when Clear chat ends the session so it appears in Extraction Archive history. Co-authored-by: Cursor <cursoragent@cursor.com>
… usage Teach the Graph Management Assistant that each relationship UI row needs a distinct edge_types label, with read-back verification before claiming saves. Also propagate Claude SDK token/cost metrics into session journals and chat turn handling for operator visibility. Co-authored-by: Cursor <cursoragent@cursor.com>
…rhead Front-load graph_id, property gaps, JSONL examples, and directory-prefix file materialization so enrichment jobs spend less time probing formats and paths. Co-authored-by: Cursor <cursoragent@cursor.com>
Implement GMA one-off mutations with session archiving, rename Mutation logs to Graph Writes History, fix job set labels and cost display, and add a template-driven manual mutation authoring panel with schema instance views. Co-authored-by: Cursor <cursoragent@cursor.com>
Drop the session pointers rail item and detail panel from all GMA modes; session history now lives in Graph Writes History when chat is cleared. Co-authored-by: Cursor <cursoragent@cursor.com>
…schema explorer Add bulk instance edit workflow guidance, helpers/sync_instances.py for diff-and-generate JSONL, and clearer list-instances MCP tool docs so agents batch deletes instead of per-slug loops. Replace manage overview type badges with GraphSchemaExplorer and extract reusable entity/relationship type list components. Co-authored-by: Cursor <cursoragent@cursor.com>
…nagement Load 100 instances per type instead of a global cap, merge observed properties into schema display, and add paginated instance APIs with property search plus load-more UI on entity and relationship panels. Co-authored-by: Cursor <cursoragent@cursor.com>
Scope GMA containers and conversations by graph-management UI mode (three parallel sessions per user/KG), add start/end/clear session APIs, terminate containers without auto-restart, expire idle sessions after 1 hour, and archive Graph Writes History only when a closed session has write_ops > 0. Update the manage UI with Start/End session controls and fix archived write count sourcing. Co-authored-by: Cursor <cursoragent@cursor.com>
…ckend Secure GMA agent containers with session-bound /v1/turn auth, Docker hardening flags, and per-turn workload tokens instead of long-lived env JWTs. Add OpenShell-backed sticky sessions and extraction jobs with per-mode network policies, dev compose wiring, and prod manifest stubs. Co-authored-by: Cursor <cursoragent@cursor.com>
…and manage UX Move batch extraction to one reusable OpenShell sandbox per worker, route GMA through inference.local with Vertex effort capping, and add maintain/archive workspace improvements plus token-efficient partial UPDATE tooling for jobs. Co-authored-by: Cursor <cursoragent@cursor.com>
…us sync Repair OpenShell extraction start failures, cap workers at 50 without sandbox UI noise, and keep recent job events accurate with status filters including archived. Co-authored-by: Cursor <cursoragent@cursor.com>
…rkers Split job prepare from long OpenShell execution to avoid pool exhaustion with high worker counts, scale up live runs on Start, and add Failed filter to recent job events. Co-authored-by: Cursor <cursoragent@cursor.com>
Finish idle extraction runs and advance last_extraction_baseline_commit when the job queue drains but the run row stayed active. On prepare, seed unset baselines for all prepared sources on the knowledge graph, not only the source that just finished ingest. Co-authored-by: Cursor <cursoragent@cursor.com>
Wire scheduled delta ingest and by-files maintenance jobs through the background scheduler, with API and dev-ui support for commit checks and maintenance runs. Treat data sources as prepared once initial ingestion completes so new commits only surface on the Maintain step. Co-authored-by: Cursor <cursoragent@cursor.com>
…history Record before/after graph instance snapshots as JSONL on extraction jobs and GMA sessions so archived history can show property-level diffs instead of raw mutation logs. Co-authored-by: Cursor <cursoragent@cursor.com>
…er actions Replace maintenance run history with extraction-style live progress, stack recurring schedule below run controls, and align Run maintenance / Run extraction button labels with their actual behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
…ne advances Run maintenance now waits for ingest when needed, materializes jobs, and starts extraction workers in one request instead of relying on the background scheduler. Also stop advancing extraction baselines on idle status polls and improve Maintain UI job counts, worker concurrency persistence, and outcome toasts. Co-authored-by: Cursor <cursoragent@cursor.com>
…agents Write commit-scoped repository-files paths for maintenance jobs, fetch baseline content from GitHub, and document the layout in prompts and sources-index so workers compare last-extraction and branch-tip state. Also persist maintenance run history and commit pending jobs before starting extraction workers. Co-authored-by: Cursor <cursoragent@cursor.com>
Load data-source credentials by source tenant, fail fast when tokens are missing, and translate GitHub 401/403 into actionable maintenance errors with clearer pipeline phase messages. Co-authored-by: Cursor <cursoragent@cursor.com>
…meouts Add start-ready and regenerate-jobs endpoints so operators can resume workers and refresh pending queues without re-running ingest, mirror those actions in Maintain UI, remove redundant local prepare button, and raise extraction job timeout default to 1800s. Co-authored-by: Cursor <cursoragent@cursor.com>
…nerate Add a baseline notice in New Files to Process, require confirmation before regenerating pending jobs, and skip scheduled runs while failed maintenance jobs remain until operators reset them manually. Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
|
…resh Show only baseline-vs-HEAD on the Maintain table, refresh branch tips on manual check and scheduled runs, and leave ingest prepare to queue/regenerate workflows. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/knowledge-graphs/{kgId}/data-sources/new(URLs → configure → sequential initial sync → summary), modeled after k-extractdesigner/new./knowledge-graphs/{kgId}/data-sources(phase1 equivalent) for sync, commits, diff, and maintenance focus.dataSourceCount === 0, otherwise to the operations page.Closes #736
Test plan
/data-sources/new?focus=maintainfilters to maintenance-ready sources/data-sourcesunchangedMade with Cursor