Skip to content

feat(ui): KG-scoped data source onboarding (k-extract flow)#737

Open
aredenba-rh wants to merge 153 commits into
mainfrom
feature/manage-knowledge-graph
Open

feat(ui): KG-scoped data source onboarding (k-extract flow)#737
aredenba-rh wants to merge 153 commits into
mainfrom
feature/manage-knowledge-graph

Conversation

@aredenba-rh

Copy link
Copy Markdown
Collaborator

Summary

  • Adds full-page data source onboarding at /knowledge-graphs/{kgId}/data-sources/new (URLs → configure → sequential initial sync → summary), modeled after k-extract designer/new.
  • Adds ongoing operations page at /knowledge-graphs/{kgId}/data-sources (phase1 equivalent) for sync, commits, diff, and maintenance focus.
  • KG manage workspace routes Data Sources to onboarding when dataSourceCount === 0, otherwise to the operations page.
  • Post–KG-create toast navigates to the new onboarding route.

Closes #736

Test plan

  • Create a KG → Manage → Data Sources → lands on /data-sources/new
  • Add GitHub URL(s), configure branch/token, connect → run Start initial sync → see progress and summary
  • Open data sources → operations page with cards, sync history, commit refs
  • Return to manage → Data Sources again → operations page (not wizard)
  • Maintain step → ?focus=maintain filters to maintenance-ready sources
  • Global sidebar /data-sources unchanged

Made with Cursor

aredenba-rh and others added 30 commits May 26, 2026 12:58
* chore(skills): add subagent delivery execution protocol

Add a reusable subagent skill that standardizes issue-based branching,
TDD execution, PR structure, and merge/conflict handling into
feature/manage-knowledge-graph.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(management): add knowledge graph workspace mode lifecycle

Implement schema_bootstrap as the default workspace mode and persist
irreversible transition state to extraction_operations across domain,
repository, API responses, and migration coverage.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
…681)

Add a workspace-status API projection with mode, readiness flags,
transition eligibility, and session pointers, including service and
route authorization coverage for manage workspace rendering.

Co-authored-by: Cursor <cursoragent@cursor.com>
…#682)

Enforce workspace readiness checks for minimum entity/relationship type
coverage and prepopulated type instance presence, and project blocking
reasons so validate/transition workflows can render actionable feedback.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expose authorized validate and transition commands for knowledge graph
workspaces, persist session pointers, and create an extraction-mode
session identifier when moving from bootstrap to extraction operations.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add durable run-level mutation metadata storage and lifecycle persistence
for session/scope identity, timestamps, token-cost totals, and
operation-count summaries linked to each sync run.

Co-authored-by: Cursor <cursoragent@cursor.com>
Emit operation-class counts and token/cost totals from mutation-log
application results into MutationsApplied payloads so downstream sync
lifecycle persistence can finalize run-level metadata.

Co-authored-by: Cursor <cursoragent@cursor.com>
#686)

Scaffold extraction application/presentation package structure and add
pytest-archon rules enforcing DDD layer boundaries plus cross-context
isolation so subsequent extraction features stay architecturally clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
Implement per-user/per-knowledge-graph/per-mode extraction session
lifecycle behaviors with clear-chat reset semantics and archived-session
retention backed by repository ports and unit coverage.

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve mode-specific extraction skill templates from global defaults and
apply deterministic knowledge-graph override merges so session prompts are
stable, customizable, and repeatable.

Co-authored-by: Cursor <cursoragent@cursor.com>
)

Persist extraction agent sessions and expose scoped APIs for active/list/clear-chat so reset creates a fresh session while preserving archived history and runtime context audit records.

Co-authored-by: Cursor <cursoragent@cursor.com>
Persist clone-head, last-extraction baseline, and tracked-branch head
commit references for data sources and expose them in management API
responses for downstream ingestion and UI commit-status workflows.

Co-authored-by: Cursor <cursoragent@cursor.com>
Prepare Git-backed ingestion context by loading data-source commit references,
refreshing tracked branch head, and passing baseline commit plus resolved
credentials into the ingestion pipeline before packaging begins.

Co-authored-by: Cursor <cursoragent@cursor.com>
# Conflicts:
#	src/api/ingestion/application/services/ingestion_service.py
#	src/api/ingestion/infrastructure/event_handler.py
#	src/api/ingestion/ports/services.py
#	src/api/tests/unit/ingestion/infrastructure/test_ingestion_event_handler.py
Skip heavy extraction when tracked branch head equals the last extraction
baseline by emitting a completed lifecycle event and recording an explicit
no-change audit log entry on the sync run.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expose a data-source diff summary API that compares the last extraction
baseline to tracked branch head and returns aggregate counts plus a
large-list-safe changed-file preview for maintenance decisions.

Co-authored-by: Cursor <cursoragent@cursor.com>
Show commit-based diff counts immediately on each data source card and
render the changed-file list as collapsed-by-default with explicit
expand/collapse controls for large-diff safe browsing.

Co-authored-by: Cursor <cursoragent@cursor.com>
…695)

Add explicit data-source actions to refresh tracked/clone commit references and adopt tracked head as the current extraction baseline. This lets the UI surface per-source changed-file counts with user-controlled commit context updates for maintenance decisioning.

Co-authored-by: Cursor <cursoragent@cursor.com>
Strengthen subagent delivery guidance with a parallel execution model, required context packs, and a blocker-question escalation flow so multiple agents can pause and ask focused questions without serializing delivery.

Co-authored-by: Cursor <cursoragent@cursor.com>
…678) (#697)

Add structured mode-specific agent configuration (system prompt, hierarchy, guardrails, and skill pack defaults) and wire session initialization to resolve and persist the configuration per knowledge graph scope.

Co-authored-by: Cursor <cursoragent@cursor.com>
) (#698)

Seed schema bootstrap sessions with a capabilities-intake prompt that offers first-pass or guided co-design paths, and persist the selected path/capability summary in session runtime context so the conversation remains continuous across requests.

Co-authored-by: Cursor <cursoragent@cursor.com>
…679) (#699)

Build a filesystem runtime context for extraction workloads by materializing ingestion package resources, reconstructing repository files, and exposing a deterministic skills directory path; wire it through extraction event handling and local/deployed container configuration.

Co-authored-by: Cursor <cursoragent@cursor.com>
#700)

Enhance schema browser rows to display prepopulated type indicators and live per-type instance counts with lazy query-backed loading, while extending shared type contracts and tests to cover the new inspector metadata behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
…671) (#701)

Add manage-authorized run-control operations (start, pause, halt, reset_running, reset_failed, reset_completed, reset_all) over data source sync runs, expose them via dedicated management routes, and verify behavior with unit tests for both service transitions and HTTP contract responses.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expose sync-run token/cost metadata in management API responses and add an extraction telemetry dashboard in the data-sources workspace with active worker counts, status buckets, recent job events, and 24h cost trend indicators backed by auto-refreshing sync data.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add knowledge-graph scoped maintenance schedule APIs with timezone-aware cron evaluation and persisted run outcomes, then expose the controls and history in the data-sources operations UI.

Co-authored-by: Cursor <cursoragent@cursor.com>
…704)

Extend the mutations console with a conversation-assisted draft flow and live entity/relationship inspector that highlights edited fields during the active session and resets highlights after apply/refresh.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace legacy row actions with Manage, Query, and Delete, remove inline edit controls from the list surface, and align structural tests to the new action contract.

Co-authored-by: Cursor <cursoragent@cursor.com>
)

Implement a dedicated manage workspace route that loads workspace status projection, shows readiness and session pointers, and provides Validate and transition-to-extraction controls.

Co-authored-by: Cursor <cursoragent@cursor.com>
Extend the manage workspace page with an always-visible extraction conversation panel, clear-chat reset action, and a tabbed lower operations area for extraction jobs, manual mutations, and run/log navigation.

Co-authored-by: Cursor <cursoragent@cursor.com>
aredenba-rh and others added 28 commits June 12, 2026 14:15
Raise parallel extraction worker default from 2 to 20. Enforce per-instance
description ownership on save, expose relationship authoring hints in config
API, and keep assistant prompts correct on follow-up turns. Kill and Reset
Running now stop orphaned extraction containers.

Co-authored-by: Cursor <cursoragent@cursor.com>
…lization

Re-fetch ingest-only archives when ZIPs are absent on disk so extraction
jobs and sticky sessions populate repository-files. Gate readiness on archive
presence and inject workload credentials into agentic-ci container env.

Co-authored-by: Cursor <cursoragent@cursor.com>
…pare

Persist successful extraction jobs as archived with mutation history and surface that in graph management. Validate relationship authoring against ontology and merge token/graph-write metrics from JSONL and agent streams. Use tarball-based GitHub full refresh with auth fallback, and order sync runs newest-first so prepare retries show accurate UI state.

Co-authored-by: Cursor <cursoragent@cursor.com>
…aph failures

Add one-command dev DB backup and restore, auto-repair corrupt tenant AGE
graphs, return HTTP 503 for graph storage errors, and update GMA instructions
to smoke-test prepopulation and stop on infrastructure failures.

Co-authored-by: Cursor <cursoragent@cursor.com>
…pply chaining

Add run_scanner.py to combine scan-to-JSONL in one step, enrich readiness tasks
with order/run_command and underscore relationship paths, and return next_action
from apply so agents can chain labels without polling readiness every batch.

Co-authored-by: Cursor <cursoragent@cursor.com>
…one mutation log

Enable CREATE/UPDATE/DELETE in workload validation and tools, accumulate applied
JSONL per assistant session, and write one ARCHIVED extraction job when Clear chat
ends the session so it appears in Extraction Archive history.

Co-authored-by: Cursor <cursoragent@cursor.com>
… usage

Teach the Graph Management Assistant that each relationship UI row needs a
distinct edge_types label, with read-back verification before claiming saves.
Also propagate Claude SDK token/cost metrics into session journals and chat
turn handling for operator visibility.

Co-authored-by: Cursor <cursoragent@cursor.com>
…rhead

Front-load graph_id, property gaps, JSONL examples, and directory-prefix
file materialization so enrichment jobs spend less time probing formats and paths.

Co-authored-by: Cursor <cursoragent@cursor.com>
Implement GMA one-off mutations with session archiving, rename Mutation logs
to Graph Writes History, fix job set labels and cost display, and add a
template-driven manual mutation authoring panel with schema instance views.

Co-authored-by: Cursor <cursoragent@cursor.com>
Drop the session pointers rail item and detail panel from all GMA modes;
session history now lives in Graph Writes History when chat is cleared.

Co-authored-by: Cursor <cursoragent@cursor.com>
…schema explorer

Add bulk instance edit workflow guidance, helpers/sync_instances.py for diff-and-generate JSONL, and clearer list-instances MCP tool docs so agents batch deletes instead of per-slug loops. Replace manage overview type badges with GraphSchemaExplorer and extract reusable entity/relationship type list components.

Co-authored-by: Cursor <cursoragent@cursor.com>
…nagement

Load 100 instances per type instead of a global cap, merge observed properties into schema display, and add paginated instance APIs with property search plus load-more UI on entity and relationship panels.

Co-authored-by: Cursor <cursoragent@cursor.com>
Scope GMA containers and conversations by graph-management UI mode (three
parallel sessions per user/KG), add start/end/clear session APIs, terminate
containers without auto-restart, expire idle sessions after 1 hour, and archive
Graph Writes History only when a closed session has write_ops > 0. Update the
manage UI with Start/End session controls and fix archived write count sourcing.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ckend

Secure GMA agent containers with session-bound /v1/turn auth, Docker
hardening flags, and per-turn workload tokens instead of long-lived env
JWTs. Add OpenShell-backed sticky sessions and extraction jobs with
per-mode network policies, dev compose wiring, and prod manifest stubs.

Co-authored-by: Cursor <cursoragent@cursor.com>
…and manage UX

Move batch extraction to one reusable OpenShell sandbox per worker, route GMA
through inference.local with Vertex effort capping, and add maintain/archive
workspace improvements plus token-efficient partial UPDATE tooling for jobs.

Co-authored-by: Cursor <cursoragent@cursor.com>
…us sync

Repair OpenShell extraction start failures, cap workers at 50 without sandbox
UI noise, and keep recent job events accurate with status filters including archived.

Co-authored-by: Cursor <cursoragent@cursor.com>
…rkers

Split job prepare from long OpenShell execution to avoid pool exhaustion
with high worker counts, scale up live runs on Start, and add Failed filter
to recent job events.

Co-authored-by: Cursor <cursoragent@cursor.com>
Finish idle extraction runs and advance last_extraction_baseline_commit when
the job queue drains but the run row stayed active. On prepare, seed unset
baselines for all prepared sources on the knowledge graph, not only the
source that just finished ingest.

Co-authored-by: Cursor <cursoragent@cursor.com>
Wire scheduled delta ingest and by-files maintenance jobs through the
background scheduler, with API and dev-ui support for commit checks and
maintenance runs. Treat data sources as prepared once initial ingestion
completes so new commits only surface on the Maintain step.

Co-authored-by: Cursor <cursoragent@cursor.com>
…history

Record before/after graph instance snapshots as JSONL on extraction jobs and
GMA sessions so archived history can show property-level diffs instead of raw
mutation logs.

Co-authored-by: Cursor <cursoragent@cursor.com>
…er actions

Replace maintenance run history with extraction-style live progress, stack
recurring schedule below run controls, and align Run maintenance / Run extraction
button labels with their actual behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ne advances

Run maintenance now waits for ingest when needed, materializes jobs, and starts
extraction workers in one request instead of relying on the background scheduler.
Also stop advancing extraction baselines on idle status polls and improve Maintain
UI job counts, worker concurrency persistence, and outcome toasts.

Co-authored-by: Cursor <cursoragent@cursor.com>
…agents

Write commit-scoped repository-files paths for maintenance jobs, fetch
baseline content from GitHub, and document the layout in prompts and
sources-index so workers compare last-extraction and branch-tip state.
Also persist maintenance run history and commit pending jobs before
starting extraction workers.

Co-authored-by: Cursor <cursoragent@cursor.com>
Load data-source credentials by source tenant, fail fast when tokens are
missing, and translate GitHub 401/403 into actionable maintenance errors
with clearer pipeline phase messages.

Co-authored-by: Cursor <cursoragent@cursor.com>
…meouts

Add start-ready and regenerate-jobs endpoints so operators can resume workers
and refresh pending queues without re-running ingest, mirror those actions in
Maintain UI, remove redundant local prepare button, and raise extraction job
timeout default to 1800s.

Co-authored-by: Cursor <cursoragent@cursor.com>
…nerate

Add a baseline notice in New Files to Process, require confirmation before
regenerating pending jobs, and skip scheduled runs while failed maintenance
jobs remain until operators reset them manually.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://openshift-hyperfleet.github.io/kartograph/pr-preview/pr-737/

Built to branch gh-pages at 2026-06-21 20:11 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

…resh

Show only baseline-vs-HEAD on the Maintain table, refresh branch tips on
manual check and scheduled runs, and leave ingest prepare to queue/regenerate workflows.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KG-scoped data source onboarding (k-extract-style full-page flow)

2 participants