Skip to content

Latest commit

 

History

History
142 lines (121 loc) · 7.29 KB

File metadata and controls

142 lines (121 loc) · 7.29 KB

SDC Agents User Documentation

SDC Agents is an open-source suite of nine purpose-scoped agents built on Google's Agent Development Kit (ADK) that transform data from SQL databases, CSV files, JSON sources, and MongoDB collections into validated, multi-format SDC4 artifacts — without requiring the user to write XML, RDF, or GQL by hand.

For installation and quick start, see the README.


Pipeline Overview

┌──────────────┐     ┌──────────────────┐     ┌──────────────┐
│ Catalog Agent│     │ Introspect Agent │     │              │
│  (5 tools)   │     │    (5 tools)     │     │              │
│              │     │                  │     │              │
│ Discovers    │     │ Examines your    │     │ Mapping Agent│
│ published    │     │ datasources      │     │  (3 tools)   │
│ SDC4 schemas │     │ (read-only)      │     │              │
└──────┬───────┘     └────────┬─────────┘     │ Suggests     │
       │                      │               │ column →     │
       ▼                      ▼               │ component    │
  .sdc-cache/           .sdc-cache/           │ mappings     │
  schemas/              introspections/       └──────┬───────┘
       │                      │                      │
       └──────────┬───────────┘                      │
                  │                                  ▼
                  │                            .sdc-cache/
                  │                            mappings/
                  │                                  │
                  └──────────┬───────────────────────┘
                             ▼
                    ┌──────────────────┐
                    │ Generator Agent  │
                    │   (3 tools)      │
                    │                  │
                    │ Produces SDC4    │
                    │ XML instances    │
                    └────────┬─────────┘
                             │
                             ▼
                       ./output/*.xml
                             │
                             ▼
                    ┌──────────────────┐
                    │ Validation Agent │
                    │   (3 tools)      │
                    │                  │
                    │ Validates & signs│
                    │ via VaaS API     │
                    └────────┬─────────┘
                             │
                             ▼
                    ./output/*.pkg.zip
                             │
                             ▼
                    ┌────────────────────┐
                    │ Distribution Agent │
                    │    (5 tools)       │
                    │                    │
                    │ Routes packages to │
                    │ your destinations  │
                    └────────────────────┘
                             │
                     ┌───────┼───────┐
                     ▼       ▼       ▼
                  Fuseki   Neo4j  Filesystem
                  GraphDB  REST API

┌──────────────────┐     ┌────────────────────┐
│ Knowledge Agent  │     │  Assembly Agent    │
│   (3 tools)      │     │    (4 tools)       │
│                  │     │                    │
│ Ingests customer │     │ Discovers catalog  │
│ context into     │────▶│ components, builds │
│ vector store     │     │ & publishes models │
└──────────────────┘     └────────────────────┘

┌────────────────────────────┐
│ Semantic Discovery Agent   │
│        (1 tool)            │
│        (ADK-only)          │
│                            │
│ Searches Vertex AI Search  │
│ for relevant SDC4 resources│
└────────────────────────────┘

Each agent communicates through files on disk (the .sdc-cache/ directory and ./output/), not direct calls. Every handoff is an inspectable, version-controllable artifact.


Security Model

  1. No agent has both datasource access and network access. The Introspect Agent reads your data but has no network. The Catalog and Validation Agents access the network but never touch your datasources. The Semantic Discovery Agent accesses GCP Vertex AI Search but never touches datasources.
  2. Read-only datasource access. SQL queries are restricted to SELECT. CSV and JSON files are read, never modified. MongoDB access uses find() only.
  3. Append-only audit log. Every tool call is logged to .sdc-cache/audit.jsonl with agent name, tool name, inputs, outputs, timestamp, and duration. Credentials are automatically redacted.

Cache Directory Structure

.sdc-cache/
├── audit.jsonl              # Append-only audit log
├── schemas/
│   └── dm-{ct_id}.json      # Cached schema details (immutable)
├── ontologies/
│   ├── *.rdf                # Downloaded ontology files
│   └── *.ttl
├── introspections/          # Introspection results
├── mappings/
│   └── {name}.json          # Confirmed column-to-component mappings
├── knowledge/
│   └── {source_name}.json   # Knowledge source metadata (Knowledge Agent)
├── skeletons/
│   └── dm-{ct_id}.xml       # Downloaded XML skeleton templates
└── field_mappings/
    └── dm-{ct_id}.json      # Skeleton field → placeholder mappings

The cache root defaults to .sdc-cache but is configurable via the cache.root setting.


Documentation Contents

Document Description
Configuration Reference All config fields, annotated YAML, environment variable substitution, working examples
Agent & Tool Reference All 32 tools across 9 agents — parameters, return shapes, access scopes
MCP Integration Serve agents as MCP servers for Claude Desktop, Cursor, and generic stdio clients
Common Workflows Step-by-step guides: CSV to validated XML, audit troubleshooting, triplestore bootstrap

External References