Purpose-scoped ADK agents for producing SDC4-compliant data artifacts from existing datastores.
SDC Agents is an open-source suite of nine purpose-scoped agents built on Google's Agent Development Kit (ADK) that transform data from SQL databases, CSV files, and JSON sources into validated, multi-format SDC4 artifacts — without requiring the user to write XML, RDF, or GQL by hand.
Each agent is an ADK LlmAgent with a narrowly scoped BaseToolset, auditable activity, and enforced isolation boundaries. No single agent can reach across scope boundaries — a compromised or misbehaving agent has blast radius limited to its purpose.
MCP compatibility: Each toolset can also be exported as an MCP server for framework-agnostic integration with non-ADK clients.
| Agent | Purpose | Network | Datasource Access |
|---|---|---|---|
| Catalog Agent | Discover published SDC4 schemas and download artifacts from SDCStudio | HTTPS (optional token auth) | None |
| Introspect Agent | Examine customer datasources and extract structure (read-only) | None | Read-only |
| Mapping Agent | Suggest and manage column-to-component mappings | None | None |
| Generator Agent | Produce SDC4 XML instances from mapped data | None | Read-only |
| Validation Agent | Validate and sign XML instances via VaaS API | HTTPS (token auth) | None |
| Distribution Agent | Route artifact packages to customer-local destinations | Customer-local only | None |
| Knowledge Agent | Ingest customer context (CSV, JSON, TTL, Markdown, PDF, DOCX) into vector store | None | Read-only (files) |
| Assembly Agent | Discover components, propose hierarchies, assemble published models | HTTPS (Assembly API) | None |
| Semantic Discovery Agent | Search Vertex AI Search for SDC4 resources (ADK-only) | GCP (Vertex AI Search) | None |
- No agent has both datasource access and network access
- Read-only datasource access — no agent can write to customer data
- Tools are declarative Python functions — ADK derives schemas from type hints and docstrings
- Structured audit log — every tool call logged with agent, tool, inputs, outputs, timestamp
- No credential sharing — each
BaseToolsetreceives only its own credential scope - Fail closed — errors are returned, never retried with escalated privileges
SDC Agents is designed consistent with IEEE 7000-2021 Value-based Engineering principles for ethical autonomous system design:
- Transparency — append-only structured audit log records every tool invocation with agent, tool, inputs, outputs, timestamp, and duration
- Traceability — all inter-agent handoffs are inspectable files on disk (
.sdc-cache/), not opaque in-memory calls - Harm minimization — purpose-scoped isolation ensures no single agent can access both customer datasources and external networks; blast radius is confined to each agent's scope
- Stakeholder value preservation — SDC4's curated, constraint-based semantic model (
xsd:restrictiononly, immutable schemas) encodes data integrity and endurance as system-level guarantees, not optional features
Agents communicate through files on disk, not direct calls. Every handoff is an inspectable, version-controllable artifact:
Catalog Agent → .sdc-cache/schemas/ ─┐
Introspect Agent → .sdc-cache/introspections/ ─┤
▼
Mapping Agent → .sdc-cache/mappings/
▼
Generator Agent → ./sdc-output/*.xml
▼
Validation Agent → ./sdc-output/*.pkg.zip
▼
Distribution Agent → customer destinations
SDC Agents consumes two sets of endpoints from SDCStudio:
- Catalog API (public, optional token auth) — schema discovery, component trees, skeleton templates, schema-level RDF, reference ontologies
- VaaS API (token auth) — XML validation, signing, artifact package generation
Authenticated Catalog Lookups: When an API key is provided, catalog search results are filtered according to the Modeler's project preferences configured in SDCStudio. If the Modeler's
prj_filtersetting is enabled (the default), results are scoped to their default project. Without an API key, the catalog returns all published public schemas. This means the samecatalog_list_schemastool returns personalized results for authenticated users and broad results for anonymous browsing, with no change to the tool interface.
See docs/dev/SDC_AGENTS_PRD.md for the full API contract and agent specifications.
- Python 3.11+
- Google ADK 1.25+ (
pip install google-adk)
pip install -e ".[dev]"Copy sdc-agents.example.yaml to sdc-agents.yaml and fill in values:
sdcstudio:
base_url: "https://sdcstudio.example.com"
api_key: "${SDC_API_KEY}" # Token auth (Catalog preferences + VaaS validation)
cache:
root: ".sdc-cache"
ttl_hours: 24
audit:
path: ".sdc-cache/audit.jsonl"
log_level: "standard" # "standard" summarizes outputs; "verbose" logs full payloads
datasources:
my_database:
type: sql
connection_string: "${DB_CONNECTION}" # env var substitution
my_csv:
type: csv
path: "/data/exports/records.csv"
output:
directory: "./output"
formats:
- "xml"
destinations:
triplestore:
type: fuseki
endpoint: "${FUSEKI_URL}"
auth: "${FUSEKI_AUTH}"
graph_database:
type: neo4j
endpoint: "${NEO4J_URL}"
database: "sdc4"
archive:
type: filesystem
path: "./archive/{ct_id}/{instance_id}/"
create_directories: trueEnvironment variables use ${VAR} syntax. Missing variables cause an immediate KeyError (fail closed).
from sdc_agents.common.config import load_config
from sdc_agents.agents.catalog import create_catalog_agent
from sdc_agents.agents.introspect import create_introspect_agent
from sdc_agents.agents.mapping import create_mapping_agent
from sdc_agents.agents.generator import create_generator_agent
from sdc_agents.agents.validation import create_validation_agent
from sdc_agents.agents.distribution import create_distribution_agent
config = load_config("sdc-agents.yaml")
# Each factory returns an LlmAgent with its scoped BaseToolset
catalog_agent = create_catalog_agent(config)
introspect_agent = create_introspect_agent(config)
mapping_agent = create_mapping_agent(config)
generator_agent = create_generator_agent(config)
validation_agent = create_validation_agent(config)
distribution_agent = create_distribution_agent(config)Or construct agents directly with toolsets:
from sdc_agents.common.config import load_config
from sdc_agents.toolsets.catalog import CatalogToolset
from google.adk.agents import LlmAgent
config = load_config("sdc-agents.yaml")
catalog_agent = LlmAgent(
name="catalog",
model="gemini-2.0-flash",
description="Discovers SDC4 schemas from SDCStudio Catalog API.",
instruction="Discover published SDC4 schemas and download artifacts.",
tools=[CatalogToolset(config=config)],
)Each agent can be served as an MCP stdio server for non-ADK clients:
# Start the Catalog Agent as an MCP server
sdc-agents serve --mcp catalog
# Start the Introspect Agent as an MCP server
sdc-agents serve --mcp introspect
# Any of the 8 MCP agents: assembly, catalog, distribution, generator, introspect, knowledge, mapping, validation
sdc-agents serve --mcp validation# Show configuration summary and agent inventory
sdc-agents info
sdc-agents info --config path/to/sdc-agents.yaml
# Validate a config file (useful in CI)
sdc-agents validate-config
sdc-agents validate-config --config path/to/sdc-agents.yaml
# Inspect the audit log
sdc-agents audit show # last 50 records
sdc-agents audit show --agent catalog # filter by agent
sdc-agents audit show --last 24h --limit 20 # recent records
sdc-agents audit show --audit-path ./logs/audit.jsonl # custom pathA single image serves all 8 MCP-servable agents. Select the agent at runtime with SDC_AGENT:
# Serve a single agent as an MCP server
docker run -v ./sdc-agents.yaml:/home/sdc/sdc-agents.yaml:ro \
-e SDC_AGENT=catalog \
ghcr.io/semanticdatacharter/sdc-agents
# Run any CLI command
docker run -v ./sdc-agents.yaml:/home/sdc/sdc-agents.yaml:ro \
ghcr.io/semanticdatacharter/sdc-agents info
docker run -v ./sdc-agents.yaml:/home/sdc/sdc-agents.yaml:ro \
ghcr.io/semanticdatacharter/sdc-agents validate-configBuild locally:
docker build -t sdc-agents .
docker run sdc-agents # prints usage hint- CI (
.github/workflows/ci.yml): Runs on push todevand PRs tomain. Lints with ruff, checks formatting with black, runs pytest with coverage across Python 3.11/3.12/3.13. - Docker (
.github/workflows/docker.yml): Builds and pushes to GHCR on push tomainandv*tags. - PyPI (
.github/workflows/release.yml): Publishes to PyPI onv*tags via OIDC trusted publisher (no API tokens).
One-time setup (maintainer):
- Configure PyPI trusted publisher — owner:
SemanticDataCharter, repo:SDC_Agents, workflow:release.yml, environment:pypi - Create a
pypienvironment in GitHub repo settings (Settings > Environments)
# Run all tests
pytest
# Run with coverage
pytest --cov=sdc_agents
# Run specific test modules
pytest tests/toolsets/test_catalog.py
pytest tests/security/- User Documentation — configuration, tool reference, MCP integration, workflow guides
- Product Requirements — full agent specifications, tools, security model, type mapping tables
- Contributing — development setup, coding standards, PR workflow
- Security Policy — vulnerability reporting, agent isolation model
- Changelog — release history
| Phase | Goal | Status |
|---|---|---|
| Phase 1 | Catalog, Introspect, and Mapping agents with shared infra | Complete |
| Phase 2 | Generator and Validation agents, Introspect extensions | Complete |
| Phase 3 | Distribution Agent with multi-destination delivery | Complete |
| Phase 4 | Production hardening: CLI, Docker, CI/CD, MCP export, documentation | Complete |
| Phase 5 | Knowledge Agent + Component Assembly Agent | Complete |
| Phase 5.5 | PDF/DOCX Knowledge Sources + Semantic Discovery Agent | Complete |
| Phase 6 | ADK ecosystem contributions (adk-sparql-tools, Integration Page) |
Complete |
Common infrastructure:
- Pydantic config with
${VAR}substitution (fail closed), append-only JSONL audit logger with credential redaction, cache manager with path helpers
CatalogToolset (5 tools): catalog_list_schemas, catalog_get_schema, catalog_download_schema_rdf, catalog_download_skeleton, catalog_download_ontologies — httpx async, cache-first for immutable schemas, optional token auth for Modeler-scoped results
IntrospectToolset (5 tools): introspect_sql (SELECT-only enforcement), introspect_csv (type inference for 10 types), introspect_json (JSONPath extraction), introspect_mongodb (BSON-to-SDC4 type mapping), introspect_bigquery (BigQuery schema extraction via asyncio.to_thread)
MappingToolset (3 tools): mapping_suggest (type compatibility + name similarity), mapping_confirm, mapping_list
GeneratorToolset (3 tools): generate_instance, generate_batch, generate_preview — skeleton-based XML generation with placeholder substitution and optional element pruning
ValidationToolset (3 tools): validate_instance, sign_instance, validate_batch — VaaS API with path confinement, token auth, artifact package (.pkg.zip) support
DistributionToolset (5 tools): inspect_package, list_destinations, distribute_package, distribute_batch, bootstrap_triplestore — httpx-only connectors for SPARQL Graph Store, Neo4j HTTP, REST API, and filesystem
Agent factories: create_catalog_agent(), create_introspect_agent(), create_mapping_agent(), create_generator_agent(), create_validation_agent(), create_distribution_agent()
176+ tests, 82% coverage — 9 toolsets with 32 disjoint tools, security isolation tests (SQL write rejection, datasource name enforcement, path confinement, credential redaction, no cross-scope tool leakage)
Consumer-first: all tests use httpx.MockTransport — zero live SDCStudio, Fuseki, or Neo4j dependency
- SDCStudio — SDC4 data model creation and management platform (provides Catalog and VaaS APIs)
- SDCRM — SDC4 Reference Model specification
- Form2SDCTemplate — PDF/DOCX to SDC template conversion
- Google ADK — Agent Development Kit (agent framework)
Copyright 2025-2026 Axius SDC, Inc.
Licensed under the Apache License 2.0 — see LICENSE for details.
SDC Agents is controlled and maintained by Axius SDC, Inc. The SemanticDataCharter GitHub organization hosts the open-source SDC4 ecosystem on behalf of Axius SDC, Inc.

