SDC Agents

Purpose-scoped ADK agents for producing SDC4-compliant data artifacts from existing datastores.

What is SDC Agents?

SDC Agents is an open-source suite of nine purpose-scoped agents built on Google's Agent Development Kit (ADK) that transform data from SQL databases, CSV files, and JSON sources into validated, multi-format SDC4 artifacts — without requiring the user to write XML, RDF, or GQL by hand.

Each agent is an ADK LlmAgent with a narrowly scoped BaseToolset, auditable activity, and enforced isolation boundaries. No single agent can reach across scope boundaries — a compromised or misbehaving agent has blast radius limited to its purpose.

MCP compatibility: Each toolset can also be exported as an MCP server for framework-agnostic integration with non-ADK clients.

Architecture: Nine Agents

Agent	Purpose	Network	Datasource Access
Catalog Agent	Discover published SDC4 schemas and download artifacts from SDCStudio	HTTPS (optional token auth)	None
Introspect Agent	Examine customer datasources and extract structure (read-only)	None	Read-only
Mapping Agent	Suggest and manage column-to-component mappings	None	None
Generator Agent	Produce SDC4 XML instances from mapped data	None	Read-only
Validation Agent	Validate and sign XML instances via VaaS API	HTTPS (token auth)	None
Distribution Agent	Route artifact packages to customer-local destinations	Customer-local only	None
Knowledge Agent	Ingest customer context (CSV, JSON, TTL, Markdown, PDF, DOCX) into vector store	None	Read-only (files)
Assembly Agent	Discover components, propose hierarchies, assemble published models	HTTPS (Assembly API)	None
Semantic Discovery Agent	Search Vertex AI Search for SDC4 resources (ADK-only)	GCP (Vertex AI Search)	None

Security Principles

No agent has both datasource access and network access
Read-only datasource access — no agent can write to customer data
Tools are declarative Python functions — ADK derives schemas from type hints and docstrings
Structured audit log — every tool call logged with agent, tool, inputs, outputs, timestamp
No credential sharing — each BaseToolset receives only its own credential scope
Fail closed — errors are returned, never retried with escalated privileges

IEEE 7000-2021 Alignment

SDC Agents is designed consistent with IEEE 7000-2021 Value-based Engineering principles for ethical autonomous system design:

Transparency — append-only structured audit log records every tool invocation with agent, tool, inputs, outputs, timestamp, and duration
Traceability — all inter-agent handoffs are inspectable files on disk (.sdc-cache/), not opaque in-memory calls
Harm minimization — purpose-scoped isolation ensures no single agent can access both customer datasources and external networks; blast radius is confined to each agent's scope
Stakeholder value preservation — SDC4's curated, constraint-based semantic model (xsd:restriction only, immutable schemas) encodes data integrity and endurance as system-level guarantees, not optional features

Data Flow

Agents communicate through files on disk, not direct calls. Every handoff is an inspectable, version-controllable artifact:

Catalog Agent → .sdc-cache/schemas/     ─┐
Introspect Agent → .sdc-cache/introspections/ ─┤
                                               ▼
                                    Mapping Agent → .sdc-cache/mappings/
                                               ▼
                                    Generator Agent → ./sdc-output/*.xml
                                               ▼
                                    Validation Agent → ./sdc-output/*.pkg.zip
                                               ▼
                                    Distribution Agent → customer destinations

SDCStudio API Dependencies

SDC Agents consumes two sets of endpoints from SDCStudio:

Catalog API (public, optional token auth) — schema discovery, component trees, skeleton templates, schema-level RDF, reference ontologies
VaaS API (token auth) — XML validation, signing, artifact package generation

Authenticated Catalog Lookups: When an API key is provided, catalog search results are filtered according to the Modeler's project preferences configured in SDCStudio. If the Modeler's prj_filter setting is enabled (the default), results are scoped to their default project. Without an API key, the catalog returns all published public schemas. This means the same catalog_list_schemas tool returns personalized results for authenticated users and broad results for anonymous browsing, with no change to the tool interface.

See docs/dev/SDC_AGENTS_PRD.md for the full API contract and agent specifications.

Quick Start

Prerequisites

Python 3.11+
Google ADK 1.25+ (pip install google-adk)

Installation

pip install -e ".[dev]"

Configuration

Copy sdc-agents.example.yaml to sdc-agents.yaml and fill in values:

sdcstudio:
  base_url: "https://sdcstudio.example.com"
  api_key: "${SDC_API_KEY}"          # Token auth (Catalog preferences + VaaS validation)

cache:
  root: ".sdc-cache"
  ttl_hours: 24

audit:
  path: ".sdc-cache/audit.jsonl"
  log_level: "standard"    # "standard" summarizes outputs; "verbose" logs full payloads

datasources:
  my_database:
    type: sql
    connection_string: "${DB_CONNECTION}"   # env var substitution
  my_csv:
    type: csv
    path: "/data/exports/records.csv"

output:
  directory: "./output"
  formats:
    - "xml"

destinations:
  triplestore:
    type: fuseki
    endpoint: "${FUSEKI_URL}"
    auth: "${FUSEKI_AUTH}"
  graph_database:
    type: neo4j
    endpoint: "${NEO4J_URL}"
    database: "sdc4"
  archive:
    type: filesystem
    path: "./archive/{ct_id}/{instance_id}/"
    create_directories: true

Environment variables use ${VAR} syntax. Missing variables cause an immediate KeyError (fail closed).

Usage (ADK — Primary)

from sdc_agents.common.config import load_config
from sdc_agents.agents.catalog import create_catalog_agent
from sdc_agents.agents.introspect import create_introspect_agent
from sdc_agents.agents.mapping import create_mapping_agent
from sdc_agents.agents.generator import create_generator_agent
from sdc_agents.agents.validation import create_validation_agent
from sdc_agents.agents.distribution import create_distribution_agent

config = load_config("sdc-agents.yaml")

# Each factory returns an LlmAgent with its scoped BaseToolset
catalog_agent = create_catalog_agent(config)
introspect_agent = create_introspect_agent(config)
mapping_agent = create_mapping_agent(config)
generator_agent = create_generator_agent(config)
validation_agent = create_validation_agent(config)
distribution_agent = create_distribution_agent(config)

Or construct agents directly with toolsets:

from sdc_agents.common.config import load_config
from sdc_agents.toolsets.catalog import CatalogToolset
from google.adk.agents import LlmAgent

config = load_config("sdc-agents.yaml")

catalog_agent = LlmAgent(
    name="catalog",
    model="gemini-2.0-flash",
    description="Discovers SDC4 schemas from SDCStudio Catalog API.",
    instruction="Discover published SDC4 schemas and download artifacts.",
    tools=[CatalogToolset(config=config)],
)

Usage (MCP — Secondary)

Each agent can be served as an MCP stdio server for non-ADK clients:

# Start the Catalog Agent as an MCP server
sdc-agents serve --mcp catalog

# Start the Introspect Agent as an MCP server
sdc-agents serve --mcp introspect

# Any of the 8 MCP agents: assembly, catalog, distribution, generator, introspect, knowledge, mapping, validation
sdc-agents serve --mcp validation

CLI Commands

# Show configuration summary and agent inventory
sdc-agents info
sdc-agents info --config path/to/sdc-agents.yaml

# Validate a config file (useful in CI)
sdc-agents validate-config
sdc-agents validate-config --config path/to/sdc-agents.yaml

# Inspect the audit log
sdc-agents audit show                        # last 50 records
sdc-agents audit show --agent catalog        # filter by agent
sdc-agents audit show --last 24h --limit 20  # recent records
sdc-agents audit show --audit-path ./logs/audit.jsonl  # custom path

Docker

A single image serves all 8 MCP-servable agents. Select the agent at runtime with SDC_AGENT:

# Serve a single agent as an MCP server
docker run -v ./sdc-agents.yaml:/home/sdc/sdc-agents.yaml:ro \
  -e SDC_AGENT=catalog \
  ghcr.io/semanticdatacharter/sdc-agents

# Run any CLI command
docker run -v ./sdc-agents.yaml:/home/sdc/sdc-agents.yaml:ro \
  ghcr.io/semanticdatacharter/sdc-agents info

docker run -v ./sdc-agents.yaml:/home/sdc/sdc-agents.yaml:ro \
  ghcr.io/semanticdatacharter/sdc-agents validate-config

Build locally:

docker build -t sdc-agents .
docker run sdc-agents  # prints usage hint

CI/CD

CI (.github/workflows/ci.yml): Runs on push to dev and PRs to main. Lints with ruff, checks formatting with black, runs pytest with coverage across Python 3.11/3.12/3.13.
Docker (.github/workflows/docker.yml): Builds and pushes to GHCR on push to main and v* tags.
PyPI (.github/workflows/release.yml): Publishes to PyPI on v* tags via OIDC trusted publisher (no API tokens).

One-time setup (maintainer):

Configure PyPI trusted publisher — owner: SemanticDataCharter, repo: SDC_Agents, workflow: release.yml, environment: pypi
Create a pypi environment in GitHub repo settings (Settings > Environments)

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=sdc_agents

# Run specific test modules
pytest tests/toolsets/test_catalog.py
pytest tests/security/

Documentation

User Documentation — configuration, tool reference, MCP integration, workflow guides
Product Requirements — full agent specifications, tools, security model, type mapping tables
Contributing — development setup, coding standards, PR workflow
Security Policy — vulnerability reporting, agent isolation model
Changelog — release history

Implementation Phases

Phase	Goal	Status
Phase 1	Catalog, Introspect, and Mapping agents with shared infra	Complete
Phase 2	Generator and Validation agents, Introspect extensions	Complete
Phase 3	Distribution Agent with multi-destination delivery	Complete
Phase 4	Production hardening: CLI, Docker, CI/CD, MCP export, documentation	Complete
Phase 5	Knowledge Agent + Component Assembly Agent	Complete
Phase 5.5	PDF/DOCX Knowledge Sources + Semantic Discovery Agent	Complete
Phase 6	ADK ecosystem contributions (`adk-sparql-tools`, Integration Page)	Complete

What's Implemented (Phases 1–3)

Common infrastructure:

Pydantic config with ${VAR} substitution (fail closed), append-only JSONL audit logger with credential redaction, cache manager with path helpers

CatalogToolset (5 tools): catalog_list_schemas, catalog_get_schema, catalog_download_schema_rdf, catalog_download_skeleton, catalog_download_ontologies — httpx async, cache-first for immutable schemas, optional token auth for Modeler-scoped results

IntrospectToolset (5 tools): introspect_sql (SELECT-only enforcement), introspect_csv (type inference for 10 types), introspect_json (JSONPath extraction), introspect_mongodb (BSON-to-SDC4 type mapping), introspect_bigquery (BigQuery schema extraction via asyncio.to_thread)

MappingToolset (3 tools): mapping_suggest (type compatibility + name similarity), mapping_confirm, mapping_list

GeneratorToolset (3 tools): generate_instance, generate_batch, generate_preview — skeleton-based XML generation with placeholder substitution and optional element pruning

ValidationToolset (3 tools): validate_instance, sign_instance, validate_batch — VaaS API with path confinement, token auth, artifact package (.pkg.zip) support

DistributionToolset (5 tools): inspect_package, list_destinations, distribute_package, distribute_batch, bootstrap_triplestore — httpx-only connectors for SPARQL Graph Store, Neo4j HTTP, REST API, and filesystem

Agent factories: create_catalog_agent(), create_introspect_agent(), create_mapping_agent(), create_generator_agent(), create_validation_agent(), create_distribution_agent()

176+ tests, 82% coverage — 9 toolsets with 32 disjoint tools, security isolation tests (SQL write rejection, datasource name enforcement, path confinement, credential redaction, no cross-scope tool leakage)

Consumer-first: all tests use httpx.MockTransport — zero live SDCStudio, Fuseki, or Neo4j dependency

Related Projects

SDCStudio — SDC4 data model creation and management platform (provides Catalog and VaaS APIs)
SDCRM — SDC4 Reference Model specification
Form2SDCTemplate — PDF/DOCX to SDC template conversion
Google ADK — Agent Development Kit (agent framework)

License & Ownership

Licensed under the Apache License 2.0 — see LICENSE for details.

SDC Agents is controlled and maintained by Axius SDC, Inc. The SemanticDataCharter GitHub organization hosts the open-source SDC4 ecosystem on behalf of Axius SDC, Inc.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.github		.github
docs		docs
scripts		scripts
src/sdc_agents		src/sdc_agents
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SDC_Agents_Key.png		SDC_Agents_Key.png
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
sdc-agents.example.yaml		sdc-agents.example.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SDC Agents

What is SDC Agents?

Architecture: Nine Agents

Security Principles

IEEE 7000-2021 Alignment

Data Flow

SDCStudio API Dependencies

Quick Start

Prerequisites

Installation

Configuration

Usage (ADK — Primary)

Usage (MCP — Secondary)

CLI Commands

Docker

CI/CD

Testing

Documentation

Implementation Phases

What's Implemented (Phases 1–3)

Related Projects

License & Ownership

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SDC Agents

What is SDC Agents?

Architecture: Nine Agents

Security Principles

IEEE 7000-2021 Alignment

Data Flow

SDCStudio API Dependencies

Quick Start

Prerequisites

Installation

Configuration

Usage (ADK — Primary)

Usage (MCP — Secondary)

CLI Commands

Docker

CI/CD

Testing

Documentation

Implementation Phases

What's Implemented (Phases 1–3)

Related Projects

License & Ownership

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages