You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Generate a data pipeline from a natural language description. Decomposes the description into extraction, transformation, loading, testing, and deployment tasks. Uses LLMCodeGenerator for code generation with budget tracking and template fallback. Cross-agent calls register the pipeline asset in the catalog, create quality tests, and check schema compatibility.
Parameter
Type
Required
Description
description
string
Yes
Natural language description of the desired pipeline
Whether to persist the pipeline spec to the relational store and Git. Pro tier only — Community tier returns specs in-memory only (default: false)
principal
object
No
Caller identity for audit and tier gating. Contains userId, tenantId, and tier (community/pro/enterprise)
validate_pipeline
Validate a pipeline specification using real sandbox execution. SandboxRunner performs Python AST parsing, SQL syntax checking, and YAML schema validation. Semantic layer validation is available when the catalog agent is reachable via message bus.
Parameter
Type
Required
Description
pipelineSpec
object
Yes
The pipeline specification to validate
customerId
string
Yes
Customer ID for tenant context
validateSemanticLayer
boolean
No
Whether to validate against the semantic layer (default: true)
sandboxExecution
boolean
No
Whether to run in sandbox (default: true)
deploy_pipeline(Pro)
Deploy a validated pipeline to the target orchestrator. When Airflow is configured (AIRFLOW_DAG_PATH, AIRFLOW_API_URL), AirflowDeployer writes DAG files via filesystem, S3, or git-sync and verifies deployment through the Airflow REST API. Optionally commits the pipeline specification as versioned YAML to Git with real SHA-1 commit hashes when GIT_REPO_PATH is set.
Parameter
Type
Required
Description
pipelineSpec
object
Yes
The validated pipeline specification to deploy
customerId
string
Yes
Customer ID for tenant context
environment
string
Yes
Deployment target: staging or production
gitCommit
boolean
No
Whether to commit the spec to Git (default: true)
gitBranch
string
No
Git branch for the commit (default: main)
list_pipeline_templates
List available pipeline templates for common data engineering patterns. Templates can be used as starting points for pipeline generation.
Parameter
Type
Required
Description
category
string
No
Filter by category: etl, elt, cdc, streaming, reverse-etl, data-quality
orchestrator
string
No
Filter by orchestrator: airflow, dagster, prefect
dw-context-catalog
search_datasets
Search the data catalog using natural language. Returns matching datasets with relevance scores. Supports filtering by platform, type, tags, and quality score.
Parameter
Type
Required
Description
query
string
Yes
Natural language search query
customerId
string
Yes
Customer ID
platform
string
No
Filter by platform (snowflake, bigquery, etc.)
type
string
No
Filter by type: table, view, model, pipeline, dashboard, metric
tags
string[]
No
Filter by tags
limit
number
No
Max results (default: 20)
get_lineage
Get the lineage graph for a data asset. Returns upstream sources and downstream consumers with column-level lineage when available.
Parameter
Type
Required
Description
assetId
string
Yes
Asset ID or fully qualified name
customerId
string
Yes
Customer ID
direction
string
No
Traversal direction: upstream, downstream, both (default: both)
maxDepth
number
No
Max traversal depth (default: 5)
includeColumnLineage
boolean
No
Include column-level lineage (default: true)
get_context
Get complete context for a data asset in a single call. Returns schema, lineage, quality, freshness, trust score, documentation, and related metrics.
Parameter
Type
Required
Description
assetId
string
Yes
Asset ID or name
customerId
string
Yes
Customer ID
check_freshness
Check the freshness of a data asset. Returns freshness score (0-100), last-updated timestamp, SLA compliance status, and staleness alerts.
Parameter
Type
Required
Description
assetId
string
Yes
Asset ID or name
customerId
string
Yes
Customer ID
slaTargetMs
number
No
SLA target in milliseconds (default: 86400000 / 24h)
assess_impact
Assess the downstream impact of changing a data asset. Returns blast radius, affected dashboards/models/pipelines, severity classification, and recommendations.
Parameter
Type
Required
Description
assetId
string
Yes
Asset ID or name to assess
customerId
string
Yes
Customer ID
maxDepth
number
No
Maximum depth for impact traversal (default: 5)
get_documentation
Get auto-generated documentation for a data asset including description, column details, lineage summary, usage stats, and quality score.
Parameter
Type
Required
Description
assetId
string
Yes
Asset ID or name
customerId
string
Yes
Customer ID
list_semantic_definitions
List all semantic layer definitions (metrics, dimensions, entities). Supports filtering by domain, type, and source.
Parameter
Type
Required
Description
customerId
string
Yes
Customer ID
domain
string
No
Filter by domain (finance, product, marketing)
type
string
No
Filter: metric, dimension, entity
source
string
No
Filter: dbt, looker, cube, custom
limit
number
No
Max results (default: 50)
resolve_metric
Resolve an ambiguous metric name to its canonical semantic layer definition. If multiple definitions match, returns all candidates for disambiguation.
Parameter
Type
Required
Description
metricName
string
Yes
Metric name to resolve (e.g., "revenue", "MRR")
customerId
string
Yes
Customer ID
domain
string
No
Optional domain filter
dw-schema
detect_schema_change
Detect schema changes on a data asset. Monitors INFORMATION_SCHEMA, schema registries, and Git webhooks for real-time modifications. Classifies changes as breaking or non-breaking.
Parameter
Type
Required
Description
source
string
Yes
Data source (snowflake, bigquery, postgres, etc.)
customerId
string
Yes
Customer ID
database
string
No
Database name
schema
string
No
Schema name
table
string
No
Table to check (omit to scan all tables)
generate_migration
Generate backward-compatible migration scripts for a schema change. Includes forward SQL, rollback SQL, and affected system updates. Validates via sqlglot.
Parameter
Type
Required
Description
change
object
Yes
The schema change to migrate
customerId
string
Yes
Customer ID
targetSystems
string[]
No
Systems to generate migrations for (sql, dbt, api)
apply_migration
Apply a validated migration using blue/green deployment strategy. Includes automatic rollback capability and downstream agent notification.
If true, validate without executing (default: false)
assess_impact
Assess downstream impact of a schema change via lineage graph traversal. Identifies all affected pipelines, views, dashboards, ML models, and APIs.
Parameter
Type
Required
Description
change
object
Yes
The schema change to assess
customerId
string
Yes
Customer ID
maxDepth
number
No
Max lineage traversal depth (default: 5)
dw-quality
run_quality_check
Execute data quality profiling on a dataset. Checks null rates, uniqueness, distributions, referential integrity, freshness, and volume. Returns quality score and detected anomalies.
Parameter
Type
Required
Description
datasetId
string
Yes
Dataset or table to check
customerId
string
Yes
Customer ID
metrics
string[]
No
Specific metrics to check (omit for all)
columns
string[]
No
Specific columns (omit for all)
get_quality_score
Retrieve the real-time data quality score (0-100) for a dataset. Includes breakdown by dimension (completeness, accuracy, consistency, freshness, uniqueness) and trend.
Parameter
Type
Required
Description
datasetId
string
Yes
Dataset ID
customerId
string
Yes
Customer ID
get_anomalies
List detected data quality anomalies with classification. Supports filtering by severity, dataset, and time range. Anomalies are deduplicated (50-100 raw to 5-10 actionable).
Parameter
Type
Required
Description
customerId
string
Yes
Customer ID
datasetId
string
No
Filter by dataset
severity
string
No
Filter: critical, warning, info
fromTimestamp
number
No
Start of time range
limit
number
No
Max results (default: 20)
deduplicatedOnly
boolean
No
Only show deduplicated/actionable (default: true)
set_sla
Define data quality SLAs for a dataset. SLA rules specify metric thresholds with severity levels. Violations trigger alerts within 60 seconds.
Diagnose a data incident from anomaly signals. Classifies into one of 6 types (schema_change, source_delay, resource_exhaustion, code_regression, infrastructure, quality_degradation), determines severity, and suggests remediation actions.
Perform root cause analysis for a diagnosed incident. Traverses the lineage graph up to 5+ hops upstream, queries execution logs, cross-references incident history, and returns a causal chain with confidence scores.
Parameter
Type
Required
Description
incidentId
string
Yes
ID of the diagnosed incident
incidentType
string
Yes
Incident type (schema_change, source_delay, etc.)
affectedResources
string[]
Yes
Resources affected by the incident
customerId
string
Yes
Customer ID
maxDepth
number
No
Max lineage traversal depth (default: 5)
remediate
Execute auto-remediation for a diagnosed incident. For known patterns with >95% confidence, executes a remediation playbook automatically. For novel incidents, generates a diagnosis report and routes to human approval.
Specific playbook: restart_task, scale_compute, apply_schema_migration, switch_backup_source, backfill_data, custom
dryRun
boolean
No
Simulate remediation without executing (default: false)
get_incident_history
Query past incidents for pattern matching and learning. Supports filtering by type, severity, time range, and similarity to a current incident. Uses vector similarity for finding related historical incidents.
Parameter
Type
Required
Description
customerId
string
Yes
Customer ID
type
string
No
Filter by incident type
severity
string
No
Filter by severity
fromTimestamp
number
No
Start of time range (epoch ms)
toTimestamp
number
No
End of time range (epoch ms)
limit
number
No
Max results (default: 20)
similarTo
string
No
Incident description to find similar incidents for
dw-governance
check_policy
Validate an action against active governance policies. Uses OPA/Rego policy engine with <100ms evaluation. Returns allow/deny/review decision with matched rules.
Parameter
Type
Required
Description
action
string
Yes
Action to validate (read, write, delete, deploy, etc.)
resource
string
Yes
Resource being accessed
agentId
string
Yes
Agent requesting the action
customerId
string
Yes
Customer ID
context
object
No
Additional context (user, environment, data classification)
enforce_rbac
Apply role-based access control to a resource. Supports column-level permissions and role hierarchy.
Process an access request with least-privilege enforcement. Supports natural language requests. Applies column-level permissions, 90-day auto-expiration.
Parameter
Type
Required
Description
userId
string
Yes
User ID
resource
string
Yes
Resource identifier
accessLevel
string
Yes
Access level: read, write, admin
justification
string
Yes
NL justification for access
customerId
string
Yes
Customer ID
durationDays
number
No
Access duration in days (default: 90)
scan_pii
Scan a dataset for PII (Personally Identifiable Information). Uses three-pass detection: (1) column-name heuristic, (2) regex on cell values, (3) LLM stub. >95% precision target.
Parameter
Type
Required
Description
datasetId
string
Yes
Dataset ID
customerId
string
Yes
Customer ID
columns
string[]
No
Specific columns (omit for all)
sampleSize
number
No
Rows to sample (default: 100)
generate_audit_report
Generate a compliance audit report with full evidence chain. Supports on-demand and scheduled generation. Covers agent actions, policy evaluations, access grants, PII detections.
List all active agent instances with their current health status and key metrics.
No required parameters.
check_agent_health
Check per-agent health status based on error rate and heartbeat recency. Returns healthy/degraded/unhealthy classification.
Parameter
Type
Required
Description
agentName
string
No
Specific agent to check (omit for all)
get_agent_metrics
Get p50/p95/p99 latency, error rates, token consumption, and confidence for an agent over a time period. Deterministic -- no LLM in collection path.
Parameter
Type
Required
Description
agentName
string
Yes
Agent name (e.g. "pipelines", "incidents")
period
string
No
Time period: 1d, 7d (default: 7d)
detect_drift
Detect behavioral drift by comparing recent metrics against 7-day baseline. Alerts on error rate spikes (>5%) and latency anomalies (p99 > 2x baseline).
Parameter
Type
Required
Description
agentName
string
No
Specific agent to check (omit for all)
get_audit_trail
Retrieve SHA-256 hash-chain audit log entries. Each entry is cryptographically linked to the previous one. Supports filtering by agent name and limiting results.
Parameter
Type
Required
Description
agentName
string
No
Filter by agent name (omit for all)
limit
number
No
Max entries to return (default: 20)
get_evaluation_report
Get aggregated human evaluation scores for an agent. Breaks down by accuracy, completeness, safety, and helpfulness.
Parameter
Type
Required
Description
agentName
string
Yes
Agent name
period
string
No
Time period: 7d, 30d (default: 7d)
dw-usage-intelligence
get_tool_usage_metrics
Get usage volume, unique users, trend direction, and response times for MCP tools. Supports grouping by tool, agent, or user.
Parameter
Type
Required
Description
toolName
string
No
Filter by specific MCP tool name
agentName
string
No
Filter by agent
period
string
No
Time period: 1d, 7d, 30d (default: 7d)
groupBy
string
No
Group by: tool, agent, or user (default: tool)
get_session_analytics
Analyze practitioner interaction sessions: duration, depth (tools per session), agents per session, and user type classification (power_user, regular, occasional).
Parameter
Type
Required
Description
userId
string
No
Specific user (omit for all)
period
string
No
7d, 30d (default: 7d)
sessionGapMinutes
number
No
Minutes of inactivity before a new session (default: 30)
get_usage_heatmap
Get usage heatmap data showing when and where practitioners interact with the platform. Supports hourly, daily, and agent_x_user dimensions.
Parameter
Type
Required
Description
dimension
string
No
hourly, daily, or agent_x_user (default: hourly)
period
string
No
7d, 30d (default: 7d)
agentName
string
No
Filter by agent
get_workflow_patterns
Identify common multi-tool and multi-agent workflow sequences. Reveals how practitioners chain tools together and what percentage of usage is standalone vs. part of workflows.
Parameter
Type
Required
Description
userId
string
No
Analyze for specific user (omit for all)
minSequenceLength
number
No
Minimum tools in a sequence (default: 2)
period
string
No
7d, 30d (default: 30d)
topN
number
No
Return top N patterns (default: 10)
detect_usage_anomalies
Detect anomalies in practitioner usage patterns: sudden drops (friction), unusual spikes (automation loops or incidents), and behavior shifts.
Parameter
Type
Required
Description
agentName
string
No
Check specific agent (omit for all)
sensitivity
string
No
low, medium, high (default: medium)
get_adoption_dashboard
Get platform adoption metrics: which agents and tools are being adopted, growing, underused, or shelfware.
Parameter
Type
Required
Description
period
string
No
7d, 30d, 90d (default: 30d)
threshold
number
No
Minimum calls per user to count as "adopted" (default: 5)
get_usage_activity_log
Retrieve SHA-256 hash-chained practitioner activity log. Shows who called which tool, when, and with what outcome. Verifies chain integrity for compliance.
Parameter
Type
Required
Description
userId
string
No
Filter by user ID
agentName
string
No
Filter by agent
toolName
string
No
Filter by specific tool
since
string
No
Relative time: 1h, 24h, 7d, 30d (default: 24h)
limit
number
No
Max entries (default: 50)
list_active_agents
List all active agent instances with health status and key metrics.
No required parameters.
check_agent_health
Check per-agent health status based on error rate and heartbeat recency.
Parameter
Type
Required
Description
agentName
string
No
Specific agent to check (omit for all)
get_agent_metrics
Get p50/p95/p99 latency, error rates, token consumption, and confidence for an agent.
Parameter
Type
Required
Description
agentName
string
Yes
Agent name
period
string
No
1d, 7d (default: 7d)
detect_drift
Detect behavioral drift by comparing recent metrics against 7-day baseline.
Parameter
Type
Required
Description
agentName
string
No
Specific agent (omit for all)
get_audit_trail
Retrieve SHA-256 hash-chain audit log entries.
Parameter
Type
Required
Description
agentName
string
No
Filter by agent
limit
number
No
Max entries (default: 20)
get_evaluation_report
Get aggregated human evaluation scores for an agent.
Parameter
Type
Required
Description
agentName
string
Yes
Agent name
period
string
No
7d, 30d (default: 7d)
dw-ml
suggest_features
Analyze a dataset and suggest feature engineering transformations for ML model training. Returns ranked feature suggestions with expected impact scores.
Parameter
Type
Required
Description
datasetId
string
Yes
Dataset or table identifier to analyze
customerId
string
Yes
Customer ID for tenant context
targetColumn
string
No
Target variable for supervised learning suggestions
maxSuggestions
number
No
Maximum number of feature suggestions (default: 20)
select_model
Recommend ML model architectures based on data characteristics, problem type, and constraints. Returns ranked model suggestions with estimated training time and resource requirements.
Get AWS cost breakdown by service. Params: startDate, endDate
get_aws_cost_forecast
Get AWS cost forecast. Params: months
get_aws_cost_recommendations
Get AWS cost optimization recommendations
Identity
Tool
Description
list_okta_users
List users from Okta. Optional: filter
get_okta_user
Get a specific Okta user. Params: userId
list_azure_ad_users
List users from Azure AD / Entra ID. Optional: filter
get_azure_ad_user
Get a specific Azure AD user. Params: userId
Observability (OTel & Datadog)
Tool
Description
query_otel_metrics
Query metrics from OpenTelemetry. Params: query, start, end
list_otel_alerts
List alerts from OpenTelemetry
get_otel_trace
Get a trace by ID from OpenTelemetry. Params: traceId
query_datadog_metrics
Query metrics from Datadog. Params: query, start, end
list_datadog_monitors
List monitors from Datadog
get_datadog_trace
Get a trace by ID from Datadog. Params: traceId
Quality (GX, Soda, Monte Carlo)
All quality connectors follow a consistent pattern: list suites, run suite, get results, list monitors.
Tool
Description
list_gx_suites
List Great Expectations Cloud suites
run_gx_suite
Run a Great Expectations suite. Params: suiteId
get_gx_results
Get Great Expectations check results. Params: suiteId
list_gx_monitors
List Great Expectations monitors
list_soda_suites
List Soda Cloud check suites
run_soda_suite
Run a Soda Cloud check suite. Params: suiteId
get_soda_results
Get Soda Cloud check results. Params: suiteId
list_soda_monitors
List Soda Cloud monitors
list_monte_carlo_suites
List Monte Carlo monitor suites
run_monte_carlo_suite
Run a Monte Carlo monitor suite. Params: suiteId
get_monte_carlo_results
Get Monte Carlo check results. Params: suiteId
list_monte_carlo_monitors
List Monte Carlo monitors
BI (Looker & Tableau)
Both BI connectors follow a consistent pattern: list dashboards, get dashboard, list reports, get data sources.
Tool
Description
list_looker_dashboards
List Looker dashboards
get_looker_dashboard
Get Looker dashboard detail. Params: dashboardId
list_looker_reports
List Looker reports (Looks)
get_looker_data_sources
Get Looker data source connections
list_tableau_dashboards
List Tableau dashboards
get_tableau_dashboard
Get Tableau dashboard detail. Params: dashboardId
list_tableau_reports
List Tableau workbooks
get_tableau_data_sources
Get Tableau data source connections
ITSM (ServiceNow & Jira SM)
Both ITSM connectors follow a consistent pattern: create, get, list, update tickets.
ServiceNow (4 tools)
Tool
Description
create_servicenow_ticket
Create a ServiceNow incident. Params: summary, description, priority, category
get_servicenow_ticket
Get a ServiceNow incident. Params: ticketId
list_servicenow_tickets
List ServiceNow incidents. Optional: status, priority
update_servicenow_ticket
Update a ServiceNow incident. Params: ticketId, optional: summary, priority
Jira Service Management (4 tools)
Tool
Description
create_jira_sm_ticket
Create a Jira SM ticket. Params: summary, description, priority, category
get_jira_sm_ticket
Get a Jira SM ticket. Params: ticketId
list_jira_sm_tickets
List Jira SM tickets. Optional: status, priority
update_jira_sm_ticket
Update a Jira SM ticket. Params: ticketId, optional: summary, priority
Note: All enterprise connector tools require a customerId parameter for multi-tenant isolation. This parameter is omitted from the compact tables above for brevity.