|
| 1 | +--- |
| 2 | +name: autonomous-data-loader |
| 3 | +description: generate and safely execute oracle autonomous ai lakehouse data loading and oci object storage lakehouse access workflows using dbms_cloud. use when the user wants to list oci object storage files, choose files or prefixes to load, create conservative csv staging tables, generate or run copy_data or copy_collection, tune dbms_cloud format options, load json documents into soda collections, create external tables to query apache iceberg data stored in oci object storage using direct metadata.json or hadoop catalog patterns, monitor user_load_operations or dba_load_operations, inspect logfile_table or badfile_table, troubleshoot rejected rows, reconcile loads, or profile staged data after loading. this skill is mcp-first with generate-only fallback and is scoped to oci object storage and dbms_cloud-based workflows. |
| 4 | +--- |
| 5 | + |
| 6 | +# Autonomous AI Lakehouse Data Loader |
| 7 | + |
| 8 | +## Purpose |
| 9 | + |
| 10 | +Use this skill to help users load data from OCI Object Storage into Oracle Autonomous AI Lakehouse with `DBMS_CLOUD`, and to create external-table access to Apache Iceberg data stored in OCI Object Storage. The skill is designed for a portable Agent Skill workflow: it can generate SQL/PLSQL for manual execution, or execute through an available MCP SQL tool when connected to the target Autonomous database. |
| 11 | + |
| 12 | +## Core Scope |
| 13 | + |
| 14 | +Handle these workflows: |
| 15 | + |
| 16 | +- Discover objects in OCI Object Storage with `DBMS_CLOUD.LIST_OBJECTS`. |
| 17 | +- Normalize Object Storage URIs and choose a file, selected file list, prefix, wildcard, or regex pattern. |
| 18 | +- Prefer existing `DBMS_CLOUD` credential names and never request secrets in chat. |
| 19 | +- Check whether target tables exist before generating `COPY_DATA`. |
| 20 | +- Generate and optionally execute `DBMS_CLOUD.COPY_DATA` for supported file loads into existing relational tables. |
| 21 | +- Generate and optionally execute `DBMS_CLOUD.COPY_COLLECTION` for JSON documents into SODA collections. |
| 22 | +- For CSV without an existing target table, offer conservative staging from the CSV header using `VARCHAR2(4000)` columns. |
| 23 | +- Generate format options for CSV, JSON, Parquet, ORC, and Avro. Treat XML as version-specific and verify official documentation before generating XML load workflows. |
| 24 | +- Create and validate external tables that query Apache Iceberg data stored in OCI Object Storage, using only the direct `metadata.json` and HadoopCatalog-on-OCI patterns documented for Autonomous AI Database. |
| 25 | +- Monitor and reconcile loads with native `USER_LOAD_OPERATIONS` or `DBA_LOAD_OPERATIONS`. |
| 26 | +- Inspect `LOGFILE_TABLE` and `BADFILE_TABLE` after failures or rejected rows. |
| 27 | +- Profile staged data after load and propose curated DDL only as a proposal. |
| 28 | + |
| 29 | +Do not make Data Pump or `DBMS_CLOUD_PIPELINE` part of any default workflow. Do not add non-OCI Iceberg providers such as Unity, Polaris, AWS Glue, S3, Azure, or GCS to the default workflow. |
| 30 | + |
| 31 | +## Execution Model |
| 32 | + |
| 33 | +Default to MCP-enabled execution when a SQL execution tool is available. If no MCP SQL tool is available, use generate-only mode. |
| 34 | + |
| 35 | +### MCP-enabled mode |
| 36 | + |
| 37 | +- Use the available MCP SQL execution tool for read-only inspection queries. |
| 38 | +- Do not assume a specific tool name. Prefer the SQL tool connected to the target Autonomous AI Lakehouse database. |
| 39 | +- Execute read-only checks directly when useful: dictionary queries, `LIST_OBJECTS`, load-history queries, log and badfile inspection, Iceberg external-table sanity checks such as `COUNT(*)`. |
| 40 | +- For mutating operations, generate the SQL/PLSQL first, explain the impact, and require approval before execution. |
| 41 | + |
| 42 | +### Generate-only mode |
| 43 | + |
| 44 | +- Generate SQL/PLSQL and ask the user to execute it manually in their preferred Oracle client. |
| 45 | +- Ask the user to paste results back when the next step depends on inspection output. |
| 46 | + |
| 47 | +## Approval Policy |
| 48 | + |
| 49 | +Support two approval styles: |
| 50 | + |
| 51 | +- **Strict approval**: ask before every mutating operation. This is the default. |
| 52 | +- **Batch approval**: show the complete non-destructive mutating plan first, then execute the approved plan. Use only when the user asks for batch approval or clearly approves the entire plan. |
| 53 | + |
| 54 | +Always require strict approval for destructive operations, even when batch approval is active. |
| 55 | + |
| 56 | +Mutating operations include: |
| 57 | + |
| 58 | +- `CREATE TABLE`, `ALTER TABLE`, `CREATE COLLECTION` patterns, and similar DDL. |
| 59 | +- `DBMS_CLOUD.COPY_DATA`. |
| 60 | +- `DBMS_CLOUD.COPY_COLLECTION`. |
| 61 | +- `DBMS_CLOUD.COPY_COLLECTION` may create a missing SODA collection; treat it as mutating even before rows or documents are loaded. |
| 62 | +- `DBMS_CLOUD.CREATE_CREDENTIAL`. |
| 63 | +- `DBMS_CLOUD.CREATE_EXTERNAL_TABLE` for Iceberg access. |
| 64 | +- `DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE` for Iceberg Object Storage ACL setup. |
| 65 | +- `INSERT`, `UPDATE`, `DELETE`, `MERGE`. |
| 66 | + |
| 67 | +Destructive operations include: |
| 68 | + |
| 69 | +- `DROP TABLE`. |
| 70 | +- `TRUNCATE TABLE`. |
| 71 | +- `ALTER TABLE DROP COLUMN`. |
| 72 | +- `DELETE` without a narrowly scoped predicate. |
| 73 | +- Replacing, truncating, or recreating an existing staging table. |
| 74 | + |
| 75 | +Prefer non-destructive alternatives, such as a new staging table name, before recommending destructive cleanup. |
| 76 | + |
| 77 | +## Guardrails |
| 78 | + |
| 79 | +- Never ask users to paste secrets, API keys, auth tokens, private keys, or passwords into the prompt. |
| 80 | +- Prefer an existing `DBMS_CLOUD` credential name. |
| 81 | +- If a credential is missing, generate a `CREATE_CREDENTIAL` template with placeholders and warn users to replace placeholders outside the chat. |
| 82 | +- Do not infer a final CSV business schema from a filename, bucket, or folder alone. |
| 83 | +- For CSV without a target table, ask whether the user wants conservative staging, a user-provided schema, or profiling first. |
| 84 | +- Do not mix formats in a single `COPY_DATA` operation. |
| 85 | +- Do not load from a whole prefix until object discovery shows the files are homogeneous enough. |
| 86 | +- Exclude marker/control files such as `_SUCCESS`, `.crc`, manifests, readme files, and zero-byte files unless the user explicitly requests otherwise. |
| 87 | +- Treat generated curated DDL as proposed until the user approves it. |
| 88 | +- For Iceberg workflows, keep the scope to OCI Object Storage only and generate only external table access patterns; do not treat Iceberg as a `COPY_DATA` load. |
| 89 | +- For Iceberg direct metadata, warn that the table points to a specific `metadata.json` snapshot and may need to be recreated after snapshot or schema changes. |
| 90 | +- For Iceberg HadoopCatalog on OCI, require the lakehouse folder URI and `iceberg_table_path`. |
| 91 | +- Warn users about documented Iceberg limitations before creating an external table: fixed external-table schema, no query-time time travel, unsupported merge-on-read delete files, and provider/version-specific restrictions. |
| 92 | + |
| 93 | +## Workflow Decision Tree |
| 94 | + |
| 95 | +1. Identify the source request: |
| 96 | + - bucket or prefix discovery: use `references/object-discovery-and-selection.md`. |
| 97 | + - direct relational table load: use `references/copy-data.md`. |
| 98 | + - JSON document collection load: use `references/copy-collection-json.md`. |
| 99 | + - Apache Iceberg data stored in OCI Object Storage: use `references/iceberg-oci-object-storage.md`. |
| 100 | + - failed load or rejected rows: use `references/monitoring-and-troubleshooting.md`. |
| 101 | + - CSV with no target table: use `references/csv-staging-and-profiling.md`. |
| 102 | + |
| 103 | +2. Collect minimum inputs: |
| 104 | + - `credential_name` or instruction to create one. |
| 105 | + - OCI Object Storage URI, bucket/prefix, or exact file URI. |
| 106 | + - target table or collection name, unless the user wants discovery/planning only. |
| 107 | + - format or enough evidence to infer the format from selected object names. |
| 108 | + - for Iceberg: external table name, credential name, OCI Object Storage URI for `metadata.json` or lakehouse folder, and optionally `iceberg_table_path` for HadoopCatalog. |
| 109 | + |
| 110 | +3. Run read-only pre-checks when MCP is available: |
| 111 | + - list object candidates with `DBMS_CLOUD.LIST_OBJECTS`. |
| 112 | + - check target table or collection existence. |
| 113 | + - inspect target columns when loading into a relational table. |
| 114 | + - inspect recent load history when troubleshooting. |
| 115 | + - inspect Iceberg metadata file or lakehouse folder candidates when building an Iceberg external table. |
| 116 | + |
| 117 | +4. Plan the load or access pattern: |
| 118 | + - choose exact file list, prefix/wildcard, or regex pattern. |
| 119 | + - select `COPY_DATA`, `COPY_COLLECTION`, or `CREATE_EXTERNAL_TABLE` for Iceberg query access. |
| 120 | + - select format options or Iceberg access protocol configuration. |
| 121 | + - decide direct load versus user-named staging. |
| 122 | + |
| 123 | +5. For mutating operations: |
| 124 | + - present the SQL/PLSQL. |
| 125 | + - explain the risk. |
| 126 | + - ask for strict or batch approval. |
| 127 | + - execute only after approval if MCP is available. |
| 128 | + |
| 129 | +6. After execution: |
| 130 | + - query `USER_LOAD_OPERATIONS` or `DBA_LOAD_OPERATIONS` for load operations. |
| 131 | + - for Iceberg external tables, run a read-only sanity check such as `SELECT COUNT(*)` and inspect table columns. |
| 132 | + - reconcile status, operation id, start/update time, log table, badfile table, and row counts where possible. |
| 133 | + - if failed or rejected rows are present, switch to troubleshooting. |
| 134 | + - if CSV staging was used, offer post-load profiling and curated DDL proposal. |
| 135 | + |
| 136 | +## Response Style |
| 137 | + |
| 138 | +Be flexible and concise. Do not force every answer into a rigid template. For mutating or destructive operations, always clearly show: |
| 139 | + |
| 140 | +- what will change, |
| 141 | +- the SQL/PLSQL involved, |
| 142 | +- whether approval is required, |
| 143 | +- how to monitor the result, |
| 144 | +- and how to troubleshoot failures. |
| 145 | + |
| 146 | +For Iceberg external-table access, clearly state that the operation creates query access to data in Object Storage; it does not copy the Iceberg data into Autonomous. |
| 147 | + |
| 148 | +## References |
| 149 | + |
| 150 | +Use these files when relevant: |
| 151 | + |
| 152 | +- `references/oracle-docs-index.md` for official Oracle documentation links. |
| 153 | +- `references/version-notes.md` for the v0.1 scope and release notes. |
| 154 | +- `references/minimum-inputs.md` for minimum required inputs by workflow. |
| 155 | +- `references/object-discovery-and-selection.md` for object listing and file selection. |
| 156 | +- `references/source-and-credentials.md` for OCI Object Storage credentials and URI patterns. |
| 157 | +- `references/copy-data.md` for relational table loads with `DBMS_CLOUD.COPY_DATA`. |
| 158 | +- `references/copy-collection-json.md` for JSON document loads into SODA collections. |
| 159 | +- `references/format-options.md` for format option guidance. |
| 160 | +- `references/iceberg-oci-object-storage.md` for querying Iceberg data in OCI Object Storage with external tables. |
| 161 | +- `references/csv-staging-and-profiling.md` for CSV staging and post-load profiling. |
| 162 | +- `references/monitoring-and-troubleshooting.md` for load monitoring, logs, badfiles, and retry guidance. |
| 163 | +- `references/mcp-execution.md` for MCP-first execution behavior. |
| 164 | + |
| 165 | +Use examples in `examples/` only when the user needs a concrete pattern. |
0 commit comments