feat: Add field schema documentation and output schema guidance (#2317)

jancurn · claude · TC-MO · web-flow · commit 9dd34f42d70c · 2026-03-09T15:36:45.000+01:00
## Summary This PR enhances the Actor documentation by adding comprehensive guidance on field schemas and output schemas, with a focus on AI agent integration. The changes include new documentation sections, examples, and clarifications about the importance of metadata for AI agents. ## Key Changes - **Dataset Schema Updates**: - Changed `fields` property from required to optional in the dataset schema table - Fixed formatting of "JSON Schema Draft 2020-12" (en-dash to hyphen) - Added reference link to new Field schema section - Added new "Field schema" section explaining JSON Schema usage and importance for AI agents - Included comprehensive example of a product scraper with full field metadata (title, description, example properties) - Added naming convention guidance recommending camelCase for field names - **Output Schema Documentation**: - Added new "Why output schema matters" section highlighting importance for AI agent integration, user experience, and API consumers - Added tip recommending that all Actors define output schema, even if empty - Added note clarifying the relationship between output schema and dataset schema - Added new "Web crawler with multiple output types" example showing a complete output schema with multiple datasets and key-value store collections - Included explanatory text about how metadata helps AI agents understand Actor capabilities ## Notable Implementation Details - The field schema documentation emphasizes the importance of `title`, `description`, and `example` properties for AI agent integration - Examples demonstrate best practices for documenting Actor outputs with clear, actionable metadata - Documentation now explicitly connects dataset schema field definitions with output schema declarations - Added cross-references between related documentation sections to improve navigation https://claude.ai/code/session_01Bag9ToAChMbL1HUtqgLVv9  --- > [!NOTE] > **Low Risk** > Documentation-only changes; no runtime, API, or schema-validation behavior is modified. > > **Overview** > Adds new documentation explaining how to define `fields` using JSON Schema (including why `title`/`description`/`example` matter for AI agents), with a full example and naming guidance. > > Updates the dataset schema spec to make `fields` optional and links it to the new *Field schema* section, and expands output schema docs with an AI-focused rationale, guidance to always define an (even empty) output schema, a note clarifying how output schema complements dataset schema, and a richer multi-output crawler example. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit a7cc7e0. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>  --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
diff --git a/sources/platform/actors/development/actor_definition/dataset_schema/index.md b/sources/platform/actors/development/actor_definition/dataset_schema/index.md
@@ -200,7 +200,7 @@ The dataset schema structure defines the various components and properties that
 | Property | Type | Required | Description |
 | --- | --- | --- | --- |
 | `actorSpecification` | integer | true | Specifies the version of dataset schema <br/>structure document. <br/>Currently only version 1 is available. |
-| `fields` | JSONSchema compatible object | true | Schema of one dataset object. <br/>Use JsonSchema Draft 2020–12 or <br/>other compatible formats. |
+| `fields` | JSONSchema compatible object | false | Schema of one dataset object. <br/>Use JsonSchema Draft 2020-12 or <br/>other compatible formats. Refer to [Field schema](#field-schema) section for details. |
 | `views` | DatasetView object | true | An object with a description of an API <br/>and UI views. |
 
 ### DatasetView object definition
@@ -236,3 +236,114 @@ The dataset schema structure defines the various components and properties that
 | --- | --- | --- | --- |
 | `label` | string | false | In the Table view, the label will be visible as the table column's header. |
 | `format` | One of <ul><li>`text`</li><li>`number`</li><li>`date`</li><li>`link`</li><li>`boolean`</li><li>`image`</li><li>`array`</li><li>`object`</li></ul> | false | Describes how output data values are formatted to be rendered in the Output tab UI. |
+
+## Field schema
+
+The `fields` property in the dataset schema defines the structure of individual dataset items using [JSON Schema](https://json-schema.org/). This schema enables validation and provides metadata that helps both humans and AI agents understand your Actor's output.
+
+### Why field descriptions matter
+
+When AI agents interact with Actors through the MCP server or API, they rely on the field schema to understand what data the Actor produces. Including `title`, `description`, and `example` properties for each field enables agents to:
+
+- Understand the meaning of each output field
+- Chain Actors together by matching inputs to outputs
+- Generate appropriate queries and handle responses correctly
+
+Without this metadata, agents must infer field meanings from names alone, which leads to errors and a degraded experience.
+
+### Define field metadata
+
+Each field in your schema can include standard JSON Schema properties:
+
+| Property | Type | Description |
+| --- | --- | --- |
+| `type` | string | The data type (`string`, `number`, `boolean`, `array`, `object`, `null`). |
+| `title` | string | A human-readable name for the field. |
+| `description` | string | Explains what the field contains and how to interpret it. |
+| `example` | any | A sample value that demonstrates the expected format. |
+| `enum` | array | A list of allowed values for the field. |
+
+### Example with field descriptions
+
+The following example shows a dataset schema for a product scraper with full field metadata:
+
+```json title=".actor/dataset_schema.json"
+{
+    "actorSpecification": 1,
+    "fields": {
+        "$schema": "http://json-schema.org/draft-07/schema#",
+        "type": "object",
+        "properties": {
+            "productName": {
+                "type": "string",
+                "title": "Product name",
+                "description": "The full name of the product as displayed on the product page.",
+                "example": "Wireless Bluetooth Headphones"
+            },
+            "price": {
+                "type": "number",
+                "title": "Price",
+                "description": "The current price in USD. Does not include shipping or taxes.",
+                "example": 49.99
+            },
+            "currency": {
+                "type": "string",
+                "title": "Currency code",
+                "description": "Three-letter ISO 4217 currency code.",
+                "example": "USD",
+                "enum": ["USD", "EUR", "GBP"]
+            },
+            "inStock": {
+                "type": "boolean",
+                "title": "In stock",
+                "description": "Whether the product is currently available for purchase.",
+                "example": true
+            },
+            "categories": {
+                "type": "array",
+                "title": "Categories",
+                "description": "List of category names the product belongs to, from broadest to most specific.",
+                "items": {
+                    "type": "string"
+                },
+                "example": ["Electronics", "Audio", "Headphones"]
+            },
+            "url": {
+                "type": "string",
+                "title": "Product URL",
+                "description": "Direct link to the product page.",
+                "example": "https://example.com/products/wireless-headphones"
+            }
+        },
+        "required": ["productName", "price", "url"]
+    },
+    "views": {
+        "overview": {
+            "title": "Overview",
+            "transformation": {
+                "fields": ["productName", "price", "inStock", "url"]
+            },
+            "display": {
+                "component": "table",
+                "properties": {
+                    "url": {
+                        "label": "Link",
+                        "format": "link"
+                    },
+                    "inStock": {
+                        "format": "boolean"
+                    }
+                }
+            }
+        }
+    }
+}
+```
+
+:::tip Naming convention
+
+Use `camelCase` for field names in your schema. This matches the convention used in input schemas and ensures consistency across your Actor's configuration.
+
+:::
+
+For validation options and error handling, see [Dataset validation](./validation.md).
diff --git a/sources/platform/actors/development/actor_definition/output_schema/index.md b/sources/platform/actors/development/actor_definition/output_schema/index.md
@@ -12,6 +12,20 @@ slug: /actors/development/actor-definition/output-schema
 
 The Actor output schema builds upon the schemas for the [dataset](/platform/actors/development/actor-definition/dataset-schema) and [key-value store](/platform/actors/development/actor-definition/key-value-store-schema). It specifies where an Actor stores its output and defines templates for accessing that output. Apify Console uses these output definitions to display run results, and the Actor run's `GET` endpoint includes them in the output property.
 
+## Why output schema matters
+
+Output schema is essential for:
+
+- AI agent integration: When agents use Actors through the MCP server or API, they need to know what results to expect. Without output schema, agents cannot effectively chain Actors or process results.
+- User experience: Clear output definitions help users understand what data they will receive before running an Actor.
+- API consumers: The output schema appears in the `GET Run` API response, enabling programmatic discovery of Actor outputs.
+
+:::tip Define output schema
+
+Even if your Actor produces no output, define an empty output schema. This tells users and AI agents that the Actor completed successfully with no output, rather than assuming the run failed.
+
+:::
+
 ## Structure
 
 Place the output configuration files in the `.actor` folder in the Actor's root directory.
@@ -77,18 +91,21 @@ The output schema defines the collections of keys and their properties. It allow
 
 ### Available template variables
 
-| Variable                           | Type   | Description                                                                                                                      |
-|------------------------------------|--------|----------------------------------------------------------------------------------------------------------------------------------|
-| `links`                            | object | Contains quick links to most commonly used URLs                                                                                  |
-| `links.publicRunUrl`               | string | Public run url in format `https://console.apify.com/view/runs/:runId`                                                            |
-| `links.consoleRunUrl`              | string | Console run url in format `https://console.apify.com/actors/runs/:runId`                                                         |
-| `links.apiRunUrl`                  | string | API run url in format `https://api.apify.com/v2/actor-runs/:runId`                                                               |
-| `links.apiDefaultDatasetUrl`       | string | API url of default dataset in format `https://api.apify.com/v2/datasets/:defaultDatasetId`                                       |
-| `links.apiDefaultKeyValueStoreUrl` | string | API url of default key-value store in format `https://api.apify.com/v2/key-value-stores/:defaultKeyValueStoreId`                 |
-| `run`                              | object | Contains information about the run same as it is returned from the `GET Run` API endpoint                                        |
-| `run.containerUrl`                 | string | URL of a webserver running inside the run in format `https://<containerId>.runs.apify.net/`                                      |
-| `run.defaultDatasetId`             | string | ID of the default dataset                                                                                                        |
-| `run.defaultKeyValueStoreId`       | string | ID of the default key-value store                                                                                                |
+| Variable                                | Type   | Description                                                                                                      |
+|-----------------------------------------|--------|------------------------------------------------------------------------------------------------------------------|
+| `links`                                 | object | Contains quick links to most commonly used URLs                                                                  |
+| `links.publicRunUrl`                    | string | Public run url in format `https://console.apify.com/view/runs/:runId`                                            |
+| `links.consoleRunUrl`                   | string | Console run url in format `https://console.apify.com/actors/runs/:runId`                                         |
+| `links.apiRunUrl`                       | string | API run url in format `https://api.apify.com/v2/actor-runs/:runId`                                               |
+| `links.apiDefaultDatasetUrl`            | string | API url of default dataset in format `https://api.apify.com/v2/datasets/:defaultDatasetId`                       |
+| `links.apiDefaultKeyValueStoreUrl`      | string | API url of default key-value store in format `https://api.apify.com/v2/key-value-stores/:defaultKeyValueStoreId` |
+| `run`                                   | object | Contains information about the run same as it is returned from the `GET Run` API endpoint                        |
+| `run.containerUrl`                      | string | URL of a webserver running inside the run in format `https://<containerId>.runs.apify.net/`                      |
+| `run.defaultDatasetId`                  | string | ID of the default dataset                                                                                        |
+| `run.defaultKeyValueStoreId`            | string | ID of the default key-value store                                                                                |
+| `storages`                              | object | Contains references to named storages defined in the Actor's storage configuration                               |
+| `storages.datasets.<name>.apiUrl`       | string | API URL of a named dataset in format `https://api.apify.com/v2/datasets/:datasetId`                              |
+| `storages.keyValueStores.<name>.apiUrl` | string | API URL of a named key-value store in format `https://api.apify.com/v2/key-value-stores/:keyValueStoreId`        |
 
 ## How templates work
 
@@ -248,6 +265,17 @@ This example shows a schema definition for a basic social media scraper. The scr
 
 After you define `views` and `collections` in `dataset_schema.json` and `key_value_store.json`, you can use them in the output schema.
 
+:::note Output schema complements dataset schema
+
+The output schema defines *where* data is stored and how to access it. The [dataset schema](/platform/actors/development/actor-definition/dataset-schema) defines *what* fields each item contains, including descriptions and examples. Use both schemas together:
+
+- Output schema: Declares that results are in the default dataset
+- Dataset schema: Describes each field with `title`, `description`, and `example`
+
+This combination gives AI agents complete information about your Actor's output structure.
+
+:::
+
 ```json title=".actor/output_schema.json"
 {
     "actorOutputSchemaVersion": 1,
@@ -345,6 +373,47 @@ When the run finishes, Apify Console displays the HTML report in an iframe:
 
 ![HTML report in Output tab](images/output-schema-record-example.png)
 
+### Web crawler with multiple output types
+
+This example shows a complete output schema for a web crawler Actor with multiple output types: crawled page data, errors, and files stored in key-value store collections.
+
+```json title=".actor/output_schema.json"
+{
+    "$schema": "https://apify-projects.github.io/actor-json-schemas/output.json?v=0.3",
+    "actorOutputSchemaVersion": 1,
+    "title": "Output schema of the Actor",
+    "properties": {
+        "crawlResults": {
+            "type": "string",
+            "title": "Crawl results",
+            "template": "{{links.apiDefaultDatasetUrl}}/items"
+        },
+        "screenshots": {
+            "type": "string",
+            "title": "Screenshots",
+            "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys?collection=screenshots"
+        },
+        "downloadedFiles": {
+            "type": "string",
+            "title": "Downloaded files",
+            "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys?collection=downloaded-files"
+        },
+        "htmlSnapshots": {
+            "type": "string",
+            "title": "HTML snapshots",
+            "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys?collection=html-snapshots"
+        },
+        "crawlErrors": {
+            "type": "string",
+            "title": "Errors",
+            "template": "{{storages.datasets.errors.apiUrl}}/items"
+        }
+    }
+}
+```
+
+Each output includes a `description` explaining what the data contains. This metadata helps AI agents understand the Actor's capabilities and select the appropriate output for their needs.
+
 ### Actor with no output
 
 If your Actor produces no output (for example, an integration Actor that performs an action), users might see the empty **Output** tab and think the Actor failed. To avoid this, specify that the Actor produces no output.