Skip to content

Commit 9dd34f4

Browse files
jancurnclaudeTC-MO
authored
feat: Add field schema documentation and output schema guidance (#2317)
## Summary This PR enhances the Actor documentation by adding comprehensive guidance on field schemas and output schemas, with a focus on AI agent integration. The changes include new documentation sections, examples, and clarifications about the importance of metadata for AI agents. ## Key Changes - **Dataset Schema Updates**: - Changed `fields` property from required to optional in the dataset schema table - Fixed formatting of "JSON Schema Draft 2020-12" (en-dash to hyphen) - Added reference link to new Field schema section - Added new "Field schema" section explaining JSON Schema usage and importance for AI agents - Included comprehensive example of a product scraper with full field metadata (title, description, example properties) - Added naming convention guidance recommending camelCase for field names - **Output Schema Documentation**: - Added new "Why output schema matters" section highlighting importance for AI agent integration, user experience, and API consumers - Added tip recommending that all Actors define output schema, even if empty - Added note clarifying the relationship between output schema and dataset schema - Added new "Web crawler with multiple output types" example showing a complete output schema with multiple datasets and key-value store collections - Included explanatory text about how metadata helps AI agents understand Actor capabilities ## Notable Implementation Details - The field schema documentation emphasizes the importance of `title`, `description`, and `example` properties for AI agent integration - Examples demonstrate best practices for documenting Actor outputs with clear, actionable metadata - Documentation now explicitly connects dataset schema field definitions with output schema declarations - Added cross-references between related documentation sections to improve navigation https://claude.ai/code/session_01Bag9ToAChMbL1HUtqgLVv9 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Documentation-only changes; no runtime, API, or schema-validation behavior is modified. > > **Overview** > Adds new documentation explaining how to define `fields` using JSON Schema (including why `title`/`description`/`example` matter for AI agents), with a full example and naming guidance. > > Updates the dataset schema spec to make `fields` optional and links it to the new *Field schema* section, and expands output schema docs with an AI-focused rationale, guidance to always define an (even empty) output schema, a note clarifying how output schema complements dataset schema, and a richer multi-output crawler example. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit a7cc7e0. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
1 parent 41957fb commit 9dd34f4

2 files changed

Lines changed: 193 additions & 13 deletions

File tree

  • sources/platform/actors/development/actor_definition

sources/platform/actors/development/actor_definition/dataset_schema/index.md

Lines changed: 112 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,7 @@ The dataset schema structure defines the various components and properties that
200200
| Property | Type | Required | Description |
201201
| --- | --- | --- | --- |
202202
| `actorSpecification` | integer | true | Specifies the version of dataset schema <br/>structure document. <br/>Currently only version 1 is available. |
203-
| `fields` | JSONSchema compatible object | true | Schema of one dataset object. <br/>Use JsonSchema Draft 202012 or <br/>other compatible formats. |
203+
| `fields` | JSONSchema compatible object | false | Schema of one dataset object. <br/>Use JsonSchema Draft 2020-12 or <br/>other compatible formats. Refer to [Field schema](#field-schema) section for details. |
204204
| `views` | DatasetView object | true | An object with a description of an API <br/>and UI views. |
205205

206206
### DatasetView object definition
@@ -236,3 +236,114 @@ The dataset schema structure defines the various components and properties that
236236
| --- | --- | --- | --- |
237237
| `label` | string | false | In the Table view, the label will be visible as the table column's header. |
238238
| `format` | One of <ul><li>`text`</li><li>`number`</li><li>`date`</li><li>`link`</li><li>`boolean`</li><li>`image`</li><li>`array`</li><li>`object`</li></ul> | false | Describes how output data values are formatted to be rendered in the Output tab UI. |
239+
240+
## Field schema
241+
242+
The `fields` property in the dataset schema defines the structure of individual dataset items using [JSON Schema](https://json-schema.org/). This schema enables validation and provides metadata that helps both humans and AI agents understand your Actor's output.
243+
244+
### Why field descriptions matter
245+
246+
When AI agents interact with Actors through the MCP server or API, they rely on the field schema to understand what data the Actor produces. Including `title`, `description`, and `example` properties for each field enables agents to:
247+
248+
- Understand the meaning of each output field
249+
- Chain Actors together by matching inputs to outputs
250+
- Generate appropriate queries and handle responses correctly
251+
252+
Without this metadata, agents must infer field meanings from names alone, which leads to errors and a degraded experience.
253+
254+
### Define field metadata
255+
256+
Each field in your schema can include standard JSON Schema properties:
257+
258+
| Property | Type | Description |
259+
| --- | --- | --- |
260+
| `type` | string | The data type (`string`, `number`, `boolean`, `array`, `object`, `null`). |
261+
| `title` | string | A human-readable name for the field. |
262+
| `description` | string | Explains what the field contains and how to interpret it. |
263+
| `example` | any | A sample value that demonstrates the expected format. |
264+
| `enum` | array | A list of allowed values for the field. |
265+
266+
### Example with field descriptions
267+
268+
The following example shows a dataset schema for a product scraper with full field metadata:
269+
270+
```json title=".actor/dataset_schema.json"
271+
{
272+
"actorSpecification": 1,
273+
"fields": {
274+
"$schema": "http://json-schema.org/draft-07/schema#",
275+
"type": "object",
276+
"properties": {
277+
"productName": {
278+
"type": "string",
279+
"title": "Product name",
280+
"description": "The full name of the product as displayed on the product page.",
281+
"example": "Wireless Bluetooth Headphones"
282+
},
283+
"price": {
284+
"type": "number",
285+
"title": "Price",
286+
"description": "The current price in USD. Does not include shipping or taxes.",
287+
"example": 49.99
288+
},
289+
"currency": {
290+
"type": "string",
291+
"title": "Currency code",
292+
"description": "Three-letter ISO 4217 currency code.",
293+
"example": "USD",
294+
"enum": ["USD", "EUR", "GBP"]
295+
},
296+
"inStock": {
297+
"type": "boolean",
298+
"title": "In stock",
299+
"description": "Whether the product is currently available for purchase.",
300+
"example": true
301+
},
302+
"categories": {
303+
"type": "array",
304+
"title": "Categories",
305+
"description": "List of category names the product belongs to, from broadest to most specific.",
306+
"items": {
307+
"type": "string"
308+
},
309+
"example": ["Electronics", "Audio", "Headphones"]
310+
},
311+
"url": {
312+
"type": "string",
313+
"title": "Product URL",
314+
"description": "Direct link to the product page.",
315+
"example": "https://example.com/products/wireless-headphones"
316+
}
317+
},
318+
"required": ["productName", "price", "url"]
319+
},
320+
"views": {
321+
"overview": {
322+
"title": "Overview",
323+
"transformation": {
324+
"fields": ["productName", "price", "inStock", "url"]
325+
},
326+
"display": {
327+
"component": "table",
328+
"properties": {
329+
"url": {
330+
"label": "Link",
331+
"format": "link"
332+
},
333+
"inStock": {
334+
"format": "boolean"
335+
}
336+
}
337+
}
338+
}
339+
}
340+
}
341+
```
342+
343+
:::tip Naming convention
344+
345+
Use `camelCase` for field names in your schema. This matches the convention used in input schemas and ensures consistency across your Actor's configuration.
346+
347+
:::
348+
349+
For validation options and error handling, see [Dataset validation](./validation.md).

sources/platform/actors/development/actor_definition/output_schema/index.md

Lines changed: 81 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,20 @@ slug: /actors/development/actor-definition/output-schema
1212

1313
The Actor output schema builds upon the schemas for the [dataset](/platform/actors/development/actor-definition/dataset-schema) and [key-value store](/platform/actors/development/actor-definition/key-value-store-schema). It specifies where an Actor stores its output and defines templates for accessing that output. Apify Console uses these output definitions to display run results, and the Actor run's `GET` endpoint includes them in the output property.
1414

15+
## Why output schema matters
16+
17+
Output schema is essential for:
18+
19+
- AI agent integration: When agents use Actors through the MCP server or API, they need to know what results to expect. Without output schema, agents cannot effectively chain Actors or process results.
20+
- User experience: Clear output definitions help users understand what data they will receive before running an Actor.
21+
- API consumers: The output schema appears in the `GET Run` API response, enabling programmatic discovery of Actor outputs.
22+
23+
:::tip Define output schema
24+
25+
Even if your Actor produces no output, define an empty output schema. This tells users and AI agents that the Actor completed successfully with no output, rather than assuming the run failed.
26+
27+
:::
28+
1529
## Structure
1630

1731
Place the output configuration files in the `.actor` folder in the Actor's root directory.
@@ -77,18 +91,21 @@ The output schema defines the collections of keys and their properties. It allow
7791

7892
### Available template variables
7993

80-
| Variable | Type | Description |
81-
|------------------------------------|--------|----------------------------------------------------------------------------------------------------------------------------------|
82-
| `links` | object | Contains quick links to most commonly used URLs |
83-
| `links.publicRunUrl` | string | Public run url in format `https://console.apify.com/view/runs/:runId` |
84-
| `links.consoleRunUrl` | string | Console run url in format `https://console.apify.com/actors/runs/:runId` |
85-
| `links.apiRunUrl` | string | API run url in format `https://api.apify.com/v2/actor-runs/:runId` |
86-
| `links.apiDefaultDatasetUrl` | string | API url of default dataset in format `https://api.apify.com/v2/datasets/:defaultDatasetId` |
87-
| `links.apiDefaultKeyValueStoreUrl` | string | API url of default key-value store in format `https://api.apify.com/v2/key-value-stores/:defaultKeyValueStoreId` |
88-
| `run` | object | Contains information about the run same as it is returned from the `GET Run` API endpoint |
89-
| `run.containerUrl` | string | URL of a webserver running inside the run in format `https://<containerId>.runs.apify.net/` |
90-
| `run.defaultDatasetId` | string | ID of the default dataset |
91-
| `run.defaultKeyValueStoreId` | string | ID of the default key-value store |
94+
| Variable | Type | Description |
95+
|-----------------------------------------|--------|------------------------------------------------------------------------------------------------------------------|
96+
| `links` | object | Contains quick links to most commonly used URLs |
97+
| `links.publicRunUrl` | string | Public run url in format `https://console.apify.com/view/runs/:runId` |
98+
| `links.consoleRunUrl` | string | Console run url in format `https://console.apify.com/actors/runs/:runId` |
99+
| `links.apiRunUrl` | string | API run url in format `https://api.apify.com/v2/actor-runs/:runId` |
100+
| `links.apiDefaultDatasetUrl` | string | API url of default dataset in format `https://api.apify.com/v2/datasets/:defaultDatasetId` |
101+
| `links.apiDefaultKeyValueStoreUrl` | string | API url of default key-value store in format `https://api.apify.com/v2/key-value-stores/:defaultKeyValueStoreId` |
102+
| `run` | object | Contains information about the run same as it is returned from the `GET Run` API endpoint |
103+
| `run.containerUrl` | string | URL of a webserver running inside the run in format `https://<containerId>.runs.apify.net/` |
104+
| `run.defaultDatasetId` | string | ID of the default dataset |
105+
| `run.defaultKeyValueStoreId` | string | ID of the default key-value store |
106+
| `storages` | object | Contains references to named storages defined in the Actor's storage configuration |
107+
| `storages.datasets.<name>.apiUrl` | string | API URL of a named dataset in format `https://api.apify.com/v2/datasets/:datasetId` |
108+
| `storages.keyValueStores.<name>.apiUrl` | string | API URL of a named key-value store in format `https://api.apify.com/v2/key-value-stores/:keyValueStoreId` |
92109

93110
## How templates work
94111

@@ -248,6 +265,17 @@ This example shows a schema definition for a basic social media scraper. The scr
248265

249266
After you define `views` and `collections` in `dataset_schema.json` and `key_value_store.json`, you can use them in the output schema.
250267

268+
:::note Output schema complements dataset schema
269+
270+
The output schema defines *where* data is stored and how to access it. The [dataset schema](/platform/actors/development/actor-definition/dataset-schema) defines *what* fields each item contains, including descriptions and examples. Use both schemas together:
271+
272+
- Output schema: Declares that results are in the default dataset
273+
- Dataset schema: Describes each field with `title`, `description`, and `example`
274+
275+
This combination gives AI agents complete information about your Actor's output structure.
276+
277+
:::
278+
251279
```json title=".actor/output_schema.json"
252280
{
253281
"actorOutputSchemaVersion": 1,
@@ -345,6 +373,47 @@ When the run finishes, Apify Console displays the HTML report in an iframe:
345373

346374
![HTML report in Output tab](images/output-schema-record-example.png)
347375

376+
### Web crawler with multiple output types
377+
378+
This example shows a complete output schema for a web crawler Actor with multiple output types: crawled page data, errors, and files stored in key-value store collections.
379+
380+
```json title=".actor/output_schema.json"
381+
{
382+
"$schema": "https://apify-projects.github.io/actor-json-schemas/output.json?v=0.3",
383+
"actorOutputSchemaVersion": 1,
384+
"title": "Output schema of the Actor",
385+
"properties": {
386+
"crawlResults": {
387+
"type": "string",
388+
"title": "Crawl results",
389+
"template": "{{links.apiDefaultDatasetUrl}}/items"
390+
},
391+
"screenshots": {
392+
"type": "string",
393+
"title": "Screenshots",
394+
"template": "{{links.apiDefaultKeyValueStoreUrl}}/keys?collection=screenshots"
395+
},
396+
"downloadedFiles": {
397+
"type": "string",
398+
"title": "Downloaded files",
399+
"template": "{{links.apiDefaultKeyValueStoreUrl}}/keys?collection=downloaded-files"
400+
},
401+
"htmlSnapshots": {
402+
"type": "string",
403+
"title": "HTML snapshots",
404+
"template": "{{links.apiDefaultKeyValueStoreUrl}}/keys?collection=html-snapshots"
405+
},
406+
"crawlErrors": {
407+
"type": "string",
408+
"title": "Errors",
409+
"template": "{{storages.datasets.errors.apiUrl}}/items"
410+
}
411+
}
412+
}
413+
```
414+
415+
Each output includes a `description` explaining what the data contains. This metadata helps AI agents understand the Actor's capabilities and select the appropriate output for their needs.
416+
348417
### Actor with no output
349418

350419
If your Actor produces no output (for example, an integration Actor that performs an action), users might see the empty **Output** tab and think the Actor failed. To avoid this, specify that the Actor produces no output.

0 commit comments

Comments
 (0)