docs: restructure dataset schema page with introduction and guidance#2554
docs: restructure dataset schema page with introduction and guidance#2554jancurn wants to merge 6 commits into
Conversation
Adds a comprehensive "Why use views" section to dataset schema docs that explains the purpose and benefits of views, when to use them, how to organize views by use case, and what views are NOT for. Also includes a practical multi-view example for an e-commerce scraper. This addresses feedback that the documentation explained HOW to configure views but not WHY or WHEN to use them. Slack thread: https://apify.slack.com/archives/C010Q0FBYG3/p1779357816377359?thread_ts=1779115904.940779&cid=C010Q0FBYG3 https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
|
✅ Preview for this PR (commit |
- Shorten the why/when content significantly - Add link to Google Maps Scraper as real-world example - Keep the anti-pattern note (useful guidance) - Remove redundant explanations https://claude.ai/code/session_018Upw3aA9syy5Jm84F1xp9f
Adds clarification that views only affect Console UI display, not how data is exported to JSON, CSV, or other formats. https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
Reorganizes the page to provide better context before diving into details: - Adds introduction explaining what dataset schema is and its two components - Moves file structure section before examples - Reorganizes into clear Fields and Views sections as parallel concepts - Consolidates reference tables at the end - Maintains all existing content but in a more logical flow The page now follows the same pattern as other actor definition pages (input_schema, output_schema) where concepts are introduced before examples. https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
| } | ||
| } | ||
| } | ||
| "fields": { /* JSON Schema describing each item */ }, |
There was a problem hiding this comment.
Instead of these comments, keep there the short examples
Addresses review feedback to show actual field/view examples instead of comments in the schema components overview. https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
TC-MO
left a comment
There was a problem hiding this comment.
Thanks Jan! Left some inline suggestions, mostly:
- Bold reserved for UI elements (a few bullet lists use term as a label pattern)
- "Output tab UI" > "Output tab" for consistency (the tab is in the UI)
- A couple of gerund headings and one all-caps "NOT" to soften
- Small tightening passes where prose restates what's just above
- One technical fix: $schema in the example uses draft-07 but the reference table specifies Draft 2020-12, changed it for consistency's sake
One pattern that's worth applying consistently across the page (not just the bullets at LOC15-16): views is required, fields is optional, so required-first ordering
would apply to:
- The example JSON at LOC20-38 (views before fields)
- The major sections (move Views section above Fields section)
- The reference table at LOC427-428 (views row before fields row)
Happy to make those changes myself if that is easier.
| - **`fields`** (optional) - JSON Schema describing the structure of each dataset item. Enables validation and provides metadata for AI agents. | ||
| - **`views`** (required) - Display configurations that control how data appears in the Output tab. Each view can show different fields, ordering, and formatting. |
There was a problem hiding this comment.
Bold is reserved for UI elements so I would not use it here.
Additionally we should flip the order to lead with required and then continue with optional
| - **`fields`** (optional) - JSON Schema describing the structure of each dataset item. Enables validation and provides metadata for AI agents. | |
| - **`views`** (required) - Display configurations that control how data appears in the Output tab. Each view can show different fields, ordering, and formatting. | |
| - `views` _(required)_ - Display configurations for how data appears in the Output tab. Each view can show different fields, ordering, and formatting. | |
| - `fields` _(optional)_ - JSON Schema describing each dataset item. Enables validation and provides metadata for AI agents. |
|
|
||
| A single view is fine for simple Actors with fewer than 10 fields where all fields are equally relevant. | ||
|
|
||
| ### Organizing views by use case |
There was a problem hiding this comment.
We try not to use gerunds within headings. Simplifies search patterns.
| ### Organizing views by use case | |
| ### Organize views by use case |
|
|
||
| The first view defined becomes the default tab. | ||
|
|
||
| ## Handling nested structures |
There was a problem hiding this comment.
We try not to use gerunds within headings. Simplifies search patterns.
| ## Handling nested structures | |
| ## Handle nested structures |
| 1. `transformation` - Which fields to fetch and how to transform them | ||
| 2. `display` - How to visually present the data in the UI |
There was a problem hiding this comment.
Use only 1. for numbered lists. Markdown auto-numbers them at build, which makes the list easier to maintain.
| 1. `transformation` - Which fields to fetch and how to transform them | |
| 2. `display` - How to visually present the data in the UI | |
| 1. `transformation` - Which fields to fetch and how to transform them | |
| 1. `display` - How to visually present the data in the UI |
| You have two choices of how to organize files within the `.actor` folder. | ||
|
|
||
| ### Single configuration file | ||
| ### Inline in actor.json |
There was a problem hiding this comment.
| ### Inline in actor.json | |
| ### Inline in `actor.json` |
| } | ||
| ``` | ||
|
|
||
| The first view defined becomes the default tab. |
There was a problem hiding this comment.
I would cut this, it is already stated at LOC 213.
| } | ||
| ``` | ||
|
|
||
| With `unwind: ["reviews"]`, a product with 5 reviews becomes 5 rows in the output, each containing the product name plus one review's data. |
There was a problem hiding this comment.
It is better to spell out small numbers
| With `unwind: ["reviews"]`, a product with 5 reviews becomes 5 rows in the output, each containing the product name plus one review's data. | |
| With `unwind: ["reviews"]`, a product with five reviews becomes five rows in the output, each containing the product name plus one review's data. |
| ### Flatten in Actor code | ||
|
|
||
| Alternatively, flatten nested structures in your Actor code before calling `Actor.pushData()`. |
There was a problem hiding this comment.
Single sentence subsections are not the greatest practice. I would recommend removing H3 and folding into previous section.
|
|
||
| | Property | Type | Required | Description | | ||
| | --- | --- | --- | --- | | ||
| | `actorSpecification` | integer | true | Version of the dataset schema structure. Currently only version 1 is available. | |
There was a problem hiding this comment.
Cleanup opportunity. We should avoid hedging features like this. Only v1 exists. The doc can be updated if and when other versions arrive.
| | `actorSpecification` | integer | true | Version of the dataset schema structure. Currently only version 1 is available. | | |
| | `actorSpecification` | integer | true | Version of the dataset schema structure. Only version 1 is available. | |
|
|
||
| | Property | Type | Required | Description | | ||
| | --- | --- | --- | --- | | ||
| | `fields` | string[] | true | Fields to include in the output. Order determines column order in the UI. Missing field values display as **undefined**. | |
There was a problem hiding this comment.
I think this is a value, not UI element right?
| | `fields` | string[] | true | Fields to include in the output. Order determines column order in the UI. Missing field values display as **undefined**. | | |
| | `fields` | string[] | true | Fields to include in the output. Order determines column order in the UI. Missing field values display as `undefined`. | |
Summary
Restructures the dataset schema documentation to provide proper context before diving into details:
fieldsandviews)The page now follows the same pattern as other actor definition pages (input_schema, output_schema).
Context
Based on feedback from Martin Sabo and Jaroslav Hejlek in #dev-docs - the documentation explained HOW to configure views but not WHY or WHEN to use them.
Slack thread: https://apify.slack.com/archives/C010Q0FBYG3/p1779357816377359?thread_ts=1779115904.940779&cid=C010Q0FBYG3
Test plan
https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG