Skip to content

docs: restructure dataset schema page with introduction and guidance#2554

Open
jancurn wants to merge 6 commits into
masterfrom
claude/slack-session-oDBaR
Open

docs: restructure dataset schema page with introduction and guidance#2554
jancurn wants to merge 6 commits into
masterfrom
claude/slack-session-oDBaR

Conversation

@jancurn
Copy link
Copy Markdown
Member

@jancurn jancurn commented May 21, 2026

Summary

Restructures the dataset schema documentation to provide proper context before diving into details:

  • Adds introduction explaining what dataset schema is and its two components (fields and views)
  • Moves file structure section before examples so readers know where to put code
  • Reorganizes content into clear Fields and Views sections as parallel concepts
  • Adds guidance on why/when to use views (addressing user feedback)
  • Documents what views are NOT for (anti-patterns, export format misconceptions)
  • Adds multi-view example for different use cases (Marketing vs Pricing)
  • Consolidates reference tables at the end
  • Links to Google Maps Scraper as real-world example

The page now follows the same pattern as other actor definition pages (input_schema, output_schema).

Context

Based on feedback from Martin Sabo and Jaroslav Hejlek in #dev-docs - the documentation explained HOW to configure views but not WHY or WHEN to use them.

Slack thread: https://apify.slack.com/archives/C010Q0FBYG3/p1779357816377359?thread_ts=1779115904.940779&cid=C010Q0FBYG3

Test plan

  • Verify page renders correctly in preview
  • Check all code examples are valid JSON
  • Verify Google Maps Scraper link works
  • Confirm new headings appear in table of contents

https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG

Adds a comprehensive "Why use views" section to dataset schema docs that
explains the purpose and benefits of views, when to use them, how to organize
views by use case, and what views are NOT for. Also includes a practical
multi-view example for an e-commerce scraper.

This addresses feedback that the documentation explained HOW to configure
views but not WHY or WHEN to use them.

Slack thread: https://apify.slack.com/archives/C010Q0FBYG3/p1779357816377359?thread_ts=1779115904.940779&cid=C010Q0FBYG3

https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
@apify-service-account
Copy link
Copy Markdown
Contributor

apify-service-account commented May 21, 2026

✅ Preview for this PR (commit 07bc8a6) is ready at https://pr-2554.preview.docs.apify.com (see action run).

claude added 2 commits May 21, 2026 11:12
- Shorten the why/when content significantly
- Add link to Google Maps Scraper as real-world example
- Keep the anti-pattern note (useful guidance)
- Remove redundant explanations

https://claude.ai/code/session_018Upw3aA9syy5Jm84F1xp9f
Adds clarification that views only affect Console UI display, not how
data is exported to JSON, CSV, or other formats.

https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
@jancurn jancurn requested a review from TC-MO May 21, 2026 11:28
Reorganizes the page to provide better context before diving into details:

- Adds introduction explaining what dataset schema is and its two components
- Moves file structure section before examples
- Reorganizes into clear Fields and Views sections as parallel concepts
- Consolidates reference tables at the end
- Maintains all existing content but in a more logical flow

The page now follows the same pattern as other actor definition pages
(input_schema, output_schema) where concepts are introduced before examples.

https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
@jancurn jancurn changed the title docs: add guidance on why and when to use dataset views [WIP] docs: add guidance on why and when to use dataset views May 21, 2026
@jancurn jancurn changed the title [WIP] docs: add guidance on why and when to use dataset views docs: restructure dataset schema page with introduction and guidance May 21, 2026
}
}
}
"fields": { /* JSON Schema describing each item */ },
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of these comments, keep there the short examples

Addresses review feedback to show actual field/view examples instead of
comments in the schema components overview.

https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
@jancurn jancurn changed the title docs: restructure dataset schema page with introduction and guidance [WIP] docs: restructure dataset schema page with introduction and guidance May 21, 2026
@jancurn jancurn changed the title [WIP] docs: restructure dataset schema page with introduction and guidance docs: restructure dataset schema page with introduction and guidance May 21, 2026
Copy link
Copy Markdown
Contributor

@TC-MO TC-MO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jan! Left some inline suggestions, mostly:

  • Bold reserved for UI elements (a few bullet lists use term as a label pattern)
  • "Output tab UI" > "Output tab" for consistency (the tab is in the UI)
  • A couple of gerund headings and one all-caps "NOT" to soften
  • Small tightening passes where prose restates what's just above
  • One technical fix: $schema in the example uses draft-07 but the reference table specifies Draft 2020-12, changed it for consistency's sake

One pattern that's worth applying consistently across the page (not just the bullets at LOC15-16): views is required, fields is optional, so required-first ordering
would apply to:

  • The example JSON at LOC20-38 (views before fields)
  • The major sections (move Views section above Fields section)
  • The reference table at LOC427-428 (views row before fields row)

Happy to make those changes myself if that is easier.

Comment on lines +15 to +16
- **`fields`** (optional) - JSON Schema describing the structure of each dataset item. Enables validation and provides metadata for AI agents.
- **`views`** (required) - Display configurations that control how data appears in the Output tab. Each view can show different fields, ordering, and formatting.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bold is reserved for UI elements so I would not use it here.
Additionally we should flip the order to lead with required and then continue with optional

Suggested change
- **`fields`** (optional) - JSON Schema describing the structure of each dataset item. Enables validation and provides metadata for AI agents.
- **`views`** (required) - Display configurations that control how data appears in the Output tab. Each view can show different fields, ordering, and formatting.
- `views` _(required)_ - Display configurations for how data appears in the Output tab. Each view can show different fields, ordering, and formatting.
- `fields` _(optional)_ - JSON Schema describing each dataset item. Enables validation and provides metadata for AI agents.


A single view is fine for simple Actors with fewer than 10 fields where all fields are equally relevant.

### Organizing views by use case
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try not to use gerunds within headings. Simplifies search patterns.

Suggested change
### Organizing views by use case
### Organize views by use case


The first view defined becomes the default tab.

## Handling nested structures
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try not to use gerunds within headings. Simplifies search patterns.

Suggested change
## Handling nested structures
## Handle nested structures

Comment on lines +316 to +317
1. `transformation` - Which fields to fetch and how to transform them
2. `display` - How to visually present the data in the UI
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use only 1. for numbered lists. Markdown auto-numbers them at build, which makes the list easier to maintain.

Suggested change
1. `transformation` - Which fields to fetch and how to transform them
2. `display` - How to visually present the data in the UI
1. `transformation` - Which fields to fetch and how to transform them
1. `display` - How to visually present the data in the UI

You have two choices of how to organize files within the `.actor` folder.

### Single configuration file
### Inline in actor.json
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Inline in actor.json
### Inline in `actor.json`

}
```

The first view defined becomes the default tab.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would cut this, it is already stated at LOC 213.

}
```

With `unwind: ["reviews"]`, a product with 5 reviews becomes 5 rows in the output, each containing the product name plus one review's data.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to spell out small numbers

Suggested change
With `unwind: ["reviews"]`, a product with 5 reviews becomes 5 rows in the output, each containing the product name plus one review's data.
With `unwind: ["reviews"]`, a product with five reviews becomes five rows in the output, each containing the product name plus one review's data.

Comment on lines +416 to +418
### Flatten in Actor code

Alternatively, flatten nested structures in your Actor code before calling `Actor.pushData()`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single sentence subsections are not the greatest practice. I would recommend removing H3 and folding into previous section.


| Property | Type | Required | Description |
| --- | --- | --- | --- |
| `actorSpecification` | integer | true | Version of the dataset schema structure. Currently only version 1 is available. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup opportunity. We should avoid hedging features like this. Only v1 exists. The doc can be updated if and when other versions arrive.

Suggested change
| `actorSpecification` | integer | true | Version of the dataset schema structure. Currently only version 1 is available. |
| `actorSpecification` | integer | true | Version of the dataset schema structure. Only version 1 is available. |


| Property | Type | Required | Description |
| --- | --- | --- | --- |
| `fields` | string[] | true | Fields to include in the output. Order determines column order in the UI. Missing field values display as **undefined**. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a value, not UI element right?

Suggested change
| `fields` | string[] | true | Fields to include in the output. Order determines column order in the UI. Missing field values display as **undefined**. |
| `fields` | string[] | true | Fields to include in the output. Order determines column order in the UI. Missing field values display as `undefined`. |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants