Skip to content

Commit cfff7bb

Browse files
authored
docs: Add guide on validating Actor input with Pydantic (#941)
Adds a guide on validating Actor input with [Pydantic](https://docs.pydantic.dev/), so Actor code works with a typed, guaranteed-valid object instead of reaching into a raw input dictionary. Follows the structure of the existing guides. - `docs/03_guides/11_pydantic.mdx` — the guide: why raw-dict access is fragile, defining an input model (aliases bridging `camelCase`↔`snake_case`, defaults, constraints, a custom validator, `extra='ignore'`), validating with `model_validate` and failing fast on `ValidationError`, the relationship to the platform input schema, and a few extra features (`HttpUrl`/`EmailStr`, `model_validator`, `SecretStr`). - `code/11_pydantic.py` — the runnable example Actor (Run on Apify), plus the `11_http_url.py`, `11_model_validator.py`, and `11_raw_input.py` snippets. - Cross-links added to the **Actor input** concepts page and the quick-start guides list. Verified locally for valid input (passes validation, writes `OUTPUT`, exit 0) and invalid input (per-field summary keyed by input-schema alias, run ends FAILED). Lint + type-check pass.
1 parent e54d9f9 commit cfff7bb

17 files changed

Lines changed: 487 additions & 1 deletion

File tree

docs/01_introduction/quick-start.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,3 +109,4 @@ To see how you can integrate the Apify SDK with popular web scraping libraries,
109109
- [Browser Use](../guides/browser-use)
110110
- [Running webserver](../guides/running-webserver)
111111
- [uv](../guides/uv)
112+
- [Validate Actor input with Pydantic](../guides/input-validation)

docs/02_concepts/02_actor_input.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ For example, if an Actor received a JSON input with two fields, `{ "firstNumber"
2020
{InputExample}
2121
</RunnableCodeBlock>
2222

23+
## Validating input
24+
25+
Reading values straight out of the raw input dictionary works for simple cases, but it gives you no type guarantees, no constraint checks, and no clear error when the input is malformed. For anything beyond a couple of fields, validate the input with [Pydantic](https://docs.pydantic.dev/). Your code then works with a typed, guaranteed-valid object instead. For the recommended approach, see [Validate Actor input with Pydantic](../guides/input-validation).
26+
2327
## Loading URLs from Actor input
2428

2529
Actors commonly receive a list of URLs to process via their input. The <ApiLink to="class/ApifyRequestList">`ApifyRequestList`</ApiLink> class (from `apify.request_loaders`) can parse the standard Apify input format for URL sources. It supports both direct URL objects (`{"url": "https://example.com"}`) and remote URL lists (`{"requestsFromUrl": "https://example.com/urls.txt"}`), where the remote file contains one URL per line.

docs/03_guides/11_pydantic.mdx

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
---
2+
id: input-validation
3+
title: Input validation with Pydantic
4+
description: Parse, validate, and type your Actor's input with Pydantic models instead of reaching into a raw dictionary.
5+
---
6+
7+
import CodeBlock from '@theme/CodeBlock';
8+
import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
9+
import ApiLink from '@theme/ApiLink';
10+
11+
import RawInputExample from '!!raw-loader!roa-loader!./code/11_raw_input.py';
12+
import PydanticExample from '!!raw-loader!roa-loader!./code/11_pydantic.py';
13+
import HttpUrlExample from '!!raw-loader!./code/11_http_url.py';
14+
import ModelValidatorExample from '!!raw-loader!./code/11_model_validator.py';
15+
import SecretStrExample from '!!raw-loader!./code/11_secret_str.py';
16+
17+
In this guide, you'll learn how to validate your Apify Actor's input with [Pydantic](https://docs.pydantic.dev/), so that your code works with a typed, guaranteed-valid object instead of a raw dictionary.
18+
19+
## Introduction
20+
21+
An Actor reads its input with <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink>, which returns the input record as a plain `dict`. Working with that dictionary directly is fragile:
22+
23+
<RunnableCodeBlock className="language-python" language="python">
24+
{RawInputExample}
25+
</RunnableCodeBlock>
26+
27+
- There are no type guarantees. `max_results` can arrive as the string `"10"` or `None` and you won't know until something breaks.
28+
- There's no validation. Nothing stops `max_results` from being `0` or `-5`, or `search_terms` from being empty.
29+
- A typo in a key, like `maxResult` instead of `maxResults`, silently falls back to the default instead of failing.
30+
- Defaults are scattered across the codebase, and your editor can't autocomplete the fields or catch mistakes.
31+
32+
[Pydantic](https://docs.pydantic.dev/) solves all of these problems. You declare the shape of your input once as a model, and Pydantic parses the raw dictionary into a typed object, applies defaults, enforces constraints, and produces clear error messages when the input doesn't match.
33+
34+
To use Pydantic, install it into your Actor's environment:
35+
36+
```bash
37+
pip install pydantic
38+
```
39+
40+
## Example Actor
41+
42+
The following Actor declares its input as a Pydantic `BaseModel`, validates the raw input against it, and then works with a fully typed object. On invalid input it fails fast with a readable error. On valid input it logs the normalized values and stores them as the Actor's output.
43+
44+
<RunnableCodeBlock className="language-python" language="python">
45+
{PydanticExample}
46+
</RunnableCodeBlock>
47+
48+
### About the model
49+
50+
- Apify input fields conventionally use camel case (`maxResults`), while Python attributes use snake case (`max_results`). Since every field follows that convention, `alias_generator=to_camel` derives the camel case alias for the whole model at once, instead of spelling out `Field(alias=...)` on each field. `populate_by_name=True` lets the model accept either spelling, which is handy in tests.
51+
- A field without a default (`search_terms`) is required. A field with a default (`max_results`) is optional. There's a single, obvious place where every default lives.
52+
- `ge=1, le=100` enforces a numeric range, `min_length=1` rejects an empty list, and `Literal['json', 'csv']` restricts a field to a fixed set of choices, mirroring an `enum` in the input schema.
53+
- The `field_validator` normalizes the search terms (trimming whitespace, dropping empties) and rejects input that has nothing left. The rest of your code never has to repeat those checks.
54+
- `extra='ignore'` means adding a new field to your input schema won't break an older Actor build that doesn't know about it yet. Use `extra='forbid'` instead if you prefer to reject anything unexpected.
55+
56+
### About the validation
57+
58+
- `model_validate` parses the raw dictionary into a typed `ActorInput` instance. It fills in defaults and guarantees every field is valid, or raises a `ValidationError` that describes every problem at once.
59+
- Catching that error, logging a readable summary, and re-raising makes the Actor fail fast with a clear explanation right at the start, rather than crashing with an obscure error somewhere deep in the run. Because the body runs inside `async with Actor:`, the re-raised exception automatically marks the run as `FAILED`.
60+
- The error messages refer to the fields by their input-schema aliases. For invalid input like `{"searchTerms": [], "maxResults": 999, "outputFormat": "xml"}`, the log shows exactly what's wrong:
61+
62+
```text
63+
The Actor input is invalid:
64+
3 validation errors for ActorInput
65+
searchTerms
66+
List should have at least 1 item after validation, not 0 ...
67+
maxResults
68+
Input should be less than or equal to 100 ...
69+
outputFormat
70+
Input should be 'json' or 'csv' ...
71+
```
72+
73+
Once validation passes, the rest of `main` works with `actor_input.search_terms`, `actor_input.max_results`, and `actor_input.output_format`, all correctly typed, with editor autocompletion and static type checking.
74+
75+
## Relationship to the input schema
76+
77+
Pydantic validation complements the Actor's [input schema](https://docs.apify.com/platform/actors/development/input-schema) (`.actor/input_schema.json`). It doesn't replace it. The two serve different layers:
78+
79+
- The input schema drives the [Apify Console](https://console.apify.com/) form, documents the fields for your users, and lets the platform validate input before the run even starts. Keep declaring your fields there.
80+
- The Pydantic model validates the input again inside your Python code, where it gives you a typed object, IDE support, and richer rules (normalization, cross-field checks, custom formats) that the input schema can't express. It's also your safety net for runs started programmatically by [another Actor](../concepts/interacting-with-other-actors) or executed [locally](https://docs.apify.com/cli/docs/reference#apify-run), and for keeping the two definitions honest with each other.
81+
82+
Keep the model's aliases in sync with the field keys in `input_schema.json`, and the two definitions describe the same input from both sides.
83+
84+
## Useful validation features
85+
86+
Pydantic offers extra features for validating Actor input. For the full set of types, constraints, and validators, see the [Pydantic documentation](https://docs.pydantic.dev/latest/concepts/models/).
87+
88+
### Format-validated types
89+
90+
For common string formats, for example `HttpUrl` for URLs or `EmailStr` for e-mail addresses, use format-validated types:
91+
92+
<CodeBlock className="language-python">
93+
{HttpUrlExample}
94+
</CodeBlock>
95+
96+
### Cross-field validation
97+
98+
When one field's validity depends on another, use `model_validator`:
99+
100+
<CodeBlock className="language-python">
101+
{ModelValidatorExample}
102+
</CodeBlock>
103+
104+
### Secret input fields
105+
106+
The platform decrypts [secret input fields](https://docs.apify.com/platform/actors/development/secret-input) for you before <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink> returns, so you receive plaintext. To keep them from leaking into logs or `model_dump()` output, wrap such fields in Pydantic's `SecretStr` and read the plaintext with `get_secret_value()` when you actually need it:
107+
108+
<CodeBlock className="language-python">
109+
{SecretStrExample}
110+
</CodeBlock>
111+
112+
## Conclusion
113+
114+
In this guide, you learned how to validate Actor input with Pydantic: declaring the input as a model with aliases, defaults, and constraints, parsing the raw input with `model_validate`, failing fast with a readable error when the input is invalid, and working with a typed object for the rest of the run. To get started with your own Actors, see the [Actor templates](https://apify.com/templates/categories/python). If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy validating!
115+
116+
## Additional resources
117+
118+
- [Pydantic: Official documentation](https://docs.pydantic.dev/)
119+
- [Pydantic: Models](https://docs.pydantic.dev/latest/concepts/models/)
120+
- [Pydantic: Validators](https://docs.pydantic.dev/latest/concepts/validators/)
121+
- [Apify: Actor input](https://docs.apify.com/platform/actors/running/input)
122+
- [Apify: Input schema specification](https://docs.apify.com/platform/actors/development/input-schema)

docs/03_guides/code/11_http_url.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
from pydantic import BaseModel, EmailStr, HttpUrl
2+
3+
4+
class ActorInput(BaseModel):
5+
target_url: HttpUrl
6+
# `EmailStr` needs the `pydantic[email]` extra installed.
7+
contact_email: EmailStr
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
from typing import Self
2+
3+
from pydantic import BaseModel, model_validator
4+
5+
6+
class ActorInput(BaseModel):
7+
min_price: int = 0
8+
max_price: int = 100
9+
10+
@model_validator(mode='after')
11+
def _check_range(self) -> Self:
12+
if self.min_price > self.max_price:
13+
raise ValueError('min_price must not exceed max_price')
14+
return self

docs/03_guides/code/11_pydantic.py

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
import asyncio
2+
from typing import Literal
3+
4+
from pydantic import BaseModel, ConfigDict, Field, ValidationError, field_validator
5+
from pydantic.alias_generators import to_camel
6+
7+
from apify import Actor
8+
9+
10+
class ActorInput(BaseModel):
11+
"""Typed and validated representation of the Actor input."""
12+
13+
# Derive each field's camelCase alias (searchTerms, maxResults, ...) automatically;
14+
# accept both spellings and ignore extras.
15+
model_config = ConfigDict(
16+
populate_by_name=True, extra='ignore', alias_generator=to_camel
17+
)
18+
19+
# Required: non-empty list of search terms (normalized below).
20+
search_terms: list[str] = Field(min_length=1)
21+
22+
# Optional: 1-100, defaults to 10.
23+
max_results: int = Field(default=10, ge=1, le=100)
24+
25+
# Optional: restricted to a fixed set of choices.
26+
output_format: Literal['json', 'csv'] = Field(default='json')
27+
28+
@field_validator('search_terms')
29+
@classmethod
30+
def _normalize_terms(cls, value: list[str]) -> list[str]:
31+
# Trim whitespace and drop empty terms.
32+
cleaned = [term.strip() for term in value if term.strip()]
33+
if not cleaned:
34+
raise ValueError('searchTerms must contain at least one non-empty term')
35+
return cleaned
36+
37+
38+
async def main() -> None:
39+
async with Actor:
40+
# Read the raw input (a plain dict, not yet validated).
41+
raw_input = await Actor.get_input() or {}
42+
43+
# Validate the raw input against the model.
44+
try:
45+
actor_input = ActorInput.model_validate(raw_input)
46+
except ValidationError as exc:
47+
# Log a per-field summary, then re-raise to fail the run.
48+
Actor.log.error('The Actor input is invalid:\n%s', exc)
49+
raise
50+
51+
# Work with typed attributes from here on.
52+
Actor.log.info('Input passed validation: %s', actor_input.model_dump())
53+
54+
max_results = actor_input.max_results
55+
for term in actor_input.search_terms:
56+
Actor.log.info('Processing %r (max %d results)', term, max_results)
57+
58+
# Store the normalized input as output.
59+
await Actor.set_value('OUTPUT', actor_input.model_dump())
60+
61+
62+
if __name__ == '__main__':
63+
asyncio.run(main())
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
import asyncio
2+
3+
from apify import Actor
4+
5+
6+
async def main() -> None:
7+
# Enter the context of the Actor.
8+
async with Actor:
9+
# Read the input and reach into the raw dict.
10+
actor_input = await Actor.get_input() or {}
11+
search_terms = actor_input.get('searchTerms', [])
12+
max_results = actor_input.get('maxResults', 10)
13+
14+
Actor.log.info('search_terms=%s, max_results=%s', search_terms, max_results)
15+
16+
17+
if __name__ == '__main__':
18+
asyncio.run(main())
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
from pydantic import BaseModel, SecretStr
2+
3+
4+
class ActorInput(BaseModel):
5+
# Masked in logs and `model_dump()`; read the plaintext with `get_secret_value()`.
6+
api_token: SecretStr
7+
8+
9+
actor_input = ActorInput.model_validate({'api_token': 'my-secret-token'})
10+
token = actor_input.api_token.get_secret_value()

src/apify/_actor.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -699,7 +699,15 @@ async def push_data(self, data: dict | list[dict], *, charged_event_name: str |
699699

700700
@_ensure_context
701701
async def get_input(self) -> Any:
702-
"""Get the Actor input value from the default key-value store associated with the current Actor run."""
702+
"""Get the Actor input value from the default key-value store associated with the current Actor run.
703+
704+
The input is the deserialized contents of the input record (the `INPUT` key by default), so it is typically
705+
a `dict` keyed by the fields declared in the Actor's input schema. Any secret input fields are decrypted to
706+
plaintext before being returned.
707+
708+
Returns:
709+
The Actor input, usually a `dict` of input fields, or `None` if the Actor has no input.
710+
"""
703711
input_value = await self.get_value(self.configuration.input_key)
704712
input_secrets_private_key = self.configuration.input_secrets_private_key_file
705713
input_secrets_key_passphrase = self.configuration.input_secrets_private_key_passphrase

website/versioned_docs/version-3.4/01_introduction/quick-start.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,3 +109,4 @@ To see how you can integrate the Apify SDK with popular web scraping libraries,
109109
- [Browser Use](../guides/browser-use)
110110
- [Running webserver](../guides/running-webserver)
111111
- [uv](../guides/uv)
112+
- [Validate Actor input with Pydantic](../guides/input-validation)

0 commit comments

Comments
 (0)