Skip to content

Commit dcdee59

Browse files
committed
fix: address apify-client v3 adaptation review findings
- new_client: scale max timeout tier 12x (was 24x); drop private apify_client._consts import via conditional kwargs - call/call_task: annotate -> Run (they never return None) - key-value store: fix error message copied from the dataset client - Webhook: drop description/should_interpolate_strings (not in the v4 ad-hoc representation); restore request_url validation - charging: report tier-priced charge attempts accurately instead of as "unknown event" - storage clients: drop redundant int() coercions and a RequestQueueStats round-trip - re-export ActorEnvVars/ApifyEnvVars from apify; drop redundant pydantic[email] extra - docs: refresh upgrading guide and CLAUDE.md; fix stale test annotations
1 parent 00bbca2 commit dcdee59

13 files changed

Lines changed: 78 additions & 57 deletions

File tree

.rules.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This file provides guidance to programming agents when working with code in this
44

55
## Project Overview
66

7-
The Apify SDK for Python (`apify` package on PyPI) is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides Actor lifecycle management, storage access (datasets, key-value stores, request queues), event handling, proxy configuration, and pay-per-event charging. It builds on top of the [Crawlee](https://crawlee.dev/python) web scraping framework and the [Apify API Client](https://docs.apify.com/api/client/python). Supports Python 3.10–3.14. Build system: hatchling.
7+
The Apify SDK for Python (`apify` package on PyPI) is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides Actor lifecycle management, storage access (datasets, key-value stores, request queues), event handling, proxy configuration, and pay-per-event charging. It builds on top of the [Crawlee](https://crawlee.dev/python) web scraping framework and the [Apify API Client](https://docs.apify.com/api/client/python). Supports Python 3.11–3.14. Build system: hatchling.
88

99
## Common Commands
1010

@@ -46,7 +46,7 @@ uv run poe e2e-tests
4646
## Code Style
4747

4848
- **Formatter/Linter**: Ruff (line length 120, single quotes for inline, double quotes for docstrings)
49-
- **Type checker**: ty (targets Python 3.10)
49+
- **Type checker**: ty (targets Python 3.11)
5050
- **All ruff rules enabled** with specific ignores — see `pyproject.toml` `[tool.ruff.lint]` for the full ignore list
5151
- Tests are exempt from docstring rules (`D`), assert warnings (`S101`), and private member access (`SLF001`)
5252
- Unused imports are allowed in `__init__.py` files (re-exports)
@@ -71,7 +71,7 @@ uv run poe e2e-tests
7171

7272
- **`_proxy_configuration.py`**`ProxyConfiguration` manages Apify proxy setup (residential, datacenter, groups, country targeting).
7373

74-
- **`_models.py`**Pydantic models for API data structures (Actor runs, webhooks, pricing info, etc.).
74+
- **`_webhook.py`**The `Webhook` dataclass (ad-hoc / persistent webhook definition) and the `to_client_representations` helper. Response and data models are no longer defined in the SDK — they come from `apify-client` v3 (e.g. `Run`, the Actor pricing-info models).
7575

7676
### Storage Clients (`src/apify/storage_clients/`)
7777

@@ -101,8 +101,9 @@ Optional integration (`apify[scrapy]` extra) providing Scrapy scheduler, middlew
101101
### Key Dependencies
102102

103103
- **`crawlee`** — Base framework providing storage abstractions, event system, configuration, service locator pattern
104-
- **`apify-client`** — HTTP client for the Apify API (`ApifyClientAsync`)
105-
- **`apify-shared`** — Shared constants and utilities (`ApifyEnvVars`, `ActorEnvVars`, etc.)
104+
- **`apify-client`** — HTTP client for the Apify API (`ApifyClientAsync`); also the source of response and data models (`Run`, pricing info, webhook representations)
105+
106+
The SDK no longer depends on `apify-shared`. The platform env-var enums (`ApifyEnvVars`, `ActorEnvVars`) are vendored in `apify._consts` and re-exported from the top-level `apify` package.
106107

107108
## Testing
108109

docs/04_upgrading/upgrading_to_v4.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,22 @@ The deprecated `latest_sdk_version`, `log_format`, and `standby_port` fields hav
5555

5656
The SDK is now built on [`apify-client`](https://docs.apify.com/api/client/python) v3 and no longer depends on `apify-shared`. The sections below cover the user-visible consequences; see the client's [Upgrading to v3](https://docs.apify.com/api/client/python/docs/upgrading/upgrading-to-v3) guide for the full list of changes in the client itself.
5757

58+
### Environment variable enums moved
59+
60+
If you imported the platform environment-variable enums from `apify_shared.consts` (`ApifyEnvVars`, `ActorEnvVars`), import them from `apify` instead — they are now vendored in the SDK and re-exported from the top-level package.
61+
62+
**Before (v3.x):**
63+
64+
```python
65+
from apify_shared.consts import ApifyEnvVars
66+
```
67+
68+
**Now (v4.0):**
69+
70+
```python
71+
from apify import ApifyEnvVars
72+
```
73+
5874
## Typed responses
5975

6076
`Actor.start`, `Actor.abort`, `Actor.call`, and `Actor.call_task` now return `apify_client._models.Run` instead of the SDK-side `ActorRun`. Both are [Pydantic](https://docs.pydantic.dev/latest/) models with the same snake_case fields, so field access is unchanged — only the type and import path differ. The SDK no longer ships its own response models (`apify._models` has been removed); response shapes come from `apify-client`.
@@ -89,7 +105,7 @@ The Actor pricing-info models exposed through `Actor.configuration.actor_pricing
89105

90106
## `Webhook` API simplified
91107

92-
The `Webhook` model has been slimmed down to only the fields a user sets when defining a webhook. Server-populated response fields (`id`, `created_at`, `modified_at`, `user_id`, `is_ad_hoc`, `condition`, `last_dispatch`, `stats`) and the unused `WebhookCondition` helper class have been removed. `Webhook` is now a plain `@dataclass` instead of a Pydantic `BaseModel` — construct it with snake_case kwargs; `.model_dump()` / `.model_validate()` are gone.
108+
The `Webhook` model has been slimmed down to only the fields a user sets when defining a webhook. Server-populated response fields (`id`, `created_at`, `modified_at`, `user_id`, `is_ad_hoc`, `condition`, `last_dispatch`, `stats`) and the unused `WebhookCondition` helper class have been removed. The `description` and `should_interpolate_strings` fields have also been removed — they are not part of the ad-hoc webhook representation (`event_types`, `request_url`, `payload_template`, `headers_template`) that `Actor.start` / `Actor.call` / `Actor.call_task` and `Actor.add_webhook` now send. `Webhook` is now a plain `@dataclass` instead of a Pydantic `BaseModel` — construct it with snake_case kwargs; `.model_dump()` / `.model_validate()` are gone.
93109

94110
The retry and idempotency kwargs that used to live on `Actor.add_webhook` have moved onto the `Webhook` instance itself.
95111

@@ -130,7 +146,7 @@ The `webhooks` argument on `Actor.start`, `Actor.call`, and `Actor.call_task` st
130146

131147
## `Actor.new_client``timeout` scales all tiers
132148

133-
`apify-client` v3 split its single timeout into four tiers (short / medium / long / max). `Actor.new_client(timeout=...)` still takes a single `timedelta`; the SDK uses it as the medium-tier baseline and scales the other tiers proportionally (short = `timeout / 6`, long = `timeout * 12`, max = `timeout * 24`). The public signature is unchanged — no migration needed.
149+
`apify-client` v3 split its single timeout into four tiers (short / medium / long / max). `Actor.new_client(timeout=...)` still takes a single `timedelta`; the SDK uses it as the medium-tier baseline and scales the other tiers proportionally (short = `timeout / 6`, long = `timeout * 12`, max = `timeout * 12`). The public signature is unchanged — no migration needed.
134150

135151
## Using the client from `Actor.new_client`
136152

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ dependencies = [
4141
"impit>=0.8.0",
4242
"lazy-object-proxy>=1.11.0",
4343
"more_itertools>=10.2.0",
44-
"pydantic[email]>=2.11.0",
44+
"pydantic>=2.11.0",
4545
"typing-extensions>=4.1.0",
4646
"websockets>=14.0",
4747
"yarl>=1.18.0",

src/apify/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
from apify._actor import Actor
1616
from apify._configuration import Configuration
17+
from apify._consts import ActorEnvVars, ApifyEnvVars
1718
from apify._proxy_configuration import ProxyConfiguration, ProxyInfo
1819
from apify._webhook import Webhook
1920
from apify.events._types import ActorEventTypes
@@ -22,7 +23,9 @@
2223

2324
__all__ = [
2425
'Actor',
26+
'ActorEnvVars',
2527
'ActorEventTypes',
28+
'ApifyEnvVars',
2629
'Configuration',
2730
'Event',
2831
'EventAbortingData',

src/apify/_actor.py

Lines changed: 23 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,6 @@
1313
from pydantic import AliasChoices
1414

1515
from apify_client import ApifyClientAsync
16-
from apify_client._consts import (
17-
DEFAULT_MAX_RETRIES,
18-
DEFAULT_MIN_DELAY_BETWEEN_RETRIES,
19-
DEFAULT_TIMEOUT_LONG,
20-
DEFAULT_TIMEOUT_MAX,
21-
DEFAULT_TIMEOUT_MEDIUM,
22-
DEFAULT_TIMEOUT_SHORT,
23-
)
2416
from crawlee import service_locator
2517
from crawlee.errors import ServiceConflictError
2618
from crawlee.events import (
@@ -516,20 +508,28 @@ def new_client(
516508
timeout: Baseline HTTP timeout for medium-duration API operations. The underlying client uses
517509
separate timeout tiers for short/medium/long/max-duration calls; passing a value here scales
518510
all four tiers proportionally (short = `timeout / 6`, long = `timeout * 12`,
519-
max = `timeout * 24`).
511+
max = `timeout * 12`).
520512
"""
521-
return ApifyClientAsync(
522-
token=token or self.configuration.token,
523-
api_url=api_url or self.configuration.api_base_url,
524-
max_retries=max_retries if max_retries is not None else DEFAULT_MAX_RETRIES,
525-
min_delay_between_retries=min_delay_between_retries
526-
if min_delay_between_retries is not None
527-
else DEFAULT_MIN_DELAY_BETWEEN_RETRIES,
528-
timeout_short=timeout / 6 if timeout is not None else DEFAULT_TIMEOUT_SHORT,
529-
timeout_medium=timeout if timeout is not None else DEFAULT_TIMEOUT_MEDIUM,
530-
timeout_long=timeout * 12 if timeout is not None else DEFAULT_TIMEOUT_LONG,
531-
timeout_max=timeout * 24 if timeout is not None else DEFAULT_TIMEOUT_MAX,
532-
)
513+
# Forward only the explicitly provided options; omitting the rest lets `ApifyClientAsync` apply its
514+
# own defaults, so the SDK doesn't have to import and re-pass the client's private default constants.
515+
client_kwargs: dict[str, Any] = {
516+
'token': token or self.configuration.token,
517+
'api_url': api_url or self.configuration.api_base_url,
518+
}
519+
if max_retries is not None:
520+
client_kwargs['max_retries'] = max_retries
521+
if min_delay_between_retries is not None:
522+
client_kwargs['min_delay_between_retries'] = min_delay_between_retries
523+
if timeout is not None:
524+
# `apify-client` v3 splits the timeout into four tiers; scale them from the single baseline,
525+
# mirroring the client's default ratios (medium = baseline, short = baseline / 6,
526+
# long = max = baseline * 12).
527+
client_kwargs['timeout_short'] = timeout / 6
528+
client_kwargs['timeout_medium'] = timeout
529+
client_kwargs['timeout_long'] = timeout * 12
530+
client_kwargs['timeout_max'] = timeout * 12
531+
532+
return ApifyClientAsync(**client_kwargs)
533533

534534
@_ensure_context
535535
async def open_dataset(
@@ -998,7 +998,7 @@ async def call(
998998
webhooks: list[Webhook] | None = None,
999999
wait: timedelta | None = None,
10001000
logger: logging.Logger | None | Literal['default'] = 'default',
1001-
) -> Run | None:
1001+
) -> Run:
10021002
"""Start an Actor on the Apify Platform and wait for it to finish before returning.
10031003
10041004
It waits indefinitely, unless the wait argument is provided.
@@ -1076,7 +1076,7 @@ async def call_task(
10761076
webhooks: list[Webhook] | None = None,
10771077
wait: timedelta | None = None,
10781078
token: str | None = None,
1079-
) -> Run | None:
1079+
) -> Run:
10801080
"""Start an Actor task on the Apify Platform and wait for it to finish before returning.
10811081
10821082
It waits indefinitely, unless the wait argument is provided.
@@ -1260,9 +1260,6 @@ async def add_webhook(self, webhook: Webhook, *, idempotency_key: str | None = N
12601260
webhook: The webhook to be added. It is automatically bound to the current Actor run.
12611261
idempotency_key: Deprecated. Pass `idempotency_key` on the `Webhook` instance instead.
12621262
Will be removed in version 5.0.0.
1263-
1264-
Returns:
1265-
The created webhook.
12661263
"""
12671264
if idempotency_key is not None:
12681265
warnings.warn(

src/apify/_charging.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -262,6 +262,7 @@ def __init__(self, configuration: Configuration, client: ApifyClientAsync) -> No
262262

263263
self._charging_state: dict[str, ChargingStateItem] = {}
264264
self._pricing_info: dict[str, PricingInfoItem] = {}
265+
self._tier_priced_events: set[str] = set()
265266

266267
self._not_ppe_warning_printed = False
267268
self.active = False
@@ -297,7 +298,10 @@ async def __aenter__(self) -> None:
297298
actor_charge_events = pricing_info.pricing_per_event.actor_charge_events or {}
298299
for event_name, event_pricing in actor_charge_events.items():
299300
if event_pricing.event_price_usd is None:
300-
continue # tier-priced event - not chargeable via the SDK's flat-price path
301+
# Tier-priced event - not chargeable via the SDK's flat-price path; tracked so a later
302+
# charge attempt is reported accurately rather than as an "unknown event".
303+
self._tier_priced_events.add(event_name)
304+
continue
301305
self._pricing_info[event_name] = PricingInfoItem(
302306
price=Decimal(str(event_pricing.event_price_usd)),
303307
title=event_pricing.event_title,
@@ -401,6 +405,10 @@ async def charge(self, event_name: str, count: int = 1) -> ChargeResult:
401405
pass
402406
elif event_name in self._pricing_info:
403407
await self._client.run(self._actor_run_id).charge(event_name, count=charged_count)
408+
elif event_name in self._tier_priced_events:
409+
logger.warning(
410+
f"Event '{event_name}' is tier-priced and is not chargeable via the pay-per-event API."
411+
)
404412
else:
405413
logger.warning(f"Attempting to charge for an unknown event '{event_name}'")
406414

src/apify/_configuration.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ def _parse_actor_pricing_info(data: Any) -> Any:
8181
if data is None or data == '':
8282
return None
8383
pricing_info = json.loads(data) if isinstance(data, str) else data
84-
if isinstance(pricing_info, dict) and not pricing_info.get('pricingModel'):
84+
if isinstance(pricing_info, dict) and not (pricing_info.get('pricingModel') or pricing_info.get('pricing_model')):
8585
return None
8686
return pricing_info
8787

src/apify/_webhook.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from typing import TYPE_CHECKING
55

66
from apify_client._models import WebhookRepresentation
7+
from crawlee._utils.urls import validate_http_url
78

89
from apify._utils import docs_group
910

@@ -41,11 +42,9 @@ class Webhook:
4142
do_not_retry: bool | None = None
4243
"""Whether to skip retrying the request on failure."""
4344

44-
description: str | None = None
45-
"""Human-readable description of the webhook."""
46-
47-
should_interpolate_strings: bool | None = None
48-
"""Whether to interpolate variables in string fields of the payload."""
45+
def __post_init__(self) -> None:
46+
# Fail fast on a malformed URL at construction time instead of deferring the error to the API call.
47+
validate_http_url(self.request_url)
4948

5049

5150
def to_client_representations(webhooks: list[Webhook] | None) -> list[WebhookRepresentation] | None:

src/apify/storage_clients/_apify/_dataset_client.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ async def get_metadata(self) -> DatasetMetadata:
6969
created_at=metadata.created_at,
7070
modified_at=metadata.modified_at,
7171
accessed_at=metadata.accessed_at,
72-
item_count=int(metadata.item_count),
72+
item_count=metadata.item_count,
7373
)
7474

7575
@classmethod

src/apify/storage_clients/_apify/_key_value_store_client.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ async def get_metadata(self) -> ApifyKeyValueStoreMetadata:
4646
metadata = await self._api_client.get()
4747

4848
if metadata is None:
49-
raise ValueError('Failed to retrieve dataset metadata.')
49+
raise ValueError('Failed to retrieve key-value store metadata.')
5050

5151
return ApifyKeyValueStoreMetadata(
5252
id=metadata.id,
@@ -148,7 +148,7 @@ async def iterate_keys(
148148
for item in list_key_page.items:
149149
record_metadata = KeyValueStoreRecordMetadata(
150150
key=item.key,
151-
size=int(item.size),
151+
size=item.size,
152152
content_type='application/octet-stream', # Content type not available from list_keys
153153
)
154154
yield record_metadata

0 commit comments

Comments
 (0)