Skip to content

Commit 365fb16

Browse files
committed
fix: Update Apify tools documentation for improved clarity and expanded details on input, usage, and examples.
1 parent 62d8352 commit 365fb16

1 file changed

Lines changed: 47 additions & 35 deletions

File tree

src/strands_tools/apify.py

Lines changed: 47 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
"""Apify platform tools for Strands Agents.
22
3-
This module provides web scraping, data extraction, and automation capabilities
4-
using the Apify platform. It lets you run any Actor, task, fetch dataset
5-
results, and scrape individual URLs.
3+
Apify is a large marketplace of tools for web scraping, data extraction,
4+
and web automation. These tools are called Actors — serverless cloud applications that
5+
take JSON input and store results in a dataset (structured, tabular output) or key-value
6+
store (files and unstructured data). Actors exist for social media, e-commerce, search
7+
engines, maps, travel sites, and many other sources.
68
79
Available Tools:
810
---------------
@@ -16,7 +18,7 @@
1618
Setup Requirements:
1719
------------------
1820
1. Create an Apify account at https://apify.com
19-
2. Obtain your API token: Apify Console > Settings > API & Integrations > Personal API tokens
21+
2. Get your API token: Apify Console > Settings > API & Integrations > Personal API tokens
2022
3. Install the optional dependency: pip install strands-agents-tools[apify]
2123
4. Set the environment variable:
2224
APIFY_API_TOKEN=your_api_token_here
@@ -366,18 +368,22 @@ def apify_run_actor(
366368
) -> Dict[str, Any]:
367369
"""Run any Apify Actor and return the run metadata as JSON.
368370
369-
Executes the Actor synchronously - blocks until the Actor run finishes or the timeout
370-
is reached. Use this when you need to run a specific Actor and then inspect or process
371-
the results separately.
371+
An Actor is a serverless cloud app on the Apify platform — it takes JSON input,
372+
runs the scraping or automation job, and writes results to a dataset. This tool
373+
executes the Actor synchronously and returns run metadata only (run_id, status,
374+
dataset_id, timestamps). Use apify_run_actor_and_get_dataset to also fetch the
375+
output data in one call, or apify_scrape_url for quick single-URL extraction.
372376
373377
Common Actors:
374-
- "apify/website-content-crawler" - scrape websites and extract content
375-
- "apify/web-scraper" - general-purpose web scraper
376-
- "apify/google-search-scraper" - scrape Google search results
378+
- "apify/website-content-crawler" scrape websites and extract content as markdown
379+
- "apify/web-scraper" general-purpose web scraper with JS rendering
380+
- "apify/google-search-scraper" scrape Google search results
377381
378382
Args:
379-
actor_id: Actor identifier, e.g. "apify/website-content-crawler" or "username/actor-name".
380-
run_input: JSON-serializable input for the Actor. Each Actor defines its own input schema.
383+
actor_id: Actor identifier in "username/actor-name" format,
384+
e.g. "apify/website-content-crawler". Find Actors at https://apify.com/store.
385+
run_input: JSON-serializable input for the Actor. Each Actor defines its own
386+
input schema — check the Actor README on Apify Store for required fields.
381387
timeout_secs: Maximum time in seconds to wait for the Actor run to finish. Defaults to 300.
382388
memory_mbytes: Memory allocation in MB for the Actor run. Uses Actor default if not set.
383389
build: Actor build tag or number to run a specific version. Uses latest build if not set.
@@ -419,8 +425,9 @@ def apify_get_dataset_items(
419425
) -> Dict[str, Any]:
420426
"""Fetch items from an existing Apify dataset and return them as JSON.
421427
422-
Use this after running an Actor to retrieve the structured results from its
423-
default dataset, or to access any dataset by ID.
428+
Every Actor run writes its output to a dataset — a structured, append-only store
429+
for tabular data. Use the dataset_id from the run metadata returned by apify_run_actor
430+
or apify_run_task. Use offset for pagination through large datasets.
424431
425432
Args:
426433
dataset_id: The Apify dataset ID to fetch items from.
@@ -457,15 +464,17 @@ def apify_run_actor_and_get_dataset(
457464
) -> Dict[str, Any]:
458465
"""Run an Apify Actor and fetch its dataset results in one step.
459466
460-
Convenience tool that combines running an Actor and fetching its default
461-
dataset items into a single call. Use this when you want both the run metadata and the
467+
Convenience tool that combines running an Actor and fetching its default dataset
468+
items into a single call. Use this when you want both the run metadata and the
462469
result data without making two separate tool calls.
463470
464471
Args:
465-
actor_id: Actor identifier, e.g. "apify/website-content-crawler" or "username/actor-name".
466-
run_input: JSON-serializable input for the Actor.
472+
actor_id: Actor identifier in "username/actor-name" format,
473+
e.g. "apify/website-content-crawler". Find Actors at https://apify.com/store.
474+
run_input: JSON-serializable input for the Actor. Each Actor defines its own
475+
input schema — check the Actor README on Apify Store for required fields.
467476
timeout_secs: Maximum time in seconds to wait for the Actor run to finish. Defaults to 300.
468-
memory_mbytes: Memory allocation in MB for the Actor run.
477+
memory_mbytes: Memory allocation in MB for the Actor run. Uses Actor default if not set.
469478
build: Actor build tag or number to run a specific version. Uses latest build if not set.
470479
dataset_items_limit: Maximum number of dataset items to return. Defaults to 100.
471480
dataset_items_offset: Number of dataset items to skip for pagination. Defaults to 0.
@@ -509,15 +518,16 @@ def apify_run_task(
509518
timeout_secs: int = DEFAULT_TIMEOUT_SECS,
510519
memory_mbytes: Optional[int] = None,
511520
) -> Dict[str, Any]:
512-
"""Run an Apify task and return the run metadata as JSON.
521+
"""Run a saved Apify task and return the run metadata as JSON.
513522
514-
Tasks are saved Actor configurations with preset inputs. Use this when a task
515-
has already been configured in Apify Console, so you don't need to specify
516-
the full Actor input every time.
523+
Tasks are saved Actor configurations with preset inputs, managed in Apify Console.
524+
Use this when a task has already been configured, so you don't need to specify
525+
the full Actor input every time. Use apify_run_task_and_get_dataset to also fetch
526+
the output data in one call.
517527
518528
Args:
519-
task_id: Task identifier, e.g. "user/my-task" or a task ID string.
520-
task_input: Optional JSON-serializable input to override the task's default input.
529+
task_id: Task identifier in "username~task-name" format or a task ID string.
530+
task_input: Optional JSON-serializable input to override the task's default input fields.
521531
timeout_secs: Maximum time in seconds to wait for the task run to finish. Defaults to 300.
522532
memory_mbytes: Memory allocation in MB for the task run. Uses task default if not set.
523533
@@ -558,17 +568,17 @@ def apify_run_task_and_get_dataset(
558568
dataset_items_limit: int = DEFAULT_DATASET_ITEMS_LIMIT,
559569
dataset_items_offset: int = 0,
560570
) -> Dict[str, Any]:
561-
"""Run an Apify task and fetch its dataset results in one step.
571+
"""Run a saved Apify task and fetch its dataset results in one step.
562572
563-
Convenience tool that combines running a task and fetching its default
564-
dataset items into a single call. Use this when you want both the run metadata and the
573+
Convenience tool that combines running a task and fetching its default dataset
574+
items into a single call. Use this when you want both the run metadata and the
565575
result data without making two separate tool calls.
566576
567577
Args:
568-
task_id: Task identifier, e.g. "user/my-task" or a task ID string.
569-
task_input: Optional JSON-serializable input to override the task's default input.
578+
task_id: Task identifier in "username~task-name" format or a task ID string.
579+
task_input: Optional JSON-serializable input to override the task's default input fields.
570580
timeout_secs: Maximum time in seconds to wait for the task run to finish. Defaults to 300.
571-
memory_mbytes: Memory allocation in MB for the task run.
581+
memory_mbytes: Memory allocation in MB for the task run. Uses task default if not set.
572582
dataset_items_limit: Maximum number of dataset items to return. Defaults to 100.
573583
dataset_items_offset: Number of dataset items to skip for pagination. Defaults to 0.
574584
@@ -613,14 +623,16 @@ def apify_scrape_url(
613623
614624
Uses the Website Content Crawler Actor under the hood, pre-configured for
615625
fast single-page scraping. This is the simplest way to extract readable content
616-
from any web page.
626+
from any web page — no Actor input schema needed. For multi-page crawls, use
627+
apify_run_actor_and_get_dataset with "apify/website-content-crawler" directly.
617628
618629
Args:
619630
url: The URL to scrape, e.g. "https://example.com".
620631
timeout_secs: Maximum time in seconds to wait for scraping to finish. Defaults to 120.
621-
crawler_type: Crawler engine to use. One of "cheerio" (fastest, no JS rendering,
622-
default), "playwright:adaptive" (fast, renders JS if present), or
623-
"playwright:firefox" (reliable, renders JS, best at avoiding blocking but slower).
632+
crawler_type: Crawler engine to use. One of:
633+
- "cheerio" (default): Fastest, no JavaScript rendering. Best for static HTML.
634+
- "playwright:adaptive": Renders JS only when needed. Good general-purpose choice.
635+
- "playwright:firefox": Full JS rendering, best at bypassing anti-bot protection but slowest.
624636
625637
Returns:
626638
Dict with status and content containing the markdown content of the scraped page.

0 commit comments

Comments
 (0)