Skip to content

BlackFalconData-org/youtube-data-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

YouTube Scraper

Scrape YouTube videos, channels, comments, and transcripts in one tool — by keyword or by video, channel, and playlist URL. Get rich per-video metadata, comments with replies, and translated transcripts. Incremental mode and RAG-ready output included.

YouTube Scraper on Apify →


🚀 How to use this actor

💚 $5 free Apify credits — every month

No credit card required. No commitment. Cancel anytime.

  1. Click sign up — pick GitHub, Google, or email; takes ~30 seconds
  2. Open this actor — input is pre-filled with a working example
  3. Click Start — export results as JSON, CSV, or Excel

Your $5 monthly platform credit is enough to run this actor right away — and again every month — scraping typically several hundred to several thousand results per run, depending on your input.

Key features

Incremental mode — Only get new or changed listings since your last run. Content hash per listing — no duplicates, no re-processing.

Change classification — Track unchanged, expired across runs. Build audit trails of how listings evolve over time.

Result cap — Stop after N listings. Set to 0 for the full catalog.

Proxy support — Route traffic through Apify Proxy or your own proxy group to avoid regional blocks and rate limits.

Export anywhere — Download as JSON, CSV, or Excel. Stream via Apify API, webhooks, or integrations with Make, Zapier, Airbyte, Keboola.

Structured data — Every listing returns the same schema with consistent field naming. All fields always present — null when unavailable, never omitted.


Use cases

Data pipeline automation Integrate with your ETL pipeline to collect structured listings from the source on a schedule. Export to CSV, JSON, or directly to your database.

Market research Monitor listings, track trends, and analyze market dynamics with structured, deduplicated data from the source.

Change monitoring Run daily or hourly in incremental mode to capture only new, updated, or expired listings. Perfect for price-tracking, churn analysis, and alerting pipelines.

AI / LLM training data Structured JSON per listing is ready for RAG pipelines, embeddings, and agent workflows. Compact mode trims tokens for LLM context windows.


Quick start

{
  "searchQueries": [
    "apify"
  ],
  "scrapeVideos": true,
  "maxResults": 5
}

Input parameters

Parameter Type Default Description
startUrls array [] YouTube video, channel, or playlist URLs to scrape directly (one per line). Each entry is routed automatically by type and processed as its own task. You can also paste a bare 11-character video ID (e.g. "dQw4w9WgXcQ") or a bare @handle (e.g. "@MrBeast"). Merged with Video / Channel / Playlist URLs below. Used by all modes. Default: empty. Also accepts the aliases url, urls, and startURLs (and the typed lists videoUrls / channelUrls / playlistUrls below all merge in here).
videoUrls array [] Video URLs or bare 11-character video IDs (one per line), e.g. "https://www.youtube.com/watch?v=dQw4w9WgXcQ" or "dQw4w9WgXcQ". Use this to paste a list of videos separately from channels and playlists. Merged with Start URLs. Feeds Video, Comments, and Transcripts modes. Default: empty. Also accepts the aliases videoUrl, url, and urls.
channelUrls array [] Channel URLs or bare @handles (one per line), e.g. "https://www.youtube.com/@MrBeast" or "@veritasium". Use this to paste a list of channels separately from videos and playlists. Merged with Start URLs. Feeds Video mode (channel uploads) and Channel mode (channel profiles). Default: empty. Also accepts the alias channelUrl.
playlistUrls array [] Playlist URLs or bare playlist IDs (one per line), e.g. "https://www.youtube.com/playlist?list=PLxxxx" or "PLxxxx". Each playlist's videos are collected and tagged with their source playlistId. Merged with Start URLs. Feeds Video mode. Default: empty. Also accepts the alias playlistUrl.
searchQueries array [] Keywords to search on YouTube (one per line), e.g. "nodejs tutorial". Each query runs as a separate search task and respects the Search Filters below. You can also paste several queries separated by commas. Required for the Trending mode (to define the topic to rank). Default: empty. Also accepts the aliases query, q, keyword, keywords, searchQuery, and search.
scrapeVideos boolean true Collect video items (metadata, stats, descriptions) from search results, channel pages, playlists, and pasted video URLs. This is the default mode. Default: true.
scrapeChannels boolean false Collect channel items (name, subscriber count, video count, country, links) from channel URLs/@handles or channel results in a search. Combine with Channel URLs or a search using Result Type = Channels only. Default: false.
scrapeComments boolean false Fetch comments for each scraped video. Requires a video target (search, channel, playlist, or pasted Video URL). Comments automatically use US residential IPs when needed (residential access required on your account). Tune volume/order in the Comments settings below. Default: false.
scrapeTranscripts boolean false Download full transcripts (captions) for each scraped video. Requires a video target. ⚠️ Needs US residential IPs (RESIDENTIAL group, country US) or transcripts come back empty; also runs slower (a browser step). Configure format, language, and translation in the Transcripts settings below. Default: false.
scrapeTrending boolean false Collect the most-viewed recent videos for your topic, ranked by view count over a recent upload window. Requires at least one Search Query, or a Trending Topic, to define what to rank. Default: false.
trendingCategory string Optional extra topic to rank when Scrape Trending is on, e.g. "music", "gaming", or "movies". Added alongside your search queries as a topic to find the most-viewed recent videos for. Applies to Trending mode. Default: blank (rank only your search queries).
sortBy enum "relevance" Order of search results. Applies to Search mode. Options: Relevance (YouTube's default ranking); Upload date (newest first); View count (most-viewed first); Rating (highest-rated first). Default: "relevance".
uploadDate enum Restrict search results to videos uploaded within this window. Applies to Search mode. Options: Last hour; Today (last 24 hours); This week (last 7 days); This month (last 30 days); This year (last 365 days). Default: no restriction (any date).
videoDuration enum Filter search results by video length. Applies to Search mode. Options: Short (under 4 minutes); Medium (4–20 minutes); Long (over 20 minutes). Default: no restriction (any length).
resultType enum "video" Which kind of YouTube results a search should return. Applies to Search mode. Options: Videos only; Channels only (use together with Scrape Channels to find channels by keyword); Any (mixed results). To scrape a playlist, paste its playlist URL into Start URLs instead. Default: "video".
filterHd boolean false Only return videos available in high definition. Applies to Search mode. Default: false.
filterSubtitles boolean false Only return videos that have subtitles or closed captions. Useful as a pre-filter before transcript scraping. Applies to Search mode. Default: false.
filterCreativeCommons boolean false Only return videos published under a Creative Commons license. Applies to Search mode. Default: false.
filterLive boolean false Only return videos that are currently live. Applies to Search mode. Default: false.
filter4k boolean false Only return videos available in 4K resolution. Applies to Search mode. Default: false.
filter360 boolean false Only return 360-degree videos. Applies to Search mode. Default: false.
filterHdr boolean false Only return videos available in HDR. Applies to Search mode. Default: false.
includeKeywords array [] Post-fetch filter: keep only results whose title or description contains at least one of these keywords (case-insensitive), e.g. ["react", "vue"]. Applies to all modes. Default: empty (no filtering).
excludeKeywords array [] Post-fetch filter: drop results whose title or description contains any of these keywords (case-insensitive), e.g. ["shorts", "reaction"]. Applies to all modes. Default: empty (no filtering).
fromDate string Post-fetch filter: only include videos published on or after this date. Accepts an absolute ISO 8601 date (e.g. "2024-01-01") or a relative offset into the past (e.g. "7d", "2 weeks", "3 months", "1 year"). Applies to all modes. Default: no lower bound.
toDate string Post-fetch filter: only include videos published on or before this date. Accepts an absolute ISO 8601 date (e.g. "2024-12-31") or a relative offset into the past (e.g. "1d"). Applies to all modes. Default: no upper bound.
maxAgeDays integer Post-fetch filter: only include videos published within the last N days (e.g. 30). When combined with From Date, the more restrictive cutoff wins. Applies to all modes. Default: blank (no age limit).
customFilters string Advanced post-fetch filter expression (DSL) evaluated against each output item, e.g. "viewCount > 100000 AND durationSec < 600". For power users; leave blank for no custom filtering. Applies to all modes. Default: blank.
channelVideoType enum "videos" Which tab to scrape when visiting a channel. Applies to Channel URLs in Video mode. Options: Regular videos (the Videos tab, long-form uploads); Shorts (the Shorts tab); Live streams (past and upcoming live broadcasts). Default: "videos".
channelSortOrder enum "newest" Order of videos listed on a channel page. Applies to the Videos tab of Channel URLs in Video mode (sort is not available on the Shorts or Live tabs and is ignored there). Options: Newest first; Oldest first; Most popular (by view count). Default: "newest".
fetchAbout boolean true Fetch the channel About panel to fill in join date, country, total view count, and external links on channel items. Disable to skip the extra request for faster channel runs. Applies to Channel mode. Default: true.
transcriptFormat enum "segments" Output format for transcript data. Applies to Transcripts mode. Options: Timed segments (JSON array of { text, startMs, durationMs }); Plain text (one continuous string in fullText); SRT subtitles (numbered, timestamped subtitle blocks); WebVTT subtitles (web-standard caption format); Raw XML (the unprocessed caption document). All variants also include the plain fullText. Default: "segments".
transcriptLanguage string BCP-47 language code for the desired transcript track, e.g. "en" or "de". Applies to Transcripts mode. Default: blank (use the video's default caption language).
translateTo string BCP-47 language code to auto-translate the transcript into, e.g. "en" or "es". Applies to Transcripts mode. Default: blank (no translation).
preferAutoGenerated boolean false When both a manual and an auto-generated transcript exist for a video, prefer the auto-generated one. Applies to Transcripts mode. Default: false (prefer the manual transcript).
transcriptFields array [] Emit only this subset of transcript output fields, e.g. ["fullText", "srt"]. Applies to Transcripts mode. Default: empty (include all transcript fields).
commentsPerVideo integer 100 Maximum number of comments to fetch per video. Set 0 for unlimited. Applies to Comments mode. Default: 100.
commentsSort enum "top" Comment retrieval order. Applies to Comments mode. Options: Top comments (YouTube's most-relevant ranking, usually most-liked first); Newest first (most recently posted). Default: "top".
includeReplies boolean false Also fetch replies to each top-level comment (nested under the comment's replies field). Increases run time and result volume. Applies to Comments mode. Default: false.
maxResults integer 50 Maximum number of videos and channels to collect across all tasks. Comments and transcripts are extra per-video records, not counted against this limit (set their volume with Comments per video / Scrape transcripts). Set 0 for unlimited. Default: 50. Aliases: limit, max, maxItems, maxVideos.
maxResultsPerQuery integer Cap on the number of videos and channels each individual search query or target (channel / playlist) may contribute, applied independently of Max Results (per run). Like Max Results, this caps videos and channels only — comments and transcripts are additional per-video records (controlled by Comments per video and Scrape transcripts) and are not counted against it. Useful for spreading a budget evenly across several queries. Default: blank (no per-query cap).
maxShorts integer Maximum number of Shorts (vertical, short-form clips) to include, applied independently of Max Results. Extra Shorts are dropped while regular videos are kept. Set 0 to exclude Shorts entirely. Applies to Video and Trending modes. Default: blank (no separate Shorts cap).
maxRunSeconds integer Optional soft cap on how long the run may spend fetching, e.g. 300 for five minutes. When the run nears this limit it stops collecting new items and saves whatever it has gathered so far. Applies to all modes. Default: blank (no soft cap).
includeShorts boolean true Include YouTube Shorts (vertical, short-form clips) in video results. Turn off to keep only regular long-form videos. Applies to Video and Trending modes. Default: true.
shortsOnly boolean false Return only YouTube Shorts and drop regular videos. Takes precedence over Include Shorts. Applies to Video and Trending modes. Default: false.
detailLevel enum "standard" How much metadata to fetch per video. Higher levels return more fields but may require an extra request per item (slower, higher cost). Applies to Video and Trending modes. Options: Basic (only the fields available from the listing — title, channel, view count, duration; no per-video request); Standard (adds video-page fields such as likes, full description, category, keywords); Detailed (adds extra metadata such as chapters, available captions, hashtags); Complete (every field the actor can extract). Default: "standard".
descriptionFormat enum "text" Output format for video and channel description text. Applies to Video and Channel modes. Options: Plain text (raw text in descriptionText); HTML (escaped, with
for newlines and for links, in descriptionHtml); Markdown (line breaks preserved, URLs linkified, in descriptionMarkdown); All formats (every variant alongside the plain text). Default: "text".
excludeEmptyFields boolean false Strip null and empty fields from output items to reduce dataset size and make records easier to read. Applies to all modes. Default: false.
includeRunMetadata boolean false Append a _meta block (run id, actor id, start/finish time, a non-sensitive input summary, and per-mode counts) to every output item, useful for auditing which run produced a record. Applies to all modes. Default: false.
ragOutput boolean false Output embedding-ready documents ({id, text, metadata}) instead of raw records — ready for RAG pipelines and vector databases. Applies to all modes. Default: false.
ragChunkSize integer 0 Split long text into chunks of about this many characters (0 = one document per item). Useful for embedding long transcripts. Applies when RAG Output is on. Default: 0 (no chunking).
ragChunkOverlap integer 0 Character overlap between consecutive chunks. Applies when RAG Output is on and a chunk size is set. Default: 0.
country string "US" Two-letter ISO country code used to localize YouTube results, e.g. "US", "DE", or "GB". Affects which results and trending content are returned. Applies to all modes. Default: "US". Also accepts the aliases gl, geo, and region.
hl string "en" BCP-47 language code for the YouTube interface language, e.g. "en" or "de". Affects the label text in responses (such as category names). Applies to all modes. Default: "en". Also accepts the aliases lang and language.
translateMetadataTo string Language code (e.g. "es" or "ja") to also fetch each video's title and description in. Uses YouTube's own localization (creator-supplied or auto-translated; coverage varies per video and language) and adds translatedTitle / translatedDescription alongside the originals. Adds an extra request per video. Applies to Video mode. Default: blank (no translation).
proxyConfiguration object {"useApifyProxy":true} Route requests through a proxy or your own proxy URLs. ⚠️ Transcripts and direct video-URL detail need US residential IPs — choose the RESIDENTIAL group with country US here, or supply your own US residential-IP proxy URLs. Comments automatically use US residential IPs when needed (requires residential access on your Apify account; billed as platform usage). Search, Channel, and Trending modes work on the default proxy. Recommended for larger runs to spread requests across IP addresses. Applies to all modes. Default: proxy enabled. Also accepts the alias proxy.
incrementalMode boolean false Track which items have been seen before. On subsequent runs with the same inputs, output only new and changed items (the rest are skipped). Applies to all modes. Default: false.
stateKey string Optional stable key identifying the incremental state to load and update, e.g. "my-tech-feed". Use the same key across runs to continue the same incremental feed. Applies when Incremental Mode is on. Default: blank (a key derived from the search inputs is used).
emitUnchanged boolean false When Incremental Mode is on, also output records whose tracked content has not changed since the last run (marked changeType: "UNCHANGED"). By default unchanged items are skipped so each run returns only new and changed items. Tracked content is title, description, channel, category, and the Short flag for videos (counts such as views/likes do not count as a change). Default: false.
emitExpired boolean false When Incremental Mode is on, output a small stub record (id plus changeType: "EXPIRED") for items that have disappeared since the last run. Only applies to fully-covered single-channel or single-playlist scrapes, where a missing item is a reliable signal it was removed; broad searches and capped runs never emit EXPIRED. Expired stubs carry no scraped fields and are not billed as results. Default: false.
telegramToken string Telegram bot token from @BotFather. Required to send Telegram notifications when a run finishes. Default: blank (Telegram notifications off).
telegramChatId string Telegram chat or channel ID to send notifications to, e.g. "-100123456789". Required when Telegram Bot Token is set. Default: blank.
discordWebhookUrl string Discord incoming webhook URL to post a run-summary message to. Default: blank (Discord notifications off).
slackWebhookUrl string Slack incoming webhook URL to post a run-summary message to. Default: blank (Slack notifications off).
whatsappAccessToken string WhatsApp Cloud API permanent access token (a System User token from Meta Business). Required to send WhatsApp notifications. Default: blank (WhatsApp notifications off).
whatsappPhoneNumberId string Your WhatsApp Business phone-number ID (numeric, from the Meta dashboard). Required when WhatsApp Access Token is set. Default: blank.
whatsappTo string Recipient phone number in E.164 format without the leading +, e.g. "14155552671". Required when WhatsApp Access Token is set. Default: blank.
webhookUrl string URL that receives a JSON POST with { metadata, items } after each run. Works with n8n, Make, Zapier, or any custom backend. Default: blank (no webhook).
webhookHeaders object Optional JSON object of custom HTTP headers sent with the webhook, e.g. {"Authorization": "Bearer token"}. Applies when Webhook URL is set. Default: none.
notificationLimit integer 5 Maximum number of items included in each notification message. Range 1–20. Applies when any notification channel is configured. Default: 5.
notifyOnlyChanges boolean false When Incremental Mode is on, only send notifications for NEW and UPDATED items (skip unchanged ones). Applies when a notification channel is configured. Default: false.

Output fields

Every listing returns the same 17-field schema. Missing values are null — never omitted.

  • datasetType
  • videoId
  • title
  • channelName
  • channelHandle
  • viewCount
  • viewCountText
  • durationSec
  • durationText
  • publishedAt
  • publishedTimeText
  • isShort
  • status
  • url
  • embedUrl
  • shortUrl
  • scrapedAt

Sample output

One object per listing. Here is a real example from a production run:

{
  "datasetType": "video",
  "videoId": "n61ULEU7CO0",
  "title": "Best of lofi hip hop 2021 ✨ [beats to relax/study to]",
  "channelName": "Lofi Girl",
  "channelHandle": "@LofiGirl",
  "viewCount": 55051243,
  "viewCountText": "55,051,243 views",
  "durationSec": 22258,
  "durationText": "6:10:58",
  "publishedAt": "2022-05-30T20:38:41.032Z",
  "publishedTimeText": "4 years ago",
  "isShort": false
}

Truncated — full records contain 17 fields. See Output fields for the complete schema.

Try YouTube Scraper now — $5 free credit, no credit card →


Pricing

Pay only for what you extract. No subscription required — Apify's free $5 credit covers thousands of results.

Event Price (USD)
Actor Start $0.005
Result $0.002
Transcript $0.005

See the actor on Apify for current pricing.


FAQ

How much does it cost? Pay-per-event pricing — you only pay for listings extracted. Apify's free $5 credit is enough to run thousands of results before you pay anything.

How does incremental mode work? Each listing gets a content hash. On subsequent runs, only new or changed listings are emitted — saving time, compute, and storage. Expired listings can be tracked separately.

Do I need an API key or credentials? No. Just sign up for Apify, paste your input, and click Start. No credit card required.


Related products by Black Falcon Data

Browse all Black Falcon Data actors →


Getting started with Apify

New to Apify? Create a free account with $5 credit — no credit card required.

  1. Sign up — $5 platform credit included
  2. Open YouTube Scraper and configure your input
  3. Click Start — export results as JSON, CSV, or Excel

Need more later? See Apify pricing.


About Black Falcon Data

Black Falcon Data builds production-grade web scrapers for job boards and marketplace data. Browse our full actor catalog at www.blackfalcondata.com.


Last updated: 2026 06

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors