-
Notifications
You must be signed in to change notification settings - Fork 66
Data Job
VDK is Versatile Data Kit SDK.
It provides standard functionality for data ingestion and processing and CLI for managing the lifecycle of a Data Job.
Data Job is a Data processing unit that allows data engineers to implement automated pull ingestion (E in ELT) or batch data transformation into Data Warehouse (T in ELT). At the core of it, it is a directory with different scripts and inside of it.
Data job consists of Steps. A data job step is a single unit of work for a Data Job. Which data job scripts or files are considered steps and executed by vdk is customizable.
By default, there are two types of steps:
- SQL steps (SQL files)
- Python steps (Python files implementing run(job_input) method)
By default steps are executed in an alphanumerical order of their file names.
See example:

The steps will be executed in the order of the respective file names: 10_drop_table.sql, 20_create_table.sql, and 30_ingest_to_table.py
To create your first Data Job you need to:
- Install Quickstart VDK
- Execute command
vdk create - Follow the Create First Data Job page
An instance of a running Data Job deployment is called an execution.
To execute your data job you need to:
- Execute
vdk runcommand - Follow the output of this run
Local executions always comprise a single attempt.
➡️ Next section: Ingestion
SDK - Develop Data Jobs
SDK Key Concepts
Control Service - Deploy Data Jobs
Control Service Key Concepts
- Scheduling a Data Job for automatic execution
- Deployment
- Execution
- Production
- Properties and Secrets
Operations UI
Community
Contacts