Skip to content

Latest commit

 

History

History
60 lines (42 loc) · 1.98 KB

File metadata and controls

60 lines (42 loc) · 1.98 KB

Configuring Spark session

SyncMaster Worker creates SparkSession for each Run.

By default, SparkSession is created with master=local, including all required .jar packages for DB/FileSystem types, and limited by transfer resources.

Custom Spark session configuration

It is possible to alter default Spark Session configuration worker settings:

worker:
    spark_session_default_config:
        spark.master: local
        spark.driver.host: 127.0.0.1
        spark.driver.bindAddress: 0.0.0.0
        spark.sql.pyspark.jvmStacktrace.enabled: true
        spark.ui.enabled: false

Custom Spark session factory

It is also possible to use custom function which returns SparkSession object:

worker:
    create_spark_session_function: my_worker.spark.create_custom_spark_session

Here is a function example:

from syncmaster.db.models import Run
from syncmaster.dto.connections import ConnectionDTO
from syncmaster.worker.settings import WorkerSettings
from pyspark.sql import SparkSession

def create_custom_spark_session(
    run: Run,
    source: ConnectionDTO,
    target: ConnectionDTO,
    settings: WorkerSettings,
) -> SparkSession:
    # any custom code returning SparkSession object
    return SparkSession.builde.config(...).getOrCreate()

Module with custom function should be placed into the same Docker image or Python virtual environment used by SyncMaster worker.

Note

For now, SyncMaster haven't been tested with master=k8s and master=yarn, so there can be some caveats.