SyncMaster Worker creates SparkSession for each Run.
By default, SparkSession is created with master=local, including all required .jar packages for DB/FileSystem types, and limited by transfer resources.
It is possible to alter default Spark Session configuration worker settings:
worker:
spark_session_default_config:
spark.master: local
spark.driver.host: 127.0.0.1
spark.driver.bindAddress: 0.0.0.0
spark.sql.pyspark.jvmStacktrace.enabled: true
spark.ui.enabled: falseIt is also possible to use custom function which returns SparkSession object:
worker:
create_spark_session_function: my_worker.spark.create_custom_spark_sessionHere is a function example:
from syncmaster.db.models import Run
from syncmaster.dto.connections import ConnectionDTO
from syncmaster.worker.settings import WorkerSettings
from pyspark.sql import SparkSession
def create_custom_spark_session(
run: Run,
source: ConnectionDTO,
target: ConnectionDTO,
settings: WorkerSettings,
) -> SparkSession:
# any custom code returning SparkSession object
return SparkSession.builde.config(...).getOrCreate()Module with custom function should be placed into the same Docker image or Python virtual environment used by SyncMaster worker.
Note
For now, SyncMaster haven't been tested with master=k8s and master=yarn, so there can be some caveats.