Here we'll show you how to install Spark 4.x for Linux. We tested it on Ubuntu 24.04 (also WSL), but it should work for other Linux distros as well
Spark 4.x requires Java 17 or 21. The simplest way is to install it via your package manager:
sudo apt update
sudo apt install default-jdkCheck that it works:
java --versionOutput (example):
openjdk 21.0.10 2026-01-20
OpenJDK Runtime Environment (build 21.0.10+7-Ubuntu-124.04)
OpenJDK 64-Bit Server VM (build 21.0.10+7-Ubuntu-124.04, mixed mode, sharing)
Set JAVA_HOME (add to your .bashrc or .zshrc):
export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))
export PATH="${JAVA_HOME}/bin:${PATH}"We recommend using uv for managing Python packages:
uv init
uv add pysparkThen run your scripts with uv run:
uv run python your_script.pyAlternatively, you can use pip:
pip install pysparkBoth approaches install PySpark along with a bundled Spark distribution - no separate Spark download needed.
Create a test script test_spark.py:
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.master("local[*]") \
.appName('test') \
.getOrCreate()
print(f"Spark version: {spark.version}")
df = spark.range(10)
df.show()
spark.stop()Run it:
uv run python test_spark.py