Skip to content

Latest commit

 

History

History
103 lines (70 loc) · 2.7 KB

File metadata and controls

103 lines (70 loc) · 2.7 KB

Installation Guidelines

Option 1: Install via pip

Install the package with the data sources you need:

# Install with MCAP support
pip install python-data-sources[mcap]

# Install with MQTT support
pip install python-data-sources[mqtt]

# Install with ZipDCM support
pip install python-data-sources[zipdcm]

# Install with all data sources
pip install python-data-sources[all]

Option 2: Install in Databricks

You can install the package directly in a Databricks notebook:

%pip install python-data-sources[all]

Or add it to your cluster's library configuration.

Option 3: Deploy via Databricks Asset Bundles

  1. Clone the project you'd like to run into your Databricks Workspace
Screenshot 2025-07-23 at 11 05 25 AM
  1. Open the Asset Bundle Editor in the Databricks UI
Screenshot 2025-07-23 at 11 06 12 AM
  1. Click on "Deploy"
Screenshot 2025-07-23 at 11 09 37 AM
  1. Navigate to the Deployments tab in the Asset Bundle UI (🚀 icon) and click "Run" on the job available. This will run the notebooks from this project sequentially.
Screenshot 2025-07-23 at 11 10 13 AM

Usage Examples

After installation, register and use the data sources:

MCAP

from pyspark.sql import SparkSession
from python_data_sources.mcap import MCAPDataSource

spark = SparkSession.builder.getOrCreate()
spark.dataSource.register(MCAPDataSource)

df = (
    spark.read.format("mcap")
    .option("path", "/path/to/data.mcap")
    .option("numPartitions", "4")
    .load()
)

MQTT

from pyspark.sql import SparkSession
from python_data_sources.mqtt import MqttDataSource

spark = SparkSession.builder.getOrCreate()
spark.dataSource.register(MqttDataSource)

df = (
    spark.readStream.format("mqtt_pub_sub")
    .option("broker_address", "mqtt.example.com")
    .option("topic", "sensors/#")
    .option("username", "user")
    .option("password", "pass")
    .load()
)

ZipDCM

from pyspark.sql import SparkSession
from python_data_sources.zipdcm import ZipDCMDataSource

spark = SparkSession.builder.getOrCreate()
spark.dataSource.register(ZipDCMDataSource)

df = (
    spark.read.format("zipdcm")
    .option("numPartitions", "2")
    .load("/path/to/dicom_files.zip")
)