This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is the official Python client for Databricks SQL. It implements PEP 249 (DB API 2.0) and uses Apache Thrift for communication with Databricks clusters/SQL warehouses.
# Install dependencies
poetry install
# Install with PyArrow support (recommended)
poetry install --all-extras
# Run unit tests
poetry run python -m pytest tests/unit
# Run specific test
poetry run python -m pytest tests/unit/test_client.py::ClientTestSuite::test_method_name
# Code formatting (required before commits)
poetry run black src
# Type checking
poetry run mypy --install-types --non-interactive src
# Check formatting without changing files
poetry run black src --check-
Client Layer (
src/databricks/sql/client.py)- Main entry point implementing DB API 2.0
- Handles connections, cursors, and query execution
- Key classes:
Connection,Cursor
-
Backend Layer (
src/databricks/sql/backend/)- Thrift-based communication with Databricks
- Handles protocol-level operations
- Key files:
thrift_backend.py,databricks_client.py - SEA (Streaming Execute API) support in
experimental/backend/sea_backend.py
-
Authentication (
src/databricks/sql/auth/)- Multiple auth methods: OAuth U2M/M2M, PAT, custom providers
- Authentication flow abstraction
- OAuth persistence support for token caching
-
Data Transfer (
src/databricks/sql/cloudfetch/)- Cloud fetch for large results
- Arrow format support for efficiency
- Handles data pagination and streaming
- Result set management in
result_set.py
-
Parameters (
src/databricks/sql/parameters/)- Native parameter support (v3.0.0+) - server-side parameterization
- Inline parameters (legacy) - client-side interpolation
- SQL injection prevention
- Type mapping and conversion
-
Telemetry (
src/databricks/sql/telemetry/)- Usage metrics and performance monitoring
- Configurable batch processing and time-based flushing
- Server-side flag integration
- Result Sets: Uses Arrow format by default for efficient data transfer
- Error Handling: Comprehensive retry logic with exponential backoff
- Resource Management: Context managers for proper cleanup
- Type System: Strong typing with MyPy throughout
poetry run python -m pytest tests/unit- Set environment variables or create
test.envfile:export DATABRICKS_SERVER_HOSTNAME="****" export DATABRICKS_HTTP_PATH="/sql/1.0/endpoints/****" export DATABRICKS_TOKEN="dapi****"
- Run:
poetry run python -m pytest tests/e2e
Test organization:
tests/unit/- Fast, isolated unit teststests/e2e/- Integration tests against real Databricks- Test files follow
test_*.pynaming convention - Test suites: core, large queries, staging ingestion, retry logic
- Dependency Management: Always use Poetry, never pip directly
- Code Style: Black formatter with 100-char line limit (PEP 8 with this exception)
- Type Annotations: Required for all new code
- Thrift Files: Generated code in
thrift_api/- do not edit manually - Parameter Security: Always use native parameters, never string interpolation
- Arrow Support: Optional but highly recommended for performance
- Python Support: 3.8+ (up to 3.13)
- DCO: Sign commits with Developer Certificate of Origin
- Implement in appropriate module under
src/databricks/sql/ - Add unit tests in
tests/unit/ - Add integration tests in
tests/e2e/if needed - Update type hints and ensure MyPy passes
- Run Black formatter before committing
- Check auth configuration in
auth/modules - Review retry logic in
src/databricks/sql/utils.py - Enable debug logging for detailed trace
- Protocol definitions in
src/databricks/sql/thrift_api/ - Backend implementation in
backend/thrift_backend.py - Don't modify generated Thrift files directly
Example scripts are in examples/ directory:
- Basic query execution examples
- OAuth authentication patterns
- Parameter usage (native vs inline)
- Staging ingestion operations
- Custom credential providers