Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
d034520
feat: implement STRUCT type support for PyAthena SQLAlchemy dialect
laughingman7743 Aug 1, 2025
87fff0d
test: enhance struct type support test coverage
laughingman7743 Aug 1, 2025
536128f
style: fix linting issues in test files
laughingman7743 Aug 1, 2025
02f0520
fix: resolve circular import issues in SQLAlchemy modules
laughingman7743 Aug 1, 2025
d89da33
fix: resolve CI test failures for struct support
laughingman7743 Aug 1, 2025
3820ea7
improve: enhance struct converter with robust parsing and safety checks
laughingman7743 Aug 1, 2025
cf7d4e2
Fix struct converter validation and update test expectations
laughingman7743 Aug 1, 2025
857778b
Fix linting errors and add runtime import guidelines
laughingman7743 Aug 1, 2025
4a8bf02
Fix CI test failures and add debugging for complex data types
laughingman7743 Aug 1, 2025
13f00ec
Enhance struct converter and fix JSON conversion syntax
laughingman7743 Aug 2, 2025
d9a06ad
Remove debugging logs and restore proper test assertions
laughingman7743 Aug 2, 2025
cd11836
Fix struct converter parsing issues
laughingman7743 Aug 2, 2025
d4ad22e
Fix critical struct converter and JSON syntax issues
laughingman7743 Aug 2, 2025
23614fe
Fix boolean type conversion and complex case test expectations
laughingman7743 Aug 2, 2025
0c0d7ab
Fix array JSON conversion test and remove debug output
laughingman7743 Aug 2, 2025
a49832b
Fix complex data type tests and enhance struct converter handling
laughingman7743 Aug 2, 2025
6b48a8a
Simplify and refactor _to_struct converter method
laughingman7743 Aug 2, 2025
2b304e8
Add local testing environment setup to CLAUDE.md
laughingman7743 Aug 2, 2025
43aae79
Optimize struct converter performance with format detection
laughingman7743 Aug 2, 2025
99c53ba
Add comprehensive STRUCT type documentation
laughingman7743 Aug 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,43 @@ The project supports different cursor implementations for various use cases:

### Code Style and Quality

#### Import Guidelines
**CRITICAL: Runtime Imports are Prohibited**
- **NEVER** use `import` or `from ... import` statements inside functions, methods, or conditional blocks
- **ALWAYS** place all imports at the top of the file, after the license header and module docstring
- This applies to all files: source code, tests, scripts, documentation examples
- Runtime imports cause issues with static analysis, code completion, dependency tracking, and can mask import errors

**Bad Examples:**
```python
def my_function():
from some_module import something # NEVER do this
import os # NEVER do this
if condition:
from optional import feature # NEVER do this
```

**Good Examples:**
```python
# At the top of the file, after license header
from __future__ import annotations

import os
from some_module import something
from typing import Optional

# Optional dependencies can be handled with TYPE_CHECKING
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from optional import feature

def my_function():
# Use imported modules here
return something.process()
```

**Exception for Optional Dependencies**: The PyAthena codebase does use runtime imports for optional dependencies like `pyarrow` and `pandas` in the main source code. However, when contributing new code or modifying tests, avoid runtime imports unless absolutely necessary for optional dependency handling.

#### Commands
```bash
# Format code (auto-fix imports and format)
Expand Down Expand Up @@ -82,6 +119,27 @@ def method_name(self, param1: str, param2: Optional[int] = None) -> List[str]:
2. **Integration Tests**: Test actual AWS Athena interactions when modifying query execution logic
3. **SQLAlchemy Compliance**: Ensure SQLAlchemy dialect tests pass when modifying dialect code
4. **Mock AWS Services**: Use `moto` or similar for testing AWS interactions without real resources
5. **LINT First**: **ALWAYS** run `make chk` before running tests - ensure code passes all quality checks first

#### Local Testing Environment
To run tests locally, you need to set the following environment variables:

```bash
export AWS_DEFAULT_REGION=us-west-2
export AWS_ATHENA_S3_STAGING_DIR=s3://your-staging-bucket/path/
export AWS_ATHENA_WORKGROUP=primary
export AWS_ATHENA_SPARK_WORKGROUP=spark-primary
```

**CRITICAL: Pre-test Requirements**
```bash
# ALWAYS run quality checks first - tests will fail if code doesn't pass lint
make chk

# Only after lint passes, install dependencies and run tests
uv sync
uv run pytest tests/pyathena/test_file.py -v
```

#### Writing Tests
- Place tests in `tests/pyathena/` mirroring the source structure
Expand Down
26 changes: 26 additions & 0 deletions docs/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,32 @@ Extra packages:
| fastparquet | ``pip install PyAthena[fastparquet]`` | >=0.4.0 |
+---------------+---------------------------------------+------------------+

.. _features:

Features
--------

PyAthena provides comprehensive support for Amazon Athena's data types and features:

**Core Features:**
- **DB API 2.0 Compliance**: Full PEP 249 compatibility for database operations
- **SQLAlchemy Integration**: Native dialect support with table reflection and ORM capabilities
- **Multiple Cursor Types**: Standard, Pandas, Arrow, and Spark cursor implementations
- **Async Support**: Asynchronous query execution for non-blocking operations

**Data Type Support:**
- **STRUCT/ROW Types**: :ref:`Complete support <sqlalchemy>` for complex nested data structures
- **ARRAY Types**: Native handling of array data with automatic Python list conversion
- **MAP Types**: Dictionary-like data structure support
- **JSON Integration**: Seamless JSON data parsing and conversion
- **Performance Optimized**: Smart format detection for efficient data processing

**Additional Features:**
- **Connection Management**: Efficient connection pooling and configuration
- **Result Caching**: Athena query result reuse capabilities
- **Error Handling**: Comprehensive exception handling and recovery
- **S3 Integration**: Direct S3 data access and staging support

.. _license:

License
Expand Down
165 changes: 165 additions & 0 deletions docs/sqlalchemy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -302,3 +302,168 @@ or :code:`table_name$history` metadata. Again the hint goes after the select sta
.. code:: sql

SELECT * FROM table_name FOR VERSION AS OF 949530903748831860

Complex Data Types
------------------

STRUCT Type Support
~~~~~~~~~~~~~~~~~~~

PyAthena provides comprehensive support for Amazon Athena's STRUCT (also known as ROW) data types, enabling you to work with complex nested data structures in your Python applications.

Basic Usage
^^^^^^^^^^^

.. code:: python

from sqlalchemy import Column, String, Integer, Table, MetaData
from pyathena.sqlalchemy.types import AthenaStruct

# Define a table with STRUCT columns
users = Table('users', metadata,
Column('id', Integer),
Column('profile', AthenaStruct(
('name', String),
('age', Integer),
('email', String)
)),
Column('settings', AthenaStruct(
('theme', String),
('notifications', AthenaStruct(
('email', String),
('push', String)
))
))
)

This generates the following SQL structure:

.. code:: sql

CREATE TABLE users (
id INTEGER,
profile ROW(name STRING, age INTEGER, email STRING),
settings ROW(theme STRING, notifications ROW(email STRING, push STRING))
)

Querying STRUCT Data
^^^^^^^^^^^^^^^^^^^^

PyAthena automatically converts STRUCT data between different formats:

.. code:: python

from sqlalchemy import create_engine, select

# Query STRUCT data using ROW constructor
result = connection.execute(
select().from_statement(
text("SELECT ROW('John Doe', 30, 'john@example.com') as profile")
)
).fetchone()

# Access STRUCT fields as dictionary
profile = result.profile # {"0": "John Doe", "1": 30, "2": "john@example.com"}

Named STRUCT Fields
^^^^^^^^^^^^^^^^^^^

For better readability, use JSON casting to get named fields:

.. code:: python

# Using CAST AS JSON for named field access
result = connection.execute(
select().from_statement(
text("SELECT CAST(ROW('John', 30) AS JSON) as user_data")
)
).fetchone()

# Parse JSON result
import json
user_data = json.loads(result.user_data) # ["John", 30]

Data Format Support
^^^^^^^^^^^^^^^^^^^

PyAthena supports multiple STRUCT data formats:

**Athena Native Format:**

.. code:: python

# Input: "{name=John, age=30}"
# Output: {"name": "John", "age": 30}

**JSON Format (Recommended):**

.. code:: python

# Input: '{"name": "John", "age": 30}'
# Output: {"name": "John", "age": 30}

**Unnamed STRUCT Format:**

.. code:: python

# Input: "{Alice, 25}"
# Output: {"0": "Alice", "1": 25}

Performance Considerations
^^^^^^^^^^^^^^^^^^^^^^^^^^

- **JSON Format**: Recommended for complex nested structures
- **Native Format**: Optimized for simple key-value pairs
- **Smart Detection**: PyAthena automatically detects the format to avoid unnecessary parsing overhead

Best Practices
^^^^^^^^^^^^^^

1. **Use JSON casting** for complex nested structures:

.. code:: sql

SELECT CAST(complex_struct AS JSON) FROM table_name

2. **Define clear field types** in AthenaStruct definitions:

.. code:: python

AthenaStruct(
('user_id', Integer),
('profile', AthenaStruct(
('name', String),
('preferences', AthenaStruct(
('theme', String),
('language', String)
))
))
)

3. **Handle NULL values** appropriately in your application logic:

.. code:: python

if result.struct_column is not None:
# Process struct data
field_value = result.struct_column.get('field_name')

Migration from Raw Strings
^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Before (raw string handling):**

.. code:: python

result = cursor.execute("SELECT struct_column FROM table").fetchone()
raw_data = result[0] # "{\"name\": \"John\", \"age\": 30}"
import json
parsed_data = json.loads(raw_data)

**After (automatic conversion):**

.. code:: python

result = cursor.execute("SELECT struct_column FROM table").fetchone()
struct_data = result[0] # {"name": "John", "age": 30} - automatically converted
name = struct_data['name'] # Direct access
Loading