Skip to content

Commit 7dd504e

Browse files
Merge pull request #587 from laughingman7743/feature/struct-type-support
feat: implement STRUCT type support for PyAthena SQLAlchemy dialect
2 parents 21cbf05 + 99c53ba commit 7dd504e

15 files changed

Lines changed: 1785 additions & 877 deletions

File tree

CLAUDE.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,43 @@ The project supports different cursor implementations for various use cases:
3232

3333
### Code Style and Quality
3434

35+
#### Import Guidelines
36+
**CRITICAL: Runtime Imports are Prohibited**
37+
- **NEVER** use `import` or `from ... import` statements inside functions, methods, or conditional blocks
38+
- **ALWAYS** place all imports at the top of the file, after the license header and module docstring
39+
- This applies to all files: source code, tests, scripts, documentation examples
40+
- Runtime imports cause issues with static analysis, code completion, dependency tracking, and can mask import errors
41+
42+
**Bad Examples:**
43+
```python
44+
def my_function():
45+
from some_module import something # NEVER do this
46+
import os # NEVER do this
47+
if condition:
48+
from optional import feature # NEVER do this
49+
```
50+
51+
**Good Examples:**
52+
```python
53+
# At the top of the file, after license header
54+
from __future__ import annotations
55+
56+
import os
57+
from some_module import something
58+
from typing import Optional
59+
60+
# Optional dependencies can be handled with TYPE_CHECKING
61+
from typing import TYPE_CHECKING
62+
if TYPE_CHECKING:
63+
from optional import feature
64+
65+
def my_function():
66+
# Use imported modules here
67+
return something.process()
68+
```
69+
70+
**Exception for Optional Dependencies**: The PyAthena codebase does use runtime imports for optional dependencies like `pyarrow` and `pandas` in the main source code. However, when contributing new code or modifying tests, avoid runtime imports unless absolutely necessary for optional dependency handling.
71+
3572
#### Commands
3673
```bash
3774
# Format code (auto-fix imports and format)
@@ -82,6 +119,27 @@ def method_name(self, param1: str, param2: Optional[int] = None) -> List[str]:
82119
2. **Integration Tests**: Test actual AWS Athena interactions when modifying query execution logic
83120
3. **SQLAlchemy Compliance**: Ensure SQLAlchemy dialect tests pass when modifying dialect code
84121
4. **Mock AWS Services**: Use `moto` or similar for testing AWS interactions without real resources
122+
5. **LINT First**: **ALWAYS** run `make chk` before running tests - ensure code passes all quality checks first
123+
124+
#### Local Testing Environment
125+
To run tests locally, you need to set the following environment variables:
126+
127+
```bash
128+
export AWS_DEFAULT_REGION=us-west-2
129+
export AWS_ATHENA_S3_STAGING_DIR=s3://your-staging-bucket/path/
130+
export AWS_ATHENA_WORKGROUP=primary
131+
export AWS_ATHENA_SPARK_WORKGROUP=spark-primary
132+
```
133+
134+
**CRITICAL: Pre-test Requirements**
135+
```bash
136+
# ALWAYS run quality checks first - tests will fail if code doesn't pass lint
137+
make chk
138+
139+
# Only after lint passes, install dependencies and run tests
140+
uv sync
141+
uv run pytest tests/pyathena/test_file.py -v
142+
```
85143

86144
#### Writing Tests
87145
- Place tests in `tests/pyathena/` mirroring the source structure

docs/introduction.rst

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,32 @@ Extra packages:
3535
| fastparquet | ``pip install PyAthena[fastparquet]`` | >=0.4.0 |
3636
+---------------+---------------------------------------+------------------+
3737

38+
.. _features:
39+
40+
Features
41+
--------
42+
43+
PyAthena provides comprehensive support for Amazon Athena's data types and features:
44+
45+
**Core Features:**
46+
- **DB API 2.0 Compliance**: Full PEP 249 compatibility for database operations
47+
- **SQLAlchemy Integration**: Native dialect support with table reflection and ORM capabilities
48+
- **Multiple Cursor Types**: Standard, Pandas, Arrow, and Spark cursor implementations
49+
- **Async Support**: Asynchronous query execution for non-blocking operations
50+
51+
**Data Type Support:**
52+
- **STRUCT/ROW Types**: :ref:`Complete support <sqlalchemy>` for complex nested data structures
53+
- **ARRAY Types**: Native handling of array data with automatic Python list conversion
54+
- **MAP Types**: Dictionary-like data structure support
55+
- **JSON Integration**: Seamless JSON data parsing and conversion
56+
- **Performance Optimized**: Smart format detection for efficient data processing
57+
58+
**Additional Features:**
59+
- **Connection Management**: Efficient connection pooling and configuration
60+
- **Result Caching**: Athena query result reuse capabilities
61+
- **Error Handling**: Comprehensive exception handling and recovery
62+
- **S3 Integration**: Direct S3 data access and staging support
63+
3864
.. _license:
3965

4066
License

docs/sqlalchemy.rst

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,3 +302,168 @@ or :code:`table_name$history` metadata. Again the hint goes after the select sta
302302
.. code:: sql
303303
304304
SELECT * FROM table_name FOR VERSION AS OF 949530903748831860
305+
306+
Complex Data Types
307+
------------------
308+
309+
STRUCT Type Support
310+
~~~~~~~~~~~~~~~~~~~
311+
312+
PyAthena provides comprehensive support for Amazon Athena's STRUCT (also known as ROW) data types, enabling you to work with complex nested data structures in your Python applications.
313+
314+
Basic Usage
315+
^^^^^^^^^^^
316+
317+
.. code:: python
318+
319+
from sqlalchemy import Column, String, Integer, Table, MetaData
320+
from pyathena.sqlalchemy.types import AthenaStruct
321+
322+
# Define a table with STRUCT columns
323+
users = Table('users', metadata,
324+
Column('id', Integer),
325+
Column('profile', AthenaStruct(
326+
('name', String),
327+
('age', Integer),
328+
('email', String)
329+
)),
330+
Column('settings', AthenaStruct(
331+
('theme', String),
332+
('notifications', AthenaStruct(
333+
('email', String),
334+
('push', String)
335+
))
336+
))
337+
)
338+
339+
This generates the following SQL structure:
340+
341+
.. code:: sql
342+
343+
CREATE TABLE users (
344+
id INTEGER,
345+
profile ROW(name STRING, age INTEGER, email STRING),
346+
settings ROW(theme STRING, notifications ROW(email STRING, push STRING))
347+
)
348+
349+
Querying STRUCT Data
350+
^^^^^^^^^^^^^^^^^^^^
351+
352+
PyAthena automatically converts STRUCT data between different formats:
353+
354+
.. code:: python
355+
356+
from sqlalchemy import create_engine, select
357+
358+
# Query STRUCT data using ROW constructor
359+
result = connection.execute(
360+
select().from_statement(
361+
text("SELECT ROW('John Doe', 30, 'john@example.com') as profile")
362+
)
363+
).fetchone()
364+
365+
# Access STRUCT fields as dictionary
366+
profile = result.profile # {"0": "John Doe", "1": 30, "2": "john@example.com"}
367+
368+
Named STRUCT Fields
369+
^^^^^^^^^^^^^^^^^^^
370+
371+
For better readability, use JSON casting to get named fields:
372+
373+
.. code:: python
374+
375+
# Using CAST AS JSON for named field access
376+
result = connection.execute(
377+
select().from_statement(
378+
text("SELECT CAST(ROW('John', 30) AS JSON) as user_data")
379+
)
380+
).fetchone()
381+
382+
# Parse JSON result
383+
import json
384+
user_data = json.loads(result.user_data) # ["John", 30]
385+
386+
Data Format Support
387+
^^^^^^^^^^^^^^^^^^^
388+
389+
PyAthena supports multiple STRUCT data formats:
390+
391+
**Athena Native Format:**
392+
393+
.. code:: python
394+
395+
# Input: "{name=John, age=30}"
396+
# Output: {"name": "John", "age": 30}
397+
398+
**JSON Format (Recommended):**
399+
400+
.. code:: python
401+
402+
# Input: '{"name": "John", "age": 30}'
403+
# Output: {"name": "John", "age": 30}
404+
405+
**Unnamed STRUCT Format:**
406+
407+
.. code:: python
408+
409+
# Input: "{Alice, 25}"
410+
# Output: {"0": "Alice", "1": 25}
411+
412+
Performance Considerations
413+
^^^^^^^^^^^^^^^^^^^^^^^^^^
414+
415+
- **JSON Format**: Recommended for complex nested structures
416+
- **Native Format**: Optimized for simple key-value pairs
417+
- **Smart Detection**: PyAthena automatically detects the format to avoid unnecessary parsing overhead
418+
419+
Best Practices
420+
^^^^^^^^^^^^^^
421+
422+
1. **Use JSON casting** for complex nested structures:
423+
424+
.. code:: sql
425+
426+
SELECT CAST(complex_struct AS JSON) FROM table_name
427+
428+
2. **Define clear field types** in AthenaStruct definitions:
429+
430+
.. code:: python
431+
432+
AthenaStruct(
433+
('user_id', Integer),
434+
('profile', AthenaStruct(
435+
('name', String),
436+
('preferences', AthenaStruct(
437+
('theme', String),
438+
('language', String)
439+
))
440+
))
441+
)
442+
443+
3. **Handle NULL values** appropriately in your application logic:
444+
445+
.. code:: python
446+
447+
if result.struct_column is not None:
448+
# Process struct data
449+
field_value = result.struct_column.get('field_name')
450+
451+
Migration from Raw Strings
452+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
453+
454+
**Before (raw string handling):**
455+
456+
.. code:: python
457+
458+
result = cursor.execute("SELECT struct_column FROM table").fetchone()
459+
raw_data = result[0] # "{\"name\": \"John\", \"age\": 30}"
460+
import json
461+
parsed_data = json.loads(raw_data)
462+
463+
**After (automatic conversion):**
464+
465+
.. code:: python
466+
467+
result = cursor.execute("SELECT struct_column FROM table").fetchone()
468+
struct_data = result[0] # {"name": "John", "age": 30} - automatically converted
469+
name = struct_data['name'] # Direct access

0 commit comments

Comments
 (0)