pyathena-dev · laughingman7743 · Aug 2, 2025 · Aug 1, 2025 · Aug 1, 2025 · Aug 1, 2025
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -32,6 +32,43 @@ The project supports different cursor implementations for various use cases:
 
 ### Code Style and Quality
 
+#### Import Guidelines
+**CRITICAL: Runtime Imports are Prohibited**
+- **NEVER** use `import` or `from ... import` statements inside functions, methods, or conditional blocks
+- **ALWAYS** place all imports at the top of the file, after the license header and module docstring
+- This applies to all files: source code, tests, scripts, documentation examples
+- Runtime imports cause issues with static analysis, code completion, dependency tracking, and can mask import errors
+
+**Bad Examples:**
+```python
+def my_function():
+    from some_module import something  # NEVER do this
+    import os  # NEVER do this
+    if condition:
+        from optional import feature  # NEVER do this
+```
+
+**Good Examples:**
+```python
+# At the top of the file, after license header
+from __future__ import annotations
+
+import os
+from some_module import something
+from typing import Optional
+
+# Optional dependencies can be handled with TYPE_CHECKING
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from optional import feature
+
+def my_function():
+    # Use imported modules here
+    return something.process()
+```
+
+**Exception for Optional Dependencies**: The PyAthena codebase does use runtime imports for optional dependencies like `pyarrow` and `pandas` in the main source code. However, when contributing new code or modifying tests, avoid runtime imports unless absolutely necessary for optional dependency handling.
+
 #### Commands
 ```bash
 # Format code (auto-fix imports and format)
@@ -82,6 +119,27 @@ def method_name(self, param1: str, param2: Optional[int] = None) -> List[str]:
 2. **Integration Tests**: Test actual AWS Athena interactions when modifying query execution logic
 3. **SQLAlchemy Compliance**: Ensure SQLAlchemy dialect tests pass when modifying dialect code
 4. **Mock AWS Services**: Use `moto` or similar for testing AWS interactions without real resources
+5. **LINT First**: **ALWAYS** run `make chk` before running tests - ensure code passes all quality checks first
+
+#### Local Testing Environment
+To run tests locally, you need to set the following environment variables:
+
+```bash
+export AWS_DEFAULT_REGION=us-west-2
+export AWS_ATHENA_S3_STAGING_DIR=s3://your-staging-bucket/path/
+export AWS_ATHENA_WORKGROUP=primary
+export AWS_ATHENA_SPARK_WORKGROUP=spark-primary
+```
+
+**CRITICAL: Pre-test Requirements**
+```bash
+# ALWAYS run quality checks first - tests will fail if code doesn't pass lint
+make chk
+
+# Only after lint passes, install dependencies and run tests
+uv sync
+uv run pytest tests/pyathena/test_file.py -v
+```
 
 #### Writing Tests
 - Place tests in `tests/pyathena/` mirroring the source structure

diff --git a/docs/introduction.rst b/docs/introduction.rst
@@ -35,6 +35,32 @@ Extra packages:
 | fastparquet   | ``pip install PyAthena[fastparquet]`` | >=0.4.0          |
 +---------------+---------------------------------------+------------------+
 
+.. _features:
+
+Features
+--------
+
+PyAthena provides comprehensive support for Amazon Athena's data types and features:
+
+**Core Features:**
+  - **DB API 2.0 Compliance**: Full PEP 249 compatibility for database operations
+  - **SQLAlchemy Integration**: Native dialect support with table reflection and ORM capabilities
+  - **Multiple Cursor Types**: Standard, Pandas, Arrow, and Spark cursor implementations
+  - **Async Support**: Asynchronous query execution for non-blocking operations
+
+**Data Type Support:**
+  - **STRUCT/ROW Types**: :ref:`Complete support <sqlalchemy>` for complex nested data structures
+  - **ARRAY Types**: Native handling of array data with automatic Python list conversion
+  - **MAP Types**: Dictionary-like data structure support
+  - **JSON Integration**: Seamless JSON data parsing and conversion
+  - **Performance Optimized**: Smart format detection for efficient data processing
+
+**Additional Features:**
+  - **Connection Management**: Efficient connection pooling and configuration
+  - **Result Caching**: Athena query result reuse capabilities
+  - **Error Handling**: Comprehensive exception handling and recovery
+  - **S3 Integration**: Direct S3 data access and staging support
+
 .. _license:
 
 License

diff --git a/docs/sqlalchemy.rst b/docs/sqlalchemy.rst
@@ -302,3 +302,168 @@ or :code:`table_name$history` metadata. Again the hint goes after the select sta
 .. code:: sql
 
         SELECT * FROM table_name FOR VERSION AS OF 949530903748831860
+
+Complex Data Types
+------------------
+
+STRUCT Type Support
+~~~~~~~~~~~~~~~~~~~
+
+PyAthena provides comprehensive support for Amazon Athena's STRUCT (also known as ROW) data types, enabling you to work with complex nested data structures in your Python applications.
+
+Basic Usage
+^^^^^^^^^^^
+
+.. code:: python
+
+    from sqlalchemy import Column, String, Integer, Table, MetaData
+    from pyathena.sqlalchemy.types import AthenaStruct
+
+    # Define a table with STRUCT columns
+    users = Table('users', metadata,
+        Column('id', Integer),
+        Column('profile', AthenaStruct(
+            ('name', String),
+            ('age', Integer),
+            ('email', String)
+        )),
+        Column('settings', AthenaStruct(
+            ('theme', String),
+            ('notifications', AthenaStruct(
+                ('email', String),
+                ('push', String)
+            ))
+        ))
+    )
+
+This generates the following SQL structure:
+
+.. code:: sql
+
+    CREATE TABLE users (
+        id INTEGER,
+        profile ROW(name STRING, age INTEGER, email STRING),
+        settings ROW(theme STRING, notifications ROW(email STRING, push STRING))
+    )
+
+Querying STRUCT Data
+^^^^^^^^^^^^^^^^^^^^
+
+PyAthena automatically converts STRUCT data between different formats:
+
+.. code:: python
+
+    from sqlalchemy import create_engine, select
+
+    # Query STRUCT data using ROW constructor
+    result = connection.execute(
+        select().from_statement(
+            text("SELECT ROW('John Doe', 30, 'john@example.com') as profile")
+        )
+    ).fetchone()
+
+    # Access STRUCT fields as dictionary
+    profile = result.profile  # {"0": "John Doe", "1": 30, "2": "john@example.com"}
+
+Named STRUCT Fields
+^^^^^^^^^^^^^^^^^^^
+
+For better readability, use JSON casting to get named fields:
+
+.. code:: python
+
+    # Using CAST AS JSON for named field access
+    result = connection.execute(
+        select().from_statement(
+            text("SELECT CAST(ROW('John', 30) AS JSON) as user_data")
+        )
+    ).fetchone()
+
+    # Parse JSON result
+    import json
+    user_data = json.loads(result.user_data)  # ["John", 30]
+
+Data Format Support
+^^^^^^^^^^^^^^^^^^^
+
+PyAthena supports multiple STRUCT data formats:
+
+**Athena Native Format:**
+
+.. code:: python
+
+    # Input: "{name=John, age=30}"
+    # Output: {"name": "John", "age": 30}
+
+**JSON Format (Recommended):**
+
+.. code:: python
+
+    # Input: '{"name": "John", "age": 30}'  
+    # Output: {"name": "John", "age": 30}
+
+**Unnamed STRUCT Format:**
+
+.. code:: python
+
+    # Input: "{Alice, 25}"
+    # Output: {"0": "Alice", "1": 25}
+
+Performance Considerations
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- **JSON Format**: Recommended for complex nested structures
+- **Native Format**: Optimized for simple key-value pairs
+- **Smart Detection**: PyAthena automatically detects the format to avoid unnecessary parsing overhead
+
+Best Practices
+^^^^^^^^^^^^^^
+
+1. **Use JSON casting** for complex nested structures:
+
+   .. code:: sql
+
+       SELECT CAST(complex_struct AS JSON) FROM table_name
+
+2. **Define clear field types** in AthenaStruct definitions:
+
+   .. code:: python
+
+       AthenaStruct(
+           ('user_id', Integer),
+           ('profile', AthenaStruct(
+               ('name', String),
+               ('preferences', AthenaStruct(
+                   ('theme', String),
+                   ('language', String)
+               ))
+           ))
+       )
+
+3. **Handle NULL values** appropriately in your application logic:
+
+   .. code:: python
+
+       if result.struct_column is not None:
+           # Process struct data
+           field_value = result.struct_column.get('field_name')
+
+Migration from Raw Strings
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**Before (raw string handling):**
+
+.. code:: python
+
+    result = cursor.execute("SELECT struct_column FROM table").fetchone()
+    raw_data = result[0]  # "{\"name\": \"John\", \"age\": 30}"
+    import json
+    parsed_data = json.loads(raw_data)
+
+**After (automatic conversion):**
+
+.. code:: python
+
+    result = cursor.execute("SELECT struct_column FROM table").fetchone()
+    struct_data = result[0]  # {"name": "John", "age": 30} - automatically converted
+    name = struct_data['name']  # Direct access