@@ -389,6 +389,112 @@ The `on_start_query_execution` callback is supported by the following cursor typ
389389Note: ` AsyncCursor ` and its variants do not support this callback as they already
390390return the query ID immediately through their different execution model.
391391
392+ ## Type hints for complex types
393+
394+ * New in version 3.30.0.*
395+
396+ The Athena API does not return element-level type information for complex types
397+ (array, map, row/struct). PyAthena parses the string representation returned by
398+ Athena, but without type metadata the converter can only apply heuristics — which
399+ may produce incorrect Python types for nested values (e.g. integers left as strings
400+ inside a struct).
401+
402+ The ` result_set_type_hints ` parameter solves this by letting you provide Athena DDL
403+ type signatures for specific columns. The converter then uses precise, recursive
404+ type-aware conversion instead of heuristics.
405+
406+ ``` python
407+ from pyathena import connect
408+
409+ cursor = connect(s3_staging_dir = " s3://YOUR_S3_BUCKET/path/to/" ,
410+ region_name = " us-west-2" ).cursor()
411+ cursor.execute(
412+ " SELECT col_array, col_map, col_struct FROM one_row_complex" ,
413+ result_set_type_hints = {
414+ " col_array" : " array(integer)" ,
415+ " col_map" : " map(integer, integer)" ,
416+ " col_struct" : " row(a integer, b integer)" ,
417+ },
418+ )
419+ row = cursor.fetchone()
420+ # col_struct values are now integers, not strings:
421+ # {"a": 1, "b": 2} instead of {"a": "1", "b": "2"}
422+ ```
423+
424+ Column name matching is case-insensitive. Type hints support arbitrarily nested types:
425+
426+ ``` python
427+ cursor.execute(
428+ """
429+ SELECT CAST(
430+ ROW(ROW('2024-01-01', 123), 4.736, 0.583)
431+ AS ROW(header ROW(stamp VARCHAR, seq INTEGER), x DOUBLE, y DOUBLE)
432+ ) AS positions
433+ """ ,
434+ result_set_type_hints = {
435+ " positions" : " row(header row(stamp varchar, seq integer), x double, y double)" ,
436+ },
437+ )
438+ row = cursor.fetchone()
439+ positions = row[0 ]
440+ # positions["header"]["seq"] == 123 (int, not "123")
441+ # positions["x"] == 4.736 (float, not "4.736")
442+ ```
443+
444+ ### Hive-style syntax
445+
446+ You can paste type signatures from Hive DDL or `` DESCRIBE TABLE `` output directly.
447+ Hive-style angle brackets and colons are automatically converted to Trino-style syntax:
448+
449+ ``` python
450+ # Both are equivalent:
451+ result_set_type_hints= {" col" : " array(struct(a integer, b varchar))" } # Trino
452+ result_set_type_hints= {" col" : " array<struct<a:int,b:varchar>>" } # Hive
453+ ```
454+
455+ The `` int `` alias is also supported and resolves to `` integer `` .
456+
457+ ### Index-based hints for duplicate column names
458+
459+ When a query produces columns with the same alias (e.g. `` SELECT a AS x, b AS x `` ),
460+ name-based hints cannot distinguish between them. Use integer keys to specify hints
461+ by zero-based column position:
462+
463+ ``` python
464+ cursor.execute(
465+ " SELECT a AS x, b AS x FROM my_table" ,
466+ result_set_type_hints = {
467+ 0 : " array(integer)" , # first "x" column
468+ 1 : " map(varchar, integer)" , # second "x" column
469+ },
470+ )
471+ ```
472+
473+ Integer (index-based) hints take priority over string (name-based) hints for the same
474+ column. You can mix both styles in the same dictionary.
475+
476+ ### Constraints
477+
478+ * ** Nested arrays in native format** — Athena's native (non-JSON) string representation
479+ does not clearly delimit nested arrays. If your query returns nested arrays
480+ (e.g. ` array(array(integer)) ` ), use ` CAST(... AS JSON) ` in your query to get
481+ JSON-formatted output, which is parsed reliably.
482+ * ** Arrow, Pandas, and Polars cursors** — These cursors accept ` result_set_type_hints `
483+ but their converters do not currently use the hints because they rely on their own
484+ type systems. The parameter is passed through for forward compatibility and for
485+ result sets that fall back to the default conversion path.
486+
487+ ### Breaking change in 3.30.0
488+
489+ Prior to 3.30.0, PyAthena attempted to infer Python types for scalar values inside
490+ complex types using heuristics (e.g. ` "123" ` → ` 123 ` ). Starting with 3.30.0, values
491+ inside complex types are ** kept as strings** unless ` result_set_type_hints ` is provided.
492+ This change avoids silent misconversion but means existing code that relied on the
493+ heuristic behavior may see string values where it previously saw integers or floats.
494+
495+ To restore typed conversion, pass ` result_set_type_hints ` with the appropriate type
496+ signatures for the affected columns.
497+
392498## Environment variables
393499
394500Support [ Boto3 environment variables] ( https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-environment-variables ) .
0 commit comments