What happens?
From PySpark docs: pyspark.sql.functions.struct
struct() should accept a list of Columns as input.
In DuckDB this does not work:
InvalidInputException: Invalid Input Error: Expected argument of type Expression, received '<class 'list'>' instead
DuckDB currently only accepts plain arguments (unpacked list).
To Reproduce
input (from docs)
from duckdb.experimental.spark.sql import SparkSession as session
from duckdb.experimental.spark.sql import functions as F
spark = session.builder.getOrCreate()
df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age"))
df.select(F.struct([df.age, df.name]).alias("struct")).collect()
output
---------------------------------------------------------------------------
InvalidInputException Traceback (most recent call last)
Cell In[30], line 1
----> 1 df.select(F.struct([df.age, df.name]).alias("struct")).collect()
File ...\.venv\lib\site-packages\duckdb\experimental\spark\sql\functions.py:108, in struct(*cols)
106 def struct(*cols: Column) -> Column:
107 return Column(
--> 108 FunctionExpression("struct_pack", *[_inner_expr_or_val(x) for x in cols])
109 )
InvalidInputException: Invalid Input Error: Expected argument of type Expression, received '<class 'list'>' instead
workaround: unpacking the list via * asterisk operator
from duckdb.experimental.spark.sql import SparkSession as session
from duckdb.experimental.spark.sql import functions as F
spark = session.builder.getOrCreate()
df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age"))
df.select(F.struct(*[df.age, df.name]).alias("struct")).collect()
output
[Row(struct={'age': 2, 'name': 'Alice'}),
Row(struct={'age': 5, 'name': 'Bob'})]
OS:
win_amd64
DuckDB Version:
1.2.2 (duckdb-1.2.2-cp310-cp310-win_amd64.whl)
DuckDB Client:
Python
Hardware:
No response
Full Name:
Martin Bode
Affiliation:
N/A
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
What happens?
From PySpark docs: pyspark.sql.functions.struct
struct()should accept a list of Columns as input.In DuckDB this does not work:
InvalidInputException: Invalid Input Error: Expected argument of type Expression, received '<class 'list'>' insteadDuckDB currently only accepts plain arguments (unpacked list).
To Reproduce
input (from docs)
output
workaround: unpacking the list via
*asterisk operatoroutput
OS:
win_amd64
DuckDB Version:
1.2.2 (duckdb-1.2.2-cp310-cp310-win_amd64.whl)
DuckDB Client:
Python
Hardware:
No response
Full Name:
Martin Bode
Affiliation:
N/A
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?