Skip to content

Commit cb7fd95

Browse files
Unified arrow API (#287)
# Problem The Arrow API of the Python client regularly causes confusion. The most important issues seem to be that: * The Relational API has a total of six functions, four of which are aliases, three of which are unique to the Relational API * The naming of the two core functions is not as intuitive as it could be. Also see #97 ## Core Functions The Connection API (and with it, the `duckdb` module) and the Relational API have two core functions to create Arrow objects: * `fetch_record_batch() -> pyarrow.lib.RecordBatchReader` * `fetch_arrow_table() -> pyarrow.lib.Table` The Connection API has another function to create a Relation from an Arrow object: * `arrow(arrow_object, connection = None) -> DuckDBPyRelation` ## Aliases The Connection and Relational APIs both have an alias for `fetch_record_batch()`: * `arrow() -> pyarrow.lib.RecordBatchReader` This function was the first we exposed in the API and is probably most often used. It changed return type over the course of 1.4.X, from `Table` to `RecordBatchReader`, which caused a number of issues. The Relational API has three more aliases: * `to_arrow_table() -> pyarrow.lib.Table` * `fetch_arrow_reader() -> pyarrow.lib.RecordBatchReader` * `record_batch() -> pyarrow.lib.RecordBatchReader` (this has been deprecated since 1.4.0) # Changes ## v1.5.0 API The Connection and Relational APIs will have the following functions: * `to_arrow_reader() -> pyarrow.lib.RecordBatchReader` * `to_arrow_table() -> pyarrow.lib.Table` * `arrow() -> pyarrow.lib.RecordBatchReader` **Note:** we will _not_ deprecate this function in v1.5.0, but we will discourage its use in both the documentation and the docstring. We encourage users to use `to_arrow_reader()` instead. The Connection API will keep this function: * `arrow(arrow_object, connection = None) -> DuckDBPyRelation` ## v1.5.0 Deprecated API The `fetch_*` functions will be deprecated in v1.5.0 (which will be emitted as a DeprecationWarning) and removed in v1.6.0: * `fetch_record_batch() -> pyarrow.lib.RecordBatchReader` * `fetch_arrow_table() -> pyarrow.lib.Table` * `fetch_arrow_reader() -> pyarrow.lib.RecordBatchReader` ## v1.5.0 Removed API * `Relation::record_batch() -> pyarrow.lib.RecordBatchReader` will be removed. # What's in a Name Arrow's ADBC Driver Manager API uses the `fetch_*` naming convention ([docs](https://arrow.apache.org/adbc/0.9.0/python/api/adbc_driver_manager.html#adbc_driver_manager.dbapi.Cursor)): * `fetch_arrow_table()` * `fetch_record_reader()` * `fetch_df()` This is what we've adopted, in spite of our own (not consistently applied) convention of using `to_*`: * `to_csv` * `to_df` * `to_parquet` * `to_table` * `to_view` We however also provide: * `fetch_df` * `fetch_df_chunk` Looking at other libraries (Vortex, Pandas, etc) there is precedent to move from the `fetch_*` prefix to the `to_*` prefix, which seems the preferred way of expressing a conversion.
2 parents 78aee48 + d3cca18 commit cb7fd95

50 files changed

Lines changed: 473 additions & 265 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

_duckdb-stubs/__init__.pyi

Lines changed: 45 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,8 @@ __all__: list[str] = [
9595
"execute",
9696
"executemany",
9797
"extract_statements",
98+
"to_arrow_reader",
99+
"to_arrow_table",
98100
"fetch_arrow_table",
99101
"fetch_df",
100102
"fetch_df_chunk",
@@ -194,7 +196,11 @@ class DuckDBPyConnection:
194196
def __exit__(self, exc_type: object, exc: object, traceback: object) -> None: ...
195197
def append(self, table_name: str, df: pandas.DataFrame, *, by_name: bool = False) -> DuckDBPyConnection: ...
196198
def array_type(self, type: sqltypes.DuckDBPyType, size: pytyping.SupportsInt) -> sqltypes.DuckDBPyType: ...
197-
def arrow(self, rows_per_batch: pytyping.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader: ...
199+
def arrow(self, rows_per_batch: pytyping.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader:
200+
"""Alias of to_arrow_reader(). We recommend using to_arrow_reader() instead."""
201+
...
202+
def to_arrow_reader(self, batch_size: pytyping.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader: ...
203+
def to_arrow_table(self, batch_size: pytyping.SupportsInt = 1000000) -> pyarrow.lib.Table: ...
198204
def begin(self) -> DuckDBPyConnection: ...
199205
def checkpoint(self) -> DuckDBPyConnection: ...
200206
def close(self) -> None: ...
@@ -222,12 +228,16 @@ class DuckDBPyConnection:
222228
def execute(self, query: Statement | str, parameters: object = None) -> DuckDBPyConnection: ...
223229
def executemany(self, query: Statement | str, parameters: object = None) -> DuckDBPyConnection: ...
224230
def extract_statements(self, query: str) -> list[Statement]: ...
225-
def fetch_arrow_table(self, rows_per_batch: pytyping.SupportsInt = 1000000) -> pyarrow.lib.Table: ...
231+
def fetch_arrow_table(self, rows_per_batch: pytyping.SupportsInt = 1000000) -> pyarrow.lib.Table:
232+
"""Deprecated: use to_arrow_table() instead."""
233+
...
226234
def fetch_df(self, *, date_as_object: bool = False) -> pandas.DataFrame: ...
227235
def fetch_df_chunk(
228236
self, vectors_per_chunk: pytyping.SupportsInt = 1, *, date_as_object: bool = False
229237
) -> pandas.DataFrame: ...
230-
def fetch_record_batch(self, rows_per_batch: pytyping.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader: ...
238+
def fetch_record_batch(self, rows_per_batch: pytyping.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader:
239+
"""Deprecated: use to_arrow_reader() instead."""
240+
...
231241
def fetchall(self) -> list[tuple[pytyping.Any, ...]]: ...
232242
def fetchdf(self, *, date_as_object: bool = False) -> pandas.DataFrame: ...
233243
def fetchmany(self, size: pytyping.SupportsInt = 1) -> list[tuple[pytyping.Any, ...]]: ...
@@ -487,7 +497,11 @@ class DuckDBPyRelation:
487497
def arg_min(
488498
self, arg_column: str, value_column: str, groups: str = "", window_spec: str = "", projected_columns: str = ""
489499
) -> DuckDBPyRelation: ...
490-
def arrow(self, batch_size: pytyping.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader: ...
500+
def arrow(self, batch_size: pytyping.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader:
501+
"""Alias of to_arrow_reader(). We recommend using to_arrow_reader() instead."""
502+
...
503+
def to_arrow_reader(self, batch_size: pytyping.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader: ...
504+
def to_arrow_table(self, batch_size: pytyping.SupportsInt = 1000000) -> pyarrow.lib.Table: ...
491505
def avg(
492506
self, column: str, groups: str = "", window_spec: str = "", projected_columns: str = ""
493507
) -> DuckDBPyRelation: ...
@@ -533,12 +547,18 @@ class DuckDBPyRelation:
533547
def favg(
534548
self, column: str, groups: str = "", window_spec: str = "", projected_columns: str = ""
535549
) -> DuckDBPyRelation: ...
536-
def fetch_arrow_reader(self, batch_size: pytyping.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader: ...
537-
def fetch_arrow_table(self, batch_size: pytyping.SupportsInt = 1000000) -> pyarrow.lib.Table: ...
550+
def fetch_arrow_reader(self, batch_size: pytyping.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader:
551+
"""Deprecated: use to_arrow_reader() instead."""
552+
...
553+
def fetch_arrow_table(self, batch_size: pytyping.SupportsInt = 1000000) -> pyarrow.lib.Table:
554+
"""Deprecated: use to_arrow_table() instead."""
555+
...
538556
def fetch_df_chunk(
539557
self, vectors_per_chunk: pytyping.SupportsInt = 1, *, date_as_object: bool = False
540558
) -> pandas.DataFrame: ...
541-
def fetch_record_batch(self, rows_per_batch: pytyping.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader: ...
559+
def fetch_record_batch(self, rows_per_batch: pytyping.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader:
560+
"""Deprecated: use to_arrow_reader() instead."""
561+
...
542562
def fetchall(self) -> list[tuple[pytyping.Any, ...]]: ...
543563
def fetchdf(self, *, date_as_object: bool = False) -> pandas.DataFrame: ...
544564
def fetchmany(self, size: pytyping.SupportsInt = 1) -> list[tuple[pytyping.Any, ...]]: ...
@@ -656,7 +676,6 @@ class DuckDBPyRelation:
656676
def query(self, virtual_table_name: str, sql_query: str) -> DuckDBPyRelation: ...
657677
def rank(self, window_spec: str, projected_columns: str = "") -> DuckDBPyRelation: ...
658678
def rank_dense(self, window_spec: str, projected_columns: str = "") -> DuckDBPyRelation: ...
659-
def record_batch(self, batch_size: pytyping.SupportsInt = 1000000) -> pyarrow.RecordBatchReader: ...
660679
def row_number(self, window_spec: str, projected_columns: str = "") -> DuckDBPyRelation: ...
661680
def select(self, *args: str | Expression, groups: str = "") -> DuckDBPyRelation: ...
662681
def select_dtypes(self, types: pytyping.List[sqltypes.DuckDBPyType | str]) -> DuckDBPyRelation: ...
@@ -692,7 +711,6 @@ class DuckDBPyRelation:
692711
self, column: str, groups: str = "", window_spec: str = "", projected_columns: str = ""
693712
) -> DuckDBPyRelation: ...
694713
def tf(self) -> dict[str, tensorflow.Tensor]: ...
695-
def to_arrow_table(self, batch_size: pytyping.SupportsInt = 1000000) -> pyarrow.lib.Table: ...
696714
def to_csv(
697715
self,
698716
file_name: str,
@@ -1067,9 +1085,18 @@ def array_type(
10671085
@pytyping.overload
10681086
def arrow(
10691087
rows_per_batch: pytyping.SupportsInt = 1000000, *, connection: DuckDBPyConnection | None = None
1070-
) -> pyarrow.lib.RecordBatchReader: ...
1088+
) -> pyarrow.lib.RecordBatchReader:
1089+
"""Alias of to_arrow_reader(). We recommend using to_arrow_reader() instead."""
1090+
...
1091+
10711092
@pytyping.overload
10721093
def arrow(arrow_object: pytyping.Any, *, connection: DuckDBPyConnection | None = None) -> DuckDBPyRelation: ...
1094+
def to_arrow_reader(
1095+
batch_size: pytyping.SupportsInt = 1000000, *, connection: DuckDBPyConnection | None = None
1096+
) -> pyarrow.lib.RecordBatchReader: ...
1097+
def to_arrow_table(
1098+
batch_size: pytyping.SupportsInt = 1000000, *, connection: DuckDBPyConnection | None = None
1099+
) -> pyarrow.lib.Table: ...
10731100
def begin(*, connection: DuckDBPyConnection | None = None) -> DuckDBPyConnection: ...
10741101
def checkpoint(*, connection: DuckDBPyConnection | None = None) -> DuckDBPyConnection: ...
10751102
def close(*, connection: DuckDBPyConnection | None = None) -> None: ...
@@ -1128,7 +1155,10 @@ def executemany(
11281155
def extract_statements(query: str, *, connection: DuckDBPyConnection | None = None) -> list[Statement]: ...
11291156
def fetch_arrow_table(
11301157
rows_per_batch: pytyping.SupportsInt = 1000000, *, connection: DuckDBPyConnection | None = None
1131-
) -> pyarrow.lib.Table: ...
1158+
) -> pyarrow.lib.Table:
1159+
"""Deprecated: use to_arrow_table() instead."""
1160+
...
1161+
11321162
def fetch_df(*, date_as_object: bool = False, connection: DuckDBPyConnection | None = None) -> pandas.DataFrame: ...
11331163
def fetch_df_chunk(
11341164
vectors_per_chunk: pytyping.SupportsInt = 1,
@@ -1138,7 +1168,10 @@ def fetch_df_chunk(
11381168
) -> pandas.DataFrame: ...
11391169
def fetch_record_batch(
11401170
rows_per_batch: pytyping.SupportsInt = 1000000, *, connection: DuckDBPyConnection | None = None
1141-
) -> pyarrow.lib.RecordBatchReader: ...
1171+
) -> pyarrow.lib.RecordBatchReader:
1172+
"""Deprecated: use to_arrow_reader() instead."""
1173+
...
1174+
11421175
def fetchall(*, connection: DuckDBPyConnection | None = None) -> list[tuple[pytyping.Any, ...]]: ...
11431176
def fetchdf(*, date_as_object: bool = False, connection: DuckDBPyConnection | None = None) -> pandas.DataFrame: ...
11441177
def fetchmany(

duckdb/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,8 @@
143143
table_function,
144144
tf,
145145
threadsafety,
146+
to_arrow_reader,
147+
to_arrow_table,
146148
token_type,
147149
tokenize,
148150
torch,
@@ -374,6 +376,8 @@
374376
"tf",
375377
"threadsafety",
376378
"threadsafety",
379+
"to_arrow_reader",
380+
"to_arrow_table",
377381
"token_type",
378382
"tokenize",
379383
"torch",

duckdb/polars_io.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -270,10 +270,7 @@ def source_generator(
270270
# Try to pushdown filter, if one exists
271271
if duck_predicate is not None:
272272
relation_final = relation_final.filter(duck_predicate)
273-
if batch_size is None:
274-
results = relation_final.fetch_arrow_reader()
275-
else:
276-
results = relation_final.fetch_arrow_reader(batch_size)
273+
results = relation_final.to_arrow_reader() if batch_size is None else relation_final.to_arrow_reader(batch_size)
277274

278275
for record_batch in iter(results.read_next_batch, None):
279276
if predicate is not None and duck_predicate is None:

scripts/connection_methods.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -395,7 +395,7 @@
395395
"return": "polars.DataFrame"
396396
},
397397
{
398-
"name": "fetch_arrow_table",
398+
"name": "arrow_table",
399399
"function": "FetchArrow",
400400
"docs": "Fetch a result as Arrow table following execute()",
401401
"args": [

src/duckdb_py/duckdb_python.cpp

Lines changed: 27 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -446,31 +446,45 @@ static void InitializeConnectionMethods(py::module_ &m) {
446446
"Fetch a result as Polars DataFrame following execute()", py::arg("rows_per_batch") = 1000000, py::kw_only(),
447447
py::arg("lazy") = false, py::arg("connection") = py::none());
448448
m.def(
449-
"fetch_arrow_table",
450-
[](idx_t rows_per_batch, shared_ptr<DuckDBPyConnection> conn = nullptr) {
449+
"to_arrow_table",
450+
[](idx_t batch_size, shared_ptr<DuckDBPyConnection> conn = nullptr) {
451451
if (!conn) {
452452
conn = DuckDBPyConnection::DefaultConnection();
453453
}
454-
return conn->FetchArrow(rows_per_batch);
454+
return conn->FetchArrow(batch_size);
455455
},
456-
"Fetch a result as Arrow table following execute()", py::arg("rows_per_batch") = 1000000, py::kw_only(),
456+
"Fetch a result as Arrow table following execute()", py::arg("batch_size") = 1000000, py::kw_only(),
457457
py::arg("connection") = py::none());
458458
m.def(
459-
"fetch_record_batch",
460-
[](const idx_t rows_per_batch, shared_ptr<DuckDBPyConnection> conn = nullptr) {
459+
"to_arrow_reader",
460+
[](idx_t batch_size, shared_ptr<DuckDBPyConnection> conn = nullptr) {
461461
if (!conn) {
462462
conn = DuckDBPyConnection::DefaultConnection();
463463
}
464-
return conn->FetchRecordBatchReader(rows_per_batch);
464+
return conn->FetchRecordBatchReader(batch_size);
465465
},
466-
"Fetch an Arrow RecordBatchReader following execute()", py::arg("rows_per_batch") = 1000000, py::kw_only(),
466+
"Fetch an Arrow RecordBatchReader following execute()", py::arg("batch_size") = 1000000, py::kw_only(),
467467
py::arg("connection") = py::none());
468468
m.def(
469-
"arrow",
469+
"fetch_arrow_table",
470+
[](idx_t rows_per_batch, shared_ptr<DuckDBPyConnection> conn = nullptr) {
471+
if (!conn) {
472+
conn = DuckDBPyConnection::DefaultConnection();
473+
}
474+
PyErr_WarnEx(PyExc_DeprecationWarning, "fetch_arrow_table() is deprecated, use to_arrow_table() instead.",
475+
0);
476+
return conn->FetchArrow(rows_per_batch);
477+
},
478+
"Fetch a result as Arrow table following execute()", py::arg("rows_per_batch") = 1000000, py::kw_only(),
479+
py::arg("connection") = py::none());
480+
m.def(
481+
"fetch_record_batch",
470482
[](const idx_t rows_per_batch, shared_ptr<DuckDBPyConnection> conn = nullptr) {
471483
if (!conn) {
472484
conn = DuckDBPyConnection::DefaultConnection();
473485
}
486+
PyErr_WarnEx(PyExc_DeprecationWarning, "fetch_record_batch() is deprecated, use to_arrow_reader() instead.",
487+
0);
474488
return conn->FetchRecordBatchReader(rows_per_batch);
475489
},
476490
"Fetch an Arrow RecordBatchReader following execute()", py::arg("rows_per_batch") = 1000000, py::kw_only(),
@@ -957,14 +971,14 @@ static void InitializeConnectionMethods(py::module_ &m) {
957971
// We define these "wrapper" methods manually because they are overloaded
958972
m.def(
959973
"arrow",
960-
[](idx_t rows_per_batch, shared_ptr<DuckDBPyConnection> conn) -> duckdb::pyarrow::Table {
974+
[](idx_t rows_per_batch, shared_ptr<DuckDBPyConnection> conn) -> duckdb::pyarrow::RecordBatchReader {
961975
if (!conn) {
962976
conn = DuckDBPyConnection::DefaultConnection();
963977
}
964-
return conn->FetchArrow(rows_per_batch);
978+
return conn->FetchRecordBatchReader(rows_per_batch);
965979
},
966-
"Fetch a result as Arrow table following execute()", py::arg("rows_per_batch") = 1000000, py::kw_only(),
967-
py::arg("connection") = py::none());
980+
"Alias of to_arrow_reader(). We recommend using to_arrow_reader() instead.",
981+
py::arg("rows_per_batch") = 1000000, py::kw_only(), py::arg("connection") = py::none());
968982
m.def(
969983
"arrow",
970984
[](py::object &arrow_object, shared_ptr<DuckDBPyConnection> conn) -> unique_ptr<DuckDBPyRelation> {

src/duckdb_py/pyconnection.cpp

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -203,11 +203,28 @@ static void InitializeConnectionMethods(py::class_<DuckDBPyConnection, shared_pt
203203
py::kw_only(), py::arg("date_as_object") = false);
204204
m.def("pl", &DuckDBPyConnection::FetchPolars, "Fetch a result as Polars DataFrame following execute()",
205205
py::arg("rows_per_batch") = 1000000, py::kw_only(), py::arg("lazy") = false);
206-
m.def("fetch_arrow_table", &DuckDBPyConnection::FetchArrow, "Fetch a result as Arrow table following execute()",
207-
py::arg("rows_per_batch") = 1000000);
208-
m.def("fetch_record_batch", &DuckDBPyConnection::FetchRecordBatchReader,
209-
"Fetch an Arrow RecordBatchReader following execute()", py::arg("rows_per_batch") = 1000000);
210-
m.def("arrow", &DuckDBPyConnection::FetchRecordBatchReader, "Fetch an Arrow RecordBatchReader following execute()",
206+
m.def("to_arrow_table", &DuckDBPyConnection::FetchArrow, "Fetch a result as Arrow table following execute()",
207+
py::arg("batch_size") = 1000000);
208+
m.def("to_arrow_reader", &DuckDBPyConnection::FetchRecordBatchReader,
209+
"Fetch an Arrow RecordBatchReader following execute()", py::arg("batch_size") = 1000000);
210+
m.def(
211+
"fetch_arrow_table",
212+
[](DuckDBPyConnection &self, idx_t rows_per_batch) {
213+
PyErr_WarnEx(PyExc_DeprecationWarning, "fetch_arrow_table() is deprecated, use to_arrow_table() instead.",
214+
0);
215+
return self.FetchArrow(rows_per_batch);
216+
},
217+
"Fetch a result as Arrow table following execute()", py::arg("rows_per_batch") = 1000000);
218+
m.def(
219+
"fetch_record_batch",
220+
[](DuckDBPyConnection &self, idx_t rows_per_batch) {
221+
PyErr_WarnEx(PyExc_DeprecationWarning, "fetch_record_batch() is deprecated, use to_arrow_reader() instead.",
222+
0);
223+
return self.FetchRecordBatchReader(rows_per_batch);
224+
},
225+
"Fetch an Arrow RecordBatchReader following execute()", py::arg("rows_per_batch") = 1000000);
226+
m.def("arrow", &DuckDBPyConnection::FetchRecordBatchReader,
227+
"Alias of to_arrow_reader(). We recommend using to_arrow_reader() instead.",
211228
py::arg("rows_per_batch") = 1000000);
212229
m.def("torch", &DuckDBPyConnection::FetchPyTorch, "Fetch a result as dict of PyTorch Tensors following execute()");
213230
m.def("tf", &DuckDBPyConnection::FetchTF, "Fetch a result as dict of TensorFlow Tensors following execute()");

src/duckdb_py/pyrelation/initialize.cpp

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -62,12 +62,21 @@ static void InitializeConsumers(py::class_<DuckDBPyRelation> &m) {
6262
py::arg("date_as_object") = false)
6363
.def("fetch_df_chunk", &DuckDBPyRelation::FetchDFChunk, "Execute and fetch a chunk of the rows",
6464
py::arg("vectors_per_chunk") = 1, py::kw_only(), py::arg("date_as_object") = false)
65-
.def("arrow", &DuckDBPyRelation::ToRecordBatch,
66-
"Execute and return an Arrow Record Batch Reader that yields all rows", py::arg("batch_size") = 1000000)
67-
.def("fetch_arrow_table", &DuckDBPyRelation::ToArrowTable, "Execute and fetch all rows as an Arrow Table",
68-
py::arg("batch_size") = 1000000)
6965
.def("to_arrow_table", &DuckDBPyRelation::ToArrowTable, "Execute and fetch all rows as an Arrow Table",
7066
py::arg("batch_size") = 1000000)
67+
.def("to_arrow_reader", &DuckDBPyRelation::ToRecordBatch,
68+
"Execute and return an Arrow Record Batch Reader that yields all rows", py::arg("batch_size") = 1000000)
69+
.def("arrow", &DuckDBPyRelation::ToRecordBatch,
70+
"Alias of to_arrow_reader(). We recommend using to_arrow_reader() instead.",
71+
py::arg("batch_size") = 1000000)
72+
.def(
73+
"fetch_arrow_table",
74+
[](pybind11::object &self, idx_t batch_size) {
75+
PyErr_WarnEx(PyExc_DeprecationWarning,
76+
"fetch_arrow_table() is deprecated, use to_arrow_table() instead.", 0);
77+
return self.attr("to_arrow_table")(batch_size);
78+
},
79+
"Execute and fetch all rows as an Arrow Table", py::arg("batch_size") = 1000000)
7180
.def("pl", &DuckDBPyRelation::ToPolars, "Execute and fetch all rows as a Polars DataFrame",
7281
py::arg("batch_size") = 1000000, py::kw_only(), py::arg("lazy") = false)
7382
.def("torch", &DuckDBPyRelation::FetchPyTorch, "Fetch a result as dict of PyTorch Tensors")
@@ -79,18 +88,25 @@ static void InitializeConsumers(py::class_<DuckDBPyRelation> &m) {
7988
)";
8089
m.def("__arrow_c_stream__", &DuckDBPyRelation::ToArrowCapsule, capsule_docs,
8190
py::arg("requested_schema") = py::none());
82-
m.def("fetch_record_batch", &DuckDBPyRelation::ToRecordBatch,
83-
"Execute and return an Arrow Record Batch Reader that yields all rows", py::arg("rows_per_batch") = 1000000)
84-
.def("fetch_arrow_reader", &DuckDBPyRelation::ToRecordBatch,
85-
"Execute and return an Arrow Record Batch Reader that yields all rows", py::arg("batch_size") = 1000000)
91+
m.def(
92+
"fetch_record_batch",
93+
[](pybind11::object &self, idx_t rows_per_batch) {
94+
PyErr_WarnEx(PyExc_DeprecationWarning,
95+
"fetch_record_batch() is deprecated, use to_arrow_reader() instead.", 0);
96+
return self.attr("to_arrow_reader")(rows_per_batch);
97+
},
98+
"Execute and return an Arrow Record Batch Reader that yields all rows", py::arg("rows_per_batch") = 1000000)
8699
.def(
87-
"record_batch",
88-
[](pybind11::object &self, idx_t rows_per_batch) {
100+
"fetch_arrow_reader",
101+
[](pybind11::object &self, idx_t batch_size) {
89102
PyErr_WarnEx(PyExc_DeprecationWarning,
90-
"record_batch() is deprecated, use fetch_record_batch() instead.", 0);
91-
return self.attr("fetch_record_batch")(rows_per_batch);
103+
"fetch_arrow_reader() is deprecated, use to_arrow_reader() instead.", 0);
104+
if (PyErr_Occurred()) {
105+
throw py::error_already_set();
106+
}
107+
return self.attr("to_arrow_reader")(batch_size);
92108
},
93-
py::arg("batch_size") = 1000000);
109+
"Execute and return an Arrow Record Batch Reader that yields all rows", py::arg("batch_size") = 1000000);
94110
}
95111

96112
static void InitializeAggregates(py::class_<DuckDBPyRelation> &m) {

tests/fast/api/test_dbapi_fetch.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,11 +42,11 @@ def test_multiple_fetch_arrow(self, duckdb_cursor):
4242
pytest.importorskip("pyarrow")
4343
con = duckdb.connect()
4444
c = con.execute("SELECT 42::BIGINT AS a")
45-
table = c.fetch_arrow_table()
45+
table = c.to_arrow_table()
4646
df = table.to_pandas()
4747
pd.testing.assert_frame_equal(df, pd.DataFrame.from_dict({"a": [42]}))
48-
assert c.fetch_arrow_table() is None
49-
assert c.fetch_arrow_table() is None
48+
assert c.to_arrow_table() is None
49+
assert c.to_arrow_table() is None
5050

5151
def test_multiple_close(self, duckdb_cursor):
5252
con = duckdb.connect()

0 commit comments

Comments
 (0)