Skip to content
This repository was archived by the owner on Apr 1, 2026. It is now read-only.

Commit 31853d4

Browse files
Merge remote-tracking branch 'github/main' into session_simplify
2 parents ced3618 + 26df6e6 commit 31853d4

File tree

47 files changed

+2761
-1260
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+2761
-1260
lines changed

CHANGELOG.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,41 @@
44

55
[1]: https://pypi.org/project/bigframes/#history
66

7+
## [2.16.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.15.0...v2.16.0) (2025-08-20)
8+
9+
10+
### Features
11+
12+
* Add `bigframes.pandas.options.display.precision` option ([#1979](https://github.com/googleapis/python-bigquery-dataframes/issues/1979)) ([15e6175](https://github.com/googleapis/python-bigquery-dataframes/commit/15e6175ec0aeb1b7b02d0bba9e8e1e018bd11c31))
13+
* Add level, inplace params to reset_index ([#1988](https://github.com/googleapis/python-bigquery-dataframes/issues/1988)) ([3446950](https://github.com/googleapis/python-bigquery-dataframes/commit/34469504b79a082d3380f9f25c597483aef2068a))
14+
* Add ML code samples from dbt blog post ([#1978](https://github.com/googleapis/python-bigquery-dataframes/issues/1978)) ([ebaa244](https://github.com/googleapis/python-bigquery-dataframes/commit/ebaa244a9eb7b87f7f9fd9c3bebe5c7db24cd013))
15+
* Add where, coalesce, fillna, casewhen, invert local impl ([#1976](https://github.com/googleapis/python-bigquery-dataframes/issues/1976)) ([f7f686c](https://github.com/googleapis/python-bigquery-dataframes/commit/f7f686cf85ab7e265d9c07ebc7f0cd59babc5357))
16+
* Adjust anywidget CSS to prevent overflow ([#1981](https://github.com/googleapis/python-bigquery-dataframes/issues/1981)) ([204f083](https://github.com/googleapis/python-bigquery-dataframes/commit/204f083a2f00fcc9fd1500dcd7a738eda3904d2f))
17+
* Format page number in table widget ([#1992](https://github.com/googleapis/python-bigquery-dataframes/issues/1992)) ([e83836e](https://github.com/googleapis/python-bigquery-dataframes/commit/e83836e8e1357f009f3f95666f1661bdbe0d3751))
18+
* Or, And, Xor can execute locally ([#1994](https://github.com/googleapis/python-bigquery-dataframes/issues/1994)) ([59c52a5](https://github.com/googleapis/python-bigquery-dataframes/commit/59c52a55ebea697855eb4c70529e226cc077141f))
19+
* Support callable bigframes function for dataframe where ([#1990](https://github.com/googleapis/python-bigquery-dataframes/issues/1990)) ([44c1ec4](https://github.com/googleapis/python-bigquery-dataframes/commit/44c1ec48cc4db1c4c9c15ec1fab43d4ef0758e56))
20+
* Support callable for series where method ([#2005](https://github.com/googleapis/python-bigquery-dataframes/issues/2005)) ([768b82a](https://github.com/googleapis/python-bigquery-dataframes/commit/768b82af96a5dd0c434edcb171036eb42cfb9b41))
21+
* When using `repr_mode = "anywidget"`, numeric values align right ([15e6175](https://github.com/googleapis/python-bigquery-dataframes/commit/15e6175ec0aeb1b7b02d0bba9e8e1e018bd11c31))
22+
23+
24+
### Bug Fixes
25+
26+
* Address the packages issue for bigframes function ([#1991](https://github.com/googleapis/python-bigquery-dataframes/issues/1991)) ([68f1d22](https://github.com/googleapis/python-bigquery-dataframes/commit/68f1d22d5ed8457a5cabc7751ed1d178063dd63e))
27+
* Correct pypdf dependency specifier for remote PDF functions ([#1980](https://github.com/googleapis/python-bigquery-dataframes/issues/1980)) ([0bd5e1b](https://github.com/googleapis/python-bigquery-dataframes/commit/0bd5e1b3c004124d2100c3fbec2fbe1e965d1e96))
28+
* Enable default retries in calls to BQ Storage Read API ([#1985](https://github.com/googleapis/python-bigquery-dataframes/issues/1985)) ([f25d7bd](https://github.com/googleapis/python-bigquery-dataframes/commit/f25d7bd30800dffa65b6c31b0b7ac711a13d790f))
29+
* Fix the copyright year in dbt sample files ([#1996](https://github.com/googleapis/python-bigquery-dataframes/issues/1996)) ([fad5722](https://github.com/googleapis/python-bigquery-dataframes/commit/fad57223d129f0c95d0c6a066179bb66880edd06))
30+
31+
32+
### Performance Improvements
33+
34+
* Faster session startup by defering anon dataset fetch ([#1982](https://github.com/googleapis/python-bigquery-dataframes/issues/1982)) ([2720c4c](https://github.com/googleapis/python-bigquery-dataframes/commit/2720c4cf070bf57a0930d7623bfc41d89cc053ee))
35+
36+
37+
### Documentation
38+
39+
* Add examples of running bigframes in kaggle ([#2002](https://github.com/googleapis/python-bigquery-dataframes/issues/2002)) ([7d89d76](https://github.com/googleapis/python-bigquery-dataframes/commit/7d89d76976595b75cb0105fbe7b4f7ca2fdf49f2))
40+
* Remove preview warning from partial ordering mode sample notebook ([#1986](https://github.com/googleapis/python-bigquery-dataframes/issues/1986)) ([132e0ed](https://github.com/googleapis/python-bigquery-dataframes/commit/132e0edfe9f96c15753649d77fcb6edd0b0708a3))
41+
742
## [2.15.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.14.0...v2.15.0) (2025-08-11)
843

944

bigframes/core/compile/ibis_compiler/scalar_op_registry.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1062,7 +1062,7 @@ def isin_op_impl(x: ibis_types.Value, op: ops.IsInOp):
10621062
if op.match_nulls and contains_nulls:
10631063
return x.isnull() | x.isin(matchable_ibis_values)
10641064
else:
1065-
return x.isin(matchable_ibis_values)
1065+
return x.isin(matchable_ibis_values).fillna(False)
10661066

10671067

10681068
@scalar_op_compiler.register_unary_op(ops.ToDatetimeOp, pass_op=True)

bigframes/core/compile/polars/compiler.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,10 @@ def _(self, op: ops.ScalarOp, l_input: pl.Expr, r_input: pl.Expr) -> pl.Expr:
198198
def _(self, op: ops.ScalarOp, l_input: pl.Expr, r_input: pl.Expr) -> pl.Expr:
199199
return l_input | r_input
200200

201+
@compile_op.register(bool_ops.XorOp)
202+
def _(self, op: ops.ScalarOp, l_input: pl.Expr, r_input: pl.Expr) -> pl.Expr:
203+
return l_input ^ r_input
204+
201205
@compile_op.register(num_ops.AddOp)
202206
def _(self, op: ops.ScalarOp, l_input: pl.Expr, r_input: pl.Expr) -> pl.Expr:
203207
return l_input + r_input
@@ -259,11 +263,9 @@ def _(self, op: ops.ScalarOp, l_input: pl.Expr, r_input: pl.Expr) -> pl.Expr:
259263
def _(self, op: ops.ScalarOp, input: pl.Expr) -> pl.Expr:
260264
# TODO: Filter out types that can't be coerced to right type
261265
assert isinstance(op, gen_ops.IsInOp)
262-
if op.match_nulls or not any(map(pd.isna, op.values)):
263-
# newer polars version have nulls_equal arg
264-
return input.is_in(op.values)
265-
else:
266-
return input.is_in(op.values) or input.is_null()
266+
assert not op.match_nulls # should be stripped by a lowering step rn
267+
values = pl.Series(op.values, strict=False)
268+
return input.is_in(values)
267269

268270
@compile_op.register(gen_ops.FillNaOp)
269271
@compile_op.register(gen_ops.CoalesceOp)

bigframes/core/compile/polars/lowering.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,10 @@
1313
# limitations under the License.
1414

1515
import dataclasses
16+
from typing import cast
1617

1718
import numpy as np
19+
import pandas as pd
1820

1921
from bigframes import dtypes
2022
from bigframes.core import bigframe_node, expression
@@ -316,6 +318,35 @@ def lower(self, expr: expression.OpExpression) -> expression.Expression:
316318
return expr
317319

318320

321+
class LowerIsinOp(op_lowering.OpLoweringRule):
322+
@property
323+
def op(self) -> type[ops.ScalarOp]:
324+
return generic_ops.IsInOp
325+
326+
def lower(self, expr: expression.OpExpression) -> expression.Expression:
327+
assert isinstance(expr.op, generic_ops.IsInOp)
328+
arg = expr.children[0]
329+
new_values = []
330+
match_nulls = False
331+
for val in expr.op.values:
332+
# coercible, non-coercible
333+
# float NaN/inf should be treated as distinct from 'true' null values
334+
if cast(bool, pd.isna(val)) and not isinstance(val, float):
335+
if expr.op.match_nulls:
336+
match_nulls = True
337+
elif dtypes.is_compatible(val, arg.output_type):
338+
new_values.append(val)
339+
else:
340+
pass
341+
342+
new_isin = ops.IsInOp(tuple(new_values), match_nulls=False).as_expr(arg)
343+
if match_nulls:
344+
return ops.coalesce_op.as_expr(new_isin, expression.const(True))
345+
else:
346+
# polars propagates nulls, so need to coalesce to false
347+
return ops.coalesce_op.as_expr(new_isin, expression.const(False))
348+
349+
319350
def _coerce_comparables(
320351
expr1: expression.Expression,
321352
expr2: expression.Expression,
@@ -414,6 +445,7 @@ def _lower_cast(cast_op: ops.AsTypeOp, arg: expression.Expression):
414445
LowerModRule(),
415446
LowerAsTypeRule(),
416447
LowerInvertOp(),
448+
LowerIsinOp(),
417449
)
418450

419451

bigframes/core/compile/sqlglot/aggregations/unary_compiler.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,15 @@ def compile(
3737
return UNARY_OP_REGISTRATION[op](op, column, window=window)
3838

3939

40+
@UNARY_OP_REGISTRATION.register(agg_ops.CountOp)
41+
def _(
42+
op: agg_ops.CountOp,
43+
column: typed_expr.TypedExpr,
44+
window: typing.Optional[window_spec.WindowSpec] = None,
45+
) -> sge.Expression:
46+
return apply_window_if_present(sge.func("COUNT", column.expr), window)
47+
48+
4049
@UNARY_OP_REGISTRATION.register(agg_ops.SumOp)
4150
def _(
4251
op: agg_ops.SumOp,

bigframes/core/compile/sqlglot/compiler.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,9 @@ def compile_window(
336336
this=is_observation_expr, expression=expr
337337
)
338338
is_observation = ir._cast(is_observation_expr, "INT64")
339+
observation_count = windows.apply_window_if_present(
340+
sge.func("SUM", is_observation), window_spec
341+
)
339342
else:
340343
# Operations like count treat even NULLs as valid observations
341344
# for the sake of min_periods notnull is just used to convert
@@ -344,10 +347,10 @@ def compile_window(
344347
sge.Not(this=sge.Is(this=inputs[0], expression=sge.Null())),
345348
"INT64",
346349
)
350+
observation_count = windows.apply_window_if_present(
351+
sge.func("COUNT", is_observation), window_spec
352+
)
347353

348-
observation_count = windows.apply_window_if_present(
349-
sge.func("SUM", is_observation), window_spec
350-
)
351354
clauses.append(
352355
(
353356
observation_count < sge.convert(window_spec.min_periods),

bigframes/core/compile/sqlglot/expressions/binary_compiler.py

Lines changed: 97 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,12 @@
1414

1515
from __future__ import annotations
1616

17-
import bigframes_vendored.constants as constants
17+
import bigframes_vendored.constants as bf_constants
1818
import sqlglot.expressions as sge
1919

2020
from bigframes import dtypes
2121
from bigframes import operations as ops
22+
import bigframes.core.compile.sqlglot.expressions.constants as constants
2223
from bigframes.core.compile.sqlglot.expressions.op_registration import OpRegistration
2324
from bigframes.core.compile.sqlglot.expressions.typed_expr import TypedExpr
2425

@@ -37,50 +38,65 @@ def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
3738
return sge.Concat(expressions=[left.expr, right.expr])
3839

3940
if dtypes.is_numeric(left.dtype) and dtypes.is_numeric(right.dtype):
40-
left_expr = left.expr
41-
if left.dtype == dtypes.BOOL_DTYPE:
42-
left_expr = sge.Cast(this=left_expr, to="INT64")
43-
right_expr = right.expr
44-
if right.dtype == dtypes.BOOL_DTYPE:
45-
right_expr = sge.Cast(this=right_expr, to="INT64")
41+
left_expr = _coerce_bool_to_int(left)
42+
right_expr = _coerce_bool_to_int(right)
4643
return sge.Add(this=left_expr, expression=right_expr)
4744

4845
if (
4946
dtypes.is_time_or_date_like(left.dtype)
5047
and right.dtype == dtypes.TIMEDELTA_DTYPE
5148
):
52-
left_expr = left.expr
53-
if left.dtype == dtypes.DATE_DTYPE:
54-
left_expr = sge.Cast(this=left_expr, to="DATETIME")
49+
left_expr = _coerce_date_to_datetime(left)
5550
return sge.TimestampAdd(
5651
this=left_expr, expression=right.expr, unit=sge.Var(this="MICROSECOND")
5752
)
5853
if (
5954
dtypes.is_time_or_date_like(right.dtype)
6055
and left.dtype == dtypes.TIMEDELTA_DTYPE
6156
):
62-
right_expr = right.expr
63-
if right.dtype == dtypes.DATE_DTYPE:
64-
right_expr = sge.Cast(this=right_expr, to="DATETIME")
57+
right_expr = _coerce_date_to_datetime(right)
6558
return sge.TimestampAdd(
6659
this=right_expr, expression=left.expr, unit=sge.Var(this="MICROSECOND")
6760
)
6861
if left.dtype == dtypes.TIMEDELTA_DTYPE and right.dtype == dtypes.TIMEDELTA_DTYPE:
6962
return sge.Add(this=left.expr, expression=right.expr)
7063

7164
raise TypeError(
72-
f"Cannot add type {left.dtype} and {right.dtype}. {constants.FEEDBACK_LINK}"
65+
f"Cannot add type {left.dtype} and {right.dtype}. {bf_constants.FEEDBACK_LINK}"
7366
)
7467

7568

76-
@BINARY_OP_REGISTRATION.register(ops.div_op)
69+
@BINARY_OP_REGISTRATION.register(ops.eq_op)
70+
def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
71+
left_expr = _coerce_bool_to_int(left)
72+
right_expr = _coerce_bool_to_int(right)
73+
return sge.EQ(this=left_expr, expression=right_expr)
74+
75+
76+
@BINARY_OP_REGISTRATION.register(ops.eq_null_match_op)
7777
def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
7878
left_expr = left.expr
79-
if left.dtype == dtypes.BOOL_DTYPE:
80-
left_expr = sge.Cast(this=left_expr, to="INT64")
79+
if right.dtype != dtypes.BOOL_DTYPE:
80+
left_expr = _coerce_bool_to_int(left)
81+
8182
right_expr = right.expr
82-
if right.dtype == dtypes.BOOL_DTYPE:
83-
right_expr = sge.Cast(this=right_expr, to="INT64")
83+
if left.dtype != dtypes.BOOL_DTYPE:
84+
right_expr = _coerce_bool_to_int(right)
85+
86+
sentinel = sge.convert("$NULL_SENTINEL$")
87+
left_coalesce = sge.Coalesce(
88+
this=sge.Cast(this=left_expr, to="STRING"), expressions=[sentinel]
89+
)
90+
right_coalesce = sge.Coalesce(
91+
this=sge.Cast(this=right_expr, to="STRING"), expressions=[sentinel]
92+
)
93+
return sge.EQ(this=left_coalesce, expression=right_coalesce)
94+
95+
96+
@BINARY_OP_REGISTRATION.register(ops.div_op)
97+
def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
98+
left_expr = _coerce_bool_to_int(left)
99+
right_expr = _coerce_bool_to_int(right)
84100

85101
result = sge.func("IEEE_DIVIDE", left_expr, right_expr)
86102
if left.dtype == dtypes.TIMEDELTA_DTYPE and dtypes.is_numeric(right.dtype):
@@ -89,6 +105,39 @@ def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
89105
return result
90106

91107

108+
@BINARY_OP_REGISTRATION.register(ops.floordiv_op)
109+
def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
110+
left_expr = _coerce_bool_to_int(left)
111+
right_expr = _coerce_bool_to_int(right)
112+
113+
result: sge.Expression = sge.Cast(
114+
this=sge.Floor(this=sge.func("IEEE_DIVIDE", left_expr, right_expr)), to="INT64"
115+
)
116+
117+
# DIV(N, 0) will error in bigquery, but needs to return `0` for int, and
118+
# `inf`` for float in BQ so we short-circuit in this case.
119+
# Multiplying left by zero propogates nulls.
120+
zero_result = (
121+
constants._INF
122+
if (left.dtype == dtypes.FLOAT_DTYPE or right.dtype == dtypes.FLOAT_DTYPE)
123+
else constants._ZERO
124+
)
125+
result = sge.Case(
126+
ifs=[
127+
sge.If(
128+
this=sge.EQ(this=right_expr, expression=constants._ZERO),
129+
true=zero_result * left_expr,
130+
)
131+
],
132+
default=result,
133+
)
134+
135+
if dtypes.is_numeric(right.dtype) and left.dtype == dtypes.TIMEDELTA_DTYPE:
136+
result = sge.Cast(this=sge.Floor(this=result), to="INT64")
137+
138+
return result
139+
140+
92141
@BINARY_OP_REGISTRATION.register(ops.ge_op)
93142
def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
94143
return sge.GTE(this=left.expr, expression=right.expr)
@@ -101,12 +150,8 @@ def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
101150

102151
@BINARY_OP_REGISTRATION.register(ops.mul_op)
103152
def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
104-
left_expr = left.expr
105-
if left.dtype == dtypes.BOOL_DTYPE:
106-
left_expr = sge.Cast(this=left_expr, to="INT64")
107-
right_expr = right.expr
108-
if right.dtype == dtypes.BOOL_DTYPE:
109-
right_expr = sge.Cast(this=right_expr, to="INT64")
153+
left_expr = _coerce_bool_to_int(left)
154+
right_expr = _coerce_bool_to_int(right)
110155

111156
result = sge.Mul(this=left_expr, expression=right_expr)
112157

@@ -118,36 +163,33 @@ def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
118163
return result
119164

120165

166+
@BINARY_OP_REGISTRATION.register(ops.ne_op)
167+
def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
168+
left_expr = _coerce_bool_to_int(left)
169+
right_expr = _coerce_bool_to_int(right)
170+
return sge.NEQ(this=left_expr, expression=right_expr)
171+
172+
121173
@BINARY_OP_REGISTRATION.register(ops.sub_op)
122174
def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
123175
if dtypes.is_numeric(left.dtype) and dtypes.is_numeric(right.dtype):
124-
left_expr = left.expr
125-
if left.dtype == dtypes.BOOL_DTYPE:
126-
left_expr = sge.Cast(this=left_expr, to="INT64")
127-
right_expr = right.expr
128-
if right.dtype == dtypes.BOOL_DTYPE:
129-
right_expr = sge.Cast(this=right_expr, to="INT64")
176+
left_expr = _coerce_bool_to_int(left)
177+
right_expr = _coerce_bool_to_int(right)
130178
return sge.Sub(this=left_expr, expression=right_expr)
131179

132180
if (
133181
dtypes.is_time_or_date_like(left.dtype)
134182
and right.dtype == dtypes.TIMEDELTA_DTYPE
135183
):
136-
left_expr = left.expr
137-
if left.dtype == dtypes.DATE_DTYPE:
138-
left_expr = sge.Cast(this=left_expr, to="DATETIME")
184+
left_expr = _coerce_date_to_datetime(left)
139185
return sge.TimestampSub(
140186
this=left_expr, expression=right.expr, unit=sge.Var(this="MICROSECOND")
141187
)
142188
if dtypes.is_time_or_date_like(left.dtype) and dtypes.is_time_or_date_like(
143189
right.dtype
144190
):
145-
left_expr = left.expr
146-
if left.dtype == dtypes.DATE_DTYPE:
147-
left_expr = sge.Cast(this=left_expr, to="DATETIME")
148-
right_expr = right.expr
149-
if right.dtype == dtypes.DATE_DTYPE:
150-
right_expr = sge.Cast(this=right_expr, to="DATETIME")
191+
left_expr = _coerce_date_to_datetime(left)
192+
right_expr = _coerce_date_to_datetime(right)
151193
return sge.TimestampDiff(
152194
this=left_expr, expression=right_expr, unit=sge.Var(this="MICROSECOND")
153195
)
@@ -156,10 +198,24 @@ def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
156198
return sge.Sub(this=left.expr, expression=right.expr)
157199

158200
raise TypeError(
159-
f"Cannot subtract type {left.dtype} and {right.dtype}. {constants.FEEDBACK_LINK}"
201+
f"Cannot subtract type {left.dtype} and {right.dtype}. {bf_constants.FEEDBACK_LINK}"
160202
)
161203

162204

163205
@BINARY_OP_REGISTRATION.register(ops.obj_make_ref_op)
164206
def _(op, left: TypedExpr, right: TypedExpr) -> sge.Expression:
165207
return sge.func("OBJ.MAKE_REF", left.expr, right.expr)
208+
209+
210+
def _coerce_bool_to_int(typed_expr: TypedExpr) -> sge.Expression:
211+
"""Coerce boolean expression to integer."""
212+
if typed_expr.dtype == dtypes.BOOL_DTYPE:
213+
return sge.Cast(this=typed_expr.expr, to="INT64")
214+
return typed_expr.expr
215+
216+
217+
def _coerce_date_to_datetime(typed_expr: TypedExpr) -> sge.Expression:
218+
"""Coerce date expression to datetime."""
219+
if typed_expr.dtype == dtypes.DATE_DTYPE:
220+
return sge.Cast(this=typed_expr.expr, to="DATETIME")
221+
return typed_expr.expr

0 commit comments

Comments
 (0)