Skip to content

feat(bigquery): expose bytes processed stats after ExecuteUpdate#125

Draft
dataders wants to merge 2 commits into
mainfrom
feat/bq-bytes-processed-stat
Draft

feat(bigquery): expose bytes processed stats after ExecuteUpdate#125
dataders wants to merge 2 commits into
mainfrom
feat/bq-bytes-processed-stat

Conversation

@dataders
Copy link
Copy Markdown

@dataders dataders commented Apr 6, 2026

Summary

After a BigQuery ExecuteUpdate call (e.g. CREATE TABLE AS SELECT), the driver now waits for the job to complete and stores TotalBytesProcessed and TotalBytesBilled from the BigQuery JobStatistics on the statement. These values are accessible via GetOptionInt using two new option keys:

  • adbc.bigquery.sql.stat.bytes_processed
  • adbc.bigquery.sql.stat.bytes_billed

Values reset to 0 at the start of each ExecuteUpdate call.

Motivation

dbt-fusion users running BigQuery models currently have no visibility into query cost during development. The dbt-core BigQuery adapter has always surfaced ...GiB processed annotations in execution logs (from job.total_bytes_processed). This change unblocks dbt-fusion from doing the same.

Tracked in: dbt-labs/dbt-core#14462

Changes

  • driver.go: Add OptionIntStatBytesProcessed and OptionIntStatBytesBilled constants
  • record_reader.go: Add jobStats struct; extend runQuery signature with optional *jobStats out-param; populate it by calling job.Wait() on the DDL path (previously returned immediately without waiting)
  • statement.go: Add lastBytesProcessed/lastBytesBilled fields; wire up ExecuteUpdate to capture stats; expose via GetOptionInt

Test plan

  • Build passes: go build ./go/adbc/driver/bigquery/...
  • Existing tests pass: go test ./go/adbc/driver/bigquery/...
  • Manual: run a dbt model against BigQuery and verify bytes processed is non-zero after ExecuteUpdate
    EOF
    )

dataders and others added 2 commits March 18, 2026 12:31
…names

Enhances `LOAD_FLAG_SEARCH_SYSTEM` to also search well-known system
library directories, and teaches `search_path_list` to try the
platform-aware filename (e.g. `duckdb` → `libduckdb.dylib`) in each
search directory.

## Changes

**`system_lib_dirs()`** (new) — returns existing well-known lib paths:
- macOS: `/opt/homebrew/lib`, `/usr/local/lib`
- Linux: `/usr/lib`, arch-specific multiarch path, `/usr/local/lib`
- Windows: empty (uses registry)

**`get_search_paths()`** — extended the `LOAD_FLAG_SEARCH_SYSTEM` block
to append `system_lib_dirs()` after the ADBC config dir.

**`search_path_list()`** — after the bare-name attempt, also tries the
platform-aware filename via `libloading::library_filename()`.  Without
this, searching `/opt/homebrew/lib` for `duckdb` would never find
`libduckdb.dylib`.  The comment explains the motivating constraint:
macOS enforces matching Team IDs across all shared libraries in a
process, so a CDN-bundled driver (signed with one key) blocks
user-installed DuckDB extensions (signed with the DuckDB key); using the
system library avoids the mismatch.

**`driver_manifests.rst`** — documents the new system lib dir search
step under `LOAD_FLAG_SEARCH_SYSTEM`.

**Tests** — updates `test_get_search_paths` for the new behaviour; adds
`test_system_lib_dirs_returns_expected_paths` and
`test_search_path_list_uses_platform_filename`.

## Search order after this change (Unix/macOS)

1. `ADBC_DRIVER_PATH` env var (`LOAD_FLAG_SEARCH_ENV`)
2. Caller-provided `additional_search_paths`
3. `$CONDA_PREFIX/etc/adbc/drivers` (conda builds, `LOAD_FLAG_SEARCH_ENV`)
4. User config dir (`LOAD_FLAG_SEARCH_USER`)
5. System config dir (`LOAD_FLAG_SEARCH_SYSTEM`)
6. **NEW** System lib dirs — `/opt/homebrew/lib`, `/usr/local/lib`, etc.
7. OS dynamic linker fallback (`load_dynamic_from_name`)

## Motivation / downstream impact

This change was motivated by dbt-labs/fs#8693, which adds ~170 lines of
DuckDB-specific system library discovery to the `fs` repo because the
driver manager didn't search standard lib paths.  After this lands, that
PR can be simplified to a version bump plus a one-call replacement of its
custom discovery logic:

```rust
// Before (~105 lines: try_discover_system_duckdb_driver,
//   try_load_duckdb_from_env_paths, system_duckdb_search_paths,
//   duckdb_library_filename, plus a bespoke test):
if let Some(driver) = Self::try_discover_system_duckdb_driver(adbc_version) {
    return Ok(driver);
}

// After (single call; system lib dirs + platform filename handled in adbc):
if let Ok(driver) = ManagedAdbcDriver::load_from_name(
    backend, "duckdb", entrypoint, adbc_version, LOAD_FLAG_SEARCH_SYSTEM, None,
) {
    return Ok(driver);
}
```

The one minor behavioural difference: `fs` currently explicitly walks
`DYLD_LIBRARY_PATH`/`LD_LIBRARY_PATH` before the well-known paths; the
driver manager's `LOAD_FLAG_SEARCH_ENV` only covers `ADBC_DRIVER_PATH`.
In practice this is not a regression — the OS dynamic-linker fallback
(step 7) naturally honours those env vars — the only loss is a
per-path tracing log line.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cuteUpdate

After ExecuteUpdate completes (e.g. CREATE TABLE AS SELECT), store
TotalBytesProcessed and TotalBytesBilled from the BigQuery JobStatistics
on the statement struct. These are accessible via GetOptionInt with the
new OptionIntStatBytesProcessed and OptionIntStatBytesBilled keys.

This allows consumers like dbt-fusion to log `X.X GiB processed`
annotations in execution output, matching dbt-core's bigquery adapter
behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant