Skip to content

Commit 83f39b9

Browse files
docs(destination-bigquery): update BigQuery destination docs
1 parent c03d43d commit 83f39b9

1 file changed

Lines changed: 87 additions & 39 deletions

File tree

docs/integrations/destinations/bigquery.md

Lines changed: 87 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
11
# BigQuery
22

3-
Setting up the BigQuery destination connector involves setting up the data loading method and configuring the BigQuery destination connector
4-
using the Airbyte UI.
5-
6-
This page guides you through setting up the BigQuery destination connector.
3+
Use the BigQuery destination connector to load Airbyte data into Google BigQuery. This page
4+
explains the required Google Cloud resources, loading methods, connector settings, and
5+
BigQuery-specific behavior.
76

87
## Prerequisites
98

@@ -13,19 +12,20 @@ This page guides you through setting up the BigQuery destination connector.
1312
version `v0.40.0-alpha` or newer and upgrade your BigQuery connector to version `1.1.14` or newer
1413
- [A Google Cloud project with BigQuery enabled](https://cloud.google.com/bigquery/docs/quickstarts/query-public-dataset-console)
1514
- [A BigQuery dataset](https://cloud.google.com/bigquery/docs/quickstarts/quickstart-web-ui#create_a_dataset)
16-
to sync data to.
15+
to sync data to, or permission for Airbyte to create datasets in your Google Cloud project.
1716

1817
**Note:** Queries written in BigQuery can only reference datasets in the same physical location.
1918
If you plan on combining the data that Airbyte syncs with data from other datasets in your
2019
queries, create the datasets in the same location on Google Cloud. For more information, read
2120
[Introduction to Datasets](https://cloud.google.com/bigquery/docs/datasets-intro)
2221

23-
- (Required for Airbyte Cloud; Optional for Airbyte Open Source) A Google Cloud
24-
[Service Account](https://cloud.google.com/iam/docs/service-accounts) with the
22+
- (Required for Airbyte Cloud; optional for Airbyte Open Source) A Google Cloud
23+
[service account](https://cloud.google.com/iam/docs/service-accounts) with the
2524
[`BigQuery User`](https://cloud.google.com/bigquery/docs/access-control#bigquery) and
26-
[`BigQuery Data Editor`](https://cloud.google.com/bigquery/docs/access-control#bigquery) roles and
27-
the
28-
[Service Account Key in JSON format](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
25+
[`BigQuery Data Editor`](https://cloud.google.com/bigquery/docs/access-control#bigquery) roles.
26+
The connector creates datasets when needed, creates and updates tables, and runs load and query
27+
jobs. If you don't use default Google credentials, create a
28+
[service account key in JSON format](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
2929
- If you're using Airbyte Cloud and this destination uses IP-based access controls, add
3030
Airbyte's [IP addresses](/platform/operating-airbyte/ip-allowlist) to your allowlist.
3131

@@ -35,11 +35,15 @@ This page guides you through setting up the BigQuery destination connector.
3535

3636
#### Using Batched Standard Inserts
3737

38-
You can use the BigQuery driver's built-in conversion to take `INSERT` statements and convert that to file uploads which are then loaded into BigQuery in batches. This is the simplest way to load data into BigQuery in a performant way. These staging files are managed by BigQuery and deleted automatically after the load is complete.
38+
Use Batched Standard Inserts to let the BigQuery driver convert large `INSERT` statements into file
39+
uploads that BigQuery loads in batches. This is the simplest way to load data into BigQuery
40+
efficiently. BigQuery manages and automatically deletes these staging files after the load
41+
completes.
3942

4043
#### Using a Google Cloud Storage bucket
4144

42-
If you want more control of how and where your staging files are stored, you can opt to use a GCS bucket.
45+
If you want more control of how and where your staging files are stored, you can opt to use a GCS
46+
bucket.
4347

4448
To use a Google Cloud Storage bucket:
4549

@@ -49,7 +53,7 @@ To use a Google Cloud Storage bucket:
4953
2. [Create an HMAC key and access ID](https://cloud.google.com/storage/docs/authentication/managing-hmackeys#create).
5054
3. Grant the
5155
[`Storage Object Admin` role](https://cloud.google.com/storage/docs/access-control/iam-roles#standard-roles)
52-
to the Google Cloud [Service Account](https://cloud.google.com/iam/docs/service-accounts). This
56+
to the Google Cloud [service account](https://cloud.google.com/iam/docs/service-accounts). This
5357
must be the same service account as the one you configure for BigQuery access in the
5458
[BigQuery connector setup step](#step-2-set-up-the-bigquery-connector).
5559
4. Make sure your Cloud Storage bucket is accessible from the machine running Airbyte. The easiest
@@ -82,29 +86,17 @@ You cannot change the location later.
8286
[GCS Staging](#using-a-google-cloud-storage-bucket).
8387
9. For **Service Account Key JSON (Required for cloud, optional for open-source)**, enter the Google
8488
Cloud
85-
[Service Account Key in JSON format](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
89+
[service account key in JSON format](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
8690

8791
:::note
88-
Be sure to copy all contents in the Account Key JSON file including the brackets.
92+
Be sure to copy all contents in the service account key JSON file, including the braces.
8993
:::
9094

91-
11. For **Transformation Query Run Type (Optional)**, select **interactive** to have
92-
[BigQuery run interactive query jobs](https://cloud.google.com/bigquery/docs/running-queries#queries)
93-
or **batch** to have
94-
[BigQuery run batch queries](https://cloud.google.com/bigquery/docs/running-queries#batch).
95-
96-
:::note
97-
Interactive queries are executed as soon as possible and count towards daily concurrent
98-
quotas and limits, while batch queries are executed as soon as idle resources are available in
99-
the BigQuery shared resource pool. If BigQuery hasn't started the query within 24 hours,
100-
BigQuery changes the job priority to interactive. Batch queries don't count towards your
101-
concurrent rate limit, making it easier to start many queries at once.
102-
:::
103-
104-
11. For **Google BigQuery Client Chunk Size (Optional)**, use the default value of 15 MiB. Later, if
105-
you see networking or memory management problems with the sync (specifically on the
106-
destination), try decreasing the chunk size. In that case, the sync will be slower but more
107-
likely to succeed.
95+
10. For **CDC deletion mode**, choose how the destination handles delete records from CDC sources.
96+
**Hard delete** propagates source deletes to the destination table. **Soft delete** keeps the
97+
row and records the delete marker.
98+
11. Optional: expand **Advanced** to configure **Legacy raw tables** or **Airbyte Internal Table
99+
Dataset Name**. Enable **Legacy raw tables** only if you need the pre-3.0 raw table format.
108100

109101
## Supported sync modes
110102

@@ -193,8 +185,8 @@ The service account does not have the proper permissions.
193185
- Make sure the BigQuery service account has `BigQuery User` and `BigQuery Data Editor` roles or
194186
equivalent permissions as those two roles.
195187
- If the GCS staging mode is selected, ensure the BigQuery service account has the right permissions
196-
to the GCS bucket and path or the `Cloud Storage Admin` role, which includes a superset of the
197-
required permissions.
188+
to the GCS bucket and path or the `Storage Object Admin` role, which includes the required object
189+
permissions.
198190

199191
The HMAC key is wrong.
200192

@@ -214,8 +206,6 @@ If your sync fails with `BigQueryException: 400 Bad Request` and the message
214206
headers on requests to `bigquery.googleapis.com`.
215207
- Verify the service account key has not been rotated or revoked since the connection was
216208
configured.
217-
- Try reducing the **Google BigQuery Client Chunk Size** from the default 15 MiB to a
218-
smaller value (for example, 5 MiB).
219209
- Try reducing concurrent syncs to your BigQuery instance or table. Contention is a
220210
possible contributing factor.
221211

@@ -243,7 +233,65 @@ tutorials:
243233

244234
## Namespace support
245235

246-
This destination supports [namespaces](https://docs.airbyte.com/platform/using-airbyte/core-concepts/namespaces). The namespace maps to a BigQuery dataset.
236+
This destination supports
237+
[namespaces](https://docs.airbyte.com/platform/using-airbyte/core-concepts/namespaces). The
238+
namespace maps to a BigQuery dataset.
239+
240+
## Reference
241+
242+
Use the following field names and values when you configure this destination with PyAirbyte,
243+
Terraform, or the Airbyte API.
244+
245+
### Required fields
246+
247+
| Field | Description |
248+
| :--- | :--- |
249+
| `project_id` | Google Cloud project ID for the project that contains the target BigQuery dataset. |
250+
| `dataset_location` | BigQuery dataset location. Use one of the locations shown in the Airbyte UI, such as `US`, `EU`, or a supported regional location like `us-east1`. |
251+
| `dataset_id` | Default BigQuery dataset ID. If the source stream doesn't specify a namespace, Airbyte writes tables to this dataset. |
252+
253+
### Optional fields
254+
255+
| Field | Description |
256+
| :--- | :--- |
257+
| `credentials_json` | Contents of the Google Cloud service account key JSON file. Required in Airbyte Cloud. Optional in Airbyte Open Source when the worker can use default Google credentials. |
258+
| `cdc_deletion_mode` | How to handle delete records from CDC sources. Valid values are `Hard delete` and `Soft delete`. Defaults to `Hard delete`. |
259+
| `disable_type_dedupe` | Set to `true` to write the legacy raw table format instead of final direct-load tables. Defaults to `false`. |
260+
| `raw_data_dataset` | Dataset for Airbyte internal tables. In legacy raw tables mode, raw tables are stored in this dataset. Defaults to `airbyte_internal`. |
261+
262+
### Loading method examples
263+
264+
For Batched Standard Inserts, set `loading_method.method` to `Standard`:
265+
266+
```json
267+
{
268+
"loading_method": {
269+
"method": "Standard"
270+
}
271+
}
272+
```
273+
274+
For GCS Staging, set `loading_method.method` to `GCS Staging` and provide the bucket, path, and
275+
HMAC key credentials:
276+
277+
```json
278+
{
279+
"loading_method": {
280+
"method": "GCS Staging",
281+
"credential": {
282+
"credential_type": "HMAC_KEY",
283+
"hmac_key_access_id": "<your-hmac-access-id>",
284+
"hmac_key_secret": "<your-hmac-secret>"
285+
},
286+
"gcs_bucket_name": "<bucket-name>",
287+
"gcs_bucket_path": "<path-prefix>",
288+
"keep_files_in_gcs-bucket": "Delete all tmp files from GCS"
289+
}
290+
}
291+
```
292+
293+
Set `keep_files_in_gcs-bucket` to `Keep all tmp files in GCS` if you want to retain temporary
294+
staging files after BigQuery load jobs complete.
247295

248296
## Changelog
249297

@@ -252,8 +300,8 @@ This destination supports [namespaces](https://docs.airbyte.com/platform/using-a
252300

253301
| Version | Date | Pull Request | Subject |
254302
|:------------|:-----------|:-----------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
255-
| 3.0.19 | 2026-05-21 | [78239](https://github.com/airbytehq/airbyte/pull/78239) | Promoting release candidate 3.0.19-rc.1 to a main version. |
256-
| 3.0.19-rc.1 | 2026-05-19 | [78239](https://github.com/airbytehq/airbyte/pull/78239) | Upgrade CDK to 1.0.13. Progressive rollout. |
303+
| 3.0.19 | 2026-05-22 | [78333](https://github.com/airbytehq/airbyte/pull/78333) | Promoting release candidate 3.0.19-rc.1 to a main version. |
304+
| 3.0.19-rc.1 | 2026-05-20 | [78239](https://github.com/airbytehq/airbyte/pull/78239) | Upgrade CDK to 1.0.13. Progressive rollout. |
257305
| 3.0.18 | 2026-03-31 | [75913](https://github.com/airbytehq/airbyte/pull/75913) | Finalize upgrade BigQuery Cloud dependencies and CDK version |
258306
| 3.0.18-rc.1 | 2026-03-27 | [75541](https://github.com/airbytehq/airbyte/pull/75541) | Upgrade BigQuery Cloud dependencies and CDK version |
259307
| 3.0.17 | 2026-01-28 | [72427](https://github.com/airbytehq/airbyte/pull/72427) | Finalize upgrade CDK to 0.2.0 |

0 commit comments

Comments
 (0)