You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
to sync data to, or permission for Airbyte to create datasets in your Google Cloud project.
17
16
18
17
**Note:** Queries written in BigQuery can only reference datasets in the same physical location.
19
18
If you plan on combining the data that Airbyte syncs with data from other datasets in your
20
19
queries, create the datasets in the same location on Google Cloud. For more information, read
21
20
[Introduction to Datasets](https://cloud.google.com/bigquery/docs/datasets-intro)
22
21
23
-
- (Required for Airbyte Cloud; Optional for Airbyte Open Source) A Google Cloud
24
-
[Service Account](https://cloud.google.com/iam/docs/service-accounts) with the
22
+
- (Required for Airbyte Cloud; optional for Airbyte Open Source) A Google Cloud
23
+
[service account](https://cloud.google.com/iam/docs/service-accounts) with the
25
24
[`BigQuery User`](https://cloud.google.com/bigquery/docs/access-control#bigquery) and
26
-
[`BigQuery Data Editor`](https://cloud.google.com/bigquery/docs/access-control#bigquery) roles and
27
-
the
28
-
[Service Account Key in JSON format](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
25
+
[`BigQuery Data Editor`](https://cloud.google.com/bigquery/docs/access-control#bigquery) roles.
26
+
The connector creates datasets when needed, creates and updates tables, and runs load and query
27
+
jobs. If you don't use default Google credentials, create a
28
+
[service account key in JSON format](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
29
29
- If you're using Airbyte Cloud and this destination uses IP-based access controls, add
30
30
Airbyte's [IP addresses](/platform/operating-airbyte/ip-allowlist) to your allowlist.
31
31
@@ -35,11 +35,15 @@ This page guides you through setting up the BigQuery destination connector.
35
35
36
36
#### Using Batched Standard Inserts
37
37
38
-
You can use the BigQuery driver's built-in conversion to take `INSERT` statements and convert that to file uploads which are then loaded into BigQuery in batches. This is the simplest way to load data into BigQuery in a performant way. These staging files are managed by BigQuery and deleted automatically after the load is complete.
38
+
Use Batched Standard Inserts to let the BigQuery driver convert large `INSERT` statements into file
39
+
uploads that BigQuery loads in batches. This is the simplest way to load data into BigQuery
40
+
efficiently. BigQuery manages and automatically deletes these staging files after the load
41
+
completes.
39
42
40
43
#### Using a Google Cloud Storage bucket
41
44
42
-
If you want more control of how and where your staging files are stored, you can opt to use a GCS bucket.
45
+
If you want more control of how and where your staging files are stored, you can opt to use a GCS
46
+
bucket.
43
47
44
48
To use a Google Cloud Storage bucket:
45
49
@@ -49,7 +53,7 @@ To use a Google Cloud Storage bucket:
49
53
2.[Create an HMAC key and access ID](https://cloud.google.com/storage/docs/authentication/managing-hmackeys#create).
9. For **Service Account Key JSON (Required for cloud, optional for open-source)**, enter the Google
84
88
Cloud
85
-
[Service Account Key in JSON format](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
89
+
[service account key in JSON format](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
86
90
87
91
:::note
88
-
Be sure to copy all contents in the Account Key JSON file including the brackets.
92
+
Be sure to copy all contents in the service account key JSON file, including the braces.
89
93
:::
90
94
91
-
11. For **Transformation Query Run Type (Optional)**, select **interactive** to have
92
-
[BigQuery run interactive query jobs](https://cloud.google.com/bigquery/docs/running-queries#queries)
93
-
or **batch** to have
94
-
[BigQuery run batch queries](https://cloud.google.com/bigquery/docs/running-queries#batch).
95
-
96
-
:::note
97
-
Interactive queries are executed as soon as possible and count towards daily concurrent
98
-
quotas and limits, while batch queries are executed as soon as idle resources are available in
99
-
the BigQuery shared resource pool. If BigQuery hasn't started the query within 24 hours,
100
-
BigQuery changes the job priority to interactive. Batch queries don't count towards your
101
-
concurrent rate limit, making it easier to start many queries at once.
102
-
:::
103
-
104
-
11. For **Google BigQuery Client Chunk Size (Optional)**, use the default value of 15 MiB. Later, if
105
-
you see networking or memory management problems with the sync (specifically on the
106
-
destination), try decreasing the chunk size. In that case, the sync will be slower but more
107
-
likely to succeed.
95
+
10. For **CDC deletion mode**, choose how the destination handles delete records from CDC sources.
96
+
**Hard delete** propagates source deletes to the destination table. **Soft delete** keeps the
97
+
row and records the delete marker.
98
+
11. Optional: expand **Advanced** to configure **Legacy raw tables** or **Airbyte Internal Table
99
+
Dataset Name**. Enable **Legacy raw tables** only if you need the pre-3.0 raw table format.
108
100
109
101
## Supported sync modes
110
102
@@ -193,8 +185,8 @@ The service account does not have the proper permissions.
193
185
- Make sure the BigQuery service account has `BigQuery User` and `BigQuery Data Editor` roles or
194
186
equivalent permissions as those two roles.
195
187
- If the GCS staging mode is selected, ensure the BigQuery service account has the right permissions
196
-
to the GCS bucket and path or the `Cloud Storage Admin` role, which includes a superset of the
197
-
required permissions.
188
+
to the GCS bucket and path or the `Storage Object Admin` role, which includes the required object
189
+
permissions.
198
190
199
191
The HMAC key is wrong.
200
192
@@ -214,8 +206,6 @@ If your sync fails with `BigQueryException: 400 Bad Request` and the message
214
206
headers on requests to `bigquery.googleapis.com`.
215
207
- Verify the service account key has not been rotated or revoked since the connection was
216
208
configured.
217
-
- Try reducing the **Google BigQuery Client Chunk Size** from the default 15 MiB to a
218
-
smaller value (for example, 5 MiB).
219
209
- Try reducing concurrent syncs to your BigQuery instance or table. Contention is a
220
210
possible contributing factor.
221
211
@@ -243,7 +233,65 @@ tutorials:
243
233
244
234
## Namespace support
245
235
246
-
This destination supports [namespaces](https://docs.airbyte.com/platform/using-airbyte/core-concepts/namespaces). The namespace maps to a BigQuery dataset.
236
+
This destination supports
237
+
[namespaces](https://docs.airbyte.com/platform/using-airbyte/core-concepts/namespaces). The
238
+
namespace maps to a BigQuery dataset.
239
+
240
+
## Reference
241
+
242
+
Use the following field names and values when you configure this destination with PyAirbyte,
243
+
Terraform, or the Airbyte API.
244
+
245
+
### Required fields
246
+
247
+
| Field | Description |
248
+
| :--- | :--- |
249
+
|`project_id`| Google Cloud project ID for the project that contains the target BigQuery dataset. |
250
+
|`dataset_location`| BigQuery dataset location. Use one of the locations shown in the Airbyte UI, such as `US`, `EU`, or a supported regional location like `us-east1`. |
251
+
|`dataset_id`| Default BigQuery dataset ID. If the source stream doesn't specify a namespace, Airbyte writes tables to this dataset. |
252
+
253
+
### Optional fields
254
+
255
+
| Field | Description |
256
+
| :--- | :--- |
257
+
|`credentials_json`| Contents of the Google Cloud service account key JSON file. Required in Airbyte Cloud. Optional in Airbyte Open Source when the worker can use default Google credentials. |
258
+
|`cdc_deletion_mode`| How to handle delete records from CDC sources. Valid values are `Hard delete` and `Soft delete`. Defaults to `Hard delete`. |
259
+
|`disable_type_dedupe`| Set to `true` to write the legacy raw table format instead of final direct-load tables. Defaults to `false`. |
260
+
|`raw_data_dataset`| Dataset for Airbyte internal tables. In legacy raw tables mode, raw tables are stored in this dataset. Defaults to `airbyte_internal`. |
261
+
262
+
### Loading method examples
263
+
264
+
For Batched Standard Inserts, set `loading_method.method` to `Standard`:
265
+
266
+
```json
267
+
{
268
+
"loading_method": {
269
+
"method": "Standard"
270
+
}
271
+
}
272
+
```
273
+
274
+
For GCS Staging, set `loading_method.method` to `GCS Staging` and provide the bucket, path, and
275
+
HMAC key credentials:
276
+
277
+
```json
278
+
{
279
+
"loading_method": {
280
+
"method": "GCS Staging",
281
+
"credential": {
282
+
"credential_type": "HMAC_KEY",
283
+
"hmac_key_access_id": "<your-hmac-access-id>",
284
+
"hmac_key_secret": "<your-hmac-secret>"
285
+
},
286
+
"gcs_bucket_name": "<bucket-name>",
287
+
"gcs_bucket_path": "<path-prefix>",
288
+
"keep_files_in_gcs-bucket": "Delete all tmp files from GCS"
289
+
}
290
+
}
291
+
```
292
+
293
+
Set `keep_files_in_gcs-bucket` to `Keep all tmp files in GCS` if you want to retain temporary
294
+
staging files after BigQuery load jobs complete.
247
295
248
296
## Changelog
249
297
@@ -252,8 +300,8 @@ This destination supports [namespaces](https://docs.airbyte.com/platform/using-a
0 commit comments