fix(bigquery): route JOB_CREATION_REQUIRED through fast query path#13437
fix(bigquery): route JOB_CREATION_REQUIRED through fast query path#13437jinseopkim0 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request enables fast query support when the job creation mode is set to JOB_CREATION_REQUIRED in QueryRequestInfo. This allows queries with this configuration to execute via the fast query path, returning a TableResult directly (with both JobId and QueryId populated) rather than creating a separate job. Unit and integration tests have been updated to reflect and verify this behavior. I have no feedback to provide.
| && config.getTimePartitioning() == null | ||
| && config.getUserDefinedFunctions() == null | ||
| && config.getWriteDisposition() == null | ||
| && config.getJobCreationMode() != JobCreationMode.JOB_CREATION_REQUIRED; |
There was a problem hiding this comment.
qq, wouldn't this just end up always doing a fast query (default is set to required)? IIUC, I think should be a fast query only for JobCreationMode.JOB_CREATION_OPTIONAL?
Is there any performance impact or behavioral change if default to a fast query even if a user explicitly sets job_required?
There was a problem hiding this comment.
Thanks for the questions.
qq, wouldn't this just end up always doing a fast query (default is set to required)?
Yes, this is intended.
IIUC, I think should be a fast query only for JobCreationMode.JOB_CREATION_OPTIONAL?
Fast query should be for both. The BigQuery backend always creates a job in the background and returns a jobReference (including the jobId, see https://docs.cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#QueryResponse) for all jobs.query requests, so fast query makes sense.
Is there any performance impact or behavioral change if default to a fast query even if a user explicitly sets job_required?
The result is faster queries and lower latencies. There is no behavioral change as a job is still created in the background and tracked as expected.
There was a problem hiding this comment.
Gotcha, I think faster queries is always better for the user, but something seems a bit weird about JobCreationMode to me. Why even have an JOB_CREATION_OPTIONAL configuration on the client side and not just do it under the hood on the server side?
Since we have this configuration, it seems odd to have a customer specify JOB_CREATION_REQUIRED and potentially not result in a job back. Could the original issue be solved if fast query runs only when JOB_CREATION_OPTIONAL is specified? It seems like the issue was the fast query logic was running on the wrong conditions?
There was a problem hiding this comment.
Thanks for the questions.
Why even have an
JOB_CREATION_OPTIONALconfiguration on the client side and not just do it under the hood on the server side?
JOB_CREATION_OPTIONAL may run stateless queries and jobReference may be null, enabling stateless optimizations.
Since we have this configuration, it seems odd to have a customer specify JOB_CREATION_REQUIRED and potentially not result in a job back.
If a customer specifies JOB_CREATION_REQUIRED, a jobReference (with a jobId) is returned in the response. (https://docs.cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#QueryResponse)
Could the original issue be solved if fast query runs only when
JOB_CREATION_OPTIONALis specified? It seems like the issue was the fast query logic was running on the wrong conditions?
JOB_CREATION_REQUIRED is the default mode. If fast query only ran for OPTIONAL, all default queries would remain on the slow fallback path. The original issue (b/522363981) was a latency issue due to fast query logic not running when JOB_CREATION_REQUIRED was true.
There was a problem hiding this comment.
JOB_CREATION_OPTIONAL may run stateless queries and jobReference may be null, enabling stateless optimizations.
Sorry I meant, I think it's weird for BigQuery service team to even expose a JobCreationMode proto if the intention was to do fast query by default. That's why it makes me think that we shouldn't do this by default and only do it if the mode is set to optional (but if other BQ clients are doing this, then happy to stand corrected).
From what I see in b/522363981, they fixed it by explicitly setting the JOB_CREATION_OPTIONAL. I think the only fix we need to set config.getJobCreationMode() == JobCreationMode.JOB_CREATION_OPTIONAL to ensure that the benchmark uses fast query.
Routes queries under
JobCreationMode.JOB_CREATION_REQUIREDto the fast query path (jobs.queryAPI / 1 RPC) to avoid the slow fallback path (jobs.insertAPI / 2 RPCs).https://docs.cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#QueryResponse says:
Thus, routing
JOB_CREATION_REQUIREDthrough the fast path is preferred.b/522363981