Skip to content

[PECOBLR-25] Add server feature flag consumption in the driver#828

Merged
samikshya-db merged 11 commits into
mainfrom
samikshya-chand_data/telemetryClosure
May 30, 2025
Merged

[PECOBLR-25] Add server feature flag consumption in the driver#828
samikshya-db merged 11 commits into
mainfrom
samikshya-chand_data/telemetryClosure

Conversation

@samikshya-db

@samikshya-db samikshya-db commented May 16, 2025

Copy link
Copy Markdown
Collaborator

Description

This PR adds client implementation of feature flag complex, i.e., achieves the following :

  1. Add server feature flag consumption in the repo
  2. Consume the telemetry feature flag in the telemetry client creation
  3. Remove the fetching of DatabricksConfig from threadContext (We instead fetch it from connectionContext for fetching server side feature flag context)
  4. As we are adding server side flag implementation, it would require the auth headers. Hence, the no-auth endpoint is no longer valid in context of this driver. This PR removes instances of no-auth endpoint too.

Testing

  • Added unit tests

Additional Notes to the Reviewer

NO_CHANGELOG=true

@samikshya-db samikshya-db changed the title [PECOBLR-25] [Draft] Add server feature flag consumption in the driver [PECOBLR-25] Add server feature flag consumption in the driver May 19, 2025
@gopalldb gopalldb requested a review from vikrantpuppala May 20, 2025 06:41

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this should be e, message?

i feel the args of our logger are very confusing, i keep making the same mistake everytime with our logger impl, we should improve this long term

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For error logs, e should be 1st param

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is the pattern across jdbc but I don't know why we started creating factory classes as singleton, is there a reason you know?

if not, can we stop using this pattern going forward and just have a simple factory pattern without the singleton-ness?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just skimmed through this class. It is not a stateless class and manages the reusable ClientConfigurator beans (again not sure of the whole context around ClientConfigurator instances). in that sense singleton should be fine imo?

@vikrantpuppala vikrantpuppala May 20, 2025

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reusable ClientConfigurator instances could just be stored in a static map (instances below changed to static), no?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static map should be fine too, that is another way of imposing singleton instances. We are doing same thing by having the static factory.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not, can we stop using this pattern going forward and just have a simple factory pattern without the singleton-ness?

I agree this is a pattern that I do not understand myself. (and like you pointed out, I have used the other way in the factoryFlagContext). I don't think there is any major differences to the pattern though.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static map should be fine too, that is another way of imposing singleton instances. We are doing same thing by having the static factory.

yes it's similar. I think this is not an anti-pattern at all. the name factory is misleading. I will rename such factory instances to Handler like DatabricksAuthClientHandler, DatabricksHttpClientHandler, etc.

Comment thread src/main/java/com/databricks/jdbc/auth/DatabricksAuthClientFactory.java Outdated
Comment thread src/main/java/com/databricks/jdbc/auth/DatabricksAuthClientFactory.java Outdated
Comment thread src/main/java/com/databricks/jdbc/auth/DatabricksAuthClientFactory.java Outdated

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be better to do statusCode / 100 == 2

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are expecting a 200 only from the server, I don't think we would want to include 2xx codes at this point.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but will the response be available in non-200 cases?

Comment thread src/main/java/com/databricks/jdbc/common/DatabricksDriverFeatureFlagsContext.java Outdated
Comment on lines 80 to 82

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make this function future-proof accounting for non-boolean flags?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[c1] I will skip it for now. I know a JSON will come our way (based on https://github.com/databricks-eng/universe/pull/1038986/files?w=1). I will make changes after server changes are merged and we know what the response will look like.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. We will get all kind of values in the feature value, that ideally itself should be a JSON. Feature enabled will be specific to a particular feature.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it'll be a json unless we plan on deviating from the existing safe implementation. The allowed types for a safe flag are Integer, String, Boolean, Long, Double, String List

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it'll be a json unless we plan on deviating from the existing safe implementation

The server implementation PR indicates that the response can be JSON. This is WIP and will change accordingly.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ not using singleton for this factory

Comment thread src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java Outdated
Comment thread src/main/java/com/databricks/jdbc/exception/DatabricksChunkDownloadException.java Outdated

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle this gracefully:

  1. The feature name may not exist (null, treat as false)
  2. It may not be boolean (parsing error, treat as false)

@samikshya-db samikshya-db May 22, 2025

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping this because:

  1. This is handled by hashmap.get
  2. This is handled by Boolean.parseBoolean

@jayantsing-db

Copy link
Copy Markdown
Collaborator

It looks like that for each connection, we will have feature flag context. That is okay. But right now, for each connection, it will send a request to connector-service. This can have serious implications to connector-service and safe-service given our user-base and concurrent environments. The connector-service includes a caching TTL in response. We should consume this field carefully and have appropriate caching on the client side.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we also send expiry time from server. If that is present, we should cache that as well, and keep updating this after reaching the threshold.

We can use readymade cacheLoader for doing this kind of stuff. Check LoadingCache from Guava: https://medium.com/@ramachandrankrish/building-an-in-memory-cache-with-ttl-expiry-using-guava-and-expiringmap-d767e12b4c1b

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of session-Id, we should cache this on key cluster-resource-Id (warehouse-Id or workspace-Id). This will allow reusing the value for another connection for the same cluster resource.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://databricks.atlassian.net/browse/PECOBLR-25?focusedCommentId=6550419 : more context here after the discussion with you and Jayant.

@samikshya-db samikshya-db May 23, 2025

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check LoadingCache from Guava

Nice suggestion 😄 Implemented ✅

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can instead use refreshAfterWrite(..), it will thus refresh asynchronously, and save latency when value is needed

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refreshAfterWrite does not apply to this use-case as we are fetching the feature flags in bulk. Guava LoadingCache expects per-key refresh, not full map refresh

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need per key expiry here? Won't all the keys be fetched in one shot? I had thought that cache would be keyed on compute resource-Id.

We are not going to have per feature flag separate expiry, thus this expiry will be irrelevant.

I think we can do like this:

CacheBuilder => compute resource-Id -> featureFlags
featureFlags can be regular map, Map<String, Values>

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, looks like you are talking about LoadingCache here. Makes sense, implemented ✅

@samikshya-db samikshya-db requested a review from gopalldb May 28, 2025 10:40
@samikshya-db samikshya-db merged commit 4d420f2 into databricks:main May 30, 2025
16 checks passed
@samikshya-db samikshya-db deleted the samikshya-chand_data/telemetryClosure branch May 30, 2025 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants