Skip to content

Commit aa6bdc5

Browse files
Honor AI_AGENT and pass raw values through (#815)
## Why The Java SDK detects AI coding agents and surfaces them as `agent/<name>` in the User-Agent. Today the generic fallback (when no proprietary env var fires) only honors the agents.md `AGENT=<name>` standard. Vercel's `@vercel/detect-agent` library uses a parallel `AI_AGENT=<name>` convention that tools in the Vercel ecosystem set instead; we currently miss those. Separately, the existing fallback coerces any unrecognized value to the literal string `"unknown"`. That buries useful signal: a tool setting `AI_AGENT=claude-code_2-1-141_agent` ends up as `agent/unknown`, discarding the very signal (tool name plus version variant) we want to see. Bucketing arbitrary names is an ETL concern, not the SDK's. This mirrors the Go SDK change in databricks/databricks-sdk-go#1683. ## Changes Two behavior changes in `src/main/java/com/databricks/sdk/core/UserAgent.java`: 1. **`AI_AGENT` fallback.** Add `AI_AGENT=<name>` as a secondary fallback after `AGENT=<name>`. `AGENT` wins when both are set to non-empty values; empty is treated as unset for both. Explicit product matchers (e.g. `CLAUDECODE`) still always win over both. 2. **Raw passthrough instead of `"unknown"`.** Drop the known-product lookup in the fallback. The value is piped through the existing `sanitize()` helper (disallowed chars become `-`, satisfying the User-Agent allowlist `[0-9A-Za-z_.+-]`) and capped at 64 chars to keep the header bounded. Known products like `cursor` or `claude-code` pass through unchanged because they already satisfy the allowlist. Note that the Java allowlist does not include `/`, so a value like `cursor/1.2.3` sanitizes to `cursor-1.2.3`. Same change is landing in `databricks-sdk-py` as a sibling PR. ## Test plan - [x] `mvn -pl databricks-sdk-java test -Dtest=UserAgentTest` passes (48 tests) - [x] `mvn spotless:apply` clean - [x] `AI_AGENT=<known product>` returns the product name - [x] `AI_AGENT=<unrecognized>` returns the raw sanitized value (no longer `"unknown"`) - [x] `AGENT` wins over `AI_AGENT` when both are non-empty - [x] Empty `AGENT` falls through to `AI_AGENT` - [x] Disallowed chars in `AGENT` / `AI_AGENT` are sanitized to `-` - [x] Values longer than 64 chars are truncated - [x] Explicit matcher (e.g. `CLAUDECODE`) still wins over both fallbacks
1 parent b27860a commit aa6bdc5

3 files changed

Lines changed: 154 additions & 20 deletions

File tree

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
## Release v0.117.0
44

55
### New Features and Improvements
6+
* Detect the `AI_AGENT` environment variable (Vercel `@vercel/detect-agent` convention) as a secondary fallback for the AI agent reported in the user agent, consulted only when the agents.md `AGENT` variable is unset or empty. An unrecognized `AGENT` or `AI_AGENT` value is now passed through as-is (sanitized to the user agent allowlist and capped at 64 characters) instead of being reported as `unknown`. Mirrors [databricks/databricks-sdk-go#1683](https://github.com/databricks/databricks-sdk-go/pull/1683).
67

78
* Added `Paginator.newTokenPagination(...)` and `Paginator.newOffsetPagination(...)` factory methods in `com.databricks.sdk.support`, which make the pagination strategy explicit. The `Paginator` constructor is now deprecated in favor of these; it keeps its previous (offset/limit) behavior.
89

databricks-sdk-java/src/main/java/com/databricks/sdk/core/UserAgent.java

Lines changed: 30 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -249,12 +249,21 @@ private static class KnownAgent {
249249
}
250250
}
251251

252-
// The agents.md standard env var. When set to a value we don't specifically
253-
// recognize, detection falls back to "unknown".
252+
// The agents.md standard env var. Consulted first when no explicit matcher
253+
// fires.
254254
private static final String AGENT_ENV_VAR = "AGENT";
255255

256+
// The Vercel @vercel/detect-agent convention env var. Consulted only as a
257+
// secondary fallback when AGENT is unset or empty.
258+
private static final String AI_AGENT_ENV_VAR = "AI_AGENT";
259+
260+
// Maximum length of a passed-through fallback agent value. Longer values are
261+
// truncated to keep the user agent header bounded.
262+
private static final int MAX_AGENT_FALLBACK_LEN = 64;
263+
256264
// Canonical list of known AI coding agents.
257-
// Keep this list in sync with databricks-sdk-go and databricks-sdk-py.
265+
// Keep this list, and the AGENT/AI_AGENT fallback handling in
266+
// agentEnvFallback, in sync with databricks-sdk-go and databricks-sdk-py.
258267
// Agents are listed alphabetically by product name.
259268
private static List<KnownAgent> listKnownAgents() {
260269
return Arrays.asList(
@@ -294,9 +303,8 @@ private static List<KnownAgent> listKnownAgents() {
294303
// stacked when one agent invokes another as a subagent (e.g. Claude Code
295304
// spawning a Cursor CLI subprocess), so the child process inherits env
296305
// vars from multiple layers.
297-
// - Zero agents matched: if the agents.md standard AGENT env var is set to
298-
// a known product name, return that product name. If it is set to any
299-
// other non-empty value, return "unknown". Otherwise return "".
306+
// - Zero agents matched: fall back to the generic AGENT / AI_AGENT env
307+
// vars (see agentEnvFallback).
300308
//
301309
// Because explicit matchers win over AGENT, e.g. AGENT=cursor + CLAUDECODE=1
302310
// yields "claude-code", and AGENT=goose + CLAUDECODE=1 also yields
@@ -317,23 +325,29 @@ private static String lookupAgentProvider(Environment env) {
317325
if (matches.size() > 1) {
318326
return "multiple";
319327
}
320-
return agentEnvFallback(env, agents);
328+
return agentEnvFallback(env);
321329
}
322330

323-
// agentEnvFallback honors the agents.md AGENT=<name> standard.
324-
// Returns the value if it matches a known product name, "unknown" if AGENT
325-
// is set to any other non-empty value, and "" if AGENT is unset or empty.
326-
private static String agentEnvFallback(Environment env, List<KnownAgent> agents) {
331+
// agentEnvFallback honors the agents.md AGENT=<name> standard, with the
332+
// Vercel @vercel/detect-agent AI_AGENT convention as a secondary fallback.
333+
// AGENT takes precedence when both are non-empty.
334+
//
335+
// The raw value is passed through (no coercion to "unknown"), but sanitized
336+
// to satisfy the user agent allowlist and capped at MAX_AGENT_FALLBACK_LEN
337+
// characters. Returns "" when both AGENT and AI_AGENT are unset or empty.
338+
private static String agentEnvFallback(Environment env) {
327339
String v = env.get(AGENT_ENV_VAR);
340+
if (v == null || v.isEmpty()) {
341+
v = env.get(AI_AGENT_ENV_VAR);
342+
}
328343
if (v == null || v.isEmpty()) {
329344
return "";
330345
}
331-
for (KnownAgent a : agents) {
332-
if (a.product.equals(v)) {
333-
return v;
334-
}
346+
v = sanitize(v);
347+
if (v.length() > MAX_AGENT_FALLBACK_LEN) {
348+
v = v.substring(0, MAX_AGENT_FALLBACK_LEN);
335349
}
336-
return "unknown";
350+
return v;
337351
}
338352

339353
// Thread-safe lazy initialization of agent provider detection

databricks-sdk-java/src/test/java/com/databricks/sdk/core/UserAgentTest.java

Lines changed: 123 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -309,7 +309,7 @@ public void testAgentProviderAgentEnvAmp() {
309309
@Test
310310
public void testAgentProviderAgentEnvCursor() {
311311
// AGENT=cursor with no cursor-specific env var. Falls through to the
312-
// AGENT fallback and matches "cursor" as a known product name.
312+
// AGENT fallback and is passed through unchanged.
313313
setupAgentEnv(
314314
new HashMap<String, String>() {
315315
{
@@ -362,23 +362,142 @@ public void testAgentProviderAmpBothMatchers() {
362362
}
363363

364364
@Test
365-
public void testAgentProviderAgentEnvUnknown() {
365+
public void testAgentProviderAgentEnvUnrecognizedPassthrough() {
366+
// An unrecognized AGENT value is passed through as-is (no longer coerced
367+
// to "unknown"), after sanitization.
366368
setupAgentEnv(
367369
new HashMap<String, String>() {
368370
{
369371
put("AGENT", "someweirdthing");
370372
}
371373
});
372-
Assertions.assertTrue(UserAgent.asString().contains("agent/unknown"));
374+
Assertions.assertTrue(UserAgent.asString().contains("agent/someweirdthing"));
375+
Assertions.assertFalse(UserAgent.asString().contains("agent/unknown"));
376+
}
377+
378+
@Test
379+
public void testAgentProviderAgentEnvVersionedPassthrough() {
380+
// A versioned variant whose characters are all in the allowlist
381+
// ([0-9A-Za-z_.+-]) is passed through unchanged.
382+
setupAgentEnv(
383+
new HashMap<String, String>() {
384+
{
385+
put("AGENT", "my-tool-1.2.3");
386+
}
387+
});
388+
Assertions.assertTrue(UserAgent.asString().contains("agent/my-tool-1.2.3"));
389+
}
390+
391+
@Test
392+
public void testAgentProviderAgentEnvSanitized() {
393+
// Characters outside the user agent allowlist [0-9A-Za-z_.+-] become
394+
// hyphens.
395+
setupAgentEnv(
396+
new HashMap<String, String>() {
397+
{
398+
put("AGENT", "weird agent!@#name");
399+
}
400+
});
401+
Assertions.assertTrue(UserAgent.asString().contains("agent/weird-agent---name"));
402+
}
403+
404+
@Test
405+
public void testAgentProviderAgentEnvTruncated() {
406+
// Values longer than 64 characters are truncated to 64.
407+
StringBuilder sb = new StringBuilder();
408+
for (int i = 0; i < 100; i++) {
409+
sb.append("a");
410+
}
411+
String longValue = sb.toString();
412+
setupAgentEnv(
413+
new HashMap<String, String>() {
414+
{
415+
put("AGENT", longValue);
416+
}
417+
});
418+
StringBuilder expected = new StringBuilder("agent/");
419+
for (int i = 0; i < 64; i++) {
420+
expected.append("a");
421+
}
422+
String userAgent = UserAgent.asString();
423+
Assertions.assertTrue(userAgent.contains(expected.toString()));
424+
// Must not contain a 65th 'a' after the prefix.
425+
Assertions.assertFalse(userAgent.contains(expected.toString() + "a"));
373426
}
374427

375428
@Test
376429
public void testAgentProviderAgentEnvEmpty() {
377-
// AGENT="" should not trigger the unknown fallback.
430+
// AGENT="" should not trigger the fallback.
431+
setupAgentEnv(
432+
new HashMap<String, String>() {
433+
{
434+
put("AGENT", "");
435+
}
436+
});
437+
Assertions.assertFalse(UserAgent.asString().contains("agent/"));
438+
}
439+
440+
@Test
441+
public void testAgentProviderAiAgentFallback() {
442+
// AI_AGENT is consulted when AGENT is unset.
443+
setupAgentEnv(
444+
new HashMap<String, String>() {
445+
{
446+
put("AI_AGENT", "vercel-agent");
447+
}
448+
});
449+
Assertions.assertTrue(UserAgent.asString().contains("agent/vercel-agent"));
450+
}
451+
452+
@Test
453+
public void testAgentProviderAgentWinsOverAiAgent() {
454+
// AGENT takes precedence over AI_AGENT when both are non-empty.
455+
setupAgentEnv(
456+
new HashMap<String, String>() {
457+
{
458+
put("AGENT", "primary");
459+
put("AI_AGENT", "secondary");
460+
}
461+
});
462+
Assertions.assertTrue(UserAgent.asString().contains("agent/primary"));
463+
Assertions.assertFalse(UserAgent.asString().contains("agent/secondary"));
464+
}
465+
466+
@Test
467+
public void testAgentProviderEmptyAgentFallsBackToAiAgent() {
468+
// AGENT="" falls back to AI_AGENT.
469+
setupAgentEnv(
470+
new HashMap<String, String>() {
471+
{
472+
put("AGENT", "");
473+
put("AI_AGENT", "secondary");
474+
}
475+
});
476+
Assertions.assertTrue(UserAgent.asString().contains("agent/secondary"));
477+
}
478+
479+
@Test
480+
public void testAgentProviderExplicitMatcherWinsOverAiAgent() {
481+
// An explicit matcher wins over AI_AGENT.
482+
setupAgentEnv(
483+
new HashMap<String, String>() {
484+
{
485+
put("AI_AGENT", "vercel-agent");
486+
put("CLAUDECODE", "1");
487+
}
488+
});
489+
Assertions.assertTrue(UserAgent.asString().contains("agent/claude-code"));
490+
Assertions.assertFalse(UserAgent.asString().contains("agent/vercel-agent"));
491+
}
492+
493+
@Test
494+
public void testAgentProviderBothEmptyReturnsNone() {
495+
// Both AGENT and AI_AGENT empty yields no agent segment.
378496
setupAgentEnv(
379497
new HashMap<String, String>() {
380498
{
381499
put("AGENT", "");
500+
put("AI_AGENT", "");
382501
}
383502
});
384503
Assertions.assertFalse(UserAgent.asString().contains("agent/"));

0 commit comments

Comments
 (0)