Skip to content

Commit f03b6da

Browse files
authored
Expand AI agent detection: Goose, Amp, Augment, Copilot (VS Code), Kiro, Windsurf (#1394)
## Why We identify which AI agent is driving the SDK via the `agent/<name>` user-agent segment. The current list covers 9 agents. This adds 6 more and honors the emerging `AGENT=<name>` standard from agents.md so we can see traffic from agents we haven't individually listed yet. ## Changes Before: presence-only matching on a fixed list of env vars; multi-match returns empty. Now: each agent record has a list of matchers (env var plus optional exact value), an agent fires if any matcher fires, ambiguity is judged by unique product (not raw matcher count), and an `AGENT=<anything>` fallback reports `unknown` when no specific entry matched. Implementation: - `databricks/sdk/useragent.py`: replaced the `_KNOWN_AGENTS` dict with a list of `_AgentRecord` dataclasses each holding a `product` and a list of `(env_var, value)` matchers. Rewrote `agent_provider()` to count unique agents matched, handle the single/multi/zero cases, and fall back to `unknown` when `AGENT` is set non-empty but no specific agent matched. Cached result (`_agent_provider`) still uses the `None` vs `""` sentinel pattern. - `tests/test_user_agent.py`: added coverage for each new agent (goose, amp, augment, copilot-vscode, kiro, windsurf), both Goose signals together (not ambiguous), both Amp signals together (not ambiguous), unknown fallback, empty `AGENT` does not trigger fallback, and cross-agent ambiguity (`AGENT=goose` + `CLAUDECODE=1`). - `NEXT_CHANGELOG.md`: entry under "Internal Changes". New detections: Goose (`GOOSE_TERMINAL` or `AGENT=goose`), Amp (`AMP_CURRENT_THREAD_ID` or `AGENT=amp`), Augment (`AUGMENT_AGENT`), VS Code Copilot (`COPILOT_MODEL`), Kiro (`KIRO`), Windsurf (`WINDSURF_AGENT`). Parallel PRs with identical behavior are being opened in `databricks-sdk-go` and `databricks-sdk-java`. ## Test plan - [x] `python3 -m pytest tests/test_user_agent.py -v` passes (37 tests) - [x] `make fmt` leaves the three modified files clean - [x] Each new product detected when its primary env var is set - [x] `AGENT=goose` alone returns `goose` - [x] `GOOSE_TERMINAL=1` + `AGENT=goose` returns `goose` (not ambiguity) - [x] `AMP_CURRENT_THREAD_ID` + `AGENT=amp` returns `amp` (not ambiguity) - [x] `AGENT=someweirdthing` returns `unknown` - [x] `AGENT=""` returns `""` (fallback only fires on non-empty value) - [x] Two distinct agents set simultaneously return `""` - [x] `AGENT=goose` + `CLAUDECODE=1` returns `""` (cross-agent ambiguity) --------- Signed-off-by: simon <simon.faltum@databricks.com>
1 parent 5724d13 commit f03b6da

3 files changed

Lines changed: 250 additions & 27 deletions

File tree

NEXT_CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,6 @@
1414

1515
### Internal Changes
1616

17+
* Expanded AI agent detection: added Goose, Amp, Augment, Copilot (VS Code), Kiro, Windsurf. Honors the `AGENT=<name>` standard (resolves to a known product if the value matches one, otherwise `unknown`). Presence-only env var matchers now treat an empty string as "set" for parity with the Go and Java SDKs. Explicit agent env vars (e.g. `CLAUDECODE`, `GOOSE_TERMINAL`) always take precedence over the generic `AGENT=<name>` signal. When multiple agent env vars are present (e.g. a Cursor CLI subagent invoked from Claude Code), the user-agent reports `agent/multiple`.
18+
1719
### API Changes

databricks/sdk/useragent.py

Lines changed: 75 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import os
44
import platform
55
import re
6+
from dataclasses import dataclass
67
from typing import List, Optional, Tuple
78

89
from .version import __version__
@@ -224,19 +225,38 @@ def cicd_provider() -> str:
224225
return _cicd_provider
225226

226227

227-
# Canonical list of known AI coding agents.
228+
# Canonical list of known AI coding agents. Alphabetical by product name.
228229
# Keep this list in sync with databricks-sdk-go and databricks-sdk-java.
229-
_KNOWN_AGENTS = {
230-
"ANTIGRAVITY_AGENT": "antigravity", # Closed source (Google)
231-
"CLAUDECODE": "claude-code", # https://github.com/anthropics/claude-code
232-
"CLINE_ACTIVE": "cline", # https://github.com/cline/cline (v3.24.0+)
233-
"CODEX_CI": "codex", # https://github.com/openai/codex
234-
"COPILOT_CLI": "copilot-cli", # https://github.com/features/copilot
235-
"CURSOR_AGENT": "cursor", # Closed source
236-
"GEMINI_CLI": "gemini-cli", # https://google-gemini.github.io/gemini-cli
237-
"OPENCODE": "opencode", # https://github.com/opencode-ai/opencode
238-
"OPENCLAW_SHELL": "openclaw", # https://github.com/anthropics/openclaw
239-
}
230+
#
231+
# Each record has a single env var that identifies the product by presence
232+
# (the env var just needs to be set, even to an empty string).
233+
@dataclass(frozen=True)
234+
class _AgentRecord:
235+
env_var: str
236+
product: str
237+
238+
239+
_KNOWN_AGENTS: List[_AgentRecord] = [
240+
_AgentRecord("AMP_CURRENT_THREAD_ID", "amp"), # https://ampcode.com/ (also sets AGENT=amp, handled centrally)
241+
_AgentRecord("ANTIGRAVITY_AGENT", "antigravity"), # Closed source (Google)
242+
_AgentRecord("AUGMENT_AGENT", "augment"), # https://www.augmentcode.com/
243+
_AgentRecord("CLAUDECODE", "claude-code"), # https://github.com/anthropics/claude-code
244+
_AgentRecord("CLINE_ACTIVE", "cline"), # https://github.com/cline/cline (v3.24.0+)
245+
_AgentRecord("CODEX_CI", "codex"), # https://github.com/openai/codex
246+
_AgentRecord("COPILOT_CLI", "copilot-cli"), # https://github.com/features/copilot
247+
_AgentRecord(
248+
"COPILOT_MODEL", "copilot-vscode"
249+
), # VS Code Copilot terminal, best-effort heuristic, not officially identified
250+
_AgentRecord("CURSOR_AGENT", "cursor"), # Closed source
251+
_AgentRecord("GEMINI_CLI", "gemini-cli"), # https://google-gemini.github.io/gemini-cli
252+
_AgentRecord(
253+
"GOOSE_TERMINAL", "goose"
254+
), # https://block.github.io/goose/ (also sets AGENT=goose, handled centrally)
255+
_AgentRecord("KIRO", "kiro"), # https://kiro.dev/ (Amazon)
256+
_AgentRecord("OPENCLAW_SHELL", "openclaw"), # https://github.com/anthropics/openclaw
257+
_AgentRecord("OPENCODE", "opencode"), # https://github.com/opencode-ai/opencode
258+
_AgentRecord("WINDSURF_AGENT", "windsurf"), # https://codeium.com/windsurf (Codeium)
259+
]
240260

241261
# Private variable to store the detected agent provider. This value is computed
242262
# at the first invocation of agent_provider() and is cached for subsequent calls.
@@ -247,22 +267,54 @@ def cicd_provider() -> str:
247267
def agent_provider() -> str:
248268
"""Detect if running inside a known AI coding agent.
249269
250-
Returns the agent name if exactly one known agent env var is set (non-empty).
251-
Returns empty string if zero or multiple agents detected.
252-
Result is cached after first call.
270+
Iterates the list of known agents. Each agent fires if its explicit,
271+
product-specific env var is set. If exactly one agent fired, returns its
272+
product name. If more than one fired, returns "multiple" (nested agents,
273+
e.g. a Cursor CLI subagent invoked by Claude Code, inherit env vars from
274+
every enclosing layer).
275+
276+
Explicit agent env vars (e.g. CLAUDECODE, GOOSE_TERMINAL) always take
277+
precedence. The agents.md-standard AGENT=<name> env var is only consulted
278+
as a fallback when no explicit matcher fired:
279+
- If AGENT matches a known product name, return that product.
280+
- Otherwise return "unknown".
253281
254-
Unlike CI/CD detection (which returns the first/sorted match), agent detection
255-
uses an ambiguity guard: multiple matches return empty. Agent env vars can be
256-
stacked (e.g., running Cline inside Cursor), so we only report when unambiguous.
282+
This means AGENT=<name> never contributes to the multi-agent signal: if
283+
any explicit matcher fires, AGENT is ignored entirely, even when it names
284+
a different known product.
285+
286+
Result is cached after first call.
257287
"""
258288
global _agent_provider
259289
if _agent_provider is not None:
260290
return _agent_provider
261291

262-
detected = []
263-
for env_var, name in _KNOWN_AGENTS.items():
264-
if os.environ.get(env_var, ""):
265-
detected.append(name)
292+
matches = [a.product for a in _KNOWN_AGENTS if a.env_var in os.environ]
293+
294+
# Known BYOK false positive: Copilot CLI users often set COPILOT_MODEL
295+
# alongside COPILOT_CLI. That pair is a single copilot-cli signal, not a
296+
# stacked multi-agent setup.
297+
if "copilot-cli" in matches and "copilot-vscode" in matches:
298+
matches = [m for m in matches if m != "copilot-vscode"]
266299

267-
_agent_provider = detected[0] if len(detected) == 1 else ""
300+
if len(matches) == 1:
301+
_agent_provider = matches[0]
302+
elif len(matches) > 1:
303+
_agent_provider = "multiple"
304+
else:
305+
_agent_provider = _agent_env_fallback()
268306
return _agent_provider
307+
308+
309+
def _agent_env_fallback() -> str:
310+
"""Honor the agents.md AGENT=<name> standard.
311+
312+
Returns the value if it matches a known product name, "unknown" if AGENT
313+
is set to any other non-empty value, and "" if AGENT is unset or empty.
314+
"""
315+
v = os.environ.get("AGENT", "")
316+
if not v:
317+
return ""
318+
if v in {a.product for a in _KNOWN_AGENTS}:
319+
return v
320+
return "unknown"

tests/test_user_agent.py

Lines changed: 173 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -181,19 +181,188 @@ def test_agent_provider_openclaw(clean_useragent_env):
181181
assert useragent.agent_provider() == "openclaw"
182182

183183

184+
def test_agent_provider_goose_env_var(clean_useragent_env):
185+
os.environ["GOOSE_TERMINAL"] = "1"
186+
from databricks.sdk import useragent
187+
188+
assert useragent.agent_provider() == "goose"
189+
190+
191+
def test_agent_provider_goose_via_agent_standard(clean_useragent_env):
192+
os.environ["AGENT"] = "goose"
193+
from databricks.sdk import useragent
194+
195+
assert useragent.agent_provider() == "goose"
196+
197+
198+
def test_agent_provider_goose_both_signals(clean_useragent_env):
199+
# Both the dedicated env var and the AGENT=goose standard are set.
200+
# This should NOT be ambiguous - it's still a single agent (goose).
201+
os.environ["GOOSE_TERMINAL"] = "1"
202+
os.environ["AGENT"] = "goose"
203+
from databricks.sdk import useragent
204+
205+
assert useragent.agent_provider() == "goose"
206+
207+
208+
def test_agent_provider_amp_env_var(clean_useragent_env):
209+
os.environ["AMP_CURRENT_THREAD_ID"] = "thread-123"
210+
from databricks.sdk import useragent
211+
212+
assert useragent.agent_provider() == "amp"
213+
214+
215+
def test_agent_provider_amp_via_agent_standard(clean_useragent_env):
216+
os.environ["AGENT"] = "amp"
217+
from databricks.sdk import useragent
218+
219+
assert useragent.agent_provider() == "amp"
220+
221+
222+
def test_agent_provider_amp_both_signals(clean_useragent_env):
223+
# Both AMP_CURRENT_THREAD_ID and AGENT=amp set - still single agent.
224+
os.environ["AMP_CURRENT_THREAD_ID"] = "thread-123"
225+
os.environ["AGENT"] = "amp"
226+
from databricks.sdk import useragent
227+
228+
assert useragent.agent_provider() == "amp"
229+
230+
231+
def test_agent_provider_augment(clean_useragent_env):
232+
os.environ["AUGMENT_AGENT"] = "1"
233+
from databricks.sdk import useragent
234+
235+
assert useragent.agent_provider() == "augment"
236+
237+
238+
def test_agent_provider_copilot_vscode(clean_useragent_env):
239+
os.environ["COPILOT_MODEL"] = "gpt-4"
240+
from databricks.sdk import useragent
241+
242+
assert useragent.agent_provider() == "copilot-vscode"
243+
244+
245+
def test_agent_provider_kiro(clean_useragent_env):
246+
os.environ["KIRO"] = "1"
247+
from databricks.sdk import useragent
248+
249+
assert useragent.agent_provider() == "kiro"
250+
251+
252+
def test_agent_provider_windsurf(clean_useragent_env):
253+
os.environ["WINDSURF_AGENT"] = "1"
254+
from databricks.sdk import useragent
255+
256+
assert useragent.agent_provider() == "windsurf"
257+
258+
259+
def test_agent_provider_unknown_agent_fallback(clean_useragent_env):
260+
# AGENT set to a value that doesn't match any known agent
261+
# should fall back to "unknown".
262+
os.environ["AGENT"] = "someweirdthing"
263+
from databricks.sdk import useragent
264+
265+
assert useragent.agent_provider() == "unknown"
266+
267+
268+
def test_agent_provider_agent_known_product_name_fallback(clean_useragent_env):
269+
# AGENT=<known product name> with no other matchers set should resolve
270+
# to the matching product (e.g. cursor is only identified by CURSOR_AGENT;
271+
# AGENT=cursor is a reasonable implicit signal to attribute it).
272+
os.environ["AGENT"] = "cursor"
273+
from databricks.sdk import useragent
274+
275+
assert useragent.agent_provider() == "cursor"
276+
277+
278+
def test_agent_provider_known_matcher_wins_over_agent_fallback(clean_useragent_env):
279+
# When a known matcher fires, it wins even if AGENT is set to an
280+
# unrelated value. The AGENT fallback only applies when nothing else hits.
281+
os.environ["AGENT"] = "somethingweird"
282+
os.environ["CLAUDECODE"] = "1"
283+
from databricks.sdk import useragent
284+
285+
assert useragent.agent_provider() == "claude-code"
286+
287+
288+
def test_agent_provider_agent_empty_string(clean_useragent_env):
289+
# AGENT="" (empty) should NOT trigger the fallback.
290+
os.environ["AGENT"] = ""
291+
from databricks.sdk import useragent
292+
293+
assert useragent.agent_provider() == ""
294+
295+
184296
def test_agent_provider_multiple_agents(clean_useragent_env):
297+
# Nested agents (e.g. Claude Code spawning a Cursor CLI subagent) set
298+
# multiple explicit matchers on the same process.
185299
os.environ["CLAUDECODE"] = "1"
186300
os.environ["CURSOR_AGENT"] = "1"
187301
from databricks.sdk import useragent
188302

189-
assert useragent.agent_provider() == ""
303+
assert useragent.agent_provider() == "multiple"
304+
305+
306+
def test_agent_provider_three_stacked_agents(clean_useragent_env):
307+
os.environ["CLAUDECODE"] = "1"
308+
os.environ["CURSOR_AGENT"] = "1"
309+
os.environ["AUGMENT_AGENT"] = "1"
310+
from databricks.sdk import useragent
311+
312+
assert useragent.agent_provider() == "multiple"
313+
314+
315+
def test_agent_provider_explicit_wins_over_agent_naming_different_product(clean_useragent_env):
316+
# Explicit env vars always win over the generic AGENT=<name> signal.
317+
# AGENT=goose names a different known product, but CLAUDECODE is explicit,
318+
# so claude-code wins. AGENT is ignored entirely when any explicit
319+
# matcher fires.
320+
os.environ["AGENT"] = "goose"
321+
os.environ["CLAUDECODE"] = "1"
322+
from databricks.sdk import useragent
323+
324+
assert useragent.agent_provider() == "claude-code"
325+
326+
327+
def test_agent_provider_explicit_goose_wins_over_agent_cursor(clean_useragent_env):
328+
# Mirror of the above: GOOSE_TERMINAL is explicit, AGENT=cursor names a
329+
# different known product. Explicit wins; AGENT is ignored.
330+
os.environ["GOOSE_TERMINAL"] = "1"
331+
os.environ["AGENT"] = "cursor"
332+
from databricks.sdk import useragent
333+
334+
assert useragent.agent_provider() == "goose"
335+
190336

337+
def test_agent_provider_copilot_cli_and_vscode_collapses_to_copilot_cli(clean_useragent_env):
338+
# Copilot CLI users (BYOK mode) often set COPILOT_MODEL alongside
339+
# COPILOT_CLI. Treat the pair as a single copilot-cli signal rather than
340+
# a stacked multi-agent setup.
341+
os.environ["COPILOT_CLI"] = "1"
342+
os.environ["COPILOT_MODEL"] = "gpt-4"
343+
from databricks.sdk import useragent
191344

192-
def test_agent_provider_empty_value(clean_useragent_env):
345+
assert useragent.agent_provider() == "copilot-cli"
346+
347+
348+
def test_agent_provider_copilot_byok_collapse_then_still_multiple(clean_useragent_env):
349+
# The Copilot BYOK collapse only removes the copilot-vscode match. If
350+
# another agent is also present, the result is still "multiple".
351+
os.environ["COPILOT_CLI"] = "1"
352+
os.environ["COPILOT_MODEL"] = "gpt-4"
353+
os.environ["CLAUDECODE"] = "1"
354+
from databricks.sdk import useragent
355+
356+
assert useragent.agent_provider() == "multiple"
357+
358+
359+
def test_agent_provider_empty_value_still_counts_as_set(clean_useragent_env):
360+
# Presence-only matchers fire even when the env var is set to an empty
361+
# string. Parity with Go (os.LookupEnv) and Java (env.get != null).
193362
os.environ["CLAUDECODE"] = ""
194363
from databricks.sdk import useragent
195364

196-
assert useragent.agent_provider() == ""
365+
assert useragent.agent_provider() == "claude-code"
197366

198367

199368
def test_user_agent_string_includes_agent(clean_useragent_env):
@@ -217,7 +386,7 @@ def test_user_agent_string_multiple_agents(clean_useragent_env):
217386
from databricks.sdk import useragent
218387

219388
ua = useragent.to_string()
220-
assert "agent/" not in ua
389+
assert "agent/multiple" in ua
221390

222391

223392
def test_agent_provider_cached(clean_useragent_env):

0 commit comments

Comments
 (0)