Skip to content

Commit 133c453

Browse files
authored
[telemetry] Detect Python package manager(s) at project setup (#1918)
## Changes Measurement-only telemetry to learn which Python package manager(s) our users' projects actually use (pip / conda / uv / poetry), so the VPEX setup-flow investment can be prioritized from first-party data instead of public-survey estimates. No setup behavior changes — this is detection only. The work splits cleanly into three layers so each is independently testable and the dependency direction stays correct (high-level → low-level): - **Pure classifier** (`packageManagerDetection.ts`): given a set of already-collected signals, reports every applicable manager, a best-guess primary (priority `uv > poetry > conda > pip`), the firing signals, `hasLockfile`, and interpreter source. Side-effect free and total. - **Emit** (`telemetry/packageManagerExtensions.ts`): adds `recordPackageManagerDetection` to the existing `Telemetry` class via the same `declare module` pattern as `commandExtensions.ts`. Keeps disk/Python-extension dependencies out of the telemetry client. - **Collection** (`PackageManagerTelemetry.ts`): a best-effort, non-blocking collector that reads disk and already-resolved interpreter metadata, runs the pure classifier, and calls the emit method. Deduplicated per session on `(trigger, projectRoot)`; any failure degrades to `unknown` and is swallowed so it never disrupts setup. Emission is wired into three setup touchpoints: project-open environment check (`auto_open`), the set-up-environment command (`explicit_command`), and first Run/Debug with Databricks Connect (`run`/`debug`). A new `Events.PYTHON_ENV_SETUP_DETECTED` event carries a typed, documented schema (reuses the existing telemetry transport; opt-out honored; categorical data only — no paths, package names, or cluster names). A handoff note for the analytics/dashboard owner is included at `src/telemetry/PACKAGE_MANAGER_DETECTION.md`. **Detection correctness** (the parts most worth reviewing): - `interpreterSource` is derived from the active interpreter alone, never from project files. A `uv.lock` project running a conda/venv/system interpreter reports that interpreter's real source, keeping the "uv project, interpreter not uv-managed yet" setup-flow gap visible. A genuinely uv-provisioned venv is identified by the `uv =` marker in `pyvenv.cfg`, not by `uv.lock`. - conda is attributed only when the active interpreter resides under `CONDA_PREFIX` (path-boundary checked), not on the bare env var — which is session-global in the extension host (launching VS Code from an activated conda shell) and would otherwise over-count conda for uv/poetry/pip projects. - `pyproject` `[tool.uv]`/`[tool.poetry]` detection uses a bounded table-header scan, not substring matching: ignores comments and in-value mentions, rejects prefix collisions (e.g. `tool.uvicorn`), and matches subtable and array-of-table headers (`[tool.uv.sources]`, `[[tool.poetry.source]]`). - No external executable is run for telemetry: the uv-on-PATH probe was removed (it spawned a PATH-resolved `uv` for a weak, non-attributing signal). Detection reads only disk and already-resolved interpreter metadata. **Scope / privacy:** measurement only — no changes to setup behavior (the VPEX flows are a separate effort). Only enum/categorical data and a closed set of signal identifiers are emitted; the existing telemetry opt-out (`telemetry.telemetryLevel`) is respected by the transport. ## Tests - [x] `yarn run test:unit`: 202 passing, 0 failing — includes the pure classifier (each manager, interpreter sources, overlaps like uv+pip / conda+pip / poetry+uv, weak signals, none) and pure helpers (`pyprojectHasToolSection`, `pyvenvCfgMarksUv`, `interpreterUnderCondaPrefix`), covering the conda-prefix boundary and shell-global false-positive cases. - [x] `yarn run build` (typecheck) passes. - [x] `eslint` clean; `prettier` formatted. Reviewer can validate with: ```bash cd packages/databricks-vscode yarn run build yarn run test:unit npx eslint src --ext ts && npx prettier . -c ```
1 parent 04f964b commit 133c453

13 files changed

Lines changed: 1581 additions & 7 deletions

packages/databricks-vscode/src/extension.ts

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ import {BundleVariableTreeDataProvider} from "./ui/bundle-variables/BundleVariab
7575
import {ConfigurationTreeViewManager} from "./ui/configuration-view/ConfigurationTreeViewManager";
7676
import {getCLIDependenciesEnvVars} from "./utils/envVarGenerators";
7777
import {EnvironmentCommands} from "./language/EnvironmentCommands";
78+
import {PackageManagerTelemetry} from "./language/PackageManagerTelemetry";
7879
import {WorkspaceFolderManager} from "./vscode-objs/WorkspaceFolderManager";
7980
import {SyncCommands} from "./sync/SyncCommands";
8081
import {CodeSynchronizer} from "./sync";
@@ -335,6 +336,24 @@ export async function activate(
335336
customWhenContext,
336337
telemetry
337338
);
339+
const packageManagerTelemetry = new PackageManagerTelemetry(
340+
telemetry,
341+
pythonExtensionWrapper,
342+
() => {
343+
try {
344+
return workspaceFolderManager.activeProjectUri.fsPath;
345+
} catch (e) {
346+
return undefined;
347+
}
348+
},
349+
() => {
350+
if (connectionManager.serverless) {
351+
return "serverless";
352+
}
353+
return connectionManager.cluster ? "cluster" : "none";
354+
},
355+
() => connectionManager.state === "CONNECTED"
356+
);
338357
context.subscriptions.push(
339358
bundleFileWatcher,
340359
bundleValidateModel,
@@ -619,13 +638,15 @@ export async function activate(
619638
connectionManager,
620639
pythonExtensionWrapper,
621640
environmentDependenciesInstaller,
622-
configureAutocomplete
641+
configureAutocomplete,
642+
packageManagerTelemetry
623643
)
624644
);
625645
const environmentCommands = new EnvironmentCommands(
626646
featureManager,
627647
pythonExtensionWrapper,
628-
environmentDependenciesInstaller
648+
environmentDependenciesInstaller,
649+
packageManagerTelemetry
629650
);
630651
context.subscriptions.push(
631652
telemetry.registerCommand(
@@ -1003,7 +1024,8 @@ export async function activate(
10031024
featureManager,
10041025
context,
10051026
customWhenContext,
1006-
telemetry
1027+
telemetry,
1028+
packageManagerTelemetry
10071029
);
10081030
const debugFactory = new DatabricksDebugAdapterFactory(
10091031
connectionManager,

packages/databricks-vscode/src/language/EnvironmentCommands.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,19 @@ import {Cluster} from "../sdk-extensions";
55
import {EnvironmentDependenciesInstaller} from "./EnvironmentDependenciesInstaller";
66
import {Environment} from "./MsPythonExtensionApi";
77
import {environmentName} from "../utils/environmentUtils";
8+
import {PackageManagerTelemetry} from "./PackageManagerTelemetry";
89

910
export class EnvironmentCommands {
1011
constructor(
1112
private featureManager: FeatureManager,
1213
private pythonExtension: MsPythonExtensionWrapper,
13-
private installer: EnvironmentDependenciesInstaller
14+
private installer: EnvironmentDependenciesInstaller,
15+
private packageManagerTelemetry: PackageManagerTelemetry
1416
) {}
1517

1618
async setup(stepId?: string) {
1719
commands.executeCommand("configurationView.focus");
20+
void this.packageManagerTelemetry.emitDetection("explicit_command");
1821
await window.withProgress(
1922
{location: {viewId: "configurationView"}},
2023
() => this._setup(stepId)

packages/databricks-vscode/src/language/EnvironmentDependenciesVerifier.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ import {ResolvedEnvironment} from "./MsPythonExtensionApi";
1010
import {NamedLogger} from "@databricks/sdk-experimental/dist/logging";
1111
import {ConfigureAutocomplete} from "./ConfigureAutocomplete";
1212
import {workspaceConfigs} from "../vscode-objs/WorkspaceConfigs";
13+
import {PackageManagerTelemetry} from "./PackageManagerTelemetry";
1314

1415
export class EnvironmentDependenciesVerifier extends MultiStepAccessVerifier {
1516
private readonly logger = NamedLogger.getOrCreate(Loggers.Extension);
@@ -18,7 +19,8 @@ export class EnvironmentDependenciesVerifier extends MultiStepAccessVerifier {
1819
private readonly connectionManager: ConnectionManager,
1920
private readonly pythonExtension: MsPythonExtensionWrapper,
2021
private readonly installer: EnvironmentDependenciesInstaller,
21-
private readonly configureAutocomplete: ConfigureAutocomplete
22+
private readonly configureAutocomplete: ConfigureAutocomplete,
23+
private readonly packageManagerTelemetry: PackageManagerTelemetry
2224
) {
2325
super([
2426
"checkCluster",
@@ -404,6 +406,10 @@ export class EnvironmentDependenciesVerifier extends MultiStepAccessVerifier {
404406

405407
override async check() {
406408
await this.connectionManager.waitForConnect();
409+
// Emit package-manager detection only once connected (waitForConnect
410+
// resolves on CONNECTED), so unauthenticated sessions are not reported.
411+
// Deduplicated per session; never throws.
412+
void this.packageManagerTelemetry.emitDetection("auto_open");
407413
await Promise.all([
408414
this.checkCluster(this.connectionManager.cluster),
409415
this.checkWorkspaceHasUc(),
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
import {expect} from "chai";
2+
import * as tmp from "tmp";
3+
import path from "node:path";
4+
import {writeFileSync} from "node:fs";
5+
import {Telemetry} from "../telemetry";
6+
import {MsPythonExtensionWrapper} from "./MsPythonExtensionWrapper";
7+
import {PackageManagerTelemetry, SetupTrigger} from "./PackageManagerTelemetry";
8+
9+
type RecordedEvent = {
10+
name: string;
11+
props: Record<string, string>;
12+
metrics: Record<string, number>;
13+
};
14+
15+
/** A Telemetry backed by a fake reporter that captures sent events. */
16+
function makeTelemetry(level: "all" | "error" | "crash" | "off" = "all") {
17+
const events: RecordedEvent[] = [];
18+
const reporter = {
19+
telemetryLevel: level,
20+
sendTelemetryEvent: (
21+
name: string,
22+
props?: Record<string, string>,
23+
metrics?: Record<string, number>
24+
) => {
25+
events.push({name, props: props ?? {}, metrics: metrics ?? {}});
26+
},
27+
sendTelemetryErrorEvent: () => {},
28+
sendDangerousTelemetryEvent: () => {},
29+
sendDangerousTelemetryErrorEvent: () => {},
30+
dispose: () => Promise.resolve(),
31+
};
32+
return {telemetry: new Telemetry(reporter as any), events};
33+
}
34+
35+
describe(__filename, () => {
36+
const cleanups: Array<() => void> = [];
37+
38+
afterEach(() => {
39+
while (cleanups.length) {
40+
cleanups.pop()!();
41+
}
42+
});
43+
44+
/**
45+
* Create a throwaway project dir populated with the given files, passed as
46+
* [name, contents] tuples (file names aren't valid identifiers, so a tuple
47+
* list avoids object-literal key lint noise).
48+
*/
49+
function makeProject(files: Array<[string, string]>): string {
50+
const dir = tmp.dirSync({unsafeCleanup: true});
51+
cleanups.push(dir.removeCallback);
52+
for (const [name, contents] of files) {
53+
writeFileSync(path.join(dir.name, name), contents);
54+
}
55+
return dir.name;
56+
}
57+
58+
// Interpreter is irrelevant to these disk-signal tests; report none.
59+
const noInterpreter = {
60+
get pythonEnvironment() {
61+
return Promise.resolve(undefined);
62+
},
63+
} as unknown as MsPythonExtensionWrapper;
64+
65+
function makePmt(
66+
telemetry: Telemetry,
67+
opts: {
68+
projectRoot: string;
69+
compute?: "cluster" | "serverless" | "none";
70+
connected?: boolean;
71+
}
72+
) {
73+
return new PackageManagerTelemetry(
74+
telemetry,
75+
noInterpreter,
76+
() => opts.projectRoot,
77+
() => opts.compute ?? "none",
78+
() => opts.connected ?? true
79+
);
80+
}
81+
82+
const emit = async (pmt: PackageManagerTelemetry, t: SetupTrigger) =>
83+
pmt.emitDetection(t);
84+
85+
it("emits a detection event for a connected project (uv + pip)", async () => {
86+
const {telemetry, events} = makeTelemetry("all");
87+
const projectRoot = makeProject([
88+
["uv.lock", "version = 1\n"],
89+
["pyproject.toml", "[project]\nname='x'\n[tool.uv]\n"],
90+
["requirements-dev.txt", "requests\n"],
91+
]);
92+
const pmt = makePmt(telemetry, {projectRoot, compute: "cluster"});
93+
94+
await emit(pmt, "explicit_command");
95+
96+
expect(events).to.have.length(1);
97+
const e = events[0];
98+
expect(e.name).to.equal("python_env.setup.detected");
99+
expect(e.props["event.primaryManager"]).to.equal("uv");
100+
expect(e.props["event.managersDetected"]).to.equal('["uv","pip"]');
101+
expect(e.props["event.hasLockfile"]).to.equal("true");
102+
expect(e.props["event.targetCompute"]).to.equal("cluster");
103+
expect(e.props["event.setupTrigger"]).to.equal("explicit_command");
104+
expect(e.props["event.interpreterSource"]).to.equal("unknown");
105+
});
106+
107+
it("deduplicates per (trigger, projectRoot) within a session", async () => {
108+
const {telemetry, events} = makeTelemetry("all");
109+
const projectRoot = makeProject([["uv.lock", "version = 1\n"]]);
110+
const pmt = makePmt(telemetry, {projectRoot});
111+
112+
await emit(pmt, "auto_open");
113+
await emit(pmt, "auto_open");
114+
115+
expect(events).to.have.length(1);
116+
});
117+
118+
it("does not emit while disconnected, and does not burn the dedupe slot", async () => {
119+
const {telemetry, events} = makeTelemetry("all");
120+
const projectRoot = makeProject([["uv.lock", "version = 1\n"]]);
121+
122+
const disconnected = makePmt(telemetry, {
123+
projectRoot,
124+
connected: false,
125+
});
126+
await emit(disconnected, "auto_open");
127+
expect(events).to.have.length(0);
128+
129+
// A later connected emit for the same (trigger, project) still fires --
130+
// i.e. the disconnected attempt did not consume the dedupe key.
131+
const connected = makePmt(telemetry, {projectRoot, connected: true});
132+
await emit(connected, "auto_open");
133+
expect(events).to.have.length(1);
134+
});
135+
136+
it("does not emit when telemetry is disabled", async () => {
137+
const {telemetry, events} = makeTelemetry("error");
138+
const projectRoot = makeProject([["uv.lock", "version = 1\n"]]);
139+
const pmt = makePmt(telemetry, {projectRoot});
140+
141+
await emit(pmt, "auto_open");
142+
143+
expect(events).to.have.length(0);
144+
});
145+
146+
it("reports unknown for a project with no recognizable signals", async () => {
147+
const {telemetry, events} = makeTelemetry("all");
148+
// `requirementsfoo.txt` (no separator) is NOT a requirements file, so
149+
// pip must not be attributed.
150+
const projectRoot = makeProject([["requirementsfoo.txt", "x\n"]]);
151+
const pmt = makePmt(telemetry, {projectRoot});
152+
153+
await emit(pmt, "auto_open");
154+
155+
expect(events).to.have.length(1);
156+
expect(events[0].props["event.managersDetected"]).to.equal("[]");
157+
expect(events[0].props["event.primaryManager"]).to.equal("unknown");
158+
});
159+
160+
it("attributes pip from a separator-suffixed requirements file", async () => {
161+
const {telemetry, events} = makeTelemetry("all");
162+
const projectRoot = makeProject([
163+
["requirements_test.txt", "pytest\n"],
164+
]);
165+
const pmt = makePmt(telemetry, {projectRoot});
166+
167+
await emit(pmt, "auto_open");
168+
169+
expect(events[0].props["event.managersDetected"]).to.equal('["pip"]');
170+
expect(events[0].props["event.primaryManager"]).to.equal("pip");
171+
});
172+
173+
it("does not attribute pip for a tool-only pyproject", async () => {
174+
const {telemetry, events} = makeTelemetry("all");
175+
// Only linter config, no [project]/[build-system] -- not a pip signal.
176+
const projectRoot = makeProject([
177+
["pyproject.toml", "[tool.ruff]\nline-length = 88\n"],
178+
]);
179+
const pmt = makePmt(telemetry, {projectRoot});
180+
181+
await emit(pmt, "auto_open");
182+
183+
expect(events[0].props["event.managersDetected"]).to.equal("[]");
184+
expect(events[0].props["event.primaryManager"]).to.equal("unknown");
185+
});
186+
187+
it("attributes pip for a pyproject with [project] and no uv/poetry", async () => {
188+
const {telemetry, events} = makeTelemetry("all");
189+
const projectRoot = makeProject([
190+
["pyproject.toml", '[project]\nname = "x"\n'],
191+
]);
192+
const pmt = makePmt(telemetry, {projectRoot});
193+
194+
await emit(pmt, "auto_open");
195+
196+
expect(events[0].props["event.managersDetected"]).to.equal('["pip"]');
197+
});
198+
199+
it("omits pythonVersion from the event when the interpreter is unknown", async () => {
200+
const {telemetry, events} = makeTelemetry("all");
201+
const projectRoot = makeProject([["uv.lock", "version = 1\n"]]);
202+
const pmt = makePmt(telemetry, {projectRoot});
203+
204+
await emit(pmt, "auto_open");
205+
206+
// The key must be absent, not the string "undefined".
207+
expect(events[0].props).to.not.have.property("event.pythonVersion");
208+
});
209+
});

0 commit comments

Comments
 (0)