-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture & Codebase Guide
Python Environment Tools (PET) is a high-performance tool written in Rust that discovers every Python installation on your computer — fast, without actually running Python. It exists because the VS Code Python extension needs to know about all available Python environments, and previously it had to spawn Python processes repeatedly (which is slow). PET replaces that with filesystem scanning and intelligent heuristics.
It runs as a JSON-RPC server over stdin/stdout so VS Code can talk to it continuously, asking "what Python environments exist?" and getting streaming results back in real-time.
┌─────────────────────────────────────────────────────┐
│ VS Code Python Extension │
│ (sends JSON-RPC requests) │
└──────────────────────┬──────────────────────────────┘
│ stdio
▼
┌─────────────────────────────────────────────────────┐
│ PET Binary (pet) │
│ CLI modes: find | resolve | server │
│ │
│ ┌─────────────┐ ┌────────────┐ ┌──────────────┐ │
│ │ JSON-RPC │ │ Discovery │ │ Resolution │ │
│ │ Server │ │ Engine │ │ (spawn │ │
│ │ (jsonrpc) │ │ (find.rs) │ │ Python) │ │
│ └──────┬──────┘ └──────┬─────┘ └──────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Locator Chain (Priority Order) │ │
│ │ │ │
│ │ 1. Windows Store 9. Pipenv │ │
│ │ 2. Windows Registry 10. VirtualEnvWrapper │ │
│ │ 3. WinPython 11. Venv │ │
│ │ 4. PyEnv 12. VirtualEnv │ │
│ │ 5. Pixi 13. Homebrew │ │
│ │ 6. Conda 14. macOS (CLT/Xcode/Org)│ │
│ │ 7. UV 15. Linux Global │ │
│ │ 8. Poetry │ │
│ └──────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
When VS Code asks "find me all Python environments," PET runs four parallel phases:
Each locator (Conda, PyEnv, Poetry, etc.) independently searches its own known locations:
- Conda reads
~/.conda/environments.txtand.condarcfiles - PyEnv scans
~/.pyenv/versions/ - Poetry checks
pypoetry/virtualenvs/cache directories - Windows Registry reads
HKLM\Software\Python - Homebrew checks
/opt/homebrew/bin/and/usr/local/bin/
Scans every directory in the system PATH for Python executables.
Searches well-known virtualenv storage locations: ~/.virtualenvs, ~/.local/share/virtualenvs, WORKON_HOME, etc.
Recursively searches the currently open project folders, prioritizing .venv, .conda, and .pixi/envs directories.
All four phases run concurrently. As each environment is found, it's reported immediately to VS Code via JSON-RPC notification — no waiting for the full scan to finish.
| Crate | Purpose |
|---|---|
| pet | Main binary — CLI entry point, JSON-RPC server, discovery orchestration, locator chain management |
| pet-core | Shared traits and types — the Locator trait, PythonEnvironment struct, Reporter trait, Configuration, caching abstractions, telemetry events |
| pet-fs | Filesystem utilities — path normalization (Windows case-insensitive), glob expansion, symlink resolution, tilde/env-var expansion |
| pet-python-utils | Python-specific utilities — spawning Python for info, dual-layer caching (memory + JSON files), version extraction from pyvenv.cfg and patchlevel.h headers, executable discovery |
| pet-jsonrpc | JSON-RPC 2.0 protocol implementation — message parsing, request/notification routing over stdin/stdout with Content-Length headers |
| pet-reporter | Reporter implementations — CacheReporter (deduplication), JsonRpcReporter (streaming to VS Code), StdioReporter (console output), CollectReporter (tests) |
The main entry point providing three modes:
-
pet find— Lists environments in specified directories or workspace. Supports--jsonoutput,--kindfiltering, and glob patterns in search paths. -
pet resolve <executable>— Deep-resolves a single Python executable by spawning it and returning full details. -
pet server— Starts the JSON-RPC server for continuous communication with VS Code.
Key source files:
-
crates/pet/src/main.rs— CLI argument parsing viaclap -
crates/pet/src/lib.rs— Library facade, tracing initialization, CLI wrappers -
crates/pet/src/find.rs— Multi-threaded discovery coordination with 4 parallel phases -
crates/pet/src/jsonrpc.rs— JSON-RPC server with request deduplication, generation-based staleness detection, and atomic missing-env reporting -
crates/pet/src/locators.rs— Ordered locator chain creation and unified identification logic -
crates/pet/src/resolve.rs— Single-executable resolution by spawning Python and merging results
Defines the fundamental abstractions used by every other crate:
-
Locatortrait — Core abstraction for environment discovery. Methods:find()(discovery),try_from()(identification),configure(),supported_categories(). -
Reportertrait — Async discovery reporting (report_environment(),report_manager(),report_telemetry()). Send + Sync for thread safety. -
PythonEnvironment— The main result struct with fields for executable, kind, version, prefix, manager, project, arch, symlinks, and error state. Includes a builder pattern for fluent construction. -
PythonEnvironmentKind— Enum of 21 environment types (Conda, Pixi, Homebrew, Pyenv, GlobalPaths, etc.). -
EnvManager— Represents discovered managers (Conda, Mamba, Pipenv, Poetry, Pyenv). -
Configuration— Runtime configuration passed to locators (workspace directories, executables, cache directory, tool paths). -
LocatorCache<K, V>— Thread-safe HashMap wrapper with RwLock and double-checked locking for concurrent read-heavy workloads. -
PyVenvCfg— Parser forpyvenv.cfgfiles to extract version, prompt, and environment metadata. -
EnvironmentApi— OS abstraction (home directory, PATH, env vars) with platform-specific implementations. -
Telemetry types —
RefreshPerformance,InaccuratePythonEnvironmentInfo,MissingCondaEnvironments,MissingPoetryEnvironments.
Low-level path manipulation used throughout the project:
-
norm_case()— Windows: normalizes case viaGetLongPathNameWwithout resolving junctions. Unix: no-op. -
expand_glob_patterns()— Brace expansion + glob matching (capped at 1024 expansions to prevent DOS). -
resolve_symlink()— Filtered symlink resolution for Python/Conda executables only. -
expand_path()— Expands~and environment variables (${HOME},${USERNAME}). -
strip_trailing_separator()— Removes trailing path separators while preserving roots.
High-level utilities for resolving and caching Python environment information:
-
ResolvedPythonEnv— Represents a fully resolved Python environment (executable, prefix, version, architecture, symlinks). Created by spawningpython -cwith JSON output parsing. -
Dual-layer caching — In-memory
Arc<Mutex<>>+ JSON file backing store with SHA256 hash keys and mtime/ctime validation. -
Version extraction — From
pyvenv.cfg,patchlevel.hheader files, symlink creator tracing, and mtime heuristics. -
Executable discovery —
find_executable(),find_executables(),is_python_executable_name(), broken symlink detection. -
Path filtering —
should_search_for_environments_in_path()excludes 30+ directories (node_modules, .git, pycache, etc.).
JSON-RPC 2.0 server implementation:
- LSP-style Content-Length header protocol over stdin/stdout
- Handler registry for requests (with ID, expects response) and notifications (no ID)
-
send_message(),send_reply(),send_error()for outbound communication - Infinite server loop reading and routing messages
Multiple Reporter trait implementations:
-
CacheReporter— Wraps any reporter with deduplication (tracks reported managers/environments by path). -
JsonRpcReporter— Sends discoveries as JSON-RPC notifications. Supports kind-based filtering. -
StdioReporter— Human-readable console output with statistics (used by CLIfindcommand). -
CollectReporter— Accumulates results in vectors for testing.
Discovers Conda/Miniconda/Miniforge/Micromamba environments.
Discovery Strategy:
- Reads
~/.conda/environments.txtfor known environment paths - Parses
.condarcYAML files from 15+ system/user locations forenvs_dirs - Scans hardcoded known locations (platform-specific)
- For each conda install directory, enumerates
envs/subdirectories
Key Techniques:
- Extracts Python version from
conda-meta/python-*.jsonpackage files (no spawning) - Traces conda installation from environment's
conda-meta/historyfile (# cmd:lines) - Detects Mamba/Micromamba managers alongside Conda
- Multi-threaded scanning with
rayonfor parallel environment processing - WSL-aware: filters out incompatible
/mnt/<drive>/paths on Linux subsystem - Telemetry reports missing environments (found by spawning conda but not by filesystem parsing)
Files: lib.rs, conda_info.rs, conda_rc.rs, env_variables.rs, environment_locations.rs, environments.rs, manager.rs, package.rs, telemetry.rs, utils.rs
Discovers PyEnv (Unix) and PyEnv-win (Windows) managed Python versions.
Discovery Strategy:
- Locates pyenv root via
PYENV_ROOT/PYENVenv vars or~/.pyenv - Scans all subdirectories in
versions/ - Distinguishes: regular installs vs. pyenv-virtualenv environments vs. conda-in-pyenv
Key Techniques:
- Extracts version from folder names using 4 regex patterns (stable, dev, beta, win32)
- Falls back to
patchlevel.hheader file parsing - Detects
-win32suffix for x86 architecture - Delegates conda environments to the conda locator
- Caches manager info with self-hydrating cache pattern
Files: lib.rs, manager.rs, environments.rs, environment_locations.rs, env_variables.rs
Discovers Poetry project environments.
Discovery Strategy:
- Scans workspace directories for
pyproject.tomlfiles - Generates project hash matching Poetry's algorithm:
SHA256(normalized_path)→ base64-URL, first 8 chars - Lists environments from Poetry's
virtualenvscache directory - Filters by hash prefix match or in-project
.venv
Key Techniques:
- Reproduces Poetry's exact naming convention:
{sanitized-name}-{8-char-hash}-py{version} - Config priority cascade: local
poetry.toml→ env vars → globalconfig.toml - Fallback: spawns
poetry env list --full-pathfor completeness - Telemetry tracks discrepancies between file-based and spawn-based discovery
Files: lib.rs, manager.rs, config.rs, environment.rs, environment_locations.rs, environment_locations_spawn.rs, pyproject_toml.rs, env_variables.rs, telemetry.rs
Discovers Pipenv project environments.
Discovery Strategy — Three-Tier Fallback:
-
Centralized directory detection — Checks if prefix is under a known virtualenv directory with
.projectfile -
Naming pattern fallback — Validates directory name matches
{name}-{8-char-hash}regex (handles corrupted.project) -
In-project detection — Walks up from executable looking for
Pipfilein parent directory
Key Techniques:
- Searches 6+ centralized virtualenv directories (WORKON_HOME, XDG_DATA_HOME, ~/.virtualenvs, etc.)
- Reads
.projectfile for project directory linking - Discovers Pipenv manager executable via PATH, pipx, AppData locations
- Robust to deleted projects — still identifies as pipenv even if Pipfile is gone
Files: lib.rs, manager.rs, env_variables.rs
Identifies Pixi package manager environments.
Identification Only (no active discovery):
- Detects
conda-meta/piximarker file in environment prefix - Reuses
CondaPackageInfoto extract Python version from conda metadata - Minimal crate (~100 LOC) — Pixi environments are typically found during workspace scanning
Files: lib.rs
Discovers UV-managed Python installations and workspace environments.
Dual Environment Types:
-
Uv— Individual project environments or globally-managed Python installs -
UvWorkspace— Multi-project workspace environments with shared.venv
Discovery Strategy:
- Scans
{uv_install_dir}for managed Python versions (dir pattern:cpython-X.Y.Z-os-arch-libc) - Identifies virtual environments via
pyvenv.cfgwithuv,version_info, andpromptfields - Validates workspace membership using glob pattern matching against
pyproject.toml[tool.uv.workspace]
Key Techniques:
- Skips junction/symlink directories (minor-version aliases) to avoid duplicates
- Workspace member validation: evaluates
membersandexcludeglob patterns withrequire_literal_separator - Platform-aware install directory:
%APPDATA%/uv/python(Windows),$XDG_DATA_HOME/uv/python(Unix)
Files: lib.rs
Identifies Python venv module environments.
- Detects
pyvenv.cfgfile at environment root - Handles broken symlinks (reports error state) and missing executables
- No active discovery — relies on workspace/PATH scanning to find candidates
- Extracts version and prompt from
pyvenv.cfg
Identifies legacy virtualenv package environments.
- Detects activation scripts (
activate,activate.bat,activate.ps1,activate.fish) inbin//Scripts/ - Excludes system paths (
/bin,/usr/bin,/usr/local/bin) to prevent false positives - Version extraction via symlink creator tracing
Identifies virtualenvwrapper-managed environments.
- Checks if environment is under
WORKON_HOMEdirectory (default:~/.virtualenvs) - Reads
.projectfile for project directory linking - Platform-specific defaults:
%USERPROFILE%\Envs(Windows),$HOME/.virtualenvs(Unix) - Canonicalizes paths to handle symlinked WORKON_HOME directories
Aggregator that discovers virtual environments across 10+ well-known locations.
- Searches:
WORKON_HOME,XDG_DATA_HOME/virtualenvs,~/envs,~/.direnv,~/.venvs,~/.virtualenvs,~/.local/share/virtualenvs,VIRTUAL_ENV - Filters out conda environments (checks for
conda-metadirectory) - Returns deduplicated, sorted list of discovered paths
Discovers Homebrew-installed Python.
- Scans
/opt/homebrew/bin(Apple Silicon),/usr/local/bin(Intel),/home/linuxbrew/.linuxbrew/bin(Linux) - Resolves through Cellar symlinks to find real executable location
- Builds comprehensive symlink lists across Framework, Cellar, and bin directories
- Verifies all candidate symlinks actually resolve to the target executable
Discovers Apple Command Line Tools Python.
- Checks
/Library/Developer/CommandLineTools/usr/binand Framework directories - Resolves symlink chains from
/usr/bin/python3→ CLT Python - Extracts version from
patchlevel.hheaders or spawns Python as fallback
Discovers python.org installer Python.
- Scans
/Library/Frameworks/Python.framework/Versions/ - Handles
Currentsymlink directory (skips during enumeration) - Collects symlinks from
/usr/local/binand Framework directories
Discovers Xcode-bundled Python.
- Identifies executables under
/Applications/Xcode*.app/Contents/Developer/ - Handles multiple Xcode installations with different version suffixes
- No active discovery — assumes Xcode Python is accessed via
/usr/bin/python3
Discovers system Python installations.
- Searches
/bin,/usr/bin,/usr/local/binwith canonicalization - Requires spawning Python to get resolved info (unlike most locators)
- Tracks symlinks within the same directory to avoid duplicates
- Caches discovered environments to prevent reprocessing
Discovers registry-registered Python.
- Reads
HKLM\Software\PythonandHKCU\Software\Pythonhierarchies - Extracts InstallPath, ExecutablePath, Version, SysArchitecture, DisplayName
- Filters out Windows Store entries and delegates conda environments to conda locator
Discovers Microsoft Store Python.
- Scans
%USERPROFILE%\AppData\Local\Microsoft\WindowsAppsforPythonSoftwareFoundation.Python.*directories - Cross-references AppModel registry for display names and package locations
- Handles ambiguous
python.exe/python3.exesymlinks (only one version gets them if multiple installed)
Discovers WinPython portable distributions.
- Detects
.winpythonorwinpython.inimarker files - Validates directory naming pattern (
WPy64-31300,WPy32-3900) - Searches user desktops, downloads, documents, and drive roots (C:, D:, E:)
- Extracts version from python folder names (
python-3.13.0.amd64)
Processes the PATH environment variable for Python discovery.
- Normalizes paths: Unix resolves symlinks (merged-usr:
/bin→/usr/bin), Windows normalizes case without resolving junctions (preserves Scoop paths) - Filters out Windows Store
WindowsAppspaths (already handled by dedicated locator) - Deduplicates via HashSet
Post-discovery verification and accuracy reporting.
- Compares discovered environment info against resolved (spawned) Python data
- Validates 5 fields: executable, symlinks, prefix, architecture, version
- Canonicalizes paths to handle venv symlinks and junctions
- Reports
InaccuratePythonEnvironmentInfoevents for telemetry
The communication between VS Code and PET follows this flow:
-
configure(must be first) — Sets workspace directories, known tool paths (conda, pipenv, poetry), environment directories, and cache directory. Supports glob patterns. -
refresh— Triggers environment discovery. Can be scoped:- By
searchKind— Filter to a specific environment type (e.g., Conda only) - By
searchPaths— Search specific directories/executables with glob support
- By
-
resolve— Deep-resolves a single Python executable by spawning it. Caches results. -
find— Identifies environments in a specific path without full refresh. -
clear— Clears the cache directory.
-
environment— Discovered Python environment (streamed as found) -
manager— Discovered environment manager (Conda, Poetry, PyEnv, etc.) -
log— Log messages (info, warning, error, debug, trace)
The server has a sophisticated RefreshCoordinator that prevents redundant work:
- Identical concurrent requests join the same discovery run (same options + config generation)
- Different requests serialize via condition variable and wait their turn
- A generation counter on the configuration ensures stale discoveries don't produce results
- A
GenerationGuardedReporterdiscards notifications from outdated refresh runs
-
No Process Spawning — PET reads filesystem metadata (config files,
pyvenv.cfg,conda-meta/*.json, header files) instead of runningpython --version. Only falls back to spawning as a last resort. -
Report Immediately — Results stream to VS Code as found, not batched at the end. The
Reportertrait decouples discovery from delivery. -
Complete Information in One Pass — Each locator gathers all available details (version, prefix, manager, symlinks) in a single filesystem scan rather than incrementally.
-
Locator Priority Order — More specific locators run first (e.g., Windows Store before Windows Registry, Poetry before generic Venv). First match wins via short-circuit evaluation.
-
Thread Safety — All locators, reporters, and caches use
Arc<Mutex<>>/Arc<RwLock<>>for safe concurrent access across discovery threads. TheLocatortrait uses&self(not&mut self) with interior mutability to enable sharing viaArc<dyn Locator>. -
Cross-Platform — Conditional compilation (
#[cfg(windows)],#[cfg(unix)],#[cfg(target_os = "macos")]) handles platform differences throughout. -
Symlink Preservation — User-facing paths are preserved as-is (not resolved to canonical form). The shortest executable path is selected for user-friendliness.
# Standard build
cargo build
# Release build (optimized: LTO, single codegen unit, opt-level 3)
cargo build --release
# Run JSON-RPC server
./target/debug/pet server# Run all tests
cargo test --all
# Run with CI feature flags
cargo test --features ci
cargo test --features ci-poetry-global
cargo test --features ci-jupyter-container
cargo test --features ci-homebrew-container
# Run a specific test
cargo test <TESTNAME>
# Run tests for a specific package
cargo test -p pet-conda# Format all code
cargo fmt --all
# Run clippy with warnings as errors
cargo clippy --all -- -D warnings| Pipeline | Trigger | Signing | Purpose |
|---|---|---|---|
| playground.yml | Manual | No | Ad-hoc builds for experimentation |
| pre-release.yml | Every commit to main + weekly schedule |
Yes (Microsoft) | Nightly builds with version suffix dev.{buildId}
|
| stable.yml | Manual from release/* branches |
Yes (Microsoft) | Production releases |
All pipelines build for 8 platforms: Linux x64/ARM64/ARMv7, macOS x64/ARM64, Windows x64/ARM64.
The primary integration test validates every discovered environment through six independent methods:
-
Spawn Python → Verify
sys.prefixandsys.versionmatch discovery -
locator.try_from(executable)→ Same info retrieved without full scan - Symlink identification → Known symlinks correctly identify the environment
-
resolve()method → Spawned details consistent with discovery -
JSON-RPC
find()→ JSON output consistent - Field-by-field comparison → Version, prefix, architecture, manager all match
Different test suites run in specialized CI environments:
-
ci— General CI runner (Windows, Linux, macOS) -
ci-homebrew-container— Linuxbrew container -
ci-jupyter-container— Codespaces Jupyter environment -
ci-poetry-*— Poetry configurations (global, project, custom) -
ci-perf— Performance benchmarks
Measures P50/P95/P99 percentiles across 10 iterations for:
- Server startup (spawn + configure)
- Full machine refresh
- Workspace-scoped refresh
- Kind-specific refresh
- Cold vs. warm resolution (cache effectiveness)
- Concurrent resolve performance
Configuration (from VS Code)
│
▼
┌─ Locator Chain ──────────────────────────────┐
│ │
│ For each locator (in priority order): │
│ find() → scans known filesystem locations │
│ try_from() → identifies individual envs │
│ │
│ Parallel phases: │
│ Phase 1: Locator-specific discovery │
│ Phase 2: PATH variable scanning │
│ Phase 3: Global virtualenv directories │
│ Phase 4: Workspace recursive search │
│ │
└──────────────┬───────────────────────────────┘
│
▼
Reporter (streaming)
│
┌──────────┼──────────┐
│ │ │
▼ ▼ ▼
JSON-RPC Console Collector
(VS Code) (CLI) (Tests)
pet-fs (standalone)
└─→ pet-python-utils (path normalization, symlink resolution)
└─→ pet-env-var-path (PATH processing)
└─→ pet-global-virtualenvs (path expansion)
pet-core (standalone)
└─→ All locator crates (Locator trait, PythonEnvironment, Reporter)
pet-jsonrpc (standalone)
└─→ pet-reporter (JSON-RPC notification sending)
pet-conda
└─→ pet-pyenv (conda-in-pyenv delegation)
└─→ pet-pixi (conda metadata reuse)
└─→ pet-windows-registry (conda env filtering)
pet-virtualenv
└─→ pet-venv (virtualenv detection reuse)
└─→ pet-virtualenvwrapper (is_virtualenv check)
└─→ pet-pipenv (virtualenv validation)
pet (main binary)
└─→ All crates (orchestration)