Skip to content

Architecture & Codebase Guide

Stella Huang edited this page Apr 6, 2026 · 1 revision

Python Environment Tools (PET) — Architecture & Codebase Guide

What Is This Project?

Python Environment Tools (PET) is a high-performance tool written in Rust that discovers every Python installation on your computer — fast, without actually running Python. It exists because the VS Code Python extension needs to know about all available Python environments, and previously it had to spawn Python processes repeatedly (which is slow). PET replaces that with filesystem scanning and intelligent heuristics.

It runs as a JSON-RPC server over stdin/stdout so VS Code can talk to it continuously, asking "what Python environments exist?" and getting streaming results back in real-time.


The Big Picture Architecture

┌─────────────────────────────────────────────────────┐
│                VS Code Python Extension              │
│            (sends JSON-RPC requests)                 │
└──────────────────────┬──────────────────────────────┘
                       │ stdio
                       ▼
┌─────────────────────────────────────────────────────┐
│                   PET Binary (pet)                   │
│  CLI modes: find | resolve | server                  │
│                                                      │
│  ┌─────────────┐  ┌────────────┐  ┌──────────────┐ │
│  │  JSON-RPC   │  │  Discovery  │  │  Resolution  │ │
│  │  Server     │  │  Engine     │  │  (spawn      │ │
│  │  (jsonrpc)  │  │  (find.rs)  │  │   Python)    │ │
│  └──────┬──────┘  └──────┬─────┘  └──────┬───────┘ │
│         │                │                │          │
│         ▼                ▼                ▼          │
│  ┌──────────────────────────────────────────────┐   │
│  │            Locator Chain (Priority Order)     │   │
│  │                                               │   │
│  │  1. Windows Store    9. Pipenv               │   │
│  │  2. Windows Registry 10. VirtualEnvWrapper   │   │
│  │  3. WinPython       11. Venv                 │   │
│  │  4. PyEnv           12. VirtualEnv           │   │
│  │  5. Pixi            13. Homebrew             │   │
│  │  6. Conda           14. macOS (CLT/Xcode/Org)│   │
│  │  7. UV              15. Linux Global          │   │
│  │  8. Poetry                                    │   │
│  └──────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘

How Discovery Works (The Core Algorithm)

When VS Code asks "find me all Python environments," PET runs four parallel phases:

Phase 1: Locator-Specific Discovery

Each locator (Conda, PyEnv, Poetry, etc.) independently searches its own known locations:

  • Conda reads ~/.conda/environments.txt and .condarc files
  • PyEnv scans ~/.pyenv/versions/
  • Poetry checks pypoetry/virtualenvs/ cache directories
  • Windows Registry reads HKLM\Software\Python
  • Homebrew checks /opt/homebrew/bin/ and /usr/local/bin/

Phase 2: PATH Variable Scan

Scans every directory in the system PATH for Python executables.

Phase 3: Global Virtual Environment Directories

Searches well-known virtualenv storage locations: ~/.virtualenvs, ~/.local/share/virtualenvs, WORKON_HOME, etc.

Phase 4: Workspace Search

Recursively searches the currently open project folders, prioritizing .venv, .conda, and .pixi/envs directories.

All four phases run concurrently. As each environment is found, it's reported immediately to VS Code via JSON-RPC notification — no waiting for the full scan to finish.


The 26 Crates (Modules) Explained

Core Infrastructure (6 crates)

Crate Purpose
pet Main binary — CLI entry point, JSON-RPC server, discovery orchestration, locator chain management
pet-core Shared traits and types — the Locator trait, PythonEnvironment struct, Reporter trait, Configuration, caching abstractions, telemetry events
pet-fs Filesystem utilities — path normalization (Windows case-insensitive), glob expansion, symlink resolution, tilde/env-var expansion
pet-python-utils Python-specific utilities — spawning Python for info, dual-layer caching (memory + JSON files), version extraction from pyvenv.cfg and patchlevel.h headers, executable discovery
pet-jsonrpc JSON-RPC 2.0 protocol implementation — message parsing, request/notification routing over stdin/stdout with Content-Length headers
pet-reporter Reporter implementations — CacheReporter (deduplication), JsonRpcReporter (streaming to VS Code), StdioReporter (console output), CollectReporter (tests)

pet (Main Binary)

The main entry point providing three modes:

  • pet find — Lists environments in specified directories or workspace. Supports --json output, --kind filtering, and glob patterns in search paths.
  • pet resolve <executable> — Deep-resolves a single Python executable by spawning it and returning full details.
  • pet server — Starts the JSON-RPC server for continuous communication with VS Code.

Key source files:

  • crates/pet/src/main.rs — CLI argument parsing via clap
  • crates/pet/src/lib.rs — Library facade, tracing initialization, CLI wrappers
  • crates/pet/src/find.rs — Multi-threaded discovery coordination with 4 parallel phases
  • crates/pet/src/jsonrpc.rs — JSON-RPC server with request deduplication, generation-based staleness detection, and atomic missing-env reporting
  • crates/pet/src/locators.rs — Ordered locator chain creation and unified identification logic
  • crates/pet/src/resolve.rs — Single-executable resolution by spawning Python and merging results

pet-core (Shared Foundation)

Defines the fundamental abstractions used by every other crate:

  • Locator trait — Core abstraction for environment discovery. Methods: find() (discovery), try_from() (identification), configure(), supported_categories().
  • Reporter trait — Async discovery reporting (report_environment(), report_manager(), report_telemetry()). Send + Sync for thread safety.
  • PythonEnvironment — The main result struct with fields for executable, kind, version, prefix, manager, project, arch, symlinks, and error state. Includes a builder pattern for fluent construction.
  • PythonEnvironmentKind — Enum of 21 environment types (Conda, Pixi, Homebrew, Pyenv, GlobalPaths, etc.).
  • EnvManager — Represents discovered managers (Conda, Mamba, Pipenv, Poetry, Pyenv).
  • Configuration — Runtime configuration passed to locators (workspace directories, executables, cache directory, tool paths).
  • LocatorCache<K, V> — Thread-safe HashMap wrapper with RwLock and double-checked locking for concurrent read-heavy workloads.
  • PyVenvCfg — Parser for pyvenv.cfg files to extract version, prompt, and environment metadata.
  • EnvironmentApi — OS abstraction (home directory, PATH, env vars) with platform-specific implementations.
  • Telemetry typesRefreshPerformance, InaccuratePythonEnvironmentInfo, MissingCondaEnvironments, MissingPoetryEnvironments.

pet-fs (Filesystem Layer)

Low-level path manipulation used throughout the project:

  • norm_case() — Windows: normalizes case via GetLongPathNameW without resolving junctions. Unix: no-op.
  • expand_glob_patterns() — Brace expansion + glob matching (capped at 1024 expansions to prevent DOS).
  • resolve_symlink() — Filtered symlink resolution for Python/Conda executables only.
  • expand_path() — Expands ~ and environment variables (${HOME}, ${USERNAME}).
  • strip_trailing_separator() — Removes trailing path separators while preserving roots.

pet-python-utils (Python Utilities)

High-level utilities for resolving and caching Python environment information:

  • ResolvedPythonEnv — Represents a fully resolved Python environment (executable, prefix, version, architecture, symlinks). Created by spawning python -c with JSON output parsing.
  • Dual-layer caching — In-memory Arc<Mutex<>> + JSON file backing store with SHA256 hash keys and mtime/ctime validation.
  • Version extraction — From pyvenv.cfg, patchlevel.h header files, symlink creator tracing, and mtime heuristics.
  • Executable discoveryfind_executable(), find_executables(), is_python_executable_name(), broken symlink detection.
  • Path filteringshould_search_for_environments_in_path() excludes 30+ directories (node_modules, .git, pycache, etc.).

pet-jsonrpc (Protocol Layer)

JSON-RPC 2.0 server implementation:

  • LSP-style Content-Length header protocol over stdin/stdout
  • Handler registry for requests (with ID, expects response) and notifications (no ID)
  • send_message(), send_reply(), send_error() for outbound communication
  • Infinite server loop reading and routing messages

pet-reporter (Reporting Layer)

Multiple Reporter trait implementations:

  • CacheReporter — Wraps any reporter with deduplication (tracks reported managers/environments by path).
  • JsonRpcReporter — Sends discoveries as JSON-RPC notifications. Supports kind-based filtering.
  • StdioReporter — Human-readable console output with statistics (used by CLI find command).
  • CollectReporter — Accumulates results in vectors for testing.

Package Manager Locators (5 crates)

pet-conda

Discovers Conda/Miniconda/Miniforge/Micromamba environments.

Discovery Strategy:

  1. Reads ~/.conda/environments.txt for known environment paths
  2. Parses .condarc YAML files from 15+ system/user locations for envs_dirs
  3. Scans hardcoded known locations (platform-specific)
  4. For each conda install directory, enumerates envs/ subdirectories

Key Techniques:

  • Extracts Python version from conda-meta/python-*.json package files (no spawning)
  • Traces conda installation from environment's conda-meta/history file (# cmd: lines)
  • Detects Mamba/Micromamba managers alongside Conda
  • Multi-threaded scanning with rayon for parallel environment processing
  • WSL-aware: filters out incompatible /mnt/<drive>/ paths on Linux subsystem
  • Telemetry reports missing environments (found by spawning conda but not by filesystem parsing)

Files: lib.rs, conda_info.rs, conda_rc.rs, env_variables.rs, environment_locations.rs, environments.rs, manager.rs, package.rs, telemetry.rs, utils.rs

pet-pyenv

Discovers PyEnv (Unix) and PyEnv-win (Windows) managed Python versions.

Discovery Strategy:

  1. Locates pyenv root via PYENV_ROOT/PYENV env vars or ~/.pyenv
  2. Scans all subdirectories in versions/
  3. Distinguishes: regular installs vs. pyenv-virtualenv environments vs. conda-in-pyenv

Key Techniques:

  • Extracts version from folder names using 4 regex patterns (stable, dev, beta, win32)
  • Falls back to patchlevel.h header file parsing
  • Detects -win32 suffix for x86 architecture
  • Delegates conda environments to the conda locator
  • Caches manager info with self-hydrating cache pattern

Files: lib.rs, manager.rs, environments.rs, environment_locations.rs, env_variables.rs

pet-poetry

Discovers Poetry project environments.

Discovery Strategy:

  1. Scans workspace directories for pyproject.toml files
  2. Generates project hash matching Poetry's algorithm: SHA256(normalized_path) → base64-URL, first 8 chars
  3. Lists environments from Poetry's virtualenvs cache directory
  4. Filters by hash prefix match or in-project .venv

Key Techniques:

  • Reproduces Poetry's exact naming convention: {sanitized-name}-{8-char-hash}-py{version}
  • Config priority cascade: local poetry.toml → env vars → global config.toml
  • Fallback: spawns poetry env list --full-path for completeness
  • Telemetry tracks discrepancies between file-based and spawn-based discovery

Files: lib.rs, manager.rs, config.rs, environment.rs, environment_locations.rs, environment_locations_spawn.rs, pyproject_toml.rs, env_variables.rs, telemetry.rs

pet-pipenv

Discovers Pipenv project environments.

Discovery Strategy — Three-Tier Fallback:

  1. Centralized directory detection — Checks if prefix is under a known virtualenv directory with .project file
  2. Naming pattern fallback — Validates directory name matches {name}-{8-char-hash} regex (handles corrupted .project)
  3. In-project detection — Walks up from executable looking for Pipfile in parent directory

Key Techniques:

  • Searches 6+ centralized virtualenv directories (WORKON_HOME, XDG_DATA_HOME, ~/.virtualenvs, etc.)
  • Reads .project file for project directory linking
  • Discovers Pipenv manager executable via PATH, pipx, AppData locations
  • Robust to deleted projects — still identifies as pipenv even if Pipfile is gone

Files: lib.rs, manager.rs, env_variables.rs

pet-pixi

Identifies Pixi package manager environments.

Identification Only (no active discovery):

  • Detects conda-meta/pixi marker file in environment prefix
  • Reuses CondaPackageInfo to extract Python version from conda metadata
  • Minimal crate (~100 LOC) — Pixi environments are typically found during workspace scanning

Files: lib.rs


Modern Tool Locators (1 crate)

pet-uv

Discovers UV-managed Python installations and workspace environments.

Dual Environment Types:

  • Uv — Individual project environments or globally-managed Python installs
  • UvWorkspace — Multi-project workspace environments with shared .venv

Discovery Strategy:

  1. Scans {uv_install_dir} for managed Python versions (dir pattern: cpython-X.Y.Z-os-arch-libc)
  2. Identifies virtual environments via pyvenv.cfg with uv, version_info, and prompt fields
  3. Validates workspace membership using glob pattern matching against pyproject.toml[tool.uv.workspace]

Key Techniques:

  • Skips junction/symlink directories (minor-version aliases) to avoid duplicates
  • Workspace member validation: evaluates members and exclude glob patterns with require_literal_separator
  • Platform-aware install directory: %APPDATA%/uv/python (Windows), $XDG_DATA_HOME/uv/python (Unix)

Files: lib.rs


Virtual Environment Locators (4 crates)

pet-venv

Identifies Python venv module environments.

  • Detects pyvenv.cfg file at environment root
  • Handles broken symlinks (reports error state) and missing executables
  • No active discovery — relies on workspace/PATH scanning to find candidates
  • Extracts version and prompt from pyvenv.cfg

pet-virtualenv

Identifies legacy virtualenv package environments.

  • Detects activation scripts (activate, activate.bat, activate.ps1, activate.fish) in bin//Scripts/
  • Excludes system paths (/bin, /usr/bin, /usr/local/bin) to prevent false positives
  • Version extraction via symlink creator tracing

pet-virtualenvwrapper

Identifies virtualenvwrapper-managed environments.

  • Checks if environment is under WORKON_HOME directory (default: ~/.virtualenvs)
  • Reads .project file for project directory linking
  • Platform-specific defaults: %USERPROFILE%\Envs (Windows), $HOME/.virtualenvs (Unix)
  • Canonicalizes paths to handle symlinked WORKON_HOME directories

pet-global-virtualenvs

Aggregator that discovers virtual environments across 10+ well-known locations.

  • Searches: WORKON_HOME, XDG_DATA_HOME/virtualenvs, ~/envs, ~/.direnv, ~/.venvs, ~/.virtualenvs, ~/.local/share/virtualenvs, VIRTUAL_ENV
  • Filters out conda environments (checks for conda-meta directory)
  • Returns deduplicated, sorted list of discovered paths

Platform-Specific Locators (6 crates)

pet-homebrew (macOS/Linux)

Discovers Homebrew-installed Python.

  • Scans /opt/homebrew/bin (Apple Silicon), /usr/local/bin (Intel), /home/linuxbrew/.linuxbrew/bin (Linux)
  • Resolves through Cellar symlinks to find real executable location
  • Builds comprehensive symlink lists across Framework, Cellar, and bin directories
  • Verifies all candidate symlinks actually resolve to the target executable

pet-mac-commandlinetools (macOS)

Discovers Apple Command Line Tools Python.

  • Checks /Library/Developer/CommandLineTools/usr/bin and Framework directories
  • Resolves symlink chains from /usr/bin/python3 → CLT Python
  • Extracts version from patchlevel.h headers or spawns Python as fallback

pet-mac-python-org (macOS)

Discovers python.org installer Python.

  • Scans /Library/Frameworks/Python.framework/Versions/
  • Handles Current symlink directory (skips during enumeration)
  • Collects symlinks from /usr/local/bin and Framework directories

pet-mac-xcode (macOS)

Discovers Xcode-bundled Python.

  • Identifies executables under /Applications/Xcode*.app/Contents/Developer/
  • Handles multiple Xcode installations with different version suffixes
  • No active discovery — assumes Xcode Python is accessed via /usr/bin/python3

pet-linux-global-python (Linux)

Discovers system Python installations.

  • Searches /bin, /usr/bin, /usr/local/bin with canonicalization
  • Requires spawning Python to get resolved info (unlike most locators)
  • Tracks symlinks within the same directory to avoid duplicates
  • Caches discovered environments to prevent reprocessing

pet-windows-registry (Windows)

Discovers registry-registered Python.

  • Reads HKLM\Software\Python and HKCU\Software\Python hierarchies
  • Extracts InstallPath, ExecutablePath, Version, SysArchitecture, DisplayName
  • Filters out Windows Store entries and delegates conda environments to conda locator

Windows-Only Locators (2 crates)

pet-windows-store (Windows)

Discovers Microsoft Store Python.

  • Scans %USERPROFILE%\AppData\Local\Microsoft\WindowsApps for PythonSoftwareFoundation.Python.* directories
  • Cross-references AppModel registry for display names and package locations
  • Handles ambiguous python.exe/python3.exe symlinks (only one version gets them if multiple installed)

pet-winpython (Windows)

Discovers WinPython portable distributions.

  • Detects .winpython or winpython.ini marker files
  • Validates directory naming pattern (WPy64-31300, WPy32-3900)
  • Searches user desktops, downloads, documents, and drive roots (C:, D:, E:)
  • Extracts version from python folder names (python-3.13.0.amd64)

Utility Crates (2 crates)

pet-env-var-path

Processes the PATH environment variable for Python discovery.

  • Normalizes paths: Unix resolves symlinks (merged-usr: /bin/usr/bin), Windows normalizes case without resolving junctions (preserves Scoop paths)
  • Filters out Windows Store WindowsApps paths (already handled by dedicated locator)
  • Deduplicates via HashSet

pet-telemetry

Post-discovery verification and accuracy reporting.

  • Compares discovered environment info against resolved (spawned) Python data
  • Validates 5 fields: executable, symlinks, prefix, architecture, version
  • Canonicalizes paths to handle venv symlinks and junctions
  • Reports InaccuratePythonEnvironmentInfo events for telemetry

The JSON-RPC Protocol

The communication between VS Code and PET follows this flow:

Requests (Client → Server)

  1. configure (must be first) — Sets workspace directories, known tool paths (conda, pipenv, poetry), environment directories, and cache directory. Supports glob patterns.

  2. refresh — Triggers environment discovery. Can be scoped:

    • By searchKind — Filter to a specific environment type (e.g., Conda only)
    • By searchPaths — Search specific directories/executables with glob support
  3. resolve — Deep-resolves a single Python executable by spawning it. Caches results.

  4. find — Identifies environments in a specific path without full refresh.

  5. clear — Clears the cache directory.

Notifications (Server → Client)

  • environment — Discovered Python environment (streamed as found)
  • manager — Discovered environment manager (Conda, Poetry, PyEnv, etc.)
  • log — Log messages (info, warning, error, debug, trace)

Refresh Deduplication

The server has a sophisticated RefreshCoordinator that prevents redundant work:

  • Identical concurrent requests join the same discovery run (same options + config generation)
  • Different requests serialize via condition variable and wait their turn
  • A generation counter on the configuration ensures stale discoveries don't produce results
  • A GenerationGuardedReporter discards notifications from outdated refresh runs

Key Design Principles

  1. No Process Spawning — PET reads filesystem metadata (config files, pyvenv.cfg, conda-meta/*.json, header files) instead of running python --version. Only falls back to spawning as a last resort.

  2. Report Immediately — Results stream to VS Code as found, not batched at the end. The Reporter trait decouples discovery from delivery.

  3. Complete Information in One Pass — Each locator gathers all available details (version, prefix, manager, symlinks) in a single filesystem scan rather than incrementally.

  4. Locator Priority Order — More specific locators run first (e.g., Windows Store before Windows Registry, Poetry before generic Venv). First match wins via short-circuit evaluation.

  5. Thread Safety — All locators, reporters, and caches use Arc<Mutex<>> / Arc<RwLock<>> for safe concurrent access across discovery threads. The Locator trait uses &self (not &mut self) with interior mutability to enable sharing via Arc<dyn Locator>.

  6. Cross-Platform — Conditional compilation (#[cfg(windows)], #[cfg(unix)], #[cfg(target_os = "macos")]) handles platform differences throughout.

  7. Symlink Preservation — User-facing paths are preserved as-is (not resolved to canonical form). The shortest executable path is selected for user-friendliness.


Build, Test, and CI/CD

Building

# Standard build
cargo build

# Release build (optimized: LTO, single codegen unit, opt-level 3)
cargo build --release

# Run JSON-RPC server
./target/debug/pet server

Testing

# Run all tests
cargo test --all

# Run with CI feature flags
cargo test --features ci
cargo test --features ci-poetry-global
cargo test --features ci-jupyter-container
cargo test --features ci-homebrew-container

# Run a specific test
cargo test <TESTNAME>

# Run tests for a specific package
cargo test -p pet-conda

Required Before Committing

# Format all code
cargo fmt --all

# Run clippy with warnings as errors
cargo clippy --all -- -D warnings

CI Pipelines (Azure DevOps)

Pipeline Trigger Signing Purpose
playground.yml Manual No Ad-hoc builds for experimentation
pre-release.yml Every commit to main + weekly schedule Yes (Microsoft) Nightly builds with version suffix dev.{buildId}
stable.yml Manual from release/* branches Yes (Microsoft) Production releases

All pipelines build for 8 platforms: Linux x64/ARM64/ARMv7, macOS x64/ARM64, Windows x64/ARM64.


Test Strategy

6-Layer Verification Pattern (ci_test.rs)

The primary integration test validates every discovered environment through six independent methods:

  1. Spawn Python → Verify sys.prefix and sys.version match discovery
  2. locator.try_from(executable) → Same info retrieved without full scan
  3. Symlink identification → Known symlinks correctly identify the environment
  4. resolve() method → Spawned details consistent with discovery
  5. JSON-RPC find() → JSON output consistent
  6. Field-by-field comparison → Version, prefix, architecture, manager all match

Feature-Gated Container Tests

Different test suites run in specialized CI environments:

  • ci — General CI runner (Windows, Linux, macOS)
  • ci-homebrew-container — Linuxbrew container
  • ci-jupyter-container — Codespaces Jupyter environment
  • ci-poetry-* — Poetry configurations (global, project, custom)
  • ci-perf — Performance benchmarks

Performance Testing (e2e_performance.rs)

Measures P50/P95/P99 percentiles across 10 iterations for:

  • Server startup (spawn + configure)
  • Full machine refresh
  • Workspace-scoped refresh
  • Kind-specific refresh
  • Cold vs. warm resolution (cache effectiveness)
  • Concurrent resolve performance

Data Flow Summary

Configuration (from VS Code)
        │
        ▼
┌─ Locator Chain ──────────────────────────────┐
│                                              │
│  For each locator (in priority order):       │
│    find() → scans known filesystem locations │
│    try_from() → identifies individual envs   │
│                                              │
│  Parallel phases:                            │
│    Phase 1: Locator-specific discovery       │
│    Phase 2: PATH variable scanning           │
│    Phase 3: Global virtualenv directories    │
│    Phase 4: Workspace recursive search       │
│                                              │
└──────────────┬───────────────────────────────┘
               │
               ▼
        Reporter (streaming)
               │
    ┌──────────┼──────────┐
    │          │          │
    ▼          ▼          ▼
 JSON-RPC   Console   Collector
 (VS Code)  (CLI)     (Tests)

Crate Dependency Graph

pet-fs (standalone)
  └─→ pet-python-utils (path normalization, symlink resolution)
  └─→ pet-env-var-path (PATH processing)
  └─→ pet-global-virtualenvs (path expansion)

pet-core (standalone)
  └─→ All locator crates (Locator trait, PythonEnvironment, Reporter)

pet-jsonrpc (standalone)
  └─→ pet-reporter (JSON-RPC notification sending)

pet-conda
  └─→ pet-pyenv (conda-in-pyenv delegation)
  └─→ pet-pixi (conda metadata reuse)
  └─→ pet-windows-registry (conda env filtering)

pet-virtualenv
  └─→ pet-venv (virtualenv detection reuse)
  └─→ pet-virtualenvwrapper (is_virtualenv check)
  └─→ pet-pipenv (virtualenv validation)

pet (main binary)
  └─→ All crates (orchestration)

Clone this wiki locally