Skip to content

perf: reduce tracker cold-start and concurrent measurement overhead#1246

Open
davidberenstein1957 wants to merge 17 commits into
masterfrom
davidberenstein1957/codecarbon-api-speed-test
Open

perf: reduce tracker cold-start and concurrent measurement overhead#1246
davidberenstein1957 wants to merge 17 commits into
masterfrom
davidberenstein1957/codecarbon-api-speed-test

Conversation

@davidberenstein1957

@davidberenstein1957 davidberenstein1957 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR reduces CodeCarbon measurement launch latency and improves concurrent-run throughput while preserving existing behavior. Changes focus on deferring work until it is needed, caching hardware detection within a process, and slimming import paths.

Performance results — cold launch (offline Mac ARM)

Metric Before After Improvement
Tracker __init__ ~15.7 s ~94 ms ~99% faster
start() (cold) ~1.0 s ~194 ms ~81% faster
First sample (cold) ~18.2 s ~288 ms ~98% faster (~63×)
Warm lifecycle (init+start+stop, same process) ~62 ms ~6 ms ~10×
CLI monitor subprocess overhead (codecarbon monitor … sleep 2) ~1.5 s ~900–1000 ms ~35–40% faster

Cold-path numbers are the first tracker in a fresh process; warm-path numbers reuse cached hardware within the same process.

Performance results — run throughput (offline, warm, same process)

Repeated OfflineEmissionsTracker(output_methods=[]) lifecycles (init → start → stop) in one Python process:

Mode Before After Improvement
Sequential runs / min ~926 ~2,200 ~2.4×
Parallel runs / min (8 threads) ~7,268 ~12,300 ~1.7×
Warm run latency (p50) ~62 ms ~6 ms ~10×

Before = master baseline (2026-06-17); after = hardware cache + warm lifecycle optimizations. Parallel benchmark: 8 worker threads, 15 s sustained load.

What changed

Tracker lifecycle

  • Lazy-import heavy modules; defer hardware probing, geo validation, and emissions engine until first use
  • Skip 1 Hz power monitor when output_methods=[]; skip redundant measurement on stop() when a sample was just taken
  • Global cpu_percent prime once per process

Hardware detection

  • Process-level hardware setup cache (hardware_cache.py) — CPU/GPU/RAM detection reused across instances
  • Platform-aware CPU backend order; cached GPU/CPU/PowerMetrics probes via @lru_cache

API write path

  • Lazy POST /runs — deferred until first emission upload (create_run_automatically=False + _ensure_api_run)

CLI

  • Lazy imports in cli/main.py for faster codecarbon monitor startup
  • Single entry point: codecarbon monitor (with optional --log-level)

Docs

  • Fix --log-level default in CLI reference (ERROR, not INFO)
  • FAQ note explaining warm hardware reuse within one Python process

Intentionally not included

These were explored during the perf work but removed to keep the diff focused on high-impact changes:

  • Process-level output handler cache, config cache, HTTP session pooling, and ApiClient pooling
  • Separate codecarbon-monitor console script
  • Benchmark scripts and carbonserver Docker startup shortcuts

Test plan

  • CODECARBON_ALLOW_MULTIPLE_RUNS=True pytest --ignore=tests/test_viz_data.py -m 'not integ_test' tests/ (541 passed locally)
  • tests/test_hardware_cache.py — cache hit/miss, clear_cache, round-trip reuse
  • CLI monitor tests — offline validation, --log-level, wrapped-command delegation
  • Manual: codecarbon monitor --offline --country-iso-code FRA -- sleep 1
  • Manual: codecarbon monitor --offline --country-iso-code FRA --log-level debug -- python train.py

Notes

Throughput numbers captured on offline Mac ARM (2026-06-18). The first tracker in a process is slower than subsequent ones because hardware detection runs once and is reused — see the FAQ.

Defer heavy imports and hardware probing until first use, cache hardware
setup per process, and add a lightweight codecarbon-monitor CLI entry point
so measurement launch and parallel runs stay fast without changing behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
davidberenstein1957 and others added 4 commits June 17, 2026 23:24
Skip the slow powermetrics sudo probe on Apple Silicon when cpu_load
setup succeeds, strip leaked subcommand tokens from monitor ctx.args,
and update tests for lazy tracker imports in run_and_monitor.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use class-name hardware cache serialization to survive module reloads in
tests, lazy-import get_datetime_with_timezone in config CLI, add probe cache
clear helpers, and update tests for lazy imports and get_cached_tdp.

Co-authored-by: Cursor <cursoragent@cursor.com>
Provide harnesses to measure cold-start, throughput, and API latency during
optimization so regressions can be caught and logged consistently.

Co-authored-by: Cursor <cursoragent@cursor.com>
Remove local-only harnesses used during optimization; the library perf
changes and their tests are sufficient for review without dev tooling.

Co-authored-by: Cursor <cursoragent@cursor.com>
@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.33728% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.35%. Comparing base (58acafa) to head (8ccb30e).

Files with missing lines Patch % Lines
codecarbon/core/resource_tracker.py 86.00% 7 Missing ⚠️
codecarbon/core/hardware_cache.py 98.30% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1246      +/-   ##
==========================================
+ Coverage   89.17%   89.35%   +0.17%     
==========================================
  Files          47       48       +1     
  Lines        4510     4762     +252     
==========================================
+ Hits         4022     4255     +233     
- Misses        488      507      +19     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

davidberenstein1957 and others added 2 commits June 17, 2026 23:50
Apply formatter/linter fixes, extract platform CPU backend selection to
satisfy flake8 complexity, stabilize the force_cpu_power load test with a
mocked cpu_percent, and add hardware_cache/monitor_main coverage tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
Avoid isinstance checks across module reload boundaries and mock
AppleSiliconChip rebuild so powermetrics is not required on non-macOS runners.

Co-authored-by: Cursor <cursoragent@cursor.com>
@davidberenstein1957 davidberenstein1957 self-assigned this Jun 18, 2026
davidberenstein1957 and others added 10 commits June 18, 2026 05:10
Add targeted tests for HTTP session reuse, hardware cache round-trips,
platform CPU backend selection, and other newly introduced code paths so
codecov patch checks pass on the PR.

Co-authored-by: Cursor <cursoragent@cursor.com>
Reuse output handlers, ApiClient instances, config reads, and Logfire
setup across repeated tracker lifecycles so CSV/API/Logfire paths stay
fast on warm runs. Add benchmark scripts for lifecycle and per-output
throughput measurement.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Remove output_cache since micro-benchmarks showed no meaningful full-lifecycle
gain; retain config caching, ApiClient pooling, and Logfire configure-once.

Co-authored-by: Cursor <cursoragent@cursor.com>
Drop session, config, logfire, and file-header caches that added complexity without clear wins, revert carbonserver bootstrap shortcuts, and align tests with direct ApiClient usage.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace hand-rolled globals for GPU/CPU/PowerMetrics probes with functools.lru_cache, use direct imports in hardware_cache.clear_cache(), and dedupe CodeCarbonAPIOutput emit paths.

Co-authored-by: Cursor <cursoragent@cursor.com>
Restore lazy sys.modules clearing so conftest teardown does not load gpu_nvidia before FakeGPUEnv tests install mock pynvml.

Co-authored-by: Cursor <cursoragent@cursor.com>
Drop codecarbon-monitor in favor of codecarbon monitor, add --log-level
there, and document warm hardware reuse plus the correct log-level default.

Co-authored-by: Cursor <cursoragent@cursor.com>
Capture cpu counts, canonical GPU ids, and RAPL settings in cached plans,
sync tracker state on apply, and pass tracking_mode through all CPU backends.

Co-authored-by: Cursor <cursoragent@cursor.com>
Align test_set_cpu_tracking_skips_tdp_when_rapl_available with the
resource tracker change that passes tracking_mode to CPU.from_utils.

Co-authored-by: Cursor <cursoragent@cursor.com>

@inimaz inimaz left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice @davidberenstein1957 thanks a lot for taking a look at this. There are many improvements done at once here. Maybe can you split it into smaller PRs? Like this it will be easier to review. I add some comments already.

Comment thread codecarbon/core/cpu.py
self.model, self.tdp = self._main()

@staticmethod
def _get_cpu_constant_power(match: str, cpu_power_df: pd.DataFrame) -> int:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not delete this, if we are using pandas only for typing we can do

from typing import TYPE_CHECKING
if TYPE_CHECKING:
    import pandas as pd

Comment thread codecarbon/core/cpu.py
return None

def _get_matching_cpu(
self, model_raw: str, cpu_df: pd.DataFrame, greedy=False

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

from typing import Dict, Optional

import pandas as pd

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same in this file

def _hardware_kind(hw) -> str:
"""Classify hardware without isinstance (safe if modules were reloaded)."""
name = type(hw).__name__
if name == "RAM":

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do an Enum out of these strings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants