Cluster independent#59
Draft
homm wants to merge 26 commits into
Draft
Conversation
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
This was referenced Apr 28, 2026
Open
$ time cargo bb Timer precision: 41 ns bench fastest β slowest β median β mean β samples β iters ββ ioreport β β β β β β ββ get_sample 2.769 ms β 4.39 ms β 2.892 ms β 2.897 ms β 100 β 100 β β°β subscription 87.26 ms β 93.58 ms β 88.55 ms β 89.29 ms β 10 β 10 ββ sampler β β β β β β β°β get_metrics 18.77 ms β 24.82 ms β 19.33 ms β 20.32 ms β 10 β 10 β°β smc β β β β β ββ full_init 670.6 ms β 738 ms β 730.9 ms β 713.2 ms β 3 β 3 β°β read_all_keys 670.4 ms β 689 ms β 679 ms β 679.5 ms β 3 β 3 cargo bb 0.47s user 1.65s system 33% cpu 6.418 total Co-authored-by: Codex <codex@openai.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Preface
I understand that this PR is unusually large and changes too many concerns at once. In an ideal world, this work would be split into several smaller, logically independent pull requests and reviewed step by step.
At the same time, I want to be transparent about the tradeoff here: for me, separating this into cleanly isolated pieces would be a substantial amount of additional work on top of the implementation itself, because many of the changes are tightly connected by the new internal model and project structure. For that reason, I would like to propose the changes in their current form first, with a detailed explanation of what was changed, why it was changed, and what behavior is affected.
Summary
This PR significantly reworks the internal architecture of
macmonto separate metric collection from the CLI/TUI, introduce a reusable library layer, stabilize the data model, and make the project usable from external native consumers.Compared to
main, this branch:libandmacmoncrate;XCFrameworkpackaging;pipeJSON format to reflect the new data model;Main Changes
1. Project split into library + app
The project is now a Cargo workspace with two crates:
crates/lib:macmon-libcrates/macmon: CLI/TUI applicationThis changes the role of the binary crate: it is now mostly a consumer of the library instead of being the place where collection logic and UI live together.
Why this matters
2. New FFI / native integration layer
A new public FFI surface was added:
crates/lib/include/macmon.h;module.modulemapand release packaging forXCFramework.This means
macmoncan now be integrated from C / Objective-C / Swift directly, without having to parsestdoutfrommacmon pipe.The application itself does not use this layer as an integration boundary. The
crates/macmoncrate consumesmacmon-libdirectly from Rust, while the FFI surface exists specifically for native consumers outside Rust.This separation already made it practical to ship bindings for other ecosystems. Two examples are available today in
macmon-bindings, which provides Python and Swift bindings on top of the shared library. At the moment, these bindings are published from my fork. If this change is accepted upstream, I will be happy to switch the references over to the main repository.3. Metrics and SoC model redesign
The metrics model was redesigned together with the SoC data model to replace a fixed, field-based public shape with a domain-oriented one.
Previously, the code already collected SoC-specific information such as CPU core counts and DVFS tables from system sources, but that information was projected into a hardcoded public model built around dedicated
ECPUandPCPUfields. As a result, both the Rust data structures and the JSON output exposed a flattened schema such asecpu_usage,pcpu_usage, andgpu_usage.In this branch, the public model is organized around explicit hardware domains instead.
SocInfonow exposescpu_domains, and metrics are grouped into named categories such as:cpu_usagegpu_usagepowermemorytempFor CPU domains, the output now includes:
For GPU data, the output now includes:
A key outcome of this redesign is that the same conceptual model is now preserved across all layers:
This keeps the internal representation and the public contract aligned, reduces translation between layers, and makes the library easier to test, reuse, and expose through bindings.
Example
Old shape:
New shape:
4. Sampling and calculation fixes
This PR also includes behavior fixes, not just structural changes.
Sampling now matches the requested interval
In the previous branch, a requested
1 sinterval was internally handled as4 x 250 mssamples plus an extra averaging pass, with the sampling loop implemented inIOReport::get_samples.In this branch, the same
1 sinterval is sampled as an actual1 sinterval instead of being split into smaller internal windows.That earlier
4 x 250 msscheme was introduced as a workaround for the behavior described in issue #10. However, the later investigation showed that the original problem had a different cause: the older implementation sampled only a fixed 80 ms window and then slept for the remainder of the interval, whileget_metricsused that short window as the sampling duration.Now that sampling is structured around consecutive real samples and their delta, the reported metrics are derived from the actual elapsed interval itself. Because of that, the extra
4 x 250 mssmoothing loop is no longer necessary. It also allowed theIOReportsampling path to become substantially simpler by removing the internal multi-sample orchestration that used to live inIOReport::get_samples.Interval management is now fully owned by the caller
Another related change is that sampling no longer sleeps internally.
Previously,
IOReportitself managed timing by sleeping between internal samples inIOReport::get_samples. In this branch,crates/lib/src/platform/io_report.rsmoves that responsibility out ofIOReport::next_sample, which now just computes the delta between the previous sample and the current one. It does not block, does not sleep, and does not decide how often sampling should happen.This is a better separation of responsibilities:
IOReportis responsible only for sample-to-sample delta calculation;CPU usage no longer scales load by maximum frequency
Another correctness issue was in the meaning of the old
pcpu_usagevalue itself. In the previous implementation,calc_freqdid not return a plain utilization ratio. Instead, it scaled usage byavg_freq / max_freq, and the resulting value was then exposed aspcpu_usage.That is not a reliable definition of CPU load. The fact that a core has some theoretical maximum frequency does not mean that this frequency is actually reachable at the moment under the current thermal and power constraints. For example, in Low Power Mode under a full all-core workload, performance cores can be busy for essentially
100%of the interval while running at a much lower frequency. In that situation, the old formula could report something closer to40%for the P-cluster simply because current frequency was being normalized by maximum possible frequency.This also means that issue #46 correctly pointed out that the reported value was misleading, but likely described the cause incorrectly. The problem was not necessarily that one CPU cluster was being dropped. The more fundamental issue was that the metric itself was frequency-scaled in a way that under-reported real CPU occupancy.
In this branch, CPU usage is derived from the actual residency ratio over the measured interval, and frequency is reported separately. That makes the result more faithful under power limits, thermal throttling, and other situations where frequency and utilization diverge.
Temperature handling needs review
Temperature reporting previously used two different mechanisms: HID sensors and SMC keys, with the HID path implemented in
IOHIDSensors.This branch removes the HID sensor path and reports temperatures only from SMC keys. The current implementation discovers readable CPU keys with
Tp0*/Tp1*prefixes and GPU keys with theTg0*prefix, ignores zero readings, and averages only values in the15..150 Crange.This is probably the most questionable part of the PR. During the refactoring I treated the HID path as an older model that was worth removing to simplify the temperature code. One reason was practical: the HID path did not work on my M4 Pro machine running macOS 15, while the SMC path did. However, I have not been able to confirm that assumption from external sources such as other projects, Apple documentation, or forum discussions, and I do not have evidence that HID sensors are generally obsolete or safe to drop for all supported machines.
So I would like reviewer input here. If HID sensors are still needed for some Apple Silicon models or macOS versions, this part should probably be reverted or changed into a fallback instead of being removed outright. If SMC-only reporting is considered acceptable, then the benefit is a smaller and more predictable temperature path, but I do not want to present that as proven without broader hardware coverage.
Power reporting is grouped by scope
Power fields are now grouped under
powerinstead of being emitted as separate top-level JSON fields.The aggregate field also changed:
all_power = cpu_power + gpu_power + ane_powerpower.package = power.cpu + power.gpu + power.ane + power.ram + power.gpu_ramSo
ram_powerandgpu_ram_powerare now included in the package aggregate instead of being reported next to anall_powervalue that did not include them.Old shape:
{ "all_power": 1.1913399696350098, "cpu_power": 0.8916881680488586, "gpu_power": 0.2996518015861511, "ane_power": 0.0, "ram_power": 0.25657692551612854, "gpu_ram_power": 0.0, "sys_power": 16.489110946655273 }New shape:
{ "power": { "package": 1.399669, "cpu": 0.7379913, "gpu": 0.37346822, "ram": 0.2882096, "gpu_ram": 0.0, "ane": 0.0, "board": 10.063049, "battery": 0.4016017, "dc_in": 9.783953 } }The SMC readings that used to be represented only by
sys_powerare now split into separate fields where available:power.board: SMCPSTRpower.battery: SMCPPBRpower.dc_in: SMCPDTR6. Startup time optimizations
This PR also includes a separate set of startup-path optimizations aimed at reducing the time to the first sample.
The main changes here are:
Why this matters
This makes the UI start updating sooner and makes scripted measurements more predictable: the command spends less extra time outside the requested sampling window.
For a single first sample:
The time to produce a first sample dropped from about
3.0sto about0.66s. That directly improves the TUI because the first visible update arrives sooner.For scripts that request a longer measurement window, the total command time is now much closer to the requested interval:
For a requested
5smeasurement, the total command runtime moved from about7.93sto about5.28s, which makes the start-to-finish duration much closer to the configured sampling window.The same change also reduces drift across repeated short samples. With
-i 50 -s 100, the requested sampling time is100 x 50ms = 5s:The total runtime dropped from about
20.37sto about5.92s. This still includes startup time, and some drift is still expected because timer scheduling is not exact, but the accumulated drift over many samples is substantially smaller.7. CLI / TUI updates
The CLI is now a thinner wrapper over
macmon-lib.pipeoutput uses the library metrics schemapipestill emits one JSON object per line. The change is the object schema: the CLI now serializes the sameMetricsmodel that is exposed bymacmon-lib.pipeoutput now uses the same grouped fields described above:cpu_usage,gpu_usage,power,memory,temp, andtimestamp.--soc-info--soc-infostill adds a top-levelsocobject to eachpipesample. The shape of that object changed with the SoC model:ecpu_cores,pcpu_cores,ecpu_freqs, andpcpu_freqs;cpu_domains, where each domain has aname,units, andfreqs_mhz.GPU frequency metadata was also renamed from
gpu_freqstogpu_freqs_mhz.TUI
The TUI now renders CPU/GPU sections based on the discovered domain structure instead of assuming only the previous fixed layout. Labels and usage/power presentation were also updated to fit the richer data model.
User-visible / consumer-visible effects
Improvements
pipeoutput;Compatibility impact
This PR includes breaking output changes for consumers of
macmon pipe.Breaking changes
pipeJSON format is changed;cpu_temp_avg->cpu_avggpu_temp_avg->gpu_avgAny external scripts or dashboards parsing the old JSON shape will need to be updated.
Known regression
macmon debugcurrently prints much less diagnostic information than it did before this refactoring. That is not an intended compatibility break; it is a known regression in this branch. I plan to restore the useful debug output after getting feedback on the main library/model changes.Testing
This PR extends the existing test coverage to cover the new library/API split and the new metrics model.
Compared to
main, it adds tests for:CI on
mainpreviously checked formatting and buildability, but did not run the test suite. This PR updatescheck.ymlto run workspace tests as well.One FFI smoke test remains environment-dependent because sampler initialization may fail depending on host runtime state and available macOS interfaces. The test still runs, but it now treats
MACMON_STATUS_INIT_FAILEDas a host-specific limitation rather than a hard failure, so it does not make CI flaky on machines where sampler initialization is unavailable.Personal motivation: StillCore
My personal motivation for this work is StillCore, a macOS menu bar app I am building for Apple Silicon metrics. It shows live charts for SoC power, temperature, frequency, and usage, and it uses
macmon-libthrough Swift bindings rather than parsingmacmon pipeoutput.StillCore currently builds against the
CMacmon.xcframeworkproduced from this branch and the siblingmacmon-bindingscheckout.I do not want to make
macmoncarry StillCore-specific code or requirements. The reason I mention it here is narrower: StillCore is a real native macOS app that uses this branch as its metrics backend, so it helped shape the library API, packaging, sampler lifecycle, domain-based CPU/GPU model, grouped power fields, and Swift integration.What's next
At this point, I would mostly like to understand how this direction fits the maintainer's view of
macmon.The main questions for me are:
macmon debugregression;I am happy to continue the work in whichever form is more useful for the project, but I would like feedback on that direction before spending more time reshaping the branch.