Skip to content

Cluster independent#59

Draft
homm wants to merge 26 commits into
vladkens:mainfrom
homm:cluster-independent
Draft

Cluster independent#59
homm wants to merge 26 commits into
vladkens:mainfrom
homm:cluster-independent

Conversation

@homm
Copy link
Copy Markdown

@homm homm commented Apr 26, 2026

Preface

I understand that this PR is unusually large and changes too many concerns at once. In an ideal world, this work would be split into several smaller, logically independent pull requests and reviewed step by step.

At the same time, I want to be transparent about the tradeoff here: for me, separating this into cleanly isolated pieces would be a substantial amount of additional work on top of the implementation itself, because many of the changes are tightly connected by the new internal model and project structure. For that reason, I would like to propose the changes in their current form first, with a detailed explanation of what was changed, why it was changed, and what behavior is affected.

Summary

This PR significantly reworks the internal architecture of macmon to separate metric collection from the CLI/TUI, introduce a reusable library layer, stabilize the data model, and make the project usable from external native consumers.

Compared to main, this branch:

  • splits the project into a workspace with a dedicated lib and macmon crate;
  • introduces a public FFI layer with C headers and XCFramework packaging;
  • redesigns the metrics model to be domain-based instead of hardcoded around a fixed CPU/GPU layout;
  • updates the pipe JSON format to reflect the new data model;
  • fixes several issues in frequency calculation, startup behavior, and power reporting;
  • changes temperature reporting in a way that needs review.

Main Changes

1. Project split into library + app

The project is now a Cargo workspace with two crates:

  • crates/lib: macmon-lib
  • crates/macmon: CLI/TUI application

This changes the role of the binary crate: it is now mostly a consumer of the library instead of being the place where collection logic and UI live together.

Why this matters

  • metric collection and interpretation can now evolve independently from the terminal UI;
  • serialization and public data contracts are easier to test;
  • external consumers can reuse the same logic without shelling out to the binary.

2. New FFI / native integration layer

A new public FFI surface was added:

  • C-compatible structs for metrics and SoC info;
  • sampler lifecycle functions;
  • error/status reporting;
  • exported header: crates/lib/include/macmon.h;
  • module.modulemap and release packaging for XCFramework.

This means macmon can now be integrated from C / Objective-C / Swift directly, without having to parse stdout from macmon pipe.

The application itself does not use this layer as an integration boundary. The crates/macmon crate consumes macmon-lib directly from Rust, while the FFI surface exists specifically for native consumers outside Rust.

This separation already made it practical to ship bindings for other ecosystems. Two examples are available today in macmon-bindings, which provides Python and Swift bindings on top of the shared library. At the moment, these bindings are published from my fork. If this change is accepted upstream, I will be happy to switch the references over to the main repository.

3. Metrics and SoC model redesign

The metrics model was redesigned together with the SoC data model to replace a fixed, field-based public shape with a domain-oriented one.

Previously, the code already collected SoC-specific information such as CPU core counts and DVFS tables from system sources, but that information was projected into a hardcoded public model built around dedicated ECPU and PCPU fields. As a result, both the Rust data structures and the JSON output exposed a flattened schema such as ecpu_usage, pcpu_usage, and gpu_usage.

In this branch, the public model is organized around explicit hardware domains instead. SocInfo now exposes cpu_domains, and metrics are grouped into named categories such as:

  • cpu_usage
  • gpu_usage
  • power
  • memory
  • temp

For CPU domains, the output now includes:

  • domain name;
  • number of units;
  • average frequency;
  • usage;
  • per-core frequency/usage pairs.

For GPU data, the output now includes:

  • cluster or channel name;
  • number of units;
  • frequency;
  • usage.

A key outcome of this redesign is that the same conceptual model is now preserved across all layers:

  • in the Rust implementation;
  • in the shared library API;
  • in the JSON output.

This keeps the internal representation and the public contract aligned, reduces translation between layers, and makes the library easier to test, reuse, and expose through bindings.

Example

Old shape:

$ macmon pipe -i 100 -s 1 | jq
{
  "all_power": 1.1913399696350098,
  "ane_power": 0.0,
  "cpu_power": 0.8916881680488586,
  "ecpu_usage": [1902, 0.6472986936569214],
  "gpu_power": 0.2996518015861511,
  "gpu_ram_power": 0.0,
  "gpu_usage": [338, 0.062442369759082794],
  "memory": {
    "ram_total": 25769803776,
    "ram_usage": 21063680000,
    "swap_total": 3221225472,
    "swap_usage": 2080571392
  },
  "pcpu_usage": [2044, 0.024696044623851776],
  "ram_power": 0.25657692551612854,
  "sys_power": 16.489110946655273,
  "temp": {
    "cpu_temp_avg": 48.95563888549805,
    "gpu_temp_avg": 44.01784133911133
  },
  "timestamp": "2026-03-29T20:59:59.337500+00:00"
}

New shape:

$ ./target/release/macmon pipe -i 100 -s 1 | jq
{
  "cpu_usage": {
    "ECPU": {
      "units": 4,
      "freq_mhz": 2195,
      "usage": 0.35363516,
      "cores": [
        [2019, 0.46434975],
        [2215, 0.34999058],
        [2222, 0.3192543],
        [2325, 0.28094602]
      ]
    },
    "PCPU": {
      "units": 10,
      "freq_mhz": 2811,
      "usage": 0.042013668,
      "cores": [
        [1407, 0.030797029],
        [1547, 0.018137457],
        [1542, 0.018063253],
        [1549, 0.018691722],
        [1649, 0.0131477965],
        [4043, 0.04576475],
        [4105, 0.056412835],
        [4016, 0.0839907],
        [4157, 0.063755155],
        [4096, 0.07137596]
      ]
    }
  },
  "gpu_usage": {
    "GPUPH": {
      "units": 20,
      "freq_mhz": 338,
      "usage": 0.29283616
    }
  },
  "power": {
    "package": 1.399669,
    "cpu": 0.7379913,
    "gpu": 0.37346822,
    "ram": 0.2882096,
    "gpu_ram": 0.0,
    "ane": 0.0,
    "board": 10.063049,
    "battery": 0.4016017,
    "dc_in": 9.783953
  },
  "memory": {
    "ram_total": 25769803776,
    "ram_usage": 21190000640,
    "swap_total": 3221225472,
    "swap_usage": 2080571392
  },
  "temp": {
    "cpu_avg": 52.61393,
    "gpu_avg": 49.554974
  },
  "timestamp": "2026-03-29T21:03:35.734702+00:00"
}

4. Sampling and calculation fixes

This PR also includes behavior fixes, not just structural changes.

Sampling now matches the requested interval

In the previous branch, a requested 1 s interval was internally handled as 4 x 250 ms samples plus an extra averaging pass, with the sampling loop implemented in IOReport::get_samples.

In this branch, the same 1 s interval is sampled as an actual 1 s interval instead of being split into smaller internal windows.

That earlier 4 x 250 ms scheme was introduced as a workaround for the behavior described in issue #10. However, the later investigation showed that the original problem had a different cause: the older implementation sampled only a fixed 80 ms window and then slept for the remainder of the interval, while get_metrics used that short window as the sampling duration.

Now that sampling is structured around consecutive real samples and their delta, the reported metrics are derived from the actual elapsed interval itself. Because of that, the extra 4 x 250 ms smoothing loop is no longer necessary. It also allowed the IOReport sampling path to become substantially simpler by removing the internal multi-sample orchestration that used to live in IOReport::get_samples.

Interval management is now fully owned by the caller

Another related change is that sampling no longer sleeps internally.

Previously, IOReport itself managed timing by sleeping between internal samples in IOReport::get_samples. In this branch, crates/lib/src/platform/io_report.rs moves that responsibility out of IOReport::next_sample, which now just computes the delta between the previous sample and the current one. It does not block, does not sleep, and does not decide how often sampling should happen.

This is a better separation of responsibilities:

  • IOReport is responsible only for sample-to-sample delta calculation;
  • the caller is responsible for scheduling and interval management;
  • the elapsed time used for metric calculation is the actual measured interval, not a synthetic sub-interval chosen inside the sampling layer.

CPU usage no longer scales load by maximum frequency

Another correctness issue was in the meaning of the old pcpu_usage value itself. In the previous implementation, calc_freq did not return a plain utilization ratio. Instead, it scaled usage by avg_freq / max_freq, and the resulting value was then exposed as pcpu_usage.

That is not a reliable definition of CPU load. The fact that a core has some theoretical maximum frequency does not mean that this frequency is actually reachable at the moment under the current thermal and power constraints. For example, in Low Power Mode under a full all-core workload, performance cores can be busy for essentially 100% of the interval while running at a much lower frequency. In that situation, the old formula could report something closer to 40% for the P-cluster simply because current frequency was being normalized by maximum possible frequency.

This also means that issue #46 correctly pointed out that the reported value was misleading, but likely described the cause incorrectly. The problem was not necessarily that one CPU cluster was being dropped. The more fundamental issue was that the metric itself was frequency-scaled in a way that under-reported real CPU occupancy.

In this branch, CPU usage is derived from the actual residency ratio over the measured interval, and frequency is reported separately. That makes the result more faithful under power limits, thermal throttling, and other situations where frequency and utilization diverge.

Temperature handling needs review

Temperature reporting previously used two different mechanisms: HID sensors and SMC keys, with the HID path implemented in IOHIDSensors.

This branch removes the HID sensor path and reports temperatures only from SMC keys. The current implementation discovers readable CPU keys with Tp0* / Tp1* prefixes and GPU keys with the Tg0* prefix, ignores zero readings, and averages only values in the 15..150 C range.

This is probably the most questionable part of the PR. During the refactoring I treated the HID path as an older model that was worth removing to simplify the temperature code. One reason was practical: the HID path did not work on my M4 Pro machine running macOS 15, while the SMC path did. However, I have not been able to confirm that assumption from external sources such as other projects, Apple documentation, or forum discussions, and I do not have evidence that HID sensors are generally obsolete or safe to drop for all supported machines.

So I would like reviewer input here. If HID sensors are still needed for some Apple Silicon models or macOS versions, this part should probably be reverted or changed into a fallback instead of being removed outright. If SMC-only reporting is considered acceptable, then the benefit is a smaller and more predictable temperature path, but I do not want to present that as proven without broader hardware coverage.

Power reporting is grouped by scope

Power fields are now grouped under power instead of being emitted as separate top-level JSON fields.

The aggregate field also changed:

  • before: all_power = cpu_power + gpu_power + ane_power
  • now: power.package = power.cpu + power.gpu + power.ane + power.ram + power.gpu_ram

So ram_power and gpu_ram_power are now included in the package aggregate instead of being reported next to an all_power value that did not include them.

Old shape:

{
  "all_power": 1.1913399696350098,
  "cpu_power": 0.8916881680488586,
  "gpu_power": 0.2996518015861511,
  "ane_power": 0.0,
  "ram_power": 0.25657692551612854,
  "gpu_ram_power": 0.0,
  "sys_power": 16.489110946655273
}

New shape:

{
  "power": {
    "package": 1.399669,
    "cpu": 0.7379913,
    "gpu": 0.37346822,
    "ram": 0.2882096,
    "gpu_ram": 0.0,
    "ane": 0.0,
    "board": 10.063049,
    "battery": 0.4016017,
    "dc_in": 9.783953
  }
}

The SMC readings that used to be represented only by sys_power are now split into separate fields where available:

  • power.board: SMC PSTR
  • power.battery: SMC PPBR
  • power.dc_in: SMC PDTR

6. Startup time optimizations

This PR also includes a separate set of startup-path optimizations aimed at reducing the time to the first sample.

The main changes here are:

  • SMC initialization is now started in the background instead of blocking the whole startup path immediately;
  • sampler initialization subscribes to fewer channels and avoids some unnecessary work up front;
  • IOReport sampling no longer sleeps internally, so the first metrics request does not block the rest of the startup flow while waiting for the sampling interval;

Why this matters

This makes the UI start updating sooner and makes scripted measurements more predictable: the command spends less extra time outside the requested sampling window.

For a single first sample:

$ time macmon pipe -i 1 -s 1
macmon pipe -i 1 -s 1  0.13s user 0.55s system 22% cpu 3.006 total

$ time target/release/macmon pipe -i 1 -s 1
target/release/macmon pipe -i 1 -s 1  0.09s user 0.23s system 49% cpu 0.658 total

The time to produce a first sample dropped from about 3.0s to about 0.66s. That directly improves the TUI because the first visible update arrives sooner.

For scripts that request a longer measurement window, the total command time is now much closer to the requested interval:

$ time macmon pipe -i 5000 -s 1
0.13s user 0.57s system 8% cpu 7.934 total

$ time target/release/macmon pipe -i 5000 -s 1
0.08s user 0.22s system 5% cpu 5.276 total

For a requested 5s measurement, the total command runtime moved from about 7.93s to about 5.28s, which makes the start-to-finish duration much closer to the configured sampling window.

The same change also reduces drift across repeated short samples. With -i 50 -s 100, the requested sampling time is 100 x 50ms = 5s:

$ time macmon pipe -i 50 -s 100
0.86s user 3.46s system 21% cpu 20.369 total

$ time target/release/macmon pipe -i 50 -s 100
0.14s user 1.02s system 19% cpu 5.915 total

The total runtime dropped from about 20.37s to about 5.92s. This still includes startup time, and some drift is still expected because timer scheduling is not exact, but the accumulated drift over many samples is substantially smaller.

7. CLI / TUI updates

The CLI is now a thinner wrapper over macmon-lib.

pipe output uses the library metrics schema

pipe still emits one JSON object per line. The change is the object schema: the CLI now serializes the same Metrics model that is exposed by macmon-lib. pipe output now uses the same grouped fields described above: cpu_usage, gpu_usage, power, memory, temp, and timestamp.

--soc-info

--soc-info still adds a top-level soc object to each pipe sample. The shape of that object changed with the SoC model:

  • before: fixed E/P fields such as ecpu_cores, pcpu_cores, ecpu_freqs, and pcpu_freqs;
  • now: cpu_domains, where each domain has a name, units, and freqs_mhz.

GPU frequency metadata was also renamed from gpu_freqs to gpu_freqs_mhz.

TUI

The TUI now renders CPU/GPU sections based on the discovered domain structure instead of assuming only the previous fixed layout. Labels and usage/power presentation were also updated to fit the richer data model.

User-visible / consumer-visible effects

Improvements

  • more detailed and structured pipe output;
  • better support for domain-based CPU/GPU reporting;
  • reusable native library interface;
  • grouped power output with package, board, battery, and DC input fields;
  • improved startup behavior and more defensive metric calculation.

Compatibility impact

This PR includes breaking output changes for consumers of macmon pipe.

Breaking changes

  • the pipe JSON format is changed;
  • flat usage/power fields were replaced by structured objects;
  • temperature field names changed:
    • cpu_temp_avg -> cpu_avg
    • gpu_temp_avg -> gpu_avg

Any external scripts or dashboards parsing the old JSON shape will need to be updated.

Known regression

macmon debug currently prints much less diagnostic information than it did before this refactoring. That is not an intended compatibility break; it is a known regression in this branch. I plan to restore the useful debug output after getting feedback on the main library/model changes.

Testing

This PR extends the existing test coverage to cover the new library/API split and the new metrics model.

Compared to main, it adds tests for:

  • FFI conversion and memory ownership behavior;
  • metric serialization shape;
  • SoC/domain layout behavior;
  • CLI/debug output structure;
  • frequency calculation helpers.

CI on main previously checked formatting and buildability, but did not run the test suite. This PR updates check.yml to run workspace tests as well.

One FFI smoke test remains environment-dependent because sampler initialization may fail depending on host runtime state and available macOS interfaces. The test still runs, but it now treats MACMON_STATUS_INIT_FAILED as a host-specific limitation rather than a hard failure, so it does not make CI flaky on machines where sampler initialization is unavailable.

Personal motivation: StillCore

My personal motivation for this work is StillCore, a macOS menu bar app I am building for Apple Silicon metrics. It shows live charts for SoC power, temperature, frequency, and usage, and it uses macmon-lib through Swift bindings rather than parsing macmon pipe output.

StillCore currently builds against the CMacmon.xcframework produced from this branch and the sibling macmon-bindings checkout.

I do not want to make macmon carry StillCore-specific code or requirements. The reason I mention it here is narrower: StillCore is a real native macOS app that uses this branch as its metrics backend, so it helped shape the library API, packaging, sampler lifecycle, domain-based CPU/GPU model, grouped power fields, and Swift integration.

What's next

At this point, I would mostly like to understand how this direction fits the maintainer's view of macmon.

The main questions for me are:

  • whether the library/API split and native integration layer are useful for the project upstream;
  • whether the new domain-based metrics model is a good direction for the public JSON and library contracts;
  • which parts should be changed or reverted before this can move forward, especially the temperature sensor changes and the current macmon debug regression;
  • whether it is acceptable to fix the disputed parts on top of this branch, or whether I should spend the extra effort to split the work into several smaller, more atomic pull requests.

I am happy to continue the work in whichever form is more useful for the project, but I would like feedback on that direction before spending more time reshaping the branch.

$ time cargo bb
Timer precision: 41 ns
bench                fastest       β”‚ slowest       β”‚ median        β”‚ mean          β”‚ samples β”‚ iters
β”œβ”€ ioreport                        β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  β”œβ”€ get_sample     2.769 ms      β”‚ 4.39 ms       β”‚ 2.892 ms      β”‚ 2.897 ms      β”‚ 100     β”‚ 100
β”‚  ╰─ subscription   87.26 ms      β”‚ 93.58 ms      β”‚ 88.55 ms      β”‚ 89.29 ms      β”‚ 10      β”‚ 10
β”œβ”€ sampler                         β”‚               β”‚               β”‚               β”‚         β”‚
β”‚  ╰─ get_metrics    18.77 ms      β”‚ 24.82 ms      β”‚ 19.33 ms      β”‚ 20.32 ms      β”‚ 10      β”‚ 10
╰─ smc                             β”‚               β”‚               β”‚               β”‚         β”‚
   β”œβ”€ full_init      670.6 ms      β”‚ 738 ms        β”‚ 730.9 ms      β”‚ 713.2 ms      β”‚ 3       β”‚ 3
   ╰─ read_all_keys  670.4 ms      β”‚ 689 ms        β”‚ 679 ms        β”‚ 679.5 ms      β”‚ 3       β”‚ 3
cargo bb  0.47s user 1.65s system 33% cpu 6.418 total

Co-authored-by: Codex <codex@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant