Skip to content

feat(stats): keep the hashrate meter alive across jobs (--internal_accumulate_hash)#174

Merged
luminousmining merged 2 commits into
luminousmining:mainfrom
yuzi-co:feat/hashrate-time-windowed-meter
Jun 14, 2026
Merged

feat(stats): keep the hashrate meter alive across jobs (--internal_accumulate_hash)#174
luminousmining merged 2 commits into
luminousmining:mainfrom
yuzi-co:feat/hashrate-time-windowed-meter

Conversation

@yuzi-co

@yuzi-co yuzi-co commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Reworks this PR to the --internal_accumulate_hash design we converged on in the thread, replacing the time-windowed EWMA meter. The EWMA implementation is preserved on feat/hashrate-ewma-meter in case we ever want to revisit smoothing separately.

Problem

Slow / memory-hard kernels (Xelis, Octopus, Ergo…) complete far fewer than --internal_kernel_count launches per job window. The meter is count-gated and reset on every job update — updateBatchNonce() zeroes the stored hashrate — so the threshold is never reached, getHashrate() returns 0, and the HASHRATE board is hidden entirely even while valid shares land.

Fix

New Kernel-section flag --internal_accumulate_hash (default true):

  • true — job and constant updates no longer reset the hash count or the elapsed window; only a memory rebuild (e.g. DAG) resets. The meter accumulates a true nonces / elapsed average that spans jobs, so slow kernels publish a real number out of the box and read 0 only when the GPU is genuinely idle.
  • false — restores the previous per-job reset behaviour exactly.

The work loop also opens the meter window once at start-up: a non-epoch algorithm's first job is a constant update (not a memory update), so under accumulation its clock would otherwise never start.

Changes

  • config / CLI: new --internal_accumulate_hash bool wired into DeviceOccupancy (default true).
  • device: gate the epoch/period reset and the updateBatchNonce reset behind (!accumulate || memory update); memory rebuilds always reset; open the meter window once before the loop.
  • docs: document the flag in PARAMETERS.md.
  • test: statistical unit tests for the accumulate-vs-reset invariant.

Verification

Unit tests (unit_test, AMD cross-compile): 119/123 pass. The 4 failures are the pre-existing *AmdTest.allDeviceFindNonce progpow-family cases, unrelated to this change. The 3 new StatisticalHashrate.* tests pass.

Live, RX 9070 XT vs Octopus (de.conflux.herominers.com:1170), same binary, ~95 s each:

--internal_accumulate_hash HASHRATE board Hashrate
true (default) renders every 10 s steady 24.30 MH/s
false never appears 0 H/s (current behaviour, reproduced)

@yuzi-co yuzi-co force-pushed the feat/hashrate-time-windowed-meter branch from 2b65ff3 to 4313b96 Compare June 12, 2026 05:36
@luminousmining

Copy link
Copy Markdown
Owner

For slow kernels we already have several options available.

Collecting stats on slow kernels

The --log_interval_hash flag reduces the time interval at which we collect stats from each GPU.
For example, if a job is received every 10 seconds, --log_interval_hash must be smaller than the interval between two mining.notify messages.

The --internal_kernel_count flag has a default value of 100 — there is an error in the documentation which I will fix.
This indicates that the kernel must have been executed at least 100 times to compute the total hashrate sum, but if the kernel is too slow this value can be reduced significantly, for example --internal_kernel_count=5.

To summarize, the user needs to correctly configure --log_interval_hash and --internal_kernel_count depending on the algorithm, to ensure that stats are displayed on the dashboard.

Test

Please redo your tests by adjusting these values, and if you still get bad results could you share your configuration.

@yuzi-co

yuzi-co commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

I re-ran both algos on the unmodified meter (this PR's changes reverted) on an RX 9070 XT vs live HeroMiners pools, sweeping --internal_kernel_count and --log_interval_hash.

Setup: Xelis → de.xelis.herominers.com:1225; Octopus → de.conflux.herominers.com:1170. Measured occupancy/timing: Xelis 64×96 = 6 144 nonces/launch ≈ 1.3 s/launch; Octopus 8192×256 = 2 097 152 nonces/launch ≈ 0.16 s/launch; both stream ~1–1.5 jobs/s.

Results (stock meter):

Config Xelis Octopus
default (internal_kernel_count=100, log_interval_hash=10000) 0 H/s — board never shown (even with valid shares) 0 H/s — board never shown
--internal_kernel_count=1 --log_interval_hash=500 shown continuously, jittery 4.0–6.1 KH shown continuously, jittery 12–25 MH
--internal_kernel_count=5 --log_interval_hash=500 shown continuously, smooth 4.5–5.0 KH flickers — shown on ~57% of stats ticks

You're right that --internal_kernel_count=1 recovers a reading on the stock meter for both algos — my "no value recovers it" was wrong. Only the default =100 shows 0.

Why it behaves this way. The stock meter is a count-gated accumulator tied to job arrival, not a wall clock: getHashrate() only recomputes once kernelExecuted >= internal_kernel_count since the last reset (as nonces / elapsed); below the threshold it re-publishes the last stored value, and the board is hidden whenever that value is 0. updateBatchNonce() zeroes kernelExecuted and the stored hashrate on job/constant updates — so the window is bounded by job cadence, not time. A memory-hard kernel completes far fewer than 100 launches per job window, so the threshold is never met → 0 → board hidden (the default case, even while valid shares land).

Lowering the threshold lets it publish, but each sample is then computed from very few launches over a short, ragged window — so the "good-looking" rows aren't actually good:

  • =1 is live but jittery — Xelis recomputed 52×/66 s (≈ once per launch), swinging ±20%; Octopus ±35%.
  • =5 on Xelis recomputed only ~9×/60 s, so that "steady 4.5–5.0 KH" is a stale value frozen ~7 s at a time; on Octopus it instead flickers — the board blanks on ~43% of ticks because the faster job stream zeroes it before it re-reaches 5.

--log_interval_hash can't change any of this — it only sets how often the gated value is sampled, not whether the gate opens. And there's no single internal_kernel_count that's right across algos, because the workable threshold depends on each algo's launch time vs. its job/reset cadence.

So this PR isn't a strict prerequisite — but a hashrate readout is a basic function that shouldn't need per-algorithm flag tuning to show a number at all, and even when tuned the number is jittery, stale, or flickering. A wall-clock window (nonces over elapsed time, decoupled from job resets) removes the dependency: correct out of the box for fast and slow kernels alike, reading 0 only when the GPU is genuinely idle.

If the zero-config smoothing isn't something you want here, I'm happy to:

  1. Re-scope to the minimal fix — keep the count-gate as-is but stop updateBatchNonce() from zeroing the stored hashrates every job. That alone removes the Octopus flicker, no EWMA, much smaller diff; or
  2. Close this and add a short docs note: slow/memory-hard algos need --internal_kernel_count=1 (Octopus) or ~5 (Xelis).

Which direction do you prefer? Happy to share the raw logs/configs.

@luminousmining

Copy link
Copy Markdown
Owner

This PR raises an important point regarding the stats display.
I propose a completely different direction!

We can add a new flag --internal_accumulate_hash in the Kernel section.
By default the value is false.
If this option is false then job updates reset the hash count accumulation between jobs, otherwise we do not reset — except in the case of an updateMemory where we always reset!

So everything is resolved with a simple flag --internal_accumulate_hash=true, otherwise the current behavior is preserved.

@yuzi-co

yuzi-co commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

A couple of thoughts on the --internal_accumulate_hash direction:

Default value. I'd argue it should default to true, not false. With it off, the out-of-the-box experience for any memory-hard algo (Xelis, Octopus, Ergo) is an unchanging 0 H/s on the dashboard even while valid shares are landing — the exact bug this PR set out to fix. A user has to already know to flip --internal_accumulate_hash=true and drop --internal_kernel_count before they see a number at all. Defaulting off preserves today's behavior, but today's behavior is the broken case. Defaulting on (with the updateMemory reset you described) keeps fast algos correct and makes the slow ones work without flag archaeology.

EWMA vs raw window. I still think the smoothed meter is the better mechanism, and not just cosmetically. Even with accumulation on, the raw nonces / elapsed over a count-gated window is computed from very few launches on a slow kernel, so each published sample is a short, ragged slice — the readings in my earlier table swing ±20–35% at internal_kernel_count=1, and go stale (frozen ~7 s) as you raise it. A wall-clock EWMA decouples the displayed number from job/launch cadence entirely: it reads correctly for fast and slow kernels alike and only falls to 0 when the GPU is genuinely idle. The accumulate flag fixes whether a number shows; EWMA fixes whether that number is steady and trustworthy.

Your call on the project, but default-on + EWMA is where I think the better UX lands.

@luminousmining

Copy link
Copy Markdown
Owner

If we add --internal_accumulate_hash then we will have this accumulation of nonces / elapsed since there will no longer be any data fetch on job updates — so a true accumulation of the hash count, meaning the number of nonce attempts, and therefore the hashrate.

We can therefore set --internal_accumulate_hash=true without defining internal_kernel_count or log_interval_hash.

The purpose of --internal_accumulate_hash is precisely to no longer fetch data when a job is updated (except in the case of a memory update).

And yes, setting --internal_accumulate_hash to true by default seems like a good compromise and simpler for the end user.

…cumulate_hash)

Slow / memory-hard kernels (Xelis, Octopus, Ergo) complete far fewer than
--internal_kernel_count launches per job window, so the count-gated meter never
recomputes and the dashboard reads 0 H/s even while valid shares land:
updateBatchNonce() zeroes the stored hashrate on every job update.

Add a Kernel-section flag --internal_accumulate_hash (default true). When set,
job and constant updates no longer reset the hash count or the elapsed window;
only a memory rebuild (e.g. DAG) resets. The meter then accumulates a true
nonces / elapsed average that spans jobs. Setting it false restores the previous
per-job reset behaviour.

- config/CLI: new --internal_accumulate_hash bool wired into DeviceOccupancy.
- device: gate the epoch/period reset and the updateBatchNonce reset behind
  (!accumulate || memory update); memory rebuilds always reset.
- docs: document the flag in PARAMETERS.md.
- test: statistical unit tests for the accumulate-vs-reset invariant.
@yuzi-co yuzi-co force-pushed the feat/hashrate-time-windowed-meter branch from 4313b96 to 6b26a40 Compare June 13, 2026 13:09
@yuzi-co yuzi-co changed the title feat(stats): time-windowed hashrate meter feat(stats): keep the hashrate meter alive across jobs (--internal_accumulate_hash) Jun 13, 2026
@yuzi-co

yuzi-co commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Reworked this PR to your --internal_accumulate_hash proposal, defaulting to true as we agreed.

With it on, job and constant updates no longer fetch/reset the meter — the hash count and elapsed window accumulate across jobs, and only an updateMemory (DAG rebuild) resets — so the dashboard publishes a true nonces / elapsed average instead of sticking at 0 on slow / memory-hard kernels. --internal_accumulate_hash=false restores the exact previous per-job reset behaviour.

One addition beyond the plain flag: the work loop opens the meter window once at start-up, because a non-epoch algorithm's first job is a constant update (not a memory update), so under accumulation its clock would otherwise never start.

Verified live on an RX 9070 XT against Octopus (de.conflux.herominers.com:1170), same binary, ~95 s each:

  • --internal_accumulate_hash=true (default): the HASHRATE board renders every 10 s with a steady 24.30 MH/s.
  • --internal_accumulate_hash=false: the board never appears (0 H/s) — today's behaviour, reproduced.

unit_test stays green (the 4 allDeviceFindNonce progpow failures are pre-existing and unrelated). The EWMA meter from the previous revision is preserved on feat/hashrate-ewma-meter if you'd ever like to revisit smoothing. Force-pushed onto current main (the old revision had gone stale/conflicting).

Comment thread sources/device/device.cpp Outdated
if (nextjobInfo.epoch != currentJobInfo.epoch || nextjobInfo.period != currentJobInfo.period)
{
miningStats.reset();
// When accumulating, the meter window spans jobs; only a memory rebuild

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment

Comment thread sources/device/device.cpp Outdated

////////////////////////////////////////////////////////////////////////////
updateBatchNonce();
// Reset the meter on memory rebuilds always; on plain job/constant updates

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment

Comment thread sources/device/device.cpp Outdated
updateBatchNonce();
// Reset the meter on memory rebuilds always; on plain job/constant updates
// only when not accumulating, so the window can span jobs.
updateBatchNonce(false == config.occupancy.accumulateHash || true == needUpdateMemory);

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move false == config.occupancy.accumulateHash || true == needUpdateMemory in variable resetStats

Comment thread sources/device/device.cpp
miningStats.resetHashrate();
miningStats.reset();

////////////////////////////////////////////////////////////////////////////

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remve comment

Comment thread sources/device/device.cpp Outdated
computing.store(true, boost::memory_order::seq_cst);

////////////////////////////////////////////////////////////////////////////
// Open the hashrate window once at start-up. When accumulating, a job that

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment

#include <gtest/gtest.h>


using statistical::Statistical;

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never declare using for namespace out of scope.

@@ -0,0 +1,82 @@
#include "statistical/statistical.hpp"

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use <> instead of ""
Respect ordre include

Comment thread sources/statistical/tests/statistical.cpp
…test style

- device.cpp: remove the explanatory comments flagged in review; hoist the
  reset predicate into a local 'bool const resetStats' before updateBatchNonce
- statistical test: includes use <> in STL -> third-party -> internal order,
  drop the file-scope 'using statistical::Statistical', and rename the TEST
  cases to lowerCamelCase per the naming convention
@yuzi-co

yuzi-co commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

Pushed c9c947e addressing the review.

device.cpp

  • Removed the four explanatory comment blocks you flagged (596 / 645 / 668 / 710).
  • Hoisted the reset predicate into a local before the call:
    bool const resetStats{ false == config.occupancy.accumulateHash || true == needUpdateMemory };
    updateBatchNonce(resetStats);

statistical/tests/statistical.cpp

  • Includes now use <> in STL → third-party → internal order (<chrono>/<thread><gtest/gtest.h><statistical/statistical.hpp>).
  • Dropped the file-scope using statistical::Statistical;; the type is now spelled statistical::Statistical at each use.
  • Renamed all three test cases to lowerCamelCase (workedWindowPublishesPositiveHashrate, resetHashrateZeroesDisplayedValue, accumulateAcrossJobUpdatePreservesHashrate).

Line-scoped clang-format-15 over the changed lines is clean.

@luminousmining luminousmining merged commit 3065b80 into luminousmining:main Jun 14, 2026
12 checks passed
@yuzi-co yuzi-co deleted the feat/hashrate-time-windowed-meter branch June 19, 2026 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants