Skip to content

Commit 2b65ff3

Browse files
committed
feat(stats): time-windowed hashrate meter
Replace the count-gated, per-job-reset hashrate with a free-running nonce accumulator sampled over a wall-clock window (EWMA-smoothed) -- the model used by ethminer/XMRig. Fixes the dashboard reporting 0 H/s for slow / memory-hard kernels (XelisHash V3, Octopus, large-DAG) under fast pool job streams, where kernelExecuted was zeroed on every mining.notify before it could reach the 100-launch publish threshold. The displayed value is also smoother than before. - Statistical: nonceAccumulator (std::atomic<uint64_t>, relaxed) counts the actual per-launch nonces, correct across occupancy changes; the mining thread fetch_adds and the stats thread exchange()s it to 0, so the two threads never race. sampleWindow() folds one sample into the EWMA per window: a working window contributes its measured rate, an empty/stalled window contributes 0 H/s so the value decays toward 0 instead of holding a stale rate (a hung GPU stays visible). A slow but working kernel still completes >= 1 launch per window, so it never produces an empty window and never decays. Pure computeHashrate/smoothHashrate helpers. - Device: getHashrate() is a pure const read; new sampleHashrate() (stats thread) owns the window; updateBatchNonce() no longer resets the window per job; loopDoWork() opens the first window before the loop; updateJob() skips the redundant pre-rebuild reset when memory is being rebuilt. - Stats thread samples via sampleHashrate(); REST API and benchmark unchanged. - --internal_kernel_count no longer gates the display (kept for compatibility). - Unit tests for the meter (sources/statistical/tests).
1 parent 076dada commit 2b65ff3

10 files changed

Lines changed: 250 additions & 33 deletions

File tree

documentation/ARCHITECTURE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -458,7 +458,7 @@ Loaded from CLI by `common::Cli`, supports per-device pool overrides:
458458

459459
**Statistics** (`statistical/statistical.hpp`):
460460
- Tracks kernel executions, valid/invalid shares, elapsed time
461-
- `getHashrate()` computes MH/s from kernel count × batch nonce / time
461+
- `sampleHashrate()` (stats thread) computes H/s from accumulated nonces over a wall-clock window, EWMA-smoothed; `getHashrate()` is a pure read of the last sample
462462
- Printed to console every ~10 seconds by the stats thread
463463

464464
**REST API** (`api/api.hpp`):

documentation/PARAMETERS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ N/A : No default value is set.
9797
| `--blocks` || N/A | Set occupancy blocks. | `--blocks=128` |
9898
| `--occupancy` || false | System will define the best occupancy for kernel. | `--occupancy=<true\|false>` |
9999
| `--internal_loop` || 1 | Set internal loop for kernel. | `--internal_loop=1` |
100-
| `--internal_kernel_count` || 1 | Set internal loop for kernel. This defines the minimum number of times the kernel must be called to display statistics | `--internal_kernel_count=1` |
100+
| `--internal_kernel_count` || 1 | Deprecated: no longer affects the hashrate display (the dashboard now uses a time-windowed meter). Still accepted for backwards compatibility. | `--internal_kernel_count=1` |
101101
| `--cuda_context` || auto | Set CUDA context. | `--cuda_context=<auto\|blocking\|yield\|spin>` |
102102

103103
## Smart Mining

sources/device/device.cpp

Lines changed: 27 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -507,18 +507,20 @@ void device::Device::increaseShare(bool const isValid)
507507
}
508508

509509

510-
double device::Device::getHashrate()
510+
double device::Device::getHashrate() const
511511
{
512-
uint32_t const executeCount{ miningStats.getKernelExecutedCount() };
513-
common::Config& config{ common::Config::instance() };
512+
// Pure read of the last sampled value. The stats thread drives the actual
513+
// measurement through sampleHashrate(); other consumers (REST API,
514+
// benchmark display) must not reset the window.
515+
return miningStats.getHashrate();
516+
}
514517

515-
if (config.occupancy.kernelMinimunExecuteNeeded <= executeCount)
516-
{
517-
miningStats.stop();
518-
miningStats.updateHashrate();
519-
miningStats.reset();
520-
}
521518

519+
double device::Device::sampleHashrate()
520+
{
521+
// Owned by the stats thread: close the current wall-clock window, fold it
522+
// into the smoothed value, and open the next one.
523+
miningStats.sampleWindow();
522524
return miningStats.getHashrate();
523525
}
524526

@@ -590,7 +592,12 @@ bool device::Device::updateJob()
590592
uint64_t const currentAtomicMemory{ synchronizer.memory.get() };
591593

592594
////////////////////////////////////////////////////////////////////////////
593-
if (nextjobInfo.epoch != currentJobInfo.epoch || nextjobInfo.period != currentJobInfo.period)
595+
// A period/epoch change restarts the hashrate window. When this change also
596+
// rebuilds memory, the reset below (after the rebuild) is the meaningful one
597+
// -- it excludes DAG-build time from the window -- so skip the redundant
598+
// pre-rebuild reset here in that case.
599+
if ((nextjobInfo.epoch != currentJobInfo.epoch || nextjobInfo.period != currentJobInfo.period)
600+
&& false == needUpdateMemory)
594601
{
595602
miningStats.reset();
596603
}
@@ -655,9 +662,12 @@ void device::Device::updateBatchNonce()
655662
}
656663

657664
////////////////////////////////////////////////////////////////////////////
665+
// updateJob() calls this on every pool job. Only the per-launch nonce stride
666+
// is refreshed here -- the hashrate window (accumulator + chrono) must survive
667+
// job updates, otherwise a slow kernel under a fast job stream never measures a
668+
// full window. The window is reset only on a memory rebuild or a period change;
669+
// the stats thread rolls it forward (sampleWindow) at each sample.
658670
miningStats.setBatchNonce(resolver->getBlocks() * resolver->getThreads() * internalLoop);
659-
miningStats.resetHashrate();
660-
miningStats.reset();
661671
}
662672

663673

@@ -691,6 +701,11 @@ void device::Device::loopDoWork()
691701
////////////////////////////////////////////////////////////////////////////
692702
computing.store(true, boost::memory_order::seq_cst);
693703

704+
////////////////////////////////////////////////////////////////////////////
705+
// Open the first hashrate window now that a job is in hand. From here the
706+
// window is owned by the stats thread (sampleHashrate) and never reset per job.
707+
miningStats.reset();
708+
694709
////////////////////////////////////////////////////////////////////////////
695710
deviceDebug() << "Start working!";
696711
while (true == isAlive() && nullptr != resolver)

sources/device/device.hpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,8 @@ namespace device
6161
bool isComputing() const;
6262
void update(bool const memory, bool const constants, stratum::StratumJobInfo const& newJobInfo);
6363
void increaseShare(bool const isValid);
64-
double getHashrate();
64+
double getHashrate() const;
65+
double sampleHashrate();
6566
stratum::Stratum* getStratum();
6667
stratum::StratumSmartMining* getStratumSmartMining();
6768
statistical::Statistical::ShareInfo getShare();

sources/device/device_manager_loop_statistical.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ void device::DeviceManager::loopStatistical()
9595
}
9696

9797
///////////////////////////////////////////////////////////////////
98-
auto const hashrate{ device->getHashrate() };
98+
auto const hashrate{ device->sampleHashrate() };
9999
statistical::Statistical::ShareInfo shareInfo{ device->getShare() };
100100

101101
///////////////////////////////////////////////////////////////////

sources/statistical/CMakeLists.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,7 @@ if (BUILD_EXE_UNIT_TEST)
2222
)
2323
endif()
2424

25+
add_subdirectory(tests)
26+
2527
set(SOURCES_STRATISTICAL ${HEADERS} ${SOURCES} PARENT_SCOPE)
28+
set(SOURCES_STATISTICAL_TESTS ${SOURCES_STATISTICAL_TESTS} PARENT_SCOPE)

sources/statistical/statistical.cpp

Lines changed: 60 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
#include <common/cast.hpp>
2-
#include <common/log/log.hpp>
32
#include <statistical/statistical.hpp>
43

54

@@ -10,7 +9,7 @@ void statistical::Statistical::setChronoUnit(common::CHRONO_UNIT newUnit)
109
{
1110
case common::CHRONO_UNIT::SEC:
1211
{
13-
chronoTime = 1;
12+
chronoTime = 1.0;
1413
break;
1514
}
1615
case common::CHRONO_UNIT::MS:
@@ -47,13 +46,15 @@ void statistical::Statistical::stop()
4746
void statistical::Statistical::reset()
4847
{
4948
kernelExecuted = 0u;
49+
nonceAccumulator.store(0ull, std::memory_order_relaxed);
5050
start();
5151
}
5252

5353

5454
void statistical::Statistical::increaseKernelExecuted()
5555
{
5656
++kernelExecuted;
57+
nonceAccumulator.fetch_add(batchNonce, std::memory_order_relaxed);
5758
}
5859

5960

@@ -75,13 +76,17 @@ uint64_t statistical::Statistical::getBatchNonce() const
7576
}
7677

7778

79+
uint64_t statistical::Statistical::getNonceAccumulator() const
80+
{
81+
return nonceAccumulator.load(std::memory_order_relaxed);
82+
}
83+
84+
7885
void statistical::Statistical::updateHashrate()
7986
{
8087
///////////////////////////////////////////////////////////////////////////
8188
elapsed = chrono.elapsed(chronoUnit);
82-
double const diffTime{ chronoTime / elapsed };
83-
uint64_t const totalNonce{ batchNonce * kernelExecuted };
84-
double const values{ totalNonce * diffTime };
89+
double const values{ computeHashrate(batchNonce * kernelExecuted, elapsed, chronoTime) };
8590

8691
///////////////////////////////////////////////////////////////////////////
8792
if (values > 0.0)
@@ -91,10 +96,57 @@ void statistical::Statistical::updateHashrate()
9196
}
9297

9398

94-
void statistical::Statistical::resetHashrate()
99+
double statistical::Statistical::computeHashrate(
100+
uint64_t const totalNonce,
101+
uint64_t const elapsedTicks,
102+
double const ticksPerSecond)
95103
{
96-
kernelExecuted = 0u;
97-
hashrates = 0.0;
104+
if (0ull == elapsedTicks)
105+
{
106+
return 0.0;
107+
}
108+
return (castDouble(totalNonce) * ticksPerSecond) / castDouble(elapsedTicks);
109+
}
110+
111+
112+
double statistical::Statistical::smoothHashrate(double const previous, double const sample, double const factor)
113+
{
114+
///////////////////////////////////////////////////////////////////////////
115+
// Seed directly from the first sample so the meter does not slowly ramp up
116+
// from 0; afterwards blend to keep the displayed value stable.
117+
if (0.0 >= previous)
118+
{
119+
return sample;
120+
}
121+
return (factor * sample) + ((1.0 - factor) * previous);
122+
}
123+
124+
125+
void statistical::Statistical::sampleWindow()
126+
{
127+
///////////////////////////////////////////////////////////////////////////
128+
stop();
129+
elapsed = chrono.elapsed(chronoUnit);
130+
131+
///////////////////////////////////////////////////////////////////////////
132+
// Fold one sample into the EWMA per window. A working window contributes its
133+
// measured rate; an empty window (a stall, or a kernel slower than the whole
134+
// sampling interval) contributes 0 H/s, so the displayed value decays toward
135+
// 0 instead of holding a stale value -- a dead GPU must stay visible. A
136+
// slow-but-working kernel still completes >= 1 launch per window, so it never
137+
// produces an empty window and never decays. Zero-length windows carry no
138+
// information and are skipped.
139+
uint64_t const windowNonce{ nonceAccumulator.exchange(0ull, std::memory_order_relaxed) };
140+
if (0ull < elapsed)
141+
{
142+
double const sample{ computeHashrate(windowNonce, elapsed, chronoTime) };
143+
hashrates = smoothHashrate(hashrates, sample, HASHRATE_SMOOTHING_FACTOR);
144+
}
145+
146+
///////////////////////////////////////////////////////////////////////////
147+
// Open the next window. kernelExecuted is owned by the mining (device) thread
148+
// and is not touched here, so the stats thread never races it.
149+
start();
98150
}
99151

100152

sources/statistical/statistical.hpp

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
11
#pragma once
22

33

4+
#include <atomic>
5+
46
#include <common/chrono.hpp>
57

68

79
namespace statistical
810
{
11+
constexpr double HASHRATE_SMOOTHING_FACTOR{ 0.5 };
12+
913
struct Statistical
1014
{
1115
public:
@@ -24,22 +28,28 @@ namespace statistical
2428
uint32_t getKernelExecutedCount() const;
2529
void setBatchNonce(uint64_t const newBatchNonce);
2630
uint64_t getBatchNonce() const;
31+
uint64_t getNonceAccumulator() const;
2732
void updateHashrate();
28-
void resetHashrate();
33+
void sampleWindow();
2934
double getHashrate() const;
3035
ShareInfo& getShares();
3136
ShareInfo getShares() const;
3237
uint64_t getElapsed() const;
3338
common::CHRONO_UNIT getChronoUnit() const;
3439

40+
static double
41+
computeHashrate(uint64_t const totalNonce, uint64_t const elapsedTicks, double const ticksPerSecond);
42+
static double smoothHashrate(double const previous, double const sample, double const factor);
43+
3544
private:
36-
common::CHRONO_UNIT chronoUnit{ common::CHRONO_UNIT::US };
37-
common::Chrono chrono{};
38-
double chronoTime{ common::SEC_TO_US };
39-
ShareInfo shares{};
40-
uint64_t batchNonce{ 0ull };
41-
uint64_t elapsed{ 0ull };
42-
double hashrates{ 0.0 };
43-
uint32_t kernelExecuted{ 0u };
45+
common::CHRONO_UNIT chronoUnit{ common::CHRONO_UNIT::US };
46+
common::Chrono chrono{};
47+
double chronoTime{ common::SEC_TO_US };
48+
ShareInfo shares{};
49+
uint64_t batchNonce{ 0ull };
50+
std::atomic<uint64_t> nonceAccumulator{ 0ull };
51+
uint64_t elapsed{ 0ull };
52+
double hashrates{ 0.0 };
53+
uint32_t kernelExecuted{ 0u };
4454
};
4555
}
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
file(GLOB HEADERS "*.hpp")
2+
file(GLOB SOURCES "*.cpp")
3+
4+
if (BUILD_EXE_UNIT_TEST)
5+
target_sources(${UNIT_TEST_EXE} PUBLIC
6+
${HEADERS}
7+
${SOURCES}
8+
)
9+
endif()
10+
11+
set(SOURCES_STATISTICAL_TESTS ${HEADERS} ${SOURCES} PARENT_SCOPE)

0 commit comments

Comments
 (0)