Skip to content

feat(stats): Add 1% low FPS tracking#2661

Open
githubawn wants to merge 8 commits into
TheSuperHackers:mainfrom
githubawn:feature/fps-1percent-low-hud
Open

feat(stats): Add 1% low FPS tracking#2661
githubawn wants to merge 8 commits into
TheSuperHackers:mainfrom
githubawn:feature/fps-1percent-low-hud

Conversation

@githubawn
Copy link
Copy Markdown

@githubawn githubawn commented Apr 29, 2026

This PR adds a 1% low FPS metric to the existing FPS counter HUD, displayed in parentheses next to the average FPS. The 1% low is a standard performance metric used to surface frame time spikes that the average FPS hides. Inspired by #1942.

Added 1% low FPS display to HUD counter
Added m_renderFpsLowString and supporting UI members
Added RenderFpsLowColor configuration to InGameUI INI
Increased history to 5,000 time-bounded frames
Implemented rolling 0.5s window for average FPS
Implemented rolling 3.0s window for 1% lows

The following screenshot from AOD Cobalt Rush shows the 1% low FPS overlay compared to CapFrameX (an external benchmarking tool, centered right), demonstrating the value of surfacing this metric separately from the average.

lowfps

This change was generated with AI assistance. All generated code has been reviewed, tested, and verified for correctness. The implementation went through multiple iterations, including fundamental changes to the underlying approach, as well as passes to apply simplifications, fix inconsistencies, and optimize performance. Both Generals and GeneralsMD implementations are included in this PR with identical code.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 29, 2026

Greptile Summary

Adds a 1% low FPS metric to the in-game HUD for both Generals and GeneralsMD. The metric is computed using a rolling 3-second time-bounded sample window with std::nth_element to find the bottom 1% of frames, displayed alongside the existing average FPS counter.

  • Core metric engine (W3DDisplay): replaces the old 30-frame static history with 5000-slot instance-member ring buffers; average FPS switches to a 0.5s rolling window; 1% low FPS is recalculated once per second via calculateLow1PercentFPS.
  • HUD display (InGameUI): adds m_renderFpsLowString with full lifecycle management (alloc, font, free) and inserts it between the average and cap labels with half-gap spacing.
  • Interface (Display.h): new pure virtual getLow1PercentFPS() satisfied by W3DDisplay and stub overrides in GUIEditDisplay.

Confidence Score: 5/5

Safe to merge; the feature is self-contained, the algorithm is correct, and the only findings are minor consistency nits that do not affect correctness.

The FPS history ring buffers are properly bounded, std::nth_element is used correctly, all new display strings are allocated and freed symmetrically, and the virtual interface is fully satisfied by both W3DDisplay and the GUIEdit stub. The residual static throttle timer is a consistency gap but causes no wrong data or crashes.

Both W3DDisplay.cpp files (Generals and GeneralsMD) warrant a second look for the static lastLowUpdate variable; GeneralsMD/InGameUI.cpp also includes an undocumented format change to the FPS cap label.

Important Files Changed

Filename Overview
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Core FPS metric implementation: migrates history tracking from function-local statics to instance members, adds time-bounded rolling average and 1% low calculation. One static throttle timer (lastLowUpdate) was not migrated with the rest.
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Identical implementation to Generals build with the same static lastLowUpdate issue; also contains bundled FPS cap format change (▲ symbols) not present in Generals.
Generals/Code/GameEngine/Source/GameClient/InGameUI.cpp Adds m_renderFpsLowString display string lifecycle (alloc, free, font, draw) and formats the 1% low as (%u). Layout adjustments add half-gap spacing between FPS elements.
GeneralsMD/Code/GameEngine/Source/GameClient/InGameUI.cpp Same as Generals InGameUI but uses ▼%u format for 1% low and also reformats the FPS cap label to ▲%u/▲X — diverges from the Generals implementation.
Core/GameEngine/Include/GameClient/Display.h Adds pure virtual getLow1PercentFPS() to the Display interface; straightforward and correct.
Generals/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h Declares new methods and the 5000-element history/sort buffers as instance members; replaces the old 30-frame static approach.
GeneralsMD/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h Mirror of Generals W3DDisplay.h changes; identical and correct.
Generals/Code/Tools/GUIEdit/Include/GUIEditDisplay.h Adds stub getLow1PercentFPS() { return 0; } override to satisfy the new pure virtual; correct.
GeneralsMD/Code/Tools/GUIEdit/Include/GUIEditDisplay.h Same stub override as the Generals GUIEditDisplay; correct.

Sequence Diagram

sequenceDiagram
    participant Draw as W3DDisplay::draw()
    participant UPM as updatePerformanceMetrics()
    participant AFS as addFpsSample()
    participant CAFPS as calculateAverageFPS(0.5s)
    participant CL1 as calculateLow1PercentFPS(3.0s)
    participant IGUI as InGameUI::updateRenderFpsString()
    participant HUD as HUD (m_renderFpsString / m_renderFpsLowString)

    Draw->>UPM: called each frame
    UPM->>AFS: elapsedSeconds
    AFS-->>UPM: updates m_fpsHistory / m_durationHistory
    UPM->>CAFPS: "windowSeconds=0.5"
    CAFPS-->>UPM: m_averageFPS
    UPM->>CL1: every 1000ms via timeGetTime()
    CL1-->>UPM: m_low1PercentFPS (nth_element on m_sortBuffer)
    IGUI->>Draw: getAverageFPS() / getLow1PercentFPS()
    Draw-->>IGUI: m_averageFPS, m_low1PercentFPS
    IGUI->>HUD: setText on m_renderFpsString / m_renderFpsLowString
Loading
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp:1079-1085
The `lastLowUpdate` throttle timer is declared as a function-local `static`, while the rest of the PR specifically migrated `lastUpdateTime64`, `historyOffset`, and `fpsHistory` from statics to instance members for exactly this reason. Leaving this one as a static means it persists across display resets (e.g., map reloads) and would be shared across multiple `W3DDisplay` instances. It should be an instance member like `m_lastLowUpdateMs` initialized in the constructor and reset in `reset()`.

```suggestion
	UnsignedInt now = timeGetTime();
	if (now - m_lastLowUpdateMs >= 1000)
	{
		m_lastLowUpdateMs = now;
		m_low1PercentFPS = calculateLow1PercentFPS(3.0f);
	}
```

### Issue 2 of 3
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp:1130-1136
Same `static UnsignedInt lastLowUpdate` issue as in the Generals build: this should be an instance member (`m_lastLowUpdateMs`) to stay consistent with the rest of the refactoring and to reset cleanly on display recreation.

```suggestion
	UnsignedInt now = timeGetTime();
	if (now - m_lastLowUpdateMs >= 1000)
	{
		m_lastLowUpdateMs = now;
		m_low1PercentFPS = calculateLow1PercentFPS(3.0f);
	}
```

### Issue 3 of 3
GeneralsMD/Code/GameEngine/Source/GameClient/InGameUI.cpp:6176-6191
**Divergent format strings between Generals and GeneralsMD**

The PR description states both implementations are "identical code," but they differ in two ways unique to GeneralsMD: (1) the 1% low FPS label uses `▼%u` (`\x25BC`) here versus `(%u)` in Generals, and (2) GeneralsMD also reformats the FPS cap display from `[%u]` to `▲%u`/`▲X`. The Unicode symbol change for the cap (``) is an undocumented behavioral change bundled into this PR. If the differing formats are intentional design choices for each build, the PR description should reflect that.

Reviews (7): Last reviewed commit: "add icons and slightly brighten lowfps d..." | Re-trigger Greptile

Comment thread Generals/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
githubawn added 4 commits May 1, 2026 01:13
Move FPS history state into W3DDisplay members.
Implement accurate time-based windowing for frame metrics.
Use ceiling logic for improved 1% low accuracy.
Optimize percentile calculation using efficient selection algorithm.
Rename and centralize performance update call sites.
Increase history buffer for stable high-FPS monitoring.
Update average FPS math to use time-weighted mean.
Move sortBuffer to class members for consistency.
@githubawn githubawn requested a review from Skyaero42 April 30, 2026 23:53
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Real m_low1PercentFPS; ///<1% low fps.
Real m_currentFPS; ///<current fps value.

enum { FPS_HISTORY_SIZE = 5000 }; // covers 5s at 1000 FPS, degrades gracefully beyond
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this number needs evaluation but I would like input from others.

As it stands, 5000 samples in three Real arrays require 60 kB in memory - just for the FPS trackers.
I think the question should be: given 3 seconds timeframe for the 1% low FPS, At what 3-second average FPS is the low fps no longer relevant. Say you average 300 fps, what is the chance that the 1% is so low that it is still relevant as a performance metric. If 300 fps is the upper bound, only 900 samples need to be stored. That's only 18% of the memory needed compared to the current setting.

Copy link
Copy Markdown
Author

@githubawn githubawn May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say you average 300 fps, what is the chance that the 1% is so low that it is still relevant as a performance metric.

It's pretty common to see 300 average fps and 30 fps lows in this game in large skirmish matches even on vs2022 non-retail.

But definitely would like to hear multiple inputs for this number.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation approach is fine, but the data size requirements for the fps counters are outrageous.

I inquired a bit with Chat Gippy and it would be possible to reduce size requirements by quantizing samples and moving intermediate samples into timed buckets.

Here is a sample FrameBucket with avg and min frame time, stored in 2 bytes each, with a resolution of 16 micro seconds.

// Compact frametime statistics bucket.
//
// Stores average and maximum frametime values using 16-bit integers
// with fixed-point quantization.
//
// ------------------------------------------------------------------
// Encoding
// ------------------------------------------------------------------
//
// Values are stored in units of 16 microseconds:
//
//     stored_value = frametime_us / 16
//
// This allows a uint16_t to represent:
//
//     65535 * 16 us = 1,048,560 us
//                   = ~1048 ms
//
// which comfortably covers extremely slow frames (~1 FPS).
//
// ------------------------------------------------------------------
// Precision
// ------------------------------------------------------------------
//
// Quantization step:
//
//     16 us = 0.016 ms
//
// Maximum quantization error:
//
//     +/- 8 us
//
// This is effectively negligible for FPS telemetry.
//
// Example:
//
//     16.667 ms frame (60 FPS)
//
// Encoded as:
//
//     16667 us / 16 = 1042
//
// Decoded:
//
//     1042 * 16 = 16672 us
//
// Error:
//
//     +5 us = 0.005 ms
//
// ------------------------------------------------------------------
// Why fixed-point quantization?
// ------------------------------------------------------------------
//
// Advantages:
//
// - extremely compact (4 bytes total per bucket)
// - deterministic runtime cost
// - cache friendly
// - no floating-point storage
// - sufficient precision for telemetry
// - supports >1 second frametimes
//
// ------------------------------------------------------------------
// Intended usage
// ------------------------------------------------------------------
//
// Typical bucket generation:
//
//     avgFrameUs = accumulatedFrameUs / frameCount
//     maxFrameUs = worstFrameUsSeen
//
// Then:
//
//     bucket.SetAvgUs(avgFrameUs);
//     bucket.SetMaxUs(maxFrameUs);
//
// ------------------------------------------------------------------
// Recommended usage pattern
// ------------------------------------------------------------------
//
// Build buckets over fixed time windows:
//
//     e.g. 50 ms or 100 ms
//
// rather than fixed frame counts.
//
// This keeps statistical resolution stable across varying FPS.
//
// ------------------------------------------------------------------

struct FrameBucket
{
    uint16_t avg16us;
    uint16_t max16us;

    static constexpr uint32_t QUANTUM_US    = 16;
    static constexpr uint32_t QUANTUM_SHIFT = 4;

    static constexpr uint32_t MAX_US =
        0xFFFFu * QUANTUM_US;

    // Encode microseconds into 16 us fixed-point units.
    //
    // Rounds to nearest unit.
    //
    // Input is clamped to representable range.
    static uint16_t EncodeUs(uint32_t us)
    {
        if (us > MAX_US)
            us = MAX_US;

        return static_cast<uint16_t>(
            (us + (QUANTUM_US / 2)) >> QUANTUM_SHIFT);
    }

    // Decode fixed-point units back into microseconds.
    static uint32_t DecodeUs(uint16_t v)
    {
        return static_cast<uint32_t>(v)
            << QUANTUM_SHIFT;
    }

    void SetAvgUs(uint32_t us)
    {
        avg16us = EncodeUs(us);
    }

    void SetMaxUs(uint32_t us)
    {
        max16us = EncodeUs(us);
    }

    uint32_t GetAvgUs() const
    {
        return DecodeUs(avg16us);
    }

    uint32_t GetMaxUs() const
    {
        return DecodeUs(max16us);
    }

    float GetAvgMs() const
    {
        return static_cast<float>(GetAvgUs()) * 0.001f;
    }

    float GetMaxMs() const
    {
        return static_cast<float>(GetMaxUs()) * 0.001f;
    }

    float GetAvgFPS() const
    {
        const uint32_t us = GetAvgUs();

        return (us > 0)
            ? (1000000.0f / static_cast<float>(us))
            : 0.0f;
    }

    float GetMinFPS() const
    {
        const uint32_t us = GetMaxUs();

        return (us > 0)
            ? (1000000.0f / static_cast<float>(us))
            : 0.0f;
    }
};

And then we move these FrameBuckets into a time based array.

Sample implementation:

// ============================================================================
// Rolling 3-second FPS statistics
// ============================================================================
//
// Stores:
//
//     3 seconds @ 10 ms resolution
//
// = 300 buckets
//
// Each bucket summarizes:
//
//     - average frametime
//     - worst frametime
//
// over a 10 ms interval.
//
// ============================================================================

class FpsHistory
{
public:

    static constexpr uint32_t BUCKET_INTERVAL_US = 10000; // 10 ms
    static constexpr uint32_t HISTORY_SECONDS    = 3;

    static constexpr uint32_t BUCKET_COUNT =
        (HISTORY_SECONDS * 1000000) / BUCKET_INTERVAL_US;

    // ------------------------------------------------------------------------

    void Reset()
    {
        m_writeIndex = 0;
        m_bucketCount = 0;

        m_accumulatedUs = 0;
        m_accumulatedFrames = 0;
        m_maxFrameUs = 0;
        m_bucketElapsedUs = 0;

        std::fill(
            std::begin(m_buckets),
            std::end(m_buckets),
            FrameBucket{});
    }

    // ------------------------------------------------------------------------
    // Add a frame
    //
    // frameUs:
    //     frametime in microseconds
    //
    // Example:
    //
    //     16.667 ms = 16667 us
    //
    // ------------------------------------------------------------------------

    void AddFrame(uint32_t frameUs)
    {
        // Accumulate stats for current bucket.

        m_accumulatedUs += frameUs;
        m_accumulatedFrames++;

        if (frameUs > m_maxFrameUs)
            m_maxFrameUs = frameUs;

        m_bucketElapsedUs += frameUs;

        // Emit one or more buckets if enough time elapsed.

        while (m_bucketElapsedUs >= BUCKET_INTERVAL_US)
        {
            EmitBucket();

            m_bucketElapsedUs -= BUCKET_INTERVAL_US;
        }
    }

    // ------------------------------------------------------------------------

    float GetAverageFPS() const
    {
        if (m_bucketCount == 0)
            return 0.0f;

        uint64_t totalUs = 0;

        for (uint32_t i = 0; i < m_bucketCount; ++i)
        {
            totalUs += m_buckets[i].GetAvgUs();
        }

        const float avgUs =
            static_cast<float>(totalUs)
            / static_cast<float>(m_bucketCount);

        return avgUs > 0.0f
            ? (1000000.0f / avgUs)
            : 0.0f;
    }

    // ------------------------------------------------------------------------
    // Approximate 1% low FPS
    //
    // Uses the worst bucket frametimes.
    //
    // This is intentionally approximate and designed for:
    //
    // - low memory use
    // - deterministic runtime
    // - good stutter detection
    //
    // ------------------------------------------------------------------------

    float GetOnePercentLowFPS() const
    {
        if (m_bucketCount == 0)
            return 0.0f;

        uint32_t worstUs = 0;

        // Find worst bucket maximum frametime.

        for (uint32_t i = 0; i < m_bucketCount; ++i)
        {
            worstUs = std::max(
                worstUs,
                m_buckets[i].GetMaxUs());
        }

        return worstUs > 0
            ? (1000000.0f / static_cast<float>(worstUs))
            : 0.0f;
    }

private:

    // ------------------------------------------------------------------------

    void EmitBucket()
    {
        FrameBucket& bucket =
            m_buckets[m_writeIndex];

        // Compute average frametime for bucket.

        const uint32_t avgUs =
            (m_accumulatedFrames > 0)
            ? static_cast<uint32_t>(
                m_accumulatedUs / m_accumulatedFrames)
            : 0;

        bucket.SetAvgUs(avgUs);
        bucket.SetMaxUs(m_maxFrameUs);

        // Advance ring buffer.

        m_writeIndex =
            (m_writeIndex + 1) % BUCKET_COUNT;

        if (m_bucketCount < BUCKET_COUNT)
            ++m_bucketCount;

        // Reset accumulators.

        m_accumulatedUs = 0;
        m_accumulatedFrames = 0;
        m_maxFrameUs = 0;
    }

private:

    FrameBucket m_buckets[BUCKET_COUNT];

    uint32_t m_writeIndex = 0;
    uint32_t m_bucketCount = 0;

    uint64_t m_accumulatedUs = 0;
    uint32_t m_accumulatedFrames = 0;
    uint32_t m_maxFrameUs = 0;

    uint32_t m_bucketElapsedUs = 0;
};

// ============================================================================
// Example usage
// ============================================================================

int main()
{
    FpsHistory history;

    history.Reset();

    // Simulate ~60 FPS.

    for (int i = 0; i < 500; ++i)
    {
        uint32_t frameUs = 16667;

        // Simulate occasional stutter.

        if ((i % 120) == 0)
        {
            frameUs = 50000; // 50 ms hitch
        }

        history.AddFrame(frameUs);
    }

    printf(
        "Average FPS: %.2f\n",
        history.GetAverageFPS());

    printf(
        "Approx 1%% Low FPS: %.2f\n",
        history.GetOnePercentLowFPS());

    return 0;
}

This data organization approach uses just 1200 bytes at most. Bucket intervals could also be longer for even less size.

It loses value accuracy. It optimizes for speed over accuracy.

However, your current implemention would perhaps be more efficient at low frame rates, because less values need reading at small frame counts. I do like that aspect, because low fps runtimes need to do less.

I suggest to look into this more how to find a good balance between size,speed and accuracy. There are many options here.

@xezon
Copy link
Copy Markdown

xezon commented May 5, 2026

I do not like the visuals of the new value. Can this look better?

@xezon xezon added Enhancement Is new feature or request GUI For graphical user interface Minor Severity: Minor < Major < Critical < Blocker Gen Relates to Generals ZH Relates to Zero Hour labels May 7, 2026
@xezon
Copy link
Copy Markdown

xezon commented May 7, 2026

Suggestion:

image

Triangle down code is: \x25BC
Triangle up code is: \x25B2

Uncapped FPS:

image

X is ascii X

X because 0 makes no sense as a value for no cap.

@xezon
Copy link
Copy Markdown

xezon commented May 7, 2026

Maybe make the 1% low value a bit brighter. It can be difficult to read in game.

updateRenderFpsString();
}

UnsignedInt renderFpsLimit = 0u;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better change the init value to RenderFpsPreset::UncappedFpsValue. Then the code can be simplified.

void W3DDisplay::addFpsSample(Real elapsedSeconds)
{
constexpr const Int FPS_HISTORY_SIZE = 30;
if (elapsedSeconds <= 0.0f)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this possible? If this is called with 0 seconds, then it indicates error.


Real W3DDisplay::calculateLow1PercentFPS(Real windowSeconds)
{
if (m_historyCount == 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is redundant

m_currentFPS = 1.0f/elapsedSeconds;
fpsHistory[historyOffset++] = m_currentFPS;
addFpsSample(elapsedSeconds);
m_averageFPS = calculateAverageFPS(0.5f);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just half of the fps text update rate of 1 second.

m_averageFPS = sum / FPS_HISTORY_SIZE;
static UnsignedInt lastLowUpdate = 0;
UnsignedInt now = timeGetTime();
if (now - lastLowUpdate >= 1000)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interval is a bit unfortunate here. Is this for performance reasons? Can we make calculateLow1PercentFPS cheaper instead?


// convert elapsed time to seconds
Real elapsedSeconds = (Real)timeDiff/(Real)freq64;
if (m_lastUpdateTime64 == 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of this condition? Looks like it will almost never be true.


Real W3DDisplay::calculateAverageFPS(Real windowSeconds)
{
if (m_historyCount == 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of this condition? It looks like it will almost never be true. Perhaps init m_historyCount with 1.

Real m_low1PercentFPS; ///<1% low fps.
Real m_currentFPS; ///<current fps value.

enum { FPS_HISTORY_SIZE = 5000 }; // covers 5s at 1000 FPS, degrades gracefully beyond
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation approach is fine, but the data size requirements for the fps counters are outrageous.

I inquired a bit with Chat Gippy and it would be possible to reduce size requirements by quantizing samples and moving intermediate samples into timed buckets.

Here is a sample FrameBucket with avg and min frame time, stored in 2 bytes each, with a resolution of 16 micro seconds.

// Compact frametime statistics bucket.
//
// Stores average and maximum frametime values using 16-bit integers
// with fixed-point quantization.
//
// ------------------------------------------------------------------
// Encoding
// ------------------------------------------------------------------
//
// Values are stored in units of 16 microseconds:
//
//     stored_value = frametime_us / 16
//
// This allows a uint16_t to represent:
//
//     65535 * 16 us = 1,048,560 us
//                   = ~1048 ms
//
// which comfortably covers extremely slow frames (~1 FPS).
//
// ------------------------------------------------------------------
// Precision
// ------------------------------------------------------------------
//
// Quantization step:
//
//     16 us = 0.016 ms
//
// Maximum quantization error:
//
//     +/- 8 us
//
// This is effectively negligible for FPS telemetry.
//
// Example:
//
//     16.667 ms frame (60 FPS)
//
// Encoded as:
//
//     16667 us / 16 = 1042
//
// Decoded:
//
//     1042 * 16 = 16672 us
//
// Error:
//
//     +5 us = 0.005 ms
//
// ------------------------------------------------------------------
// Why fixed-point quantization?
// ------------------------------------------------------------------
//
// Advantages:
//
// - extremely compact (4 bytes total per bucket)
// - deterministic runtime cost
// - cache friendly
// - no floating-point storage
// - sufficient precision for telemetry
// - supports >1 second frametimes
//
// ------------------------------------------------------------------
// Intended usage
// ------------------------------------------------------------------
//
// Typical bucket generation:
//
//     avgFrameUs = accumulatedFrameUs / frameCount
//     maxFrameUs = worstFrameUsSeen
//
// Then:
//
//     bucket.SetAvgUs(avgFrameUs);
//     bucket.SetMaxUs(maxFrameUs);
//
// ------------------------------------------------------------------
// Recommended usage pattern
// ------------------------------------------------------------------
//
// Build buckets over fixed time windows:
//
//     e.g. 50 ms or 100 ms
//
// rather than fixed frame counts.
//
// This keeps statistical resolution stable across varying FPS.
//
// ------------------------------------------------------------------

struct FrameBucket
{
    uint16_t avg16us;
    uint16_t max16us;

    static constexpr uint32_t QUANTUM_US    = 16;
    static constexpr uint32_t QUANTUM_SHIFT = 4;

    static constexpr uint32_t MAX_US =
        0xFFFFu * QUANTUM_US;

    // Encode microseconds into 16 us fixed-point units.
    //
    // Rounds to nearest unit.
    //
    // Input is clamped to representable range.
    static uint16_t EncodeUs(uint32_t us)
    {
        if (us > MAX_US)
            us = MAX_US;

        return static_cast<uint16_t>(
            (us + (QUANTUM_US / 2)) >> QUANTUM_SHIFT);
    }

    // Decode fixed-point units back into microseconds.
    static uint32_t DecodeUs(uint16_t v)
    {
        return static_cast<uint32_t>(v)
            << QUANTUM_SHIFT;
    }

    void SetAvgUs(uint32_t us)
    {
        avg16us = EncodeUs(us);
    }

    void SetMaxUs(uint32_t us)
    {
        max16us = EncodeUs(us);
    }

    uint32_t GetAvgUs() const
    {
        return DecodeUs(avg16us);
    }

    uint32_t GetMaxUs() const
    {
        return DecodeUs(max16us);
    }

    float GetAvgMs() const
    {
        return static_cast<float>(GetAvgUs()) * 0.001f;
    }

    float GetMaxMs() const
    {
        return static_cast<float>(GetMaxUs()) * 0.001f;
    }

    float GetAvgFPS() const
    {
        const uint32_t us = GetAvgUs();

        return (us > 0)
            ? (1000000.0f / static_cast<float>(us))
            : 0.0f;
    }

    float GetMinFPS() const
    {
        const uint32_t us = GetMaxUs();

        return (us > 0)
            ? (1000000.0f / static_cast<float>(us))
            : 0.0f;
    }
};

And then we move these FrameBuckets into a time based array.

Sample implementation:

// ============================================================================
// Rolling 3-second FPS statistics
// ============================================================================
//
// Stores:
//
//     3 seconds @ 10 ms resolution
//
// = 300 buckets
//
// Each bucket summarizes:
//
//     - average frametime
//     - worst frametime
//
// over a 10 ms interval.
//
// ============================================================================

class FpsHistory
{
public:

    static constexpr uint32_t BUCKET_INTERVAL_US = 10000; // 10 ms
    static constexpr uint32_t HISTORY_SECONDS    = 3;

    static constexpr uint32_t BUCKET_COUNT =
        (HISTORY_SECONDS * 1000000) / BUCKET_INTERVAL_US;

    // ------------------------------------------------------------------------

    void Reset()
    {
        m_writeIndex = 0;
        m_bucketCount = 0;

        m_accumulatedUs = 0;
        m_accumulatedFrames = 0;
        m_maxFrameUs = 0;
        m_bucketElapsedUs = 0;

        std::fill(
            std::begin(m_buckets),
            std::end(m_buckets),
            FrameBucket{});
    }

    // ------------------------------------------------------------------------
    // Add a frame
    //
    // frameUs:
    //     frametime in microseconds
    //
    // Example:
    //
    //     16.667 ms = 16667 us
    //
    // ------------------------------------------------------------------------

    void AddFrame(uint32_t frameUs)
    {
        // Accumulate stats for current bucket.

        m_accumulatedUs += frameUs;
        m_accumulatedFrames++;

        if (frameUs > m_maxFrameUs)
            m_maxFrameUs = frameUs;

        m_bucketElapsedUs += frameUs;

        // Emit one or more buckets if enough time elapsed.

        while (m_bucketElapsedUs >= BUCKET_INTERVAL_US)
        {
            EmitBucket();

            m_bucketElapsedUs -= BUCKET_INTERVAL_US;
        }
    }

    // ------------------------------------------------------------------------

    float GetAverageFPS() const
    {
        if (m_bucketCount == 0)
            return 0.0f;

        uint64_t totalUs = 0;

        for (uint32_t i = 0; i < m_bucketCount; ++i)
        {
            totalUs += m_buckets[i].GetAvgUs();
        }

        const float avgUs =
            static_cast<float>(totalUs)
            / static_cast<float>(m_bucketCount);

        return avgUs > 0.0f
            ? (1000000.0f / avgUs)
            : 0.0f;
    }

    // ------------------------------------------------------------------------
    // Approximate 1% low FPS
    //
    // Uses the worst bucket frametimes.
    //
    // This is intentionally approximate and designed for:
    //
    // - low memory use
    // - deterministic runtime
    // - good stutter detection
    //
    // ------------------------------------------------------------------------

    float GetOnePercentLowFPS() const
    {
        if (m_bucketCount == 0)
            return 0.0f;

        uint32_t worstUs = 0;

        // Find worst bucket maximum frametime.

        for (uint32_t i = 0; i < m_bucketCount; ++i)
        {
            worstUs = std::max(
                worstUs,
                m_buckets[i].GetMaxUs());
        }

        return worstUs > 0
            ? (1000000.0f / static_cast<float>(worstUs))
            : 0.0f;
    }

private:

    // ------------------------------------------------------------------------

    void EmitBucket()
    {
        FrameBucket& bucket =
            m_buckets[m_writeIndex];

        // Compute average frametime for bucket.

        const uint32_t avgUs =
            (m_accumulatedFrames > 0)
            ? static_cast<uint32_t>(
                m_accumulatedUs / m_accumulatedFrames)
            : 0;

        bucket.SetAvgUs(avgUs);
        bucket.SetMaxUs(m_maxFrameUs);

        // Advance ring buffer.

        m_writeIndex =
            (m_writeIndex + 1) % BUCKET_COUNT;

        if (m_bucketCount < BUCKET_COUNT)
            ++m_bucketCount;

        // Reset accumulators.

        m_accumulatedUs = 0;
        m_accumulatedFrames = 0;
        m_maxFrameUs = 0;
    }

private:

    FrameBucket m_buckets[BUCKET_COUNT];

    uint32_t m_writeIndex = 0;
    uint32_t m_bucketCount = 0;

    uint64_t m_accumulatedUs = 0;
    uint32_t m_accumulatedFrames = 0;
    uint32_t m_maxFrameUs = 0;

    uint32_t m_bucketElapsedUs = 0;
};

// ============================================================================
// Example usage
// ============================================================================

int main()
{
    FpsHistory history;

    history.Reset();

    // Simulate ~60 FPS.

    for (int i = 0; i < 500; ++i)
    {
        uint32_t frameUs = 16667;

        // Simulate occasional stutter.

        if ((i % 120) == 0)
        {
            frameUs = 50000; // 50 ms hitch
        }

        history.AddFrame(frameUs);
    }

    printf(
        "Average FPS: %.2f\n",
        history.GetAverageFPS());

    printf(
        "Approx 1%% Low FPS: %.2f\n",
        history.GetOnePercentLowFPS());

    return 0;
}

This data organization approach uses just 1200 bytes at most. Bucket intervals could also be longer for even less size.

It loses value accuracy. It optimizes for speed over accuracy.

However, your current implemention would perhaps be more efficient at low frame rates, because less values need reading at small frame counts. I do like that aspect, because low fps runtimes need to do less.

I suggest to look into this more how to find a good balance between size,speed and accuracy. There are many options here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement Is new feature or request Gen Relates to Generals GUI For graphical user interface Minor Severity: Minor < Major < Critical < Blocker ZH Relates to Zero Hour

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants