Skip to content

Logging performance: off-thread console writer, buffered file sinks, and quieter defaults#402

Merged
MadMaxMangos merged 5 commits into
masterfrom
fix/logging-perf-speedfix
Jun 28, 2026
Merged

Logging performance: off-thread console writer, buffered file sinks, and quieter defaults#402
MadMaxMangos merged 5 commits into
masterfrom
fix/logging-perf-speedfix

Conversation

@MadMaxMangos

@MadMaxMangos MadMaxMangos commented Jun 28, 2026

Copy link
Copy Markdown
Member

Summary

Logging was a significant source of startup/runtime stalls on Windows: large
spawn bursts (LivingWorld creature loads, playerbot spawn-in) could make the
world thread crawl, and the effect was worst at LogLevel = 3 (debug) and when
run from a console window. Root cause is that every console line is flushed by
the CRT per call, and the file sinks were fflush-ed per line — all on the hot
world/map-update threads.

This PR removes that cost in three layers, plus a follow-up commit addressing an
adversarial code review. It is core-side only — no submodules
(src/realmd, src/modules/Bots) are touched — and the public sLog API is
unchanged.

What changed

  • 4a6248f4 — quieter defaults (config only, no recompile). Dist
    LogLevel 3 → 1; suppress the highest-volume debug filters
    (LogFilter_AIAndMovegens, LogFilter_SpellCast) by default.
  • 43a2f2a8 — buffered, flush-free file sinks. setvbuf(64 KiB) on file
    sinks; the hot loggers stop fflush-ing per line and write under a dedicated
    file mutex; a periodic Log::Flush() runs on the world tick and at shutdown;
    Initialize() is now idempotent (clean close/reopen); the world packet dump
    is rewritten from per-byte fprintf to a single buffered fwrite
    (byte-identical output).
  • 0477eb46 — off-thread console writer. A dedicated ConsoleLogWriter
    thread performs color + OEM transform + console write off the world/map
    threads; producers format the line and enqueue it (bounded, with drop+count
    on overflow). This is the one sink Windows cannot batch inline, so it's what
    makes live LogLevel = 3 affordable.
  • 8d3d684b — adversarial-review fixes. Shutdown ordering so the SOAP
    listener thread is joined before the console writer is torn down; atomic stop
    flag; console writer started only after fallible init; full (untruncated)
    formatting on POSIX; restored exact legacy newline/color byte-ordering; misc
    portability/doc hygiene.

Behavior & config changes (please note when upgrading)

  • PacketLoggingEnabled (new, default 0). World packet logging used to
    turn on implicitly whenever WorldLogFile was set. It is now off unless
    this flag is explicitly 1, so an existing config with WorldLogFile (even
    at LogLevel 3) no longer pays the packet-dump cost by accident.
  • Default LogLevel is now 1 (Basic & Error) and
    LogFilter_AIAndMovegens / LogFilter_SpellCast now ship suppressed.
    Set them back if you want the verbose output; mangosd.conf.dist documents
    all of this.
  • Error console output is now consistently on stderr (timestamp + message).
    Previously the optional LogTime timestamp leaked to stdout while the message
    went to stderr. With the default LogTime = 0 the console output is
    byte-identical to before; only observable with LogTime = 1.

Performance (validated, warm runs, LivingWorld spawn load)

LoadContinents startup metric, same machine/workload:

Mode LoadContinents
Baseline (LogLevel 3, inline) ~3879 ms
This PR, LogLevel 1 (quiet) ~1112–1224 ms
This PR, LogLevel 3 (off-thread console) ~1247–1302 ms

~2.6–3× faster at the same LogLevel 3, and full-debug console is now on
par with quiet mode. 0 console-line drops and a clean drain+join shutdown on
every run, with SOAP both enabled and disabled.

Design notes / scope

  • Errors are intentionally kept off the file mutex so the AntiFreeze
    watchdog can never block on logging (rare error/debug interleave accepted, no
    corruption).
  • The console writer never calls sLog (no re-entrancy) and is drained+joined
    at shutdown; lines lost on a hard crash are harmless because the file sink is
    durable.
  • src/realmd (submodule) is untouched and uses the synchronous console
    fallback. Optional submodule follow-ups (realmd flush hook, playerbot log
    re-leveling) are deferred to separate PRs.

Testing

  • Built clean on Windows (MSVC / VS2026, RelWithDebInfo, 0 warnings) — mangosd
    and realmd.
  • Runtime-validated on a real Classic (1.12.x) server under a LivingWorld
    spawn-in load (numbers above); clean drain+join shutdown verified with SOAP
    both enabled and disabled, 0 console-line drops.
  • Relies on CI for the GCC/Clang builds (C++17); the diff was reviewed for
    cross-toolchain portability but compiled locally only on MSVC.

Design and full review history: docs/logging-perf/SPEC.md.

🤖 Generated with Claude Code


This change is Reviewable

MadMaxMangos and others added 4 commits June 27, 2026 19:58
…ebug filters

Stage 1 / Change A increment 1 (config only, no recompile needed).

- LogLevel 3 -> 1: the dist shipped full DEBUG to the console; on Windows the
  console is flushed per stdio call, so at level 3 every per-spawn DEBUG/DETAIL
  line is a synchronous console write on the world/map-update threads -> the
  LivingWorld / playerbot spawn-burst hang.
- LogFilter_AIAndMovegens 0 -> 1 and LogFilter_SpellCast 0 -> 1: these two
  filters shipped enabled; in a LivingWorld bitmask-7 baseline they are ~50% of
  a LogFileLevel=3 dump (~11k "used AI is" + ~2.7k spell/aura + waypoint). Each
  stays re-enablable for targeted debugging.

Recommended runtime mode is now LogLevel=1 + LogFileLevel=3 (quiet console,
buffered DEBUG to file). Spec: docs/logging-perf/SPEC.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…/ Change A)

Increment 2 - make LogFileLevel=3 debug fast/buffered/un-torn so devs can run
full DEBUG without the world thread stalling on per-line file I/O.

- setvbuf(_IOFBF, 64KB) on all file sinks (openLogFile). The four hot loggers
  (outString/outBasic/outDetail/outDebug) drop their per-line fflush and write
  the timestamp+body+newline under a new m_fileMtx so concurrent map-update
  worker threads cannot tear lines.
- Log::Flush() drains the buffered main + packet logs; called ~1/sec from
  World::Update and once at shutdown (after "Bye!"). Errors still flush at emit
  time and are deliberately NOT under m_fileMtx, so the AntiFreeze watchdog can
  never be blocked by it and error lines stay crash-durable (accepted cost: a
  rare error/debug line interleave, no corruption).
- Initialize() is now idempotent via CloseLogFiles(), fixing the realmd
  double-init handle leak; ~Log() shares the same CloseLogFiles().
- outWorldPacketDump rewritten from one fprintf PER BYTE to a single buffered
  fwrite (byte-identical output).
- PacketLoggingEnabled (default off) is the single gate for packet logging:
  worldLogfile is opened only when set, IsPacketLoggingEnabled() keys off it,
  and both WorldSocket call sites gate on it -> packet logging is off by default
  even on legacy configs with WorldLogFile set + LogLevel=3.

Recommended dev mode: LogLevel=1 + LogFileLevel=3 (quiet console, buffered
DEBUG file). Reviewed by a 3-lens adversarial pass (compile / concurrency /
byte-fidelity): SHIP, no compile blockers. Spec: docs/logging-perf/SPEC.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the console emit off the world + map-update threads so live LogLevel=3
console debug no longer stalls them. On Windows the CRT flushes the console per
stdio call, so console writes can't be batched inline - a dedicated thread does
them.

- ConsoleLogWriter (ACE_Based::Runnable, modeled on SqlDelayThread): producers
  format the line and Enqueue a by-value ConsoleLogRecord into a LockedQueue; one
  writer thread drains it and does Log::SetColor (now static) + OEM transform +
  fwrite + ResetColor, per line, off the hot threads. Bounded with drop+count on
  overflow (the file sink is durable, so dropped/crash-lost console lines are
  harmless). The writer never calls sLog (no reentrancy).
- Every logger's CONSOLE block now routes through Log::ConsoleEmit /
  ConsoleEmitBlank (async when the thread runs, synchronous inline fallback
  otherwise - startup + shutdown tail). The increment-2 m_fileMtx FILE blocks are
  untouched and the public sLog API is unchanged.
- Util::vutf8format returns the formatted OEM/UTF-8 bytes as std::string (same
  vsnprintf + Utf8toWStr + CharToOemBuffW chain as vutf8printf; byte-identical).
- mangosd starts the writer after config/InitColors and stops+drains+joins it at
  shutdown before the final Flush. realmd (a submodule) is intentionally NOT wired
  - it uses the synchronous fallback; its console-thread wiring is a deferred
  submodule follow-up.

Built clean (RelWithDebInfo: mangosd + realmd, 0 errors/warnings) and reviewed by
a 3-lens adversarial pass (concurrency/lifecycle, console-fidelity, style),
SHIP_WITH_FIXES: the no-arg error variants now route through the writer, the
drop-notice fwrite is clamped, and the StopConsoleThread quiescence invariant is
documented. Spec: docs/logging-perf/SPEC.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adversarial pre-PR review (Codex) flagged correctness/portability issues in
the off-thread console + buffered-file rework. Fixes (no public sLog API
change, core-side only, no submodule edits):

BLOCKING
- SOAP UAF on shutdown: the SOAP listener is a std::thread, not an ACE task,
  so ACE_Thread_Manager::wait() never joined it; its shared_ptr deleter joined
  only at main() scope exit, AFTER StopConsoleThread() freed the writer -- a
  late SOAP log could enqueue into a freed ConsoleLogWriter. Now soapThread is
  reset() (joined) before StopConsoleThread(), upholding the quiescence
  invariant for SOAP too. (mangosd.cpp)
- ConsoleLogWriter::m_running is now std::atomic<bool>, not volatile bool, so
  Stop()/run() synchronise correctly (volatile is not a memory-model fence;
  matters on weakly-ordered archs). (ConsoleLogWriter.h)

MAJOR
- Console writer started too early: it was launched before the OpenSSL / PID /
  start_db init, whose early-return error paths left the writer thread running
  into stdio teardown. Moved StartConsoleThread() to just after start_db()
  succeeds -- still before SetInitialWorldSettings() (the spawn-burst hot path)
  -- so no fatal early-return leaks a running writer. (mangosd.cpp)
- POSIX truncation regression: vutf8format() clamped to 32 KiB whereas the
  legacy POSIX path used unbounded vfprintf. Oversize messages are now rendered
  in full via a sized second pass (va_copy). (Util.cpp)
- Console byte-fidelity: the trailing newline was being written INSIDE the
  SetColor/ResetColor span; restored the legacy ordering (body -> ResetColor ->
  "\n") by keeping the newline off rec.text and appending it after reset in the
  writer and the sync fallback. (Log.cpp, ConsoleLogWriter.cpp, Log.h)

MINOR / hygiene
- Writer now owns the stdout flush (one fflush per drain batch, off the world
  thread) so redirected/piped consoles stay prompt. (ConsoleLogWriter.cpp)
- Added <utility> for std::move (was relying on a transitive include that GCC/
  Clang may not provide). (Log.cpp)
- Refreshed stale config comments (LogLevel now defaults to 1; AIAndMovegens /
  SpellCast now ship suppressed) and documented PacketLoggingEnabled.
  (mangosd.conf.dist.in)

Note (intentional, not restored): all error console output (timestamp +
message + newline) now goes to stderr consistently. Previously the optional
LogTime timestamp leaked to stdout via outTime() while the message went to
stderr; with the default LogTime=0 the console output is byte-identical to
master.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 17 complexity · -19 duplication

Metric Results
Complexity 17
Duplication -19

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

The logging-performance change adds PacketLoggingEnabled and flips the
default LogLevel / LogFilter_AIAndMovegens / LogFilter_SpellCast values in
mangosd.conf.dist, so bump MANGOS_WORLD_VER (drives both the generated
ConfVersion and the compiled MANGOSD_CONFIG_VERSION) to today's date. Servers
running an older mangosd.conf now get the "conf out of date" warning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AppVeyorBot

Copy link
Copy Markdown

@MadMaxMangos MadMaxMangos merged commit c718ef1 into master Jun 28, 2026
8 checks passed
@MadMaxMangos MadMaxMangos deleted the fix/logging-perf-speedfix branch June 28, 2026 14:26
MadMaxMangos added a commit that referenced this pull request Jun 28, 2026
…e filters for runtime debug spam (#403)

* fix(logging): route progress bars through the console writer (no more torn console)

PR #402 moved console log emit onto a dedicated writer thread, but BarGoLink
still printf()+fflush()ed its progress bars straight to stdout on the
loading/world thread. Two unsynchronized writers on one stdout made the
"=== N%" bars and the writer thread's log lines interleave character-by-
character -> the torn live console. The file log was unaffected (the bar never
writes to the file and file writes are m_fileMtx-serialized). The race also
recurs at runtime via .reload/.ahbot/playerbot commands that step bars on the
world thread while the writer is live, so a writer-start-timing fix alone is
insufficient.

Make stdout single-owner: the bar now builds one full redraw string and hands
it to the writer as a single atomic "raw" record (verbatim - no time prefix,
no color, no appended newline), so bar redraws and log lines flow through one
FIFO in program order, exactly as before the writer existed.

- ProgressBarRender.h (new): pure, dependency-free byte builders, byte-identical
  to the legacy printf sequences (offline unit test, incl. an n>indic_len guard).
- ProgressBar.{h,cpp}: init/step/dtor build one string and emit via a swappable
  sink; the default sink is a synchronous fwrite(stdout)+fflush, so offline tools
  and realmd stay byte-for-byte unchanged.
- Log.{h,cpp}: ConsoleEmitRaw() enqueues a raw record when the writer runs, else
  writes verbatim synchronously. ConsoleLogRecord gains isRaw.
- ConsoleLogWriter.cpp: Emit honours isRaw (verbatim, no '\n'); the queue-full
  drop notice gets a leading '\n' so it can no longer land mid-bar.
- mangosd.cpp: install the async bar sink right after StartConsoleThread().

Colored output (including the ScriptDev banner) is untouched - only the bar,
which was never colored, changed path. Built clean (RelWithDebInfo, mangosd +
realmd, 0 errors) and confirmed torn-free in a live smoke test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(logging): suppressible filters for runtime debug spam + console-writer review fixes

Two related logging improvements on top of the off-thread console writer.

1) LOG-FILTER SWEEP -- make a busy LogLevel/LogFileLevel=3 run usable.
   - New filters LogFilter_GridAdd (0x100000) and LogFilter_DbScripts (0x200000);
     LOG_FILTER_COUNT -> 22. ALL log filters now ship default-on (suppressed) so a
     fresh level-3 run is quiet out of the box; set any LogFilter_* = 0 to restore
     that category. ConfVersion bumped so stale configs get the out-of-date notice.
   - Gated the highest-volume bare DEBUG_LOG/DETAIL_LOG/outString runtime lines onto
     EXISTING filters (no new bits beyond the two above):
       SPELL_CAST       : "Holder of spell .. in use" (Unit), "Spell Effect DUMMY" x2
       COMBAT           : SMSG_ATTACKSTART (now logs attacker -> victim via GetGuidStr)
       MAP_LOADING      : grid load-content count, grid IDLE/REMOVAL + differed-unload
                          transitions, both vmap GetHitPosition LoS corrections,
                          AddPersistentState, and the startup "Active Flag" line
       DB_STRICTED_CHECK: the four "Unused (> MaxPlayerLevel) level" notices
       GRID_ADD         : "X enters grid[x,y]"
       DB_SCRIPTS       : db_scripts process traces
     Error variants (outError/outErrorDb) and one-time/info lines stay visible.
   - Fixed a latent config-key typo: logFilterData read "LogFilter_MapLoading" but
     every config ships "LogFilter_MapsLoading", so the key never bound and the
     filter only worked by code default. Code now reads "LogFilter_MapsLoading".
   - GridStates: corrected the mislabeled "moved to IDLE state" -> "REMOVAL state"
     (IdleState::Update actually transitions to GRID_STATE_REMOVAL).

2) CONSOLE-WRITER REVIEW FIXES (adversarial review of the bar-tearing change).
   - Single-owner stdout: the bar was made async, but the CLI prompt, CLI command
     output (utf8print), the beep, and the loglevel-change notices still wrote to
     stdout directly -- so they could overtake queued bar frames (e.g. the prompt
     reprinted after .reload landing before/over the bar). All now route through
     Log::ConsoleEmitRaw, so every non-log-line console write shares the one FIFO
     and drains in program order. (Log::outTime's printf is dead -- no callers.)
   - ProgressBarRender.h: include <cstddef> and use std::size_t (do not rely on
     <string> to expose ::size_t globally -- non-portable on GCC/Clang).

Built clean (RelWithDebInfo, mangosd + realmd, 0 errors); filters and the .reload
prompt-ordering fix confirmed in live level-3 smoke tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants