Skip to content

Commit f2eaa5e

Browse files
vaindclaudeJoshuaMoelans
authored
test: Fix test failures when session tracking is enabled (#1393)
* fix(logs): Fix test failures when session tracking is enabled This commit fixes 3 test failures in test_logs.c that occur when auto-session tracking is enabled (the default): - basic_logging_functionality - formatted_log_messages - logs_disabled_by_default Root causes and fixes: 1. validate_logs_envelope counted all envelopes but only validated logs - Session envelopes from sentry_close() were triggering assertions - Fixed by filtering: only count/validate log envelopes, skip others 2. formatted_log_messages didn't wait for batching thread to start - Added sleep_ms(20) after sentry_init() to match other tests 3. batching_stop flag wasn't reset between sentry_init() calls - Once set to 1 during shutdown, subsequent startups would fail - Fixed by resetting to 0 in sentry__logs_startup() These issues were discovered in console SDK testing where session tracking is enabled by default and tests run sequentially in a single process. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(logs): Eliminate thread startup race with enum state machine This commit eliminates the thread startup race condition by: 1. Replacing batching_stop with enum-based thread_state - SENTRY_LOGS_THREAD_STOPPED (0): Not running - SENTRY_LOGS_THREAD_RUNNING (1): Thread active and processing logs - Improves code clarity and makes thread lifecycle explicit 2. Thread signals RUNNING state after initialization - Batching thread sets state to RUNNING after mutex setup - Provides deterministic indication that thread is ready 3. Adding test-only helper: sentry__logs_wait_for_thread_startup() - Polls thread_state until RUNNING (max 1 second) - Zero production overhead (only compiled with SENTRY_UNITTEST) - Tests explicitly wait for thread readiness instead of arbitrary sleeps 4. Updating shutdown to use atomic state transition - Atomically transitions from RUNNING to STOPPED - Detects double shutdown or never-started cases - Returns early if thread wasn't running Benefits: - Eliminates race where logs could be enqueued before thread starts - Tests are deterministic (no arbitrary timing assumptions) - Code is clearer with explicit state names - No production overhead (test helper is ifdef'd out) The one remaining sleep in basic_logging_functionality test is intentional - it tests batch timing behavior (wait for buffer flush). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(logs): Fix thread lifecycle race condition causing memory leaks This commit fixes a race condition in the logs batching thread lifecycle that caused valgrind to report 336 byte memory leaks in CI tests. ## Problem When `sentry__logs_shutdown()` ran before the batching thread transitioned from initial state to RUNNING, the two-state model couldn't distinguish between "never started" and "starting but not yet running", causing shutdown to skip joining the thread. ## Solution 1. Added three-state lifecycle enum: - STOPPED (0): Thread never started or shut down - STARTING (1): Thread spawned but not yet initialized - RUNNING (2): Thread active and processing logs 2. Added `sentry__atomic_compare_swap()` primitive for atomic state verification (cross-platform: Windows InterlockedCompareExchange, POSIX __atomic_compare_exchange_n) 3. Startup sets state to STARTING before spawning thread 4. Thread uses CAS to verify STARTING → RUNNING transition - If CAS fails (shutdown already set to STOPPED), exits cleanly - Guarantees thread only runs if it successfully transitioned 5. Shutdown always joins thread if old state != STOPPED ## Benefits - Eliminates race condition where shutdown could miss a spawned thread - Thread explicitly verifies state transition with CAS - No memory leaks in any shutdown scenario - All 212 unit tests pass - All log integration tests pass Fixes test failures: - test_before_send_log - test_before_send_log_discard 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(logs): Address code review comments with clarifying documentation Addresses bot review feedback by adding documentation without changing behavior: 1. **CAS memory ordering**: Added comment explaining sequential consistency usage for thread state transitions and why it's appropriate for synchronization 2. **Condition variable cleanup**: Added note explaining that static storage condition variables don't require explicit cleanup on POSIX and Windows 3. **CAS function documentation**: Enhanced docstring to document memory ordering guarantees and note that ABA problem isn't a concern for simple integer state machines 4. **Shutdown race handling**: Added comment explaining how the atomic CAS in the thread prevents the race when shutdown occurs during STARTING state All changes are documentation/comments only - no behavior changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Update src/sentry_logs.c Co-authored-by: JoshuaMoelans <60878493+JoshuaMoelans@users.noreply.github.com> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: JoshuaMoelans <60878493+JoshuaMoelans@users.noreply.github.com>
1 parent 695f4a4 commit f2eaa5e

4 files changed

Lines changed: 138 additions & 14 deletions

File tree

src/sentry_logs.c

Lines changed: 94 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,30 @@
1313

1414
#ifdef SENTRY_UNITTEST
1515
# define QUEUE_LENGTH 5
16+
# ifdef SENTRY_PLATFORM_WINDOWS
17+
# include <windows.h>
18+
# define sleep_ms(MILLISECONDS) Sleep(MILLISECONDS)
19+
# else
20+
# include <unistd.h>
21+
# define sleep_ms(MILLISECONDS) usleep(MILLISECONDS * 1000)
22+
# endif
1623
#else
1724
# define QUEUE_LENGTH 100
1825
#endif
1926
#define FLUSH_TIMER 5
2027

28+
/**
29+
* Thread lifecycle states for the logs batching thread.
30+
*/
31+
typedef enum {
32+
/** Thread is not running (initial state or after shutdown) */
33+
SENTRY_LOGS_THREAD_STOPPED = 0,
34+
/** Thread is starting up but not yet ready */
35+
SENTRY_LOGS_THREAD_STARTING = 1,
36+
/** Thread is running and processing logs */
37+
SENTRY_LOGS_THREAD_RUNNING = 2,
38+
} sentry_logs_thread_state_t;
39+
2140
typedef struct {
2241
sentry_value_t logs[QUEUE_LENGTH];
2342
long index; // (atomic) index for producer threads to get a unique slot
@@ -29,7 +48,7 @@ static struct {
2948
log_buffer_t buffers[2]; // double buffer
3049
long active_idx; // (atomic) index to the active buffer
3150
long flushing; // (atomic) reentrancy guard to the flusher
32-
long batching_stop; // (atomic) run variable of the batching thread
51+
long thread_state; // (atomic) sentry_logs_thread_state_t
3352
sentry_cond_t request_flush; // condition variable to schedule a flush
3453
sentry_threadid_t batching_thread; // the batching thread
3554
} g_logs_state = {
@@ -47,6 +66,7 @@ static struct {
4766
},
4867
.active_idx = 0,
4968
.flushing = 0,
69+
.thread_state = SENTRY_LOGS_THREAD_STOPPED,
5070
};
5171

5272
// checks whether the currently active buffer should be flushed.
@@ -209,14 +229,30 @@ batching_thread_func(void *data)
209229
sentry__mutex_init(&task_lock);
210230
sentry__mutex_lock(&task_lock);
211231

212-
// check if thread got a shut-down signal
213-
while (sentry__atomic_fetch(&g_logs_state.batching_stop) == 0) {
232+
// Transition from STARTING to RUNNING using compare-and-swap
233+
// CAS ensures atomic state verification: only succeeds if state is STARTING
234+
// If CAS fails, shutdown already set state to STOPPED, so exit immediately
235+
// Uses sequential consistency to ensure all thread initialization is
236+
// visible
237+
if (!sentry__atomic_compare_swap(&g_logs_state.thread_state,
238+
(long)SENTRY_LOGS_THREAD_STARTING,
239+
(long)SENTRY_LOGS_THREAD_RUNNING)) {
240+
SENTRY_DEBUG("logs thread detected shutdown during startup, exiting");
241+
sentry__mutex_unlock(&task_lock);
242+
sentry__mutex_free(&task_lock);
243+
return 0;
244+
}
245+
246+
// Main loop: run while state is RUNNING
247+
while (sentry__atomic_fetch(&g_logs_state.thread_state)
248+
== SENTRY_LOGS_THREAD_RUNNING) {
214249
// Sleep for 5 seconds or until request_flush hits
215250
const int triggered_by = sentry__cond_wait_timeout(
216251
&g_logs_state.request_flush, &task_lock, 5000);
217252

218-
// make sure loop invariant still holds
219-
if (sentry__atomic_fetch(&g_logs_state.batching_stop) != 0) {
253+
// Check if we should still be running
254+
if (sentry__atomic_fetch(&g_logs_state.thread_state)
255+
!= SENTRY_LOGS_THREAD_RUNNING) {
220256
break;
221257
}
222258

@@ -251,6 +287,7 @@ batching_thread_func(void *data)
251287

252288
sentry__mutex_unlock(&task_lock);
253289
sentry__mutex_free(&task_lock);
290+
SENTRY_DEBUG("batching thread exiting");
254291
return 0;
255292
}
256293

@@ -733,6 +770,12 @@ sentry_log_fatal(const char *message, ...)
733770
void
734771
sentry__logs_startup(void)
735772
{
773+
// Mark thread as starting before actually spawning so thread can transition
774+
// to RUNNING. This prevents shutdown from thinking the thread was never
775+
// started if it races with the thread's initialization.
776+
sentry__atomic_store(
777+
&g_logs_state.thread_state, (long)SENTRY_LOGS_THREAD_STARTING);
778+
736779
sentry__cond_init(&g_logs_state.request_flush);
737780

738781
sentry__thread_init(&g_logs_state.batching_thread);
@@ -741,6 +784,11 @@ sentry__logs_startup(void)
741784

742785
if (spawn_result == 1) {
743786
SENTRY_ERROR("Failed to start batching thread");
787+
// Failed to spawn, reset to STOPPED
788+
// Note: condition variable doesn't need explicit cleanup for static
789+
// storage (pthread_cond_t on POSIX and CONDITION_VARIABLE on Windows)
790+
sentry__atomic_store(
791+
&g_logs_state.thread_state, (long)SENTRY_LOGS_THREAD_STOPPED);
744792
}
745793
}
746794

@@ -750,12 +798,23 @@ sentry__logs_shutdown(uint64_t timeout)
750798
(void)timeout;
751799
SENTRY_DEBUG("shutting down logs system");
752800

753-
// Signal the batching thread to stop running
754-
if (sentry__atomic_store(&g_logs_state.batching_stop, 1) != 0) {
755-
SENTRY_DEBUG("preventing double shutdown of logs system");
801+
// Atomically transition to STOPPED and get the previous state
802+
// This handles the race where thread might be in STARTING state:
803+
// - If thread's CAS hasn't run yet: CAS will fail, thread exits cleanly
804+
// - If thread already transitioned to RUNNING: normal shutdown path
805+
const long old_state = sentry__atomic_store(
806+
&g_logs_state.thread_state, (long)SENTRY_LOGS_THREAD_STOPPED);
807+
808+
// If thread was never started, nothing to do
809+
if (old_state == SENTRY_LOGS_THREAD_STOPPED) {
810+
SENTRY_DEBUG("logs thread was not started, skipping shutdown");
756811
return;
757812
}
813+
814+
// Thread was started (either STARTING or RUNNING), signal it to stop
758815
sentry__cond_wake(&g_logs_state.request_flush);
816+
817+
// Always join the thread to avoid leaks
759818
sentry__thread_join(g_logs_state.batching_thread);
760819

761820
// Perform final flush to ensure any remaining logs are sent
@@ -765,3 +824,30 @@ sentry__logs_shutdown(uint64_t timeout)
765824

766825
SENTRY_DEBUG("logs system shutdown complete");
767826
}
827+
828+
#ifdef SENTRY_UNITTEST
829+
/**
830+
* Wait for the logs batching thread to be ready.
831+
* This is a test-only helper to avoid race conditions in tests.
832+
*/
833+
void
834+
sentry__logs_wait_for_thread_startup(void)
835+
{
836+
const int max_wait_ms = 1000;
837+
const int check_interval_ms = 10;
838+
const int max_attempts = max_wait_ms / check_interval_ms;
839+
840+
for (int i = 0; i < max_attempts; i++) {
841+
const long state = sentry__atomic_fetch(&g_logs_state.thread_state);
842+
if (state == SENTRY_LOGS_THREAD_RUNNING) {
843+
SENTRY_DEBUGF(
844+
"logs thread ready after %d ms", i * check_interval_ms);
845+
return;
846+
}
847+
sleep_ms(check_interval_ms);
848+
}
849+
850+
SENTRY_WARNF(
851+
"logs thread failed to start within %d ms timeout", max_wait_ms);
852+
}
853+
#endif

src/sentry_logs.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,12 @@ void sentry__logs_shutdown(uint64_t timeout);
1919
#ifdef SENTRY_UNITTEST
2020
int populate_message_parameters(
2121
sentry_value_t attributes, const char *message, va_list args);
22+
23+
/**
24+
* Wait for the logs batching thread to be ready.
25+
* This is a test-only helper to avoid race conditions in tests.
26+
*/
27+
void sentry__logs_wait_for_thread_startup(void);
2228
#endif
2329

2430
#endif

src/sentry_sync.h

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -384,6 +384,34 @@ sentry__atomic_fetch(volatile long *val)
384384
return sentry__atomic_fetch_and_add(val, 0);
385385
}
386386

387+
/**
388+
* Compare and swap: atomically compare *val with expected, and if equal,
389+
* set *val to desired. Returns true if the swap occurred.
390+
*
391+
* Uses sequential consistency (ATOMIC_SEQ_CST / InterlockedCompareExchange)
392+
* to ensure all memory operations are visible across threads. This is
393+
* appropriate for thread synchronization and state machine transitions.
394+
*
395+
* Note: The ABA problem (where a value changes A->B->A between reads) is not
396+
* a concern for simple integer-based state machines with monotonic transitions.
397+
*/
398+
static inline bool
399+
sentry__atomic_compare_swap(volatile long *val, long expected, long desired)
400+
{
401+
#ifdef SENTRY_PLATFORM_WINDOWS
402+
# if SIZEOF_LONG == 8
403+
return InterlockedCompareExchange64((LONG64 *)val, desired, expected)
404+
== expected;
405+
# else
406+
return InterlockedCompareExchange((LONG *)val, desired, expected)
407+
== expected;
408+
# endif
409+
#else
410+
return __atomic_compare_exchange_n(
411+
val, &expected, desired, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
412+
#endif
413+
}
414+
387415
struct sentry_bgworker_s;
388416
typedef struct sentry_bgworker_s sentry_bgworker_t;
389417

tests/unit/test_logs.c

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,19 @@ static void
1515
validate_logs_envelope(sentry_envelope_t *envelope, void *data)
1616
{
1717
uint64_t *called = data;
18-
*called += 1;
1918

2019
// Verify we have at least one envelope item
2120
TEST_CHECK(sentry__envelope_get_item_count(envelope) > 0);
2221

23-
// Get the first item and check it's a logs item
22+
// Get the first item and check its type
2423
const sentry_envelope_item_t *item = sentry__envelope_get_item(envelope, 0);
2524
sentry_value_t type_header = sentry__envelope_item_get_header(item, "type");
26-
TEST_CHECK_STRING_EQUAL(sentry_value_as_string(type_header), "log");
25+
const char *type = sentry_value_as_string(type_header);
26+
27+
// Only validate and count log envelopes, skip others (e.g., session)
28+
if (strcmp(type, "log") == 0) {
29+
*called += 1;
30+
}
2731

2832
sentry_envelope_free(envelope);
2933
}
@@ -42,16 +46,15 @@ SENTRY_TEST(basic_logging_functionality)
4246
sentry_options_set_transport(options, transport);
4347

4448
sentry_init(options);
45-
// TODO if we don't sleep, log timer_task might not start in time to flush
46-
sleep_ms(20);
49+
sentry__logs_wait_for_thread_startup();
4750

4851
// These should not crash and should respect the enable_logs option
4952
sentry_log_trace("Trace message");
5053
sentry_log_debug("Debug message");
5154
sentry_log_info("Info message");
5255
sentry_log_warn("Warning message");
5356
sentry_log_error("Error message");
54-
// sleep to finish flush of the first 5, otherwise failed enqueue
57+
// Sleep to allow first batch to flush (testing batch timing behavior)
5558
sleep_ms(20);
5659
sentry_log_fatal("Fatal message");
5760
sentry_close();
@@ -98,6 +101,7 @@ SENTRY_TEST(formatted_log_messages)
98101
sentry_options_set_transport(options, transport);
99102

100103
sentry_init(options);
104+
sentry__logs_wait_for_thread_startup();
101105

102106
// Test format specifiers
103107
sentry_log_info("String: %s, Integer: %d, Float: %.2f", "test", 42, 3.14);

0 commit comments

Comments
 (0)