Skip to content

Commit 062a005

Browse files
authored
Server thread safety (#275)
* WOLFHSM_CFG_THREADSAFE: Adds framework for internal server thread safety, serializing access to shared global resources like NVM and global keycache * Address code review feedback. API/Error handling: - Add initialized flag to whLock structure to distinguish init states - Enhance error handling: acquire/release check initialized flag - Make wh_Lock_Cleanup zero structure for clear post-cleanup state - Document init/cleanup must be single-threaded (no atomics) - Document cleanup preconditions (no active contention required) - Update all API docs with precise return codes and error conditions - Change blocking acquire failure from ERROR_LOCKED to ERROR_ABORTED - Add comment explaining why non-blocking acquire is not provided POSIX port improvements: - Enhanced errno mapping in posix_lock.c (EINVAL→BADARGS, etc) - Trap PTHREAD_MUTEX_ERRORCHECK errors (EDEADLK, EPERM) Test coverage: - Add testUninitializedLock to validate error handling - Enhance testLockLifecycle with post-cleanup validation tests Misc: - Apply consistent critical section style pattern in wh_nvm.c - Update copyright years to 2026 - Rename stress test files to wh_test_posix_threadsafe_stress.* * Address review feedback from rizlik. Extends NVM locking to cert, counter, img_mgr, and nvm modules Adds proper thread-safety locking discipline to additional server modules that perform compound NVM operations. This prevents TOCTOU (Time-Of-Check-Time-Of-Use) issues where metadata could become stale between check and use/writeback. Changes: - wh_server_cert.c: Add NVM locking for atomic GetMetadata + Read operations in certificate read and export paths - wh_server_counter.c: Add NVM locking for atomic read-modify-write counter increment operations - wh_server_img_mgr.c: Add NVM locking for atomic signature load operations - wh_server_keystore.c: Refactor to use unlocked internal variants for compound operations (GetUniqueId + CacheKey, policy check + erase, freshen + export). Add locking discipline documentation. - wh_server_nvm.c: Add NVM locking for DMA read operations to ensure metadata remains valid throughout transfer. Add locking discipline documentation. - wh_test_posix_threadsafe_stress.c: Add new stress test phases for counter concurrent increment, counter increment vs read, NVM read vs resize, NVM concurrent resize, and NVM read DMA vs resize. Add counter atomicity validation. All compound operations now follow the pattern: 1. Acquire server->nvm->lock 2. Use only *Unlocked() variants internally 3. Keep lock held for entire operation including DMA 4. Release lock after all metadata-dependent operations complete * Massive refactor to locking integration. Pull locking out of lower level server module APIs (keystore, NVM, counter, etc.) and aquire lock in request handling functions (e.g. wh_Server_HandleXXXRequest()) * - cleanups, formatting, test housekeeping fixes surrounding macro protection - TSAN options to fail-fast in CI on error * update wh_nvm.h doxygen * add thread safe docs * housekeeping - pull some logic out of locked critical sections * simplify comment * fix return code value * document NVM list concurrency issues * move client intialization before threads are spawned to prevent race on wolfCrypt init and cryptoCb registration * simplify
1 parent 6692649 commit 062a005

31 files changed

+4684
-352
lines changed
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: Multi-threaded Stress Test
2+
3+
on:
4+
push:
5+
branches: [ 'master', 'main', 'release/**' ]
6+
pull_request:
7+
branches: [ '*' ]
8+
9+
jobs:
10+
build:
11+
12+
runs-on: ubuntu-latest
13+
timeout-minutes: 10
14+
15+
steps:
16+
- uses: actions/checkout@v4
17+
18+
# List host CPU info
19+
- name: Host CPU info
20+
run: cat /proc/cpuinfo
21+
22+
# List compiler version
23+
- name: List compiler version
24+
run: gcc --version
25+
26+
# pull and build wolfssl
27+
- name: Checkout wolfssl
28+
uses: actions/checkout@v4
29+
with:
30+
repository: wolfssl/wolfssl
31+
path: wolfssl
32+
33+
# Build and run stress tests with TSAN
34+
- name: Build and run stress tests with TSAN
35+
run: cd test && make clean && make -j THREADSAFE=1 TSAN=1 STRESS=1 DMA=1 SHE=1 WOLFSSL_DIR=../wolfssl && make TSAN=1 run

.github/workflows/build-and-test.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,3 +64,15 @@ jobs:
6464
# Build and test with DEBUG_VERBOSE=1 (includes DEBUG)
6565
- name: Build and test with DEBUG_VERBOSE
6666
run: cd test && make clean && make -j DEBUG_VERBOSE=1 WOLFSSL_DIR=../wolfssl && make run
67+
68+
# Build and test in multithreaded mode with everything enabled
69+
- name: Build and test with THREADSAFE and everything
70+
run: cd test && make clean && make -j THREADSAFE=1 DMA=1 SHE=1 ASAN=1 WOLFSSL_DIR=../wolfssl && make run
71+
72+
# Build and test in multithreaded mode with everything enabled and wolfCrypt tests
73+
- name: Build and test with THREADSAFE and TESTWOLFCRYPT and everything
74+
run: cd test && make clean && make -j THREADSAFE=1 TESTWOLFCRYPT=1 DMA=1 SHE=1 ASAN=1 WOLFSSL_DIR=../wolfssl && make run
75+
76+
# Build and test in multithreaded mode with everything enabled and wolfCrypt tests with dma
77+
- name: Build and test with THREADSAFE and TESTWOLFCRYPT with DMA
78+
run: cd test && make clean && make -j THREADSAFE=1 TESTWOLFCRYPT=1 TESTWOLFCRYPT_DMA=1 DMA=1 SHE=1 ASAN=1 WOLFSSL_DIR=../wolfssl && make run

docs/draft/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Draft Documentation
2+
3+
This directory holds documentation for individual features added to wolfHSM.
4+
5+
Eventually these documents should be refined and incorporated into the "V2"
6+
wolfHSM documentation and the online manual. For now we leave them here so they
7+
can inform code review and still be helpful so users, despite being in "draft"
8+
state.

docs/draft/THREADSAFE.md

Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
# Thread Safety in wolfHSM
2+
3+
## Overview
4+
5+
wolfHSM supports server-side thread safety via the `WOLFHSM_CFG_THREADSAFE` build flag. When enabled, the server can safely process requests from multiple clients concurrently, with each request processing loop running on a separate thread and communicating through its own server context.
6+
7+
The thread safety model serializes access to shared resources like NVM storage and the global key cache using a single lock embedded in the NVM context. Multiple server contexts can safely share a single NVM context, with the lock ensuring atomic access to shared state.
8+
9+
The thread safety feature does NOT imply that a single server context can be shared across threads, as it only serializes internal access to shared resources between multiple server contexts that would occur internally as a consequence of using the server API. Concurrent access to a single server context is not currently supported.
10+
11+
## Build Configuration
12+
13+
Thread safety is enabled when building with `WOLFHSM_CFG_THREADSAFE` defined.
14+
15+
To enable thread safety in the posix test harness, you can build with:
16+
17+
```bash
18+
make -C test THREADSAFE=1
19+
```
20+
21+
When `WOLFHSM_CFG_THREADSAFE` is not defined, all locking operations compile to no-ops with zero overhead.
22+
23+
## What Is Protected
24+
25+
Currently only the global NVM context is protected by a lock, guaranteeing thread safe access to:
26+
27+
- **NVM operations**: All operations on non-volatile storage
28+
- **Global key cache**: Keys marked as global (shared across clients) via `WOLFHSM_CFG_GLOBAL_KEYS`
29+
- **Local key cache**: Per-server key cache operations that must synchronize with NVM
30+
31+
The lock does **not** protect:
32+
33+
- Transport layer operations (each server has its own comm context)
34+
- Per-request scratch memory (allocated per-handler)
35+
- Server context fields that are not shared
36+
37+
## Architecture
38+
39+
```
40+
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
41+
│ Server 1 │ │ Server 2 │ │ Server 3 │
42+
│ (localCache)│ │ (localCache)│ │ (localCache)│
43+
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
44+
│ │ │
45+
└────────────────┼────────────────┘
46+
47+
48+
┌───────────────────┐
49+
│ NVM Context │
50+
│ ┌─────────────┐ │
51+
│ │ Lock │ │
52+
│ ├─────────────┤ │
53+
│ │ Global Keys │ │
54+
│ ├─────────────┤ │
55+
│ │ NVM Backend│ │
56+
│ └─────────────┘ │
57+
└───────────────────┘
58+
```
59+
60+
## Lock Lifecycle
61+
62+
The lock is initialized and cleaned up with the NVM context:
63+
64+
```c
65+
whNvmConfig nvmConfig = {
66+
.cb = &nvmFlashCb,
67+
.context = &flashContext,
68+
.config = &flashConfig,
69+
#ifdef WOLFHSM_CFG_THREADSAFE
70+
.lockConfig = &lockConfig, /* Platform-specific lock config */
71+
#endif
72+
};
73+
74+
wh_Nvm_Init(&nvmContext, &nvmConfig); /* Initializes lock */
75+
/* ... use NVM ... */
76+
wh_Nvm_Cleanup(&nvmContext); /* Cleans up lock */
77+
```
78+
79+
## Request Handler Locking
80+
81+
Request handlers in the server acquire the lock around compound operations. The pattern is:
82+
83+
```c
84+
case SOME_ACTION: {
85+
/* Translate request (no lock needed) */
86+
wh_MessageFoo_TranslateRequest(magic, req_packet, &req);
87+
88+
/* Acquire lock for atomic compound operation */
89+
ret = WH_SERVER_NVM_LOCK(server);
90+
if (ret == WH_ERROR_OK) {
91+
/* Perform work while holding lock */
92+
ret = wh_Server_KeystoreXxx(server, ...);
93+
if (ret == WH_ERROR_OK) {
94+
ret = wh_Nvm_Xxx(server->nvm, ...);
95+
}
96+
97+
/* Release lock */
98+
(void)WH_SERVER_NVM_UNLOCK(server);
99+
}
100+
resp.rc = ret;
101+
102+
/* Translate response (no lock needed) */
103+
wh_MessageFoo_TranslateResponse(magic, &resp, resp_packet);
104+
}
105+
```
106+
107+
Key points:
108+
- Lock acquired **after** request translation
109+
- Lock released **before** response translation
110+
- All NVM and keystore operations within the lock are atomic
111+
- The lock ensures multi-step operations (e.g., check-then-modify) are not interleaved
112+
113+
## Server-Side Development
114+
115+
When developing server-side code that accesses shared resources outside the request handling pipeline, you must manually acquire the lock.
116+
117+
### Using the Server Lock Macros
118+
119+
```c
120+
int my_server_function(whServerContext* server)
121+
{
122+
int ret;
123+
124+
ret = WH_SERVER_NVM_LOCK(server);
125+
if (ret != WH_ERROR_OK) {
126+
return ret;
127+
}
128+
129+
/* Access NVM or global keystore while holding lock */
130+
ret = wh_Nvm_Read(server->nvm, id, offset, len, buffer);
131+
if (ret == WH_ERROR_OK) {
132+
ret = wh_Server_KeystoreCacheKey(server, &meta, keyData);
133+
}
134+
135+
(void)WH_SERVER_NVM_UNLOCK(server);
136+
return ret;
137+
}
138+
```
139+
140+
### Using the NVM Lock Macros Directly
141+
142+
If you only have access to the NVM context:
143+
144+
```c
145+
int my_nvm_function(whNvmContext* nvm)
146+
{
147+
int ret;
148+
149+
ret = WH_NVM_LOCK(nvm);
150+
if (ret != WH_ERROR_OK) {
151+
return ret;
152+
}
153+
154+
/* Access NVM while holding lock */
155+
ret = wh_Nvm_GetMetadata(nvm, id, &meta);
156+
if (ret == WH_ERROR_OK) {
157+
ret = wh_Nvm_Read(nvm, id, 0, meta.len, buffer);
158+
}
159+
160+
(void)WH_NVM_UNLOCK(nvm);
161+
return ret;
162+
}
163+
```
164+
165+
### Lock Macro Behavior
166+
167+
| Macro | THREADSAFE defined | THREADSAFE not defined |
168+
|-------|-------------------|------------------------|
169+
| `WH_SERVER_NVM_LOCK(server)` | Calls `wh_Server_NvmLock()` | Returns `WH_ERROR_OK` |
170+
| `WH_SERVER_NVM_UNLOCK(server)` | Calls `wh_Server_NvmUnlock()` | Returns `WH_ERROR_OK` |
171+
| `WH_NVM_LOCK(nvm)` | Calls `wh_Nvm_Lock()` | Returns `WH_ERROR_OK` |
172+
| `WH_NVM_UNLOCK(nvm)` | Calls `wh_Nvm_Unlock()` | Returns `WH_ERROR_OK` |
173+
174+
Using these macros ensures code compiles and runs correctly regardless of whether thread safety is enabled.
175+
176+
## Platform-Specific Lock Implementation
177+
178+
The lock abstraction is a generic interface that relies on callbacks for the actual implementation, allowing platform-specific implementations. wolfHSM provides a reference POSIX implementation using pthreads for use in the POSIX port:
179+
180+
```c
181+
#include "port/posix/posix_lock.h"
182+
183+
static posixLockContext lockCtx;
184+
static const whLockCb lockCb = POSIX_LOCK_CB;
185+
186+
whLockConfig lockConfig = {
187+
.cb = &lockCb,
188+
.context = &lockCtx,
189+
.config = NULL, /* Use default mutex attributes */
190+
};
191+
```
192+
193+
To implement for another platform, you can implement your own callbacks matching the `whLockCb` interface:
194+
195+
```c
196+
typedef struct whLockCb_t {
197+
whLockInitCb init; /* Initialize lock resources */
198+
whLockCleanupCb cleanup; /* Free lock resources */
199+
whLockAcquireCb acquire; /* Acquire exclusive lock (blocking) */
200+
whLockReleaseCb release; /* Release exclusive lock */
201+
} whLockCb;
202+
```
203+
204+
## Testing Thread Safety on the POSIX port
205+
206+
Run the standard test suite with DMA, SHE, and thread safety enabled
207+
208+
```bash
209+
make -C test clean && make -j -C test DMA=1 THREADSAFE=1 && make -C test run
210+
```
211+
212+
Run the multithreaded stress test with the same functionality under ThreadSanitizer to detect data races:
213+
214+
```bash
215+
make -C test clean && make -j -C test STRESS=1 TSAN=1 DMA=1 SHE=1 THREADSAFE=1 && make -C test run TSAN=1
216+
```
217+
218+
The stress test runs multiple client threads against multiple server contexts sharing a single NVM context, exercising contention patterns across keystore, NVM, counter, and certificate APIs.

0 commit comments

Comments
 (0)