Skip to content

Commit 91e4f1a

Browse files
committed
Added copilot instructions and copilot generated doc around locking
1 parent 5fdef27 commit 91e4f1a

2 files changed

Lines changed: 277 additions & 0 deletions

File tree

.github/copilot-instructions.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# ESPAsyncWebServer — Copilot Instructions
2+
3+
## Project Overview
4+
5+
ESP32Async fork of ESPAsyncWebServer — an async HTTP and WebSocket server library for ESP32 (Arduino and ESP-IDF) and ESP8266. Built on top of AsyncTCP, it handles requests and WebSocket frames in the TCP task context without blocking the main loop.
6+
7+
- **License**: LGPL-3.0-or-later
8+
- **Build systems**: PlatformIO (primary), Arduino IDE, ESP-IDF component
9+
- **Documentation site**: <https://ESP32Async.github.io/ESPAsyncWebServer/>
10+
11+
## Folder Structure
12+
13+
```
14+
src/ # Library source code
15+
ESPAsyncWebServer.h # Main header (includes, config, mutex types, base classes)
16+
AsyncWebSocket.{h,cpp} # WebSocket server + per-client state machine
17+
AsyncEventSource.{h,cpp} # Server-Sent Events (SSE) implementation
18+
AsyncJson.{h,cpp} # ArduinoJson request/response helpers
19+
AsyncMessagePack.h # MessagePack response helper
20+
Middleware.cpp # Middleware chain (auth, CORS, rate-limit, etc.)
21+
WebServer.cpp # AsyncWebServer core (routing, rewrites, handlers)
22+
WebRequest.cpp # HTTP request parsing (low-level, called from TCP callbacks)
23+
AsyncWebServerRequest.cpp # AsyncWebServerRequest public API
24+
WebResponses.cpp # Response implementations (basic, chunked, file, stream)
25+
WebHandlers.cpp # Built-in handlers (static file, upload, body)
26+
WebHandlerImpl.h # Handler class definitions
27+
WebResponseImpl.h # Response class definitions
28+
WebAuthentication.{h,cpp} # Digest/Basic auth helpers + SHA1/Base64
29+
BackPort_SHA1Builder.{h,cpp}# SHA1 for platforms without it
30+
ChunkPrint.{h,cpp} # Print adapter for chunked output
31+
AsyncWebServerLogging.h # Log macros (async_ws_log_v, etc.)
32+
AsyncWebServerVersion.h # Version constants
33+
literals.h # Shared string literals
34+
35+
docs/ # MkDocs documentation site (user-facing)
36+
index.md # Landing page
37+
installation.md # Install & dependency setup
38+
setup.md # Server setup and configuration
39+
configuration.md # Compile-time options
40+
routing.md # URL routing, rewrites, path params
41+
requests.md # Request object API
42+
responses.md # Response types and API
43+
websockets.md # WebSocket API and events
44+
eventsource.md # SSE API
45+
middleware.md # Middleware system
46+
filters.md # Request filters
47+
static-files.md # Serving files from SPIFFS/LittleFS
48+
concepts.md # Architecture and design concepts
49+
LOCKING_RULES_FOR_CLIENT.md # ⚠ Internal: _client pointer ownership & locking rules
50+
51+
examples/
52+
arduino/ # Arduino sketches (one folder per example)
53+
idf_component/ # ESP-IDF component examples
54+
pioarduino/ # pioarduino-specific examples
55+
arduino_emulator/ # Host-side build for CI (PosixAsyncTCP, no real hardware)
56+
57+
.github/
58+
workflows/ # CI pipelines (PlatformIO builds, host emulator build)
59+
scripts/ # CI helper scripts
60+
ISSUE_TEMPLATE/ # GitHub issue templates
61+
```
62+
63+
## Key Architecture Concepts
64+
65+
### Threading Model
66+
67+
All AsyncTCP callbacks (`onData`, `onAck`, `onPoll`, `onDisconnect`, `onTimeout`, `onError`) run on the **TCP task** (typically core 0 on ESP32). User application code calling the WebSocket/HTTP API runs on **a different task** (typically `loop()` on core 1). This means shared state is accessed from two different contexts.
68+
69+
### Mutex Configuration
70+
71+
Defined in `src/ESPAsyncWebServer.h` inside namespace `asyncsrv`:
72+
73+
- **ESP32 / HOST**: `ASYNCWEBSERVER_USE_MUTEX=1``mutex_type = std::recursive_mutex`
74+
- **ESP8266**: `ASYNCWEBSERVER_USE_MUTEX=0``mutex_type = null_mutex` (no-op, single-threaded)
75+
76+
Lock/guard types: `asyncsrv::lock_guard_type`, `asyncsrv::unique_lock_type`.
77+
78+
### WebSocket Locking (see [docs/LOCKING_RULES_FOR_CLIENT.md](../docs/LOCKING_RULES_FOR_CLIENT.md))
79+
80+
Two lock scopes exist:
81+
82+
| Lock | Scope | Guards |
83+
|------|-------|--------|
84+
| `_queue_lock` | Per `AsyncWebSocketClient` | `_controlQueue`, `_messageQueue`, and writes to `_client` pointer |
85+
| `_ws_clients_lock` | Per `AsyncWebSocket` (server) | `_clients` list (add/remove/iterate) |
86+
87+
**Critical rules**:
88+
89+
1. `_queue_lock` guards **queues** primarily. The `_client` pointer is nulled under `_queue_lock` in `_onDisconnect()` so that queue methods that check `_client` see a consistent value.
90+
2. Never hold `_queue_lock` when invoking user callbacks (`_handleEvent`, `_handleDisconnect`). Release first.
91+
3. Before unlocking and calling `_client->close()`, capture the pointer into a local variable — `_client->close()` triggers `_onDisconnect → _handleDisconnect → erase → ~AsyncWebSocketClient`, destroying `this`.
92+
4. User-facing methods (`close`, `remoteIP`, `remotePort`, `_onTimeout`) use a **local-capture pattern**: `AsyncClient *c = _client; if (!c) return; c->method();` to avoid TOCTOU races on the member.
93+
5. `_status` is a plain `AwsClientStatus` enum, currently not guarded by a mutex. Reads may be stale but downstream operations have their own locks.
94+
95+
### Host Emulator Build
96+
97+
`examples/arduino_emulator/` contains a CMake project that builds the library against [Arduino-Emulator](https://github.com/mathieucarbou/Arduino-Emulator) and [PosixAsyncTCP](https://github.com/MitchBradley/PosixAsyncTCP) for host-side compilation and CI testing without real hardware. CI clones dependencies into `.ci/`.
98+
99+
## Coding Conventions
100+
101+
- C++17 assumed on ESP32; C++11 on ESP8266
102+
- Use `async_ws_log_v` / `async_ws_log_w` / `async_ws_log_e` for logging (maps to `log_v`/`log_w`/`log_e` or no-op)
103+
- Use `asyncsrv::lock_guard_type` / `asyncsrv::unique_lock_type` for locking — never raw `std::mutex`
104+
- Prefer `std::deque` for queues, `std::list` for client collections
105+
- Format specifiers: use `PRIu32`, `PRIu64`, `PRIu8`, `PRIi8` — cross-platform safe
106+
- All examples live in `examples/arduino/<Name>/<Name>.ino` (one `.ino` per folder)
107+
- PlatformIO envs: `arduino-2`, `arduino-3`, `esp8266`, `raspberrypi`

docs/LOCKING_RULES_FOR_CLIENT.md

Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
# AsyncWebSocketClient Safe Usage Rules for `_client` Pointer
2+
3+
## Current Ownership Model
4+
- `AsyncClient* _client` is acquired in constructor via `AsyncWebServerRequest::clientRelease()`
5+
- Ownership is transferred to AsyncWebSocketClient
6+
- Destruction is triggered by AsyncClient's onDisconnect callback: `delete c;` in lambda at src/AsyncWebSocket.cpp:270
7+
- After destruction signal, `_client` is nulled in `_onDisconnect()` at src/AsyncWebSocket.cpp:561
8+
- **Key**: AsyncWebSocketClient is NOT the sole owner; the AsyncClient destructor is externally managed
9+
10+
## Current Lock Architecture
11+
- `_queue_lock` (recursive_mutex): Guards **only** `_controlQueue` and `_messageQueue`
12+
- **NOT** intended to guard `_client` pointer itself (per comment in locking branch)
13+
- Each execution context (onData, onPoll, onAck, onTimeout, user API calls) may run on different threads
14+
15+
## Safe Access Rules
16+
17+
### 1. Construction Phase (Constructor)
18+
**Context**: Called from AsyncWebSocket::_newClient under `_ws_clients_lock`
19+
**Rule**: Safe to set up all callback handlers on `_client` without synchronization, as no other thread can yet access this object
20+
```cpp
21+
// src/AsyncWebSocket.cpp:248-295
22+
_client->setRxTimeout(0);
23+
_client->onError(...);
24+
_client->onAck(...);
25+
_client->onDisconnect(...); // <-- This lambda will call delete c; at line 270
26+
_client->onTimeout(...);
27+
_client->onData(...);
28+
_client->onPoll(...);
29+
```
30+
31+
### 2. AsyncClient Callback Context (_onData, _onPoll, _onAck, _onTimeout, _onError)
32+
**Context**: Runs from AsyncClient's internal task (typically AsyncTCP/same core)
33+
**Rule**:
34+
- These methods are called FROM AsyncClient; they are NOT called from application/user context
35+
- Only AsyncClient callbacks can change `_client` state to null (via onDisconnect callback)
36+
- **Within these methods**, if you read `_client`, it won't become null mid-execution (single-threaded from client perspective)
37+
- **BUT**: If you release `_queue_lock` and then dereference `_client` again, or call outward to user callbacks while holding the lock, risk of re-entrancy or callback-triggered disconnect increases
38+
39+
**Current violations in PR #424**:
40+
- `_onDisconnect()` at line 556-562: Holds `_queue_lock`, calls `_server->_handleDisconnect(this)` which invokes user event callback
41+
- If user callback calls back into WebSocket API, lock-order inversion or re-entrancy can occur
42+
- **Fix needed**: Release `_queue_lock` before invoking user callback
43+
44+
- `_onData()` at line 565+: Captures `const AwsClientStatus client_status = status()` at line 569
45+
- `status()` method **currently unguarded** (returns `_status` directly without lock)
46+
- Later at line 691-700 acquires lock and uses `_client`
47+
- **Fix needed**: Ensure `_client` null-checks are done under the same lock that prevents concurrent null-ing
48+
49+
### 3. User-Facing Methods (ping, text, binary, close)
50+
**Context**: Called from application code (arbitrary thread)
51+
**Rule**:
52+
- **MUST NOT** directly dereference `_client` without synchronization
53+
- **MUST** not read `_client` once, release lock, then use it
54+
- **Pattern**: Only through `find_connected_client_locked()` helper which is called under `_ws_clients_lock`
55+
56+
**Current violations in PR #424**:
57+
- `close()` at src/AsyncWebSocket.cpp:500-536:
58+
- Line 502-507: Acquires `_queue_lock`, checks and sets `_status`
59+
- Line 532: **Direct dereference `_client->abort()` OUTSIDE any lock after releasing `_queue_lock`**
60+
- Risk: `_onDisconnect()` could run concurrently and null `_client`, causing UAF
61+
- **Fix needed**: Either re-lock before `_client->abort()`, or capture `_client` pointer under lock and null-check outside
62+
63+
### 4. Queue Operations (_queueControl, _queueMessage, _runQueue)
64+
**Context**: Called from both AsyncClient callbacks and user API
65+
**Rule**:
66+
- `_queueControl()` at line 448: Acquires `_queue_lock`
67+
- Line 457: Reads `_client && _client->canSend()`
68+
- **Risk**: `_client` can be nulled by concurrent `_onDisconnect()`
69+
- **Fix needed**: `_client` null check must be atomic with state update, or re-check after lock re-acquire
70+
71+
- `_runQueue()` at line 404+: Called from within locked context
72+
- Line 366 and 457: Dereferences `_client->canSend()`
73+
- Assumes caller holds `_queue_lock`
74+
- **Safe IF** caller ensures no unlock between lock acquire and `_runQueue()` call
75+
76+
### 5. Disconnect Path (_onDisconnect, _handleDisconnect)
77+
**Context**: Called from AsyncClient::onDisconnect callback (AsyncTCP task context)
78+
**Rule**:
79+
- `_onDisconnect()` at line 556-562:
80+
- Acquires `_queue_lock`
81+
- Sets `_status = WS_DISCONNECTED` (should be under lock; currently unguarded until explicit `status()` lock added)
82+
- Nulls `_client`
83+
- Calls `_server->_handleDisconnect(this)` which fires user event under the lock
84+
- **Problem**: User callback can call back into WebSocket APIs, causing re-entrancy and potential deadlock
85+
- **Fix needed**: Unlock before invoking `_handleDisconnect()`
86+
87+
### 6. Cleanup Context (cleanupClients)
88+
**Context**: Called from application event loop (user's cleanup cadence)
89+
**Rule**:
90+
- Acquires `_ws_clients_lock` (server-level lock)
91+
- Calls `close()` on connected clients at src/AsyncWebSocket.cpp:1073
92+
- Splices deleted clients into temp list to destroy after releasing lock (good pattern)
93+
- But `close()` method has the UAF issue at line 532
94+
95+
## Correct Safe Pattern for _client Dereference
96+
97+
```cpp
98+
// Pattern 1: Short-lived operation under lock (AsyncClient callback context)
99+
void AsyncWebSocketClient::_onData_SAFE(...) {
100+
// We are already running from AsyncClient callback; no concurrent nulling from _onData path
101+
// But be careful: _onDisconnect can still run if network tears down
102+
asyncsrv::lock_guard_type lock(_queue_lock);
103+
if (_client) {
104+
// Safe to dereference once check passes, because we're in callback and null only happens from _onDisconnect
105+
_client->send_something();
106+
}
107+
}
108+
109+
// Pattern 2: Capture under lock, use outside lock (user API context)
110+
bool AsyncWebSocketClient::ping_SAFE(const uint8_t *data, size_t len) {
111+
AsyncClient* client_snapshot = nullptr;
112+
{
113+
asyncsrv::lock_guard_type lock(_queue_lock);
114+
// Check both pointer and state under lock
115+
if (_client && _status == WS_CONNECTED) {
116+
client_snapshot = _client;
117+
}
118+
}
119+
// Now, outside lock, use snapshot
120+
// Risk: snapshot could become invalid, but at worst we call a dead pointer
121+
if (client_snapshot) {
122+
return _queuePing(...); // safer to queue it
123+
}
124+
return false;
125+
}
126+
127+
// Pattern 3: Lock around entire operation including callback (safest for short ops)
128+
void AsyncWebSocketClient::close_SAFE(uint16_t code, const char *message) {
129+
AsyncClient* client_to_close = nullptr;
130+
{
131+
asyncsrv::lock_guard_type lock(_queue_lock);
132+
if (_status != WS_CONNECTED) {
133+
return;
134+
}
135+
_status = WS_DISCONNECTING;
136+
client_to_close = _client;
137+
// Do NOT null _client here; let _onDisconnect do it
138+
}
139+
140+
// Now outside lock, we can safely call client_to_close because:
141+
// 1. We captured it under lock
142+
// 2. It may be freed by the onDisconnect callback, but that's AsyncClient's responsibility
143+
// 3. We do NOT dereference unless we re-check _client under lock
144+
145+
async_ws_log_w("[%s][%" PRIu32 "] CLOSE", _server->url(), _clientId);
146+
147+
if (code) {
148+
// ... queue close frame ...
149+
}
150+
}
151+
```
152+
153+
## Summary of Current PR #424 Issues
154+
155+
| Issue | Location | Risk | Fix |
156+
|-------|----------|------|-----|
157+
| _client deref outside lock | close() L532 | UAF if _onDisconnect happens | Capture under lock or re-check |
158+
| User callback in _onDisconnect under lock | L561->_handleDisconnect | Re-entrancy/deadlock | Unlock before callback |
159+
| status() call in _onData | L569 | Unguarded read of _status | Guard with per-client lock or atomic |
160+
| _client reads in _queueControl | L457 | Concurrent null from _onDisconnect | Add null-check pattern or atomic |
161+
162+
## Recommended Next Steps
163+
164+
1. **Separate concerns**: Use `_queue_lock` ONLY for queues, not for `_client` state
165+
2. **Add per-client coordination**: Either:
166+
- A second mutex for `_client` and `_status` safety, or
167+
- Make `_status` atomic<AwsClientStatus> if available
168+
3. **Unlock critical sections before user callbacks**: Always release locks before invoking `_handleEvent()` and user-registered callbacks
169+
4. **Document ownership explicitly**: Add comments stating `_client` is borrowed, not owned, and lifecycle assumptions
170+
5. **Consider async-safe patterns**: Queue disconnect events and process them later outside callback context, rather than invoking user code from within stack of AsyncClient callbacks

0 commit comments

Comments
 (0)