Summary
OlmMachine occasionally emits a KeysUploadRequest containing a one-time-key whose signed_curve25519:<id> collides with an OTK ID already published earlier in the same session. Synapse rejects the duplicate with HTTP 400:
{ "errcode": "M_INVALID_PARAM", "error": "signed_curve25519:AAAAAAAAAA0 already exists" }
matrix-js-sdk's requestWithRetry does not retry 4xx other than 429, so the bootstrap call surfaces the 400 directly. bootstrapCrossSigning() then fails outright and the affected account is permanently stuck — every retry deterministically reproduces the same colliding ID.
Environment
|
|
@matrix-org/matrix-sdk-crypto-nodejs |
0.5.1 |
@matrix-org/matrix-sdk-crypto-wasm |
18.2.0 |
| matrix-js-sdk |
41.4.0-rc.0 |
| Node |
24.x (also reproduces on 22.x) |
| Homeserver |
Synapse 1.145.0+ess.1 (ESS, MAS-fronted) |
| Auth |
Personal Access Token issued by MAS |
Reproduction
- Bring up a bot account using
matrix-sdk-crypto-nodejs with cross-signing.
- Run a cross-signing bootstrap that completes far enough to publish OTKs.
- Re-run bootstrap (e.g. after a transient failure earlier in the chain — for us it was a MAS UIA stage we now route around). The second run hits the collision deterministically.
We see the same OTK ID (AAAAAAAAAA0) re-emitted across attempts, suggesting the OTK ID counter inside the persisted account state is being seeded from saved state but not advanced past previously-published-but-not-yet-claimed IDs. (Speculative — happy to capture tracing=debug logs of the colliding KeysUploadRequest body alongside the prior session's /keys/upload response if useful.)
Why this matters
POST /keys/upload is on the critical path for every E2EE bootstrap. The 400 is not retried by the upstream SDK and bootstrapCrossSigning does not regenerate keys on its own, so affected bot accounts cannot complete cross-signing — the room device list never sees a properly-signed device, and decryption falls back to UTD on every encrypted event.
Workaround we shipped
In openclaw we ship a transport-layer mitigation while the long-term fix is figured out: rewrite the precise signed_curve25519:<id> already exists 400 on POST /keys/upload to a synthetic 200 {"one_time_key_counts":{}}. The empty counts cause OlmMachine to mint fresh OTK IDs on the next outgoing-request tick, which then upload successfully. Filed as openclaw/openclaw#74529 — happy to link / share the patch if it's useful for reproduction.
The workaround unblocks the user-visible failure but doesn't fix the underlying ID-tracking issue. The right place for the long-term fix is here.
What I'd want from upstream
A recheck of the OTK ID generation/tracking path inside OlmMachine — specifically whether the counter state survives session restart correctly and whether there's a window where the SDK can re-emit an already-uploaded ID before mark_request_as_sent advances state. If there's a known issue I missed, happy to close this as a duplicate.
I can capture tracing=debug logs and the colliding request body on request — let me know what would be most useful.
Summary
OlmMachineoccasionally emits aKeysUploadRequestcontaining a one-time-key whosesigned_curve25519:<id>collides with an OTK ID already published earlier in the same session. Synapse rejects the duplicate with HTTP 400:{ "errcode": "M_INVALID_PARAM", "error": "signed_curve25519:AAAAAAAAAA0 already exists" }matrix-js-sdk'srequestWithRetrydoes not retry 4xx other than 429, so the bootstrap call surfaces the 400 directly.bootstrapCrossSigning()then fails outright and the affected account is permanently stuck — every retry deterministically reproduces the same colliding ID.Environment
@matrix-org/matrix-sdk-crypto-nodejs@matrix-org/matrix-sdk-crypto-wasmReproduction
matrix-sdk-crypto-nodejswith cross-signing.We see the same OTK ID (
AAAAAAAAAA0) re-emitted across attempts, suggesting the OTK ID counter inside the persisted account state is being seeded from saved state but not advanced past previously-published-but-not-yet-claimed IDs. (Speculative — happy to capturetracing=debuglogs of the collidingKeysUploadRequestbody alongside the prior session's/keys/uploadresponse if useful.)Why this matters
POST /keys/uploadis on the critical path for every E2EE bootstrap. The 400 is not retried by the upstream SDK andbootstrapCrossSigningdoes not regenerate keys on its own, so affected bot accounts cannot complete cross-signing — the room device list never sees a properly-signed device, and decryption falls back to UTD on every encrypted event.Workaround we shipped
In
openclawwe ship a transport-layer mitigation while the long-term fix is figured out: rewrite the precisesigned_curve25519:<id> already exists400 on POST/keys/uploadto a synthetic200 {"one_time_key_counts":{}}. The empty counts causeOlmMachineto mint fresh OTK IDs on the next outgoing-request tick, which then upload successfully. Filed as openclaw/openclaw#74529 — happy to link / share the patch if it's useful for reproduction.The workaround unblocks the user-visible failure but doesn't fix the underlying ID-tracking issue. The right place for the long-term fix is here.
What I'd want from upstream
A recheck of the OTK ID generation/tracking path inside
OlmMachine— specifically whether the counter state survives session restart correctly and whether there's a window where the SDK can re-emit an already-uploaded ID beforemark_request_as_sentadvances state. If there's a known issue I missed, happy to close this as a duplicate.I can capture
tracing=debuglogs and the colliding request body on request — let me know what would be most useful.