Session Management: add server-side storage subsection (#1153)#2133
Session Management: add server-side storage subsection (#1153)#21330xBassia wants to merge 4 commits into
Conversation
Adds a 'Server-Side Session ID Storage' subsection covering the issue raised in OWASP#1153: how to store the session ID server-side so that a database disclosure does not directly yield active session tokens. Aligns with the Argon2id position from the discussion and cross-links to the Password Storage Cheat Sheet.
|
|
||
| - The cookie carries the random session ID (CSPRNG, at least 128 bits, as covered above). | ||
| - The server-side store keeps a one-way derivative such as `hash(session_id)` keyed alongside the session metadata. On each request, the server hashes the presented ID and looks up the matching record using a constant-time comparison. | ||
| - Use a memory-hard KDF such as Argon2id when the threat model includes full database disclosure (see [Password Storage Cheat Sheet](Password_Storage_Cheat_Sheet.md)). For most deployments, an HMAC over the session ID with a per-server secret achieves a similar property at far lower per-request cost. |
There was a problem hiding this comment.
- Consider reversing the order - lead with HMAC as the practical default for most deployments, and present Argon2id as the elevated-threat-model option, since listing Argon2id first may cause readers to treat it as the baseline recommendation despite its significant per-request cost
- I think we need to add clarification when HMAC will be defeated vs. Argon note such as: "This protects against database-level exfiltration only; it does not protect if the application server itself is compromised."
There was a problem hiding this comment.
Thanks for the careful read. Pushed 32663f0:
- Reordered: HMAC is now the practical default. Argon2id is the elevated-threat-model option.
- Added the limit clarification you suggested on the HMAC bullet. It protects against database-level exfiltration only (backups, replicas, log dumps, snapshots), and does not protect if the application server itself is compromised, since the attacker can read the HMAC key and forge derivatives.
- Reframed Argon2id around the case where you want defence-in-depth against an attacker who has both the DB and the HMAC key, and called out the per-request cost so readers do not pick it for high-volume apps by default.
Address review feedback on OWASP#2133: lead with HMAC as the practical default and call out that it protects against database-level exfiltration only, not application-server compromise. Move Argon2id to the elevated-threat-model bullet so readers do not treat the high-cost option as the baseline.
|
Before we approve this I would politely like to get a review from @Sc00bz who has been the lead scientist and maintainer on this topic. Thank you! |
|
|
||
| ### Server-Side Session ID Storage | ||
|
|
||
| The random session ID sent to the client should not be stored verbatim in the server-side session repository. If that store is exposed through a backup, replica, log, or stolen database snapshot, every active session ID becomes immediately usable to impersonate the corresponding user. Treat session IDs in storage the same way you treat password material: persist only a value that lets the server *verify* the presented ID without recovering it. |
There was a problem hiding this comment.
Agree to that.
However, there is a underlying asumption (in the whole cheat sheet actually): the session ID is assumed to be all which is needed to actually allow access to the session. If this does not hold (eg. the session identifier is combined with some signature or some verifier), then the session identifier can be stored verbatim. In other words, it can be useful to have a session identifier which only identifies the session but does not grant access to the session.
| Recommended pattern: | ||
|
|
||
| - The cookie carries the random session ID (CSPRNG, at least 128 bits, as covered above). | ||
| - The server-side store keeps a one-way derivative such as `hash(session_id)` keyed alongside the session metadata. On each request, the server hashes the presented ID and looks up the matching record using a constant-time comparison. |
There was a problem hiding this comment.
Explicit says that, you should use a cryptographically secure hash function.
| Recommended pattern: | ||
|
|
||
| - The cookie carries the random session ID (CSPRNG, at least 128 bits, as covered above). | ||
| - The server-side store keeps a one-way derivative such as `hash(session_id)` keyed alongside the session metadata. On each request, the server hashes the presented ID and looks up the matching record using a constant-time comparison. |
There was a problem hiding this comment.
looks up the matching record using a constant-time comparison
Can we really make sure this lookup is constant-time?
There was a problem hiding this comment.
No and it is not needed. (see my main comment)
|
|
||
| - The cookie carries the random session ID (CSPRNG, at least 128 bits, as covered above). | ||
| - The server-side store keeps a one-way derivative such as `hash(session_id)` keyed alongside the session metadata. On each request, the server hashes the presented ID and looks up the matching record using a constant-time comparison. | ||
| - For most deployments, use an HMAC over the session ID with a per-server secret as the practical default. This protects against database-level exfiltration only (backups, replicas, log dumps, stolen snapshots); it does not protect if the application server itself is compromised, since the attacker can read the HMAC key and forge derivatives directly. |
There was a problem hiding this comment.
use an HMAC over the session ID with a per-server secret as the practical default.
Not sure what you are saying here. Are you saying (A) that actually, I should store HMAC(session_id,server_key) server-side instead of hash(session_id)?
Or are you saying (B) that, actually I should store session_id || HMAC(session_id,server_key) in the cookie? (and session_id on the server-side)
it does not protect if the application server itself is compromised, since the attacker can read the HMAC key and forge derivatives directly.
I believe this only make sense in (B).
| - The cookie carries the random session ID (CSPRNG, at least 128 bits, as covered above). | ||
| - The server-side store keeps a one-way derivative such as `hash(session_id)` keyed alongside the session metadata. On each request, the server hashes the presented ID and looks up the matching record using a constant-time comparison. | ||
| - For most deployments, use an HMAC over the session ID with a per-server secret as the practical default. This protects against database-level exfiltration only (backups, replicas, log dumps, stolen snapshots); it does not protect if the application server itself is compromised, since the attacker can read the HMAC key and forge derivatives directly. | ||
| - For elevated threat models, where you want defence-in-depth against an attacker who has both the database and the HMAC key, use a memory-hard KDF such as Argon2id (see [Password Storage Cheat Sheet](Password_Storage_Cheat_Sheet.md)). It removes the single-key shortcut at a significant per-request cost, so it is not the right default for high-volume applications. |
There was a problem hiding this comment.
This is going to be slow as hell, right? Anyone doing that?
You can not do a constant-time compare when looking up a hash of a session ID in a database without the cookie also having an ID (like user ID+device ID or a global autoincrement ID) that specifies the row. This leaks info about number of sessions and might cause bugs with just looking up that ID and skipping the hash check. Also this attack is harder than the following attack on multiple session ID hashes.
HMAC is overkill. You cannot crack a 128 bit random value. This only helps to avoid the situation where every website in the world uses the same method and all of the world's session databases leak. An attacker will get like 2^48 session ID hashes thus it cost like 2^80 to crack just one random session ID. Which is beyond all but the most funded organizations and the payout is just a random session among the 2^48 sessions.
Key stretching is overkill. You cannot crack a 128 bit random value. If you are worried about attacking multiple session ID hashes, then increase the session ID to 192 bits. This gives you 112 bit security for the case of everyone on earth has 128 trillion sessions and all session databases are publicly readable. This should mention that you should hash the encoded session ID instead of decoding and hashing. Also optionally use a constant-time encoding for the session ID when it is generated. Note that the hash of the session ID does not need to be constant-time encoded because the session ID is a secret but the hash of it can be considered public. The session ID is only encoded once so an attacker's ability is super limited. I only mention constant-time encoding because this attack is more feasible than above attacks. But you really don't need to worry about it. Oh one last thing the hash output can be truncated to the same bit size of the session ID. There's no point in storing a full SHA-512 hash in a database. But the wording will need to be precise to avoid overly truncating because hash functions in some languages return hex encodings and people can get confused. Also issues with double encoding and there is no real damage in storing 128 hex digits instead of 32 hex digits. |
|
Hi @Sc00bz, thanks for the detailed read. You're right on every point. The hash-like-a-password framing assumes session IDs and passwords have similar entropy, and 128-bit random ones don't, so the math doesn't work the same way. HMAC and Argon2id don't add real security on top of a 128-bit CSPRNG ID, and the "hash + constant-time lookup" pattern is structurally awkward since it needs a separate row-ID column anyway, which leaks session count as you point out. Couple of options on direction, your call:
Either is fine, and if there's a third direction I'm missing, name it. AI disclosure: I wrote the content. AI was used only as a review pass on the wording. Mohamed |
| - Cookie carries the random ID (CSPRNG, 128 bits minimum, per the section above). | ||
| - Server stores `H(session_id)` alongside the session record, and on each request it hashes the cookie value and does a constant-time compare. | ||
|
|
||
| For the choice of `H`, an HMAC over the session ID with a per-server secret is the practical default. It's cheap, it covers the database-exfil paths most teams actually face (backups, replicas, dumps, log captures), and it doesn't add latency at scale. What it doesn't cover is a full application-server compromise, because at that point the attacker reads the key and recomputes the derivative directly. |
There was a problem hiding this comment.
As I asked before, it it really useful to use a HMAC and not a secure hash?
Can we just suggest using a secure hash in order to keep things simple? I would suggest focusing on secure hash (and mention HMAC as an alternative solution if you wish).
If you want to do a HMAC, you then have to talk about secret rotation, secure secret storage, etc. and I am not sure that the HMAC really adds anything useful.
| - Cookie carries the random ID (CSPRNG, 128 bits minimum, per the section above). | ||
| - Server stores `H(session_id)` alongside the session record, and on each request it hashes the cookie value and does a constant-time compare. | ||
|
|
||
| For the choice of `H`, an HMAC over the session ID with a per-server secret is the practical default. It's cheap, it covers the database-exfil paths most teams actually face (backups, replicas, dumps, log captures), and it doesn't add latency at scale. What it doesn't cover is a full application-server compromise, because at that point the attacker reads the key and recomputes the derivative directly. |
There was a problem hiding this comment.
"database-exfil" -> "database exfiltration"?
| - Cookie carries the random ID (CSPRNG, 128 bits minimum, per the section above). | ||
| - Server stores `H(session_id)` alongside the session record, and on each request it hashes the cookie value and does a constant-time compare. | ||
|
|
||
| For the choice of `H`, an HMAC over the session ID with a per-server secret is the practical default. It's cheap, it covers the database-exfil paths most teams actually face (backups, replicas, dumps, log captures), and it doesn't add latency at scale. What it doesn't cover is a full application-server compromise, because at that point the attacker reads the key and recomputes the derivative directly. |
There was a problem hiding this comment.
at that point the attacker reads the key and recomputes the derivative directly
I still don't get it.
|
|
||
| For the choice of `H`, an HMAC over the session ID with a per-server secret is the practical default. It's cheap, it covers the database-exfil paths most teams actually face (backups, replicas, dumps, log captures), and it doesn't add latency at scale. What it doesn't cover is a full application-server compromise, because at that point the attacker reads the key and recomputes the derivative directly. | ||
|
|
||
| If your threat model includes that case (host compromise leaking both DB and key), swap the HMAC for a memory-hard KDF like Argon2id and follow the [Password Storage Cheat Sheet](Password_Storage_Cheat_Sheet.md) for parameter selection. The per-request cost is high enough this is rarely the right default for high-traffic sites, but it removes the single-key shortcut. |
There was a problem hiding this comment.
a memory-hard KDF like Argon2id
This does not seem practical for session. This looks like a good way to inflict yourself a denial of service, isn't it? (The KDF is the session verification, is going to be a huge bottleneck for your application).
| The shape: | ||
|
|
||
| - Cookie carries the random ID (CSPRNG, 128 bits minimum, per the section above). | ||
| - Server stores `H(session_id)` alongside the session record, and on each request it hashes the cookie value and does a constant-time compare. |
There was a problem hiding this comment.
constant-time compare
So about "contant-time" …
- You receive a "session ID" from the the client.
- You compute h = HMAC(session ID, secret).
- Now, how do you actually do a constant-time compare? Against what?
Do you:
- Search a session by
hin your session table / session hash map ? This is not going to be really constant time. (and is that really a problem anyway?) - Or your first get a session entry (using what? you need some additional session identifier to make that lookup) and then make a constant time compare of the hash stored in the session entry?
Pseudo code for case 1:
session_id = res.cookies[cookie_name]
h = hmac.digest(secret_key, session_id.encode("UTF-8"), "sha256")
session = sessions.get(h) # <- not constant-time?
return sessionVariant of (1):
session_id = res.cookies[cookie_name]
h = hmac.digest(secret_key, session_id.encode("UTF-8"), "sha256")
session = sessions.get(h) # <- not constant-time?
if not compare_digest(session.hash, h): # <- OK constant time by why would you need to do that?
return None
return sessionPseudo code for case 2:
session_id = res.cookies[cookie_name]
session_key = ???
h = hmac.digest(secret_key, session_id.encode("UTF-8"), "sha256")
session = sessions[k] # <- still not constant-time
if not compare_digest(session.hash, h):
return None
return sessionShall we actually suggest this (robust about timing attacks):
(session_id, session_secret) = parse_session_token(res.cookies[cookie_name])
session = sessions[session_id] # <- still not constant-time but whatever this is not really a secret
if session is None:
return None
h = sha256(session_secret)
# Or if you want: h = hmac.digest(secret_key, session_secret, "sha256")
if not compare_digest(session.hash, h):
return None
return session|
|
||
| If your threat model includes that case (host compromise leaking both DB and key), swap the HMAC for a memory-hard KDF like Argon2id and follow the [Password Storage Cheat Sheet](Password_Storage_Cheat_Sheet.md) for parameter selection. The per-request cost is high enough this is rarely the right default for high-traffic sites, but it removes the single-key shortcut. | ||
|
|
||
| Wherever you land, don't write the raw session ID outside of the cookie. Not in app logs, not in error pages, not in distributed-tracing payloads. |
There was a problem hiding this comment.
I would suggest saying "session token" instad of "session ID" (throughout the Cheat Sheet). As I said, you can actually use a session ID which is only an identifier and is not secret as a way to identify the session.
|
Thanks @0xBassia for the careful work, and thanks @Sc00bz and @randomstuff for the technical pushback. I'm with Sc00bz on most points. A 128-bit CSPRNG session ID isn't crackable, so HMAC and Argon2id aren't adding meaningful security on top of it. It was a poor suggestion. And Argon2id on every authenticated request is a real DoS problem. I'd like to take direction (1) from your May 1 comment, but reframed around the patterns in #1153 rather than the password-hash analogy:
Does that sounds reasonable to everyone? |
…8-bit token Reframes the subsection along the consensus from @Sc00bz, @randomstuff, and @jmanico: - Default is a 128-bit CSPRNG token stored verbatim, with operational hygiene (no logs/traces, restricted DB access, encrypted backups, regenerate on auth events). - Multi-target leak threat model widens the token to 192 bits instead of reaching for a KDF. - Optional (identifier, verifier) split for one-way storage: lookup by identifier, constant-time compare SHA-256 of the encoded verifier truncated to verifier width. - HMAC and Argon2id removed from the section. - Terminology shifted to "session token" to match @randomstuff's point about keeping a non-secret identifier separate from the secret.
|
Thanks all. Pushed a rewrite along the direction in your reviews:
@randomstuff on encoded vs decoded: went with encoded per @Sc00bz so what gets hashed is exactly the string the cookie carried, which sidesteps double-encoding bugs. If you'd rather the AI disclosure: the content is mine. AI was used as a wording review pass. |
|
|
||
| ### Server-Side Session Token Storage | ||
|
|
||
| A 128-bit CSPRNG session token cannot be brute-forced from a database leak, so storing it verbatim in the session store is the sensible default. Most of the protection comes from operational hygiene rather than from any storage transformation: |
There was a problem hiding this comment.
A 128-bit CSPRNG session token cannot be brute-forced from a database leak, so storing it verbatim in the session store is the sensible default.
I don't believe I understand this sentence.
| - Restrict direct database access. Encrypt backups and replicas at rest. | ||
| - Regenerate the token on authentication and privilege-level changes. | ||
|
|
||
| If your threat model includes simultaneous leaks across many independent session stores, widen the token to 192 bits rather than reaching for a KDF. |
There was a problem hiding this comment.
I don't understand this part. When is it necessary to wide the token and why?
|
|
||
| Hash the encoded form of the verifier (whatever string the cookie carried), not a decoded one. This keeps verification deterministic and avoids double-encoding bugs. The stored hash can be truncated to the verifier's bit width. | ||
|
|
||
| Either approach assumes the token is generated by a CSPRNG with at least 128 bits of entropy. Hashing does not rescue a weak generator. |
There was a problem hiding this comment.
In the second approach, it's not clear from this text what should be 128 bits? The identifier? Identifier+verifier? It's certainly the verifier.
|
|
||
| Hash the encoded form of the verifier (whatever string the cookie carried), not a decoded one. This keeps verification deterministic and avoids double-encoding bugs. The stored hash can be truncated to the verifier's bit width. | ||
|
|
||
| Either approach assumes the token is generated by a CSPRNG with at least 128 bits of entropy. Hashing does not rescue a weak generator. |
There was a problem hiding this comment.
I would humbly suggest "MUST have at least 128 bits of entropy but SHOULD have at least 160 bits of entropy". This would align with requirements such as:
https://datatracker.ietf.org/doc/html/draft-ietf-oauth-v2-1-15#section-7.7
|
Thanks, the wording here still isn't landing. Let me fix it. @randomstuff @jmanico the opening sentence is just badly phrased. What I meant is that a 128-bit CSPRNG token has no offline guessing attack, so unlike a password it gains almost nothing from being hashed at rest. The real risk is the raw token leaking through a backup, replica or log. "Cannot be brute-forced from a database leak" reads backwards anyway, since if the store leaks you already have the token. I'll rewrite the intro around "no offline attack, so protect the store instead of transforming the value." On the 192-bit line: that's carried over from @Sc00bz's earlier point and I trimmed it down to where it stopped making sense on its own. The only case it's for is the one he raised, where everyone hashes the same way and session stores leak at scale, an attacker batches the hashes, and widening the token helps while a KDF doesn't. Standalone it just raises questions, so I'll drop it and leave at most a one-line pointer to that scenario. @randomstuff on the split scheme: right, it's the verifier that needs the 128 bits. The identifier is only a lookup key and can be public. I'll say that directly instead of using "token" for both. I'll also switch the entropy line to "MUST have at least 128 bits, SHOULD have at least 160 bits" and cite the OAuth 2.1 draft (section 7.7). That lines up with the section above this one. Revision will be: reworded intro, 192-bit aside cut, verifier called out explicitly, and MUST/SHOULD 128/160. Let me know if that misses anything. |
This seems like a somewhat weird/roundabout/circular argument for not hashing it. 😄 A CSPRNG token has no offline guessing attack because you can directly use it so there is nothing to guess. I completely oppose suggesting this solution, but the argument feels weak. Arguments for not hashing it:
Arguments for hashing it:
I feel that hashing would benefit from being recommended as a good baseline because it protects against database leak (database dump leak, leak the the session database entry through some other mechanisms). @jmanico ? (Sorry for the recurrent nitpicking 😄) |
Closes #1153.
Adds a
### Server-Side Session ID Storagesubsection under Session Management Implementation. The current cheat sheet says the session repository "must be secure" but does not say what that actually means. This fills it in.The angle is per the discussion on the issue, in particular @jmanico's point that the session ID shouldn't be stored raw and a verifiable derivative should be stored instead. The new subsection writes that out and gives the HMAC vs Argon2id trade-off so readers can pick a default based on their threat model. KDF parameter detail cross-links to the Password Storage Cheat Sheet rather than restating that content here.
One file changed:
cheatsheets/Session_Management_Cheat_Sheet.md. ~12 added lines, no removals. No code, just guidance.Linting:
npm run lint-markdownandnpm run lint-terminologyboth clean locally.Note: the previous attempt on this issue (#1841) was closed for
MD004/MD030violations on its bullet list. This PR uses-markers with single spacing which lint-markdown accepts.AI disclosure: the content, structure and technical guidance here are 100% my own. AI was used only as a review pass on the wording.