THREESCALE-12408: Access tokens protection#4236
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4236 +/- ##
==========================================
+ Coverage 88.23% 88.30% +0.07%
==========================================
Files 1765 1765
Lines 44344 44361 +17
Branches 686 686
==========================================
+ Hits 39126 39175 +49
+ Misses 5202 5170 -32
Partials 16 16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
b2037c0 to
9d65d4f
Compare
This is not true, those lines are covered by https://github.com/3scale/porta/blob/513293b4e74515fced4aaef41a07e5e442fb6720/features/provider/admin/user/access_tokens.feature. But that's somehow ignored by Codecov. Probably a misconfiguration. |
| def self.find_from_value(plaintext_value) | ||
| return nil if plaintext_value.blank? | ||
|
|
||
| scrubbed = plaintext_value.to_s.scrub | ||
| digest = compute_digest(scrubbed) | ||
|
|
||
| def self.find_from_value(value) | ||
| find_by(value: value.to_s.scrub) | ||
| # Fast path: find by digest (new/migrated tokens) | ||
| token = find_by(value: digest) | ||
| return token if token | ||
|
|
||
| # Legacy tokens can't be exactly 96 chars (that's our hash length) | ||
| # This prevents using a leaked hash directly as a token | ||
| return nil if scrubbed.length == HASHED_TOKEN_LENGTH | ||
|
|
||
| # Slow path: find by plaintext (legacy tokens) | ||
| token = find_by(value: scrubbed) | ||
| return nil unless token | ||
|
|
||
| # Migrate on use: replace plaintext with hash | ||
| token.migrate_to_hashed!(scrubbed) | ||
| token |
There was a problem hiding this comment.
Why not make a database migration instead of having another piece of crazy logic to handle legacy and new data nad never to be sure that the data was eventually fully converted or not and have to notify customers when we finally decide to remove the legacy support, wait for them, check they actually stopped using this, wait more, etc.
There was a problem hiding this comment.
I agree on adding a DB migration or a migration script, that was my plan. But I don't think this is crazy logic or this can be replaced by a migration. Backwards compatibility is mandatory because otherwise we couldn't deploy without downtime, and the logic is really simple, just a few additional lines in a one and single method.
The safe path to proceed is to deploy code that is compatible with pre and post migration state, then run the migration at any moment, then in the next deploy, remove the code to support legacy tokens.
It's as simple as creating a jira issue to remove the legacy support and schedule it to the release after the one including this PR. I don't see any drama, just regular work.
There was a problem hiding this comment.
Backwards compatibility is mandatory because otherwise we couldn't deploy without downtime
if we start with this, then pure database migration will not be really feasible as far as I can tell.
I'm not talking about ruby code migration but pure query migration that can execute very fast on the DB server side.
A short downtime is always acceptable if it is short. For example a couple of minutes is totally fine.
While a pure ruby migration might be super slow. On the other hand it can be lazily executed.
There was a problem hiding this comment.
But this lazy execution can only reliably be done on clusters we maintain, not sure we can rely customers running this. That's why I think for on-prem much better would be a one-step migration without gimmicks.
There was a problem hiding this comment.
A short downtime is always acceptable if it is short. For example a couple of minutes is totally fine.
Well, I think no down time is even better.
I'm not talking about ruby code migration but pure query migration that can execute very fast on the DB server side.
Like an SQL query that hashes the values? Is there a way to do it reliably on all the DBs, including Internet Explor... I mean Oracle? Is it really so fast?
Everything is trade-off. If I did it this way is because for me, providing a easy and less risky way to migrate is worth the super smaller extra complexity. Forcing the client to have a down time and execute a migration written in SQL that must work in all DBs and versions we support is more risky and complicated for them I think. And the only advantage would be that it's allegedly less work for us, and maybe not even that.
@mayorova @madnialihussain Please untie.
There was a problem hiding this comment.
From our discussion, I suggest this approach:
- New tokens are created hashed
- Hashed tokens are identified by the prefix with the algorithm
- When receiving a request
- Find by hashed
- If it fails, find by clear
- If it fails, reject
- Clear tokens are not migrated on the fly
Deploy paths:
- SaaS:
- Deploy code
- Leave it running to ensure we don't rollback
- Run the migration
- Two queries will only happen in the lapse between deploying code and running the migration
- On premises:
- Have downtime
- Update the code
- Run the migration
- Start server
- They will never run two queries per hash
Result:
| One time deploy | Backwards compatibility | No downtime | Rollback | Leak-safe | No DDL Migration | No Corner cases |
|---|---|---|---|---|---|---|
| ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
WDYT?
There was a problem hiding this comment.
I fact, it's almost one time deploy, bc the second phase is only a cleanup of dead code.
There was a problem hiding this comment.
I'm fine with whatever you find yourself comfortable with. I'm only not sure about the 2 queries, but provided on first request the key will be migrated and 2 queries will consistently run only for missing keys, perhaps should be fine.
There was a problem hiding this comment.
In my last suggestion, I added:
Clear tokens are not migrated on the fly
This is to provide a way back. But two queries will only happen until they run the migration.
On the other hand, I think we can actually deploy in two phases for SaaS and a single phase for on premises, before we can release the two phases altogether in one release if they are in fact having downtime anyway.
There was a problem hiding this comment.
It is not ultimately down-gradable still, because new tokens may be created while in the new version. Still much higher downgradeablibity.
| def create | ||
| @presenter = AccessTokensNewPresenter.new(current_account) | ||
| create! do |success, _failure| | ||
| create! do |success, failure| |
There was a problem hiding this comment.
This syntax belongs to inherited_resources 😅
As we're trying to get rid of it, I'd suggest to use a standard Rails approach to avoid conflicts.
There was a problem hiding this comment.
Exactly, this is how I learnt about that gem.
As we're trying to get rid of it, I'd suggest to use a standard Rails approach to avoid conflicts.
Sure!
|
|
||
| def self.random_id | ||
| SecureRandom.hex(32) | ||
| SecureRandom.hex(48) |
There was a problem hiding this comment.
Hm,the existing 64-characters tokens are represented as 96-character long hash, it seems. Why is it necessary to also increase the generated plain value size? 🤔
While we can do it, but I think it's better if we can avoid it - maybe some customers also have some assumptions based on the token size (even though it might not be wise of them).
There was a problem hiding this comment.
Because it's more resistant to quantum computers, if those actually work some day.
There was a problem hiding this comment.
+1 to generate tokens that contain at least as many random bits as the hash used to store them. This is not expensive so should be noticeable in performance.
c3be649 to
e58951c
Compare
❌ 69 blocking issues (69 total)
|
|
I made a few changes, my last proposal: Changes:
Deploy paths:
Support table:
Performance tests:
Results:
|
akostadinov
left a comment
There was a problem hiding this comment.
Awesome work!
I have a few nitpick comments but overall very solid, thanks!
|
|
||
| def self.find_from_value(value) | ||
| find_by(value: value.to_s.scrub) | ||
| scrubbed = plaintext_value.to_s.scrub |
There was a problem hiding this comment.
do we need scrub?
Just a nitpick for waste of compute resources.
There was a problem hiding this comment.
yeah we need it because we re-use it for the legacy comparison. We can remove it in the follow-up PR.
| section id="access-tokens" | ||
| h2 Access Tokens | ||
| p | ||
| ' Access tokens are personal tokens that let you authenticate against the Account Management API, the Analytics API and the Billing API through HTTP Basic Auth. You can create multiple access tokens with custom scopes and permissions. We suggest you create tokens with the minimal scopes & permissions needed for the task at hand. Use Access Tokens from within the |
There was a problem hiding this comment.
Can we use i18n for all strings? 👼
But not blocking merge can be done later.
|
|
||
| def new | ||
| @presenter = AccessTokensNewPresenter.new(current_account) | ||
| @access_token = access_tokens.build |
There was a problem hiding this comment.
I assume we can't easily avoid instantiating an object here generating a key, but at the same time generating random data should be fast enough nowadays...
There was a problem hiding this comment.
Can we do this in the model?
- after_initialize :generate_if_missing, if: :new_record?
+ before_validation :generate_if_missing, on: :createThere was a problem hiding this comment.
I'm not sure what would be the implications from the top of m head, I guess if tests pass it should be good enough.
There was a problem hiding this comment.
I think @josemigallas meant that if we have it in before_validation then the token will not be generated just because we have created the object. As a tiny optimization addressing my complaining.
There was a problem hiding this comment.
But I wouldn't push for this. Only if you feel looking at it.
There was a problem hiding this comment.
Ah OK, fine. I was having a parallel conversation with @josemigallas about resolving conflicts in his PR and though he was talking about that.
| "WHERE value NOT LIKE '#{DIGEST_PREFIX}%' LIMIT #{BATCH_SIZE}" | ||
| elsif System::Database.postgres? | ||
| "UPDATE access_tokens SET value = '#{DIGEST_PREFIX}' || encode(sha384(value::bytea), 'hex') " \ | ||
| "WHERE id IN (SELECT id FROM access_tokens WHERE value NOT LIKE '#{DIGEST_PREFIX}%' LIMIT #{BATCH_SIZE})" |
There was a problem hiding this comment.
Now I'm thinking, postgres may potentially not have the crypto extension enabled. Maybe we should have some failsafe solution for this case 😬
But can be discussed after merge to see if we need one and what exactly.
There was a problem hiding this comment.
We don't use the extension, that's for old postgres versions, in PG 14 there's a builtin alternative, the sha384() function I'm using here.
https://www.postgresql.org/docs/14/functions-binarystring.html
There was a problem hiding this comment.
Awesome, AI apparently doesn't know about it.
| assert_equal legacy_value, @token.reload.read_attribute(:value) | ||
| end | ||
|
|
||
| def test_find_from_value_rejects_leaked_hash_as_token |
There was a problem hiding this comment.
this one seems to be a dup with authentication with leaked database hash fails
There was a problem hiding this comment.
Not dupped, one is unit and the other is integration test. They test different layers
| assert @token.reload.read_attribute(:value).start_with?(AccessToken::DIGEST_PREFIX) | ||
| end | ||
|
|
||
| def test_find_from_value_finds_legacy_token |
There was a problem hiding this comment.
this seems to be a dup with authentication with legacy unmigrated token succeeds
There was a problem hiding this comment.
Same. Different level testing.
| @@ -1,10 +1,14 @@ | |||
| class AccessToken < ApplicationRecord | |||
| DIGEST_PREFIX = 'SHA384|'.freeze | |||
There was a problem hiding this comment.
Overnight I had an insight that this prefix is not ideal. It will be a constant nuisance to use it on command line:
$ echo SHA384|asdasd
bash: asdasd: command not found...
The | character needs to always be quoted or escaped. It will be much more user friendly to have a shorter prefix without special characters, for example s2_
$ echo s2_asdasd
s2_asdasd
There was a problem hiding this comment.
The _ isn't good neither, because it's a wildcard in mysql:
https://dev.mysql.com/doc/refman/8.4/en/string-comparison-functions.html#operator_like
Let's use $ which is what bcrypt uses and we are already using in our DB for passwords. 65a75ea
I hope you can sleep now.
| @@ -1,5 +1,5 @@ | |||
| class AccessToken < ApplicationRecord | |||
| DIGEST_PREFIX = 'SHA384|'.freeze | |||
| DIGEST_PREFIX = 'SHA384$'.freeze | |||
There was a problem hiding this comment.
Wait, my request was to have some normal non-special character 🤔
$ echo SHA384$adsasdas
SHA384
Can we just go with something non-weirs like underscore or whatever?
There was a problem hiding this comment.
just wrap it in single quotes?
$ echo 'SHA384$adsasdas'
SHA384$adsasdas
I hope this is not something you do every day 🙃
There was a problem hiding this comment.
Maybe add # frozen_string_literal: true and remove .freeze for a specific string?
But nevermind.
c298d44 to
dfaf55e
Compare
|
Rebased and solved conflicts |
| def generate_value | ||
| self.value ||= self.class.random_id | ||
| return if persisted? | ||
| return if @plaintext_value.present? |
There was a problem hiding this comment.
I don't think there's any reasonable scenario when any of these two conditions can be true. It's a safeguard for very unlikely situations. Probably superfluous code, want me to remove it?
There was a problem hiding this comment.
Well, no strong opinion on this one. Indeed, currently this will never happen, because it's only called under if new_record? condition. But I guess it might be useful is someone accidentally calls .generate_value explicitly on an existing object 🤷
There was a problem hiding this comment.
might be nice to rename the method to #generate_if_missing to make it super clear.
|
Oopsie... conflict detected... |
| DIGEST_PREFIX = 'SHA384$'.freeze | ||
| DIGEST_PREFIX = 'SHA384$' | ||
|
|
||
| TIMESTAMP_FORMAT = '%FT%T%:z'.freeze |
There was a problem hiding this comment.
this .freeze can now be removed too
There was a problem hiding this comment.
I removed this, I rewrote the commit log so no particular commit for this, but you can see the change is applied.
9e216ee to
364982b
Compare
Implements SHA384 hashing for access token storage to prevent plaintext token values from being stored in the database. New tokens are generated with 96 hex characters (48 bytes) and immediately hashed. The model stores hashes with a 'SHA384$' prefix to distinguish them from legacy plaintext tokens. The find_from_value method implements a two-phase lookup strategy: - Fast path: attempts to find by computed digest (new hashed tokens) - Slow path: falls back to plaintext lookup for legacy tokens This approach ensures backward compatibility with existing plaintext tokens until the migration runs, allowing gradual rollout of the security enhancement without breaking existing token authentication. Co-Authored-By: Claude <noreply@anthropic.com>
Changes the UI flow to display the generated access token immediately after creation instead of redirecting to the index page. This is necessary because tokens are now hashed and can never be retrieved again after initial creation. Key changes: - Controller renders new show view instead of redirecting after create - New show.html.slim template displays the token with copy functionality - Index view refactored to remove inline token display logic - Added I18N strings for the new show view and updated messaging The token value is only available in memory during the create action via the @plaintext_value attribute, ensuring it cannot be accessed later from the database. Co-Authored-By: Claude <noreply@anthropic.com>
Updates worker, API representer, seeds, and factory to work with the new hashed token storage mechanism. - RestoreApicastMasterTokenWorker: Uses plaintext_value getter - AccessTokenRepresenter: Returns plaintext_value when available - Seeds: Adapts token generation for hashed storage - Factory: Updates to use new token generation flow These changes ensure all token creation paths properly handle the separation between plaintext generation and hashed storage.
Adds comprehensive test coverage for the new SHA384 hashing functionality including: - Token value hashing on creation - Digest computation with proper prefix - Two-phase lookup (hashed and legacy plaintext) - Plaintext value visibility logic - Hash prefix rejection in find_from_value These tests verify both the security enhancement and backward compatibility with existing plaintext tokens. Co-Authored-By: Claude <noreply@anthropic.com>
Updates all existing tests across functional, integration, and acceptance suites to work with hashed token values. Tests now use the plaintext_value accessor when they need to verify or use the actual token value, since the stored value field now contains SHA384 hashes instead of plaintext. Changes span 90+ test files covering: - API authentication test helpers - Admin API controller tests - Finance API tests - Stats API tests - User management API tests - Multi-tenant enforcement tests - Acceptance specs No test logic changes, only adaptations to the new hashed storage mechanism.
Introduces a data migration that converts existing plaintext access token values to SHA384 hashes in batches. This migration completes the security enhancement by ensuring all tokens in the database are hashed, not just newly created ones. The migration processes tokens in batches of 1000 with a small delay between batches to avoid overwhelming the database during the upgrade. After this migration runs, all tokens will be hashed and the slow path plaintext lookup in find_from_value will only be used if the migration hasn't run yet (useful for development/testing scenarios).
Adds critical test verifying that the migration's SQL-based SHA384 computation produces identical output to Ruby's compute_digest method. This is essential because the migration is irreversible and if the SQL digest differs from Ruby (due to encoding, casing, or implementation differences), all migrated tokens would become permanently unreachable. The test also verifies: - Already-hashed tokens are not double-hashed (idempotency) - Batch processing correctly hashes multiple tokens This addresses the highest-risk untested change in the PR, ensuring the migration will not break authentication for existing tokens. Co-Authored-By: Claude <noreply@anthropic.com>
364982b to
c526100
Compare
|
I rebased this onto I rewrote the git log so the changes are integrated in existing commits, but you can see the diff from the previously approved state to now: https://github.com/3scale/porta/compare/364982bfc2756ab034da23ced2870dbf911a7617..c526100c290ee59c3bb0aeeff2023c4069180a35 |
| class HashAccessTokenValues < ActiveRecord::Migration[7.1] | ||
| disable_ddl_transaction! if System::Database.postgres? | ||
|
|
||
| BATCH_SIZE = 1000 |
There was a problem hiding this comment.
I would like this to be measured. I'm not sure doing in batches will be faster or safer because RDBMS will scan the table multiple times instead of once. So I suggest trying it in staging to see if there is any meaningful difference.
There was a problem hiding this comment.
I think we shouldn't include this test because migration runs once and is over. Regressions are irrelevant because migration will not run again. So we have some needless tests.
akostadinov
left a comment
There was a problem hiding this comment.
While I prefer my last comments addressed, looks great overall, a lot of hard work!
|
|
||
| def show_value?(*) | ||
| saved_changes.include?(:value) | ||
| def show_plaintext_value?(*) |
There was a problem hiding this comment.
This method name is a little confusing. perhaps
| def show_plaintext_value?(*) | |
| def plaintext_value_known?(*) |
But that's a minor complaint.
What this PR does / why we need it:
This adds some protection to access tokens in our DB, so tokens are not visible in the case the DB is leaked.
This implements two measures to increase protection:
Since we want to make this backwards compatible, the current code still allows plain text tokens to exist and be used to authenticate.
The code is designed to be super simple. The day we want to remove the migration code and the support for plain text tokens, we'll only have to remove a few lines in the
find_from_valuemethod.A brief explanation of the new design:
valueattribute will always unconditionally hold the value in the DB column. That is: plain text if unmigrated, hash if migratedplaintext_valueholds the plain text value of a hashed token. It's not persisted so it's lost when the instance is removed. This attribute is designed to provide a way to return the original value once in the request response, be it UI or API. The user will have that only opportunity to store it somewhere else. After that, we won't hold the original value anywhere.Which issue(s) this PR fixes
https://issues.redhat.com/browse/THREESCALE-12408
Verification steps
Special notes for your reviewer:
How to review:
There are 90+ files edited in this PR but most of them are just the result of replacing
access_token.valuebyaccess_token.plaintext_valuein tests. This PR is better reviewed commit by commit: