feat(selector): subscribe to token DB notifications to invalidate cache#1613
feat(selector): subscribe to token DB notifications to invalidate cache#1613atharrva01 wants to merge 3 commits into
Conversation
|
Hey @adecaro, I noticed the |
ea96a12 to
c8f3782
Compare
|
Hi @atharrva01 , thanks for picking this up 🙏 So, in a running system, we can have tokens being continuously added and this will trigger a refresh of the cache that will prevent the selector any read. I see a possible substantial degradation of performance here. But what if instead we take the input passed to the listener and surgically update the cache with the new entry? We would only need to keep the lock to add the new token for a very short period. What do you think? It might be good to check the benchmarks here What do you think? 😄 |
|
Hey @adecaro, you're right, I missed that. The tricky part is that Still much cheaper than a full table scan under the write lock for high-write scenarios, so the direction makes sense to me. One thing worth flagging, the cache value is currently an immutable Also the existing benchmarks only cover read throughput on a static cache, they wouldn't catch the write-contention case you're describing. I can add a concurrent-write benchmark to make the improvement visible. Does this sound like the right direction? |
|
Hi @atharrva01 , if more is needed we can modify what is put inside |
|
Hey @adecaro, had a look at the storage layer. The reference gets populated from a Postgres LISTEN/NOTIFY trigger and right now it only emits |
|
HI @atharrva01 , yes, let's try that. There is a limit to the amount of data that Postgres can send. Let's see if we are in the boundaries 🤞 |
|
@atharrva01 , if you have cycles, please, think also about a benchmark that we can run against Postgres under different loads to see how the selector react. The current benchmarks work against a fake DB. What do you think? |
|
Hey @adecaro, good news, the infrastructure is already there. |
|
Hi @atharrva01 , I was double checking and we have an 8,000 bytes limit on the payload Postgres would send back with the current approach. We need to be carefully. I was also checking the performance of this mechanism under high load. It is not design to be efficient under high load so each use case must evaluate if it makes sense to relay on this feature. So, it might make sense also to have the possibility to disable completely the token notifier in case it starts being a bottleneck. |
|
Hey @adecaro, good catches. On the 8KB limit, individual token fields should fit fine but I'll add a safeguard so the trigger fails loudly rather than silently truncating anything unexpected. For the disable option, the fetcher already falls back to time-based polling when the notifier is unavailable, I'll just wire an explicit config flag into the selector config so operators can turn it off cleanly if it becomes a bottleneck under load. |
c8f3782 to
1587f6b
Compare
|
Hey @adecaro, addressed all three points! For the surgical update, I extended For the 8KB limit, the trigger now raises an explicit exception before calling For the disable option, added Also added Let me know if anything needs tweaking! |
99af372 to
3cbd485
Compare
|
Hi @atharrva01 , thanks for the effort and patience. I will review ASAP 🙏 |
|
hey @adecaro , a gentle ping on this pr , whenever you get time , thanks :) |
8eaa2b4 to
67dbc86
Compare
|
Hi @EvanYan1024 , Many thanks |
Wire driver.TokenNotifier into cachedFetcher so the cache is marked stale immediately when a token is inserted or deleted from the DB, eliminating the polling staleness window caused by freshnessInterval. The dirty flag (atomic.Int32) is set by the subscription callback and checked in isCacheStale(), ensuring the next query forces a DB refresh. The flag is cleared before fetching and restored if the DB call fails, so transient errors retry on the next query. Closes hyperledger-labs#884 Signed-off-by: atharrva01 <atharvaborade568@gmail.com>
- Extend TokenRecordReference with WalletID, Type, Quantity - Update Postgres trigger to include owner_wallet_id, token_type, quantity in NOTIFY payload - Replace dirty-flag invalidation with surgical insert/delete in onTokenChange - Add 8KB payload guard to trigger (RAISE EXCEPTION on overflow) - Add TokenNotifierDisabled config flag for operators to opt out - Add BenchmarkSelectorWit Signed-off-by: Atharva Borade <atharvaborade568@gmail.com> Signed-off-by: atharrva01 <atharvaborade568@gmail.com>
Signed-off-by: Atharva Borade <atharvaborade568@gmail.com> Signed-off-by: atharrva01 <atharvaborade568@gmail.com>
67dbc86 to
1ae9618
Compare
Closes #884
While working in this area for the sherdlock metrics PR (#1459), I noticed the
cachedFetcheronly goes stale through two mechanisms: a time-basedfreshnessIntervaland a query-count cap. That means a token written to the DB can sit invisible to the selector for up to a full freshness interval, even though the infrastructure to know about it immediately was already there.driver.TokenNotifier(with itsSubscribecallback) was already implemented and tested in the storage layer. It just wasn't wired anywhere.What I did:
dirty atomic.Int32flag tocachedFetcher. The subscription callback sets it on any DB write;isCacheStale()checks it so the next query forces an immediate refresh.MixedfetchFunc obtains the notifier fromdb.Notifier()and passes it down. If the notifier is unavailable, we log a warning and fall back gracefully to the existing time-based behavior , nothing breaks.Tests added:
TestCachedFetcher_NotifierMarksDirtyOnInsert, Insert notification immediately makes cache staleTestCachedFetcher_NotifierMarksDirtyOnDelete, Delete notification immediately makes cache staleTestCachedFetcher_DirtyFlagClearedAfterUpdate, flag is gone after a successful refreshTestCachedFetcher_DirtyFlagRestoredOnDBError, flag survives a failed DB fetch so retries work