fix: Do not clear the schema cache during retries#4869
Conversation
c468e80 to
36be579
Compare
|
Clarifying the motivation on #4873 |
5201ea9 to
82cf2e7
Compare
|
Follow up question on #4873 (comment), not sure if it's correct to do the fix like this. |
02f8fbc to
4d6a87d
Compare
@steve-chavez - So what's the conclusion after #4873 (comment), #4873 (comment) and all discussions in #4873 ? Are we going to proceed with this PR? |
|
@mkleczek We need to clarify what are we going to do with the request "waiting" (not sure how to call this) mentioned on #4873 (comment) because there's also a scenario here where we void the schema cache and we force clients to wait while the schema cache is reloaded again. If that "waiting" is useless, let's settle it on #4873 (comment). |
350e946 to
f92122f
Compare
retryingSchemaCacheLoad should not clear existing schema cache upon failure - there is no reason to do that. If there is a communication issue with the database server or db is down, clients are going to get 502 anyway. If it was a glitch when loading the schema cache - the clients are going to use old (stale) schema cache for some time until next retry re-loads it successfully.
f92122f to
3b60b73
Compare
|
Now that we clarified the "waiting" was ineffective on #4937, let's proceed with reviewing this. |
| @@ -763,7 +763,7 @@ def test_admin_ready_includes_schema_cache_state(defaultenv, metapostgrest): | |||
| assert response.status_code == 503 | |||
|
|
|||
| response = postgrest.session.get("/projects", timeout=1) | |||
| assert response.status_code == 503 | |||
| assert response.status_code == 200 | |||
There was a problem hiding this comment.
@mkleczek One thing that is confusing, this is the full test we're changing here:
Lines 740 to 768 in 86d6ed1
So the /ready endpoint will give 503 but a request to /projects will give 200? An admin will not be able to correlate the time the health check was failing with the users' requests failing. I don't think this behavior is correct.
There was a problem hiding this comment.
Hmm... but what is there to correlate if a request returns 200? There is no error in this case. Schema cache is stale but we still can handle the query.
But this is a good question: what should admin server return from /ready during schema cache reloading? 200 or 503?
There was a problem hiding this comment.
But this is a good question: what should admin server return from /ready during schema cache reloading? 200 or 503?
Under this change, it looks we should return 200 on /ready once the startup scache load is done, no matter if it fails later.
And perhaps only return 503 if the db is down (detected by connection error or perhaps by pool metric).
But TBH not sure if any of the above are right.
DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.
retryingSchemaCacheLoadshould not clear existing schema cache upon failure - there is no reason to do that.If there is a communication issue with the database server or db is down, clients are going to get 502 anyway. If it was a glitch when loading the schema cache - the clients are going to use old (stale) schema cache for some time until next retry re-loads it successfully.
Fixes #4873