fix: Do not clear the schema cache during retries by mkleczek · Pull Request #4869 · PostgREST/postgrest

mkleczek · 2026-05-04T07:28:49Z

DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.

retryingSchemaCacheLoad should not clear existing schema cache upon failure - there is no reason to do that.
If there is a communication issue with the database server or db is down, clients are going to get 502 anyway. If it was a glitch when loading the schema cache - the clients are going to use old (stale) schema cache for some time until next retry re-loads it successfully.

Fixes #4873

steve-chavez · 2026-05-04T18:02:40Z

Clarifying the motivation on #4873

steve-chavez · 2026-05-04T18:14:49Z

Follow up question on #4873 (comment), not sure if it's correct to do the fix like this.

mkleczek · 2026-05-12T07:32:19Z

Follow up question on #4873 (comment), not sure if it's correct to do the fix like this.

@steve-chavez - So what's the conclusion after #4873 (comment), #4873 (comment) and all discussions in #4873 ?

Are we going to proceed with this PR?

steve-chavez · 2026-05-12T18:18:36Z

@mkleczek We need to clarify what are we going to do with the request "waiting" (not sure how to call this) mentioned on #4873 (comment) because there's also a scenario here where we void the schema cache and we force clients to wait while the schema cache is reloaded again. If that "waiting" is useless, let's settle it on #4873 (comment).

retryingSchemaCacheLoad should not clear existing schema cache upon failure - there is no reason to do that. If there is a communication issue with the database server or db is down, clients are going to get 502 anyway. If it was a glitch when loading the schema cache - the clients are going to use old (stale) schema cache for some time until next retry re-loads it successfully.

steve-chavez · 2026-05-20T16:21:51Z

Now that we clarified the "waiting" was ineffective on #4937, let's proceed with reviewing this.

steve-chavez · 2026-05-20T16:25:46Z

@@ -763,7 +763,7 @@ def test_admin_ready_includes_schema_cache_state(defaultenv, metapostgrest):
        assert response.status_code == 503

        response = postgrest.session.get("/projects", timeout=1)
-        assert response.status_code == 503
+        assert response.status_code == 200


@mkleczek One thing that is confusing, this is the full test we're changing here:

postgrest/test/io/test_io.py

Lines 740 to 768 in 86d6ed1

def test_admin_ready_includes_schema_cache_state(defaultenv, metapostgrest):

"Should get a failed response from the admin server ready endpoint when the schema cache is not loaded"

role = "timeout_authenticator"

env = {

**defaultenv,

"PGUSER": role,

"PGRST_DB_ANON_ROLE": role,

"PGRST_INTERNAL_SCHEMA_CACHE_QUERY_SLEEP": "500",

}

with run(env=env) as postgrest:

# The schema cache query takes at least 500ms, due to PGRST_INTERNAL_SCHEMA_CACHE_QUERY_SLEEP above.

# Make it impossible to load the schema cache, by setting statement timeout to 400ms.

set_statement_timeout(metapostgrest, role, 400)

# force a reconnection so the new role setting is picked up

postgrest.process.send_signal(signal.SIGUSR1)

postgrest.wait_until_scache_starts_loading()

response = postgrest.admin.get("/ready", timeout=1)

assert response.status_code == 503

response = postgrest.session.get("/projects", timeout=1)

assert response.status_code == 503

reset_statement_timeout(metapostgrest, role)

So the /ready endpoint will give 503 but a request to /projects will give 200? An admin will not be able to correlate the time the health check was failing with the users' requests failing. I don't think this behavior is correct.

Hmm... but what is there to correlate if a request returns 200? There is no error in this case. Schema cache is stale but we still can handle the query.

But this is a good question: what should admin server return from /ready during schema cache reloading? 200 or 503?

But this is a good question: what should admin server return from /ready during schema cache reloading? 200 or 503?

Under this change, it looks we should return 200 on /ready once the startup scache load is done, no matter if it fails later.

And perhaps only return 503 if the db is down (detected by connection error or perhaps by pool metric).

But TBH not sure if any of the above are right.

mkleczek requested a review from steve-chavez May 4, 2026 07:49

mkleczek force-pushed the push-olktzqvzkonu branch from c468e80 to 36be579 Compare May 4, 2026 17:35

steve-chavez reviewed May 4, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

mkleczek force-pushed the push-olktzqvzkonu branch 2 times, most recently from 5201ea9 to 82cf2e7 Compare May 4, 2026 18:08

wolfgangwalther reviewed May 5, 2026

View reviewed changes

Comment thread test/io/test_io.py

Comment thread test/io/fixtures/roles.sql

mkleczek force-pushed the push-olktzqvzkonu branch 3 times, most recently from 02f8fbc to 4d6a87d Compare May 12, 2026 06:29

mkleczek mentioned this pull request May 12, 2026

A statement timeout can void the schema cache #4873

Open

mkleczek force-pushed the push-olktzqvzkonu branch 7 times, most recently from 350e946 to f92122f Compare May 19, 2026 10:56

mkleczek force-pushed the push-olktzqvzkonu branch from f92122f to 3b60b73 Compare May 20, 2026 10:35

steve-chavez reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Do not clear the schema cache during retries#4869

fix: Do not clear the schema cache during retries#4869
mkleczek wants to merge 1 commit into
PostgREST:mainfrom
mkleczek:push-olktzqvzkonu

mkleczek commented May 4, 2026 •

edited by steve-chavez

Loading

Uh oh!

steve-chavez commented May 4, 2026

Uh oh!

Uh oh!

steve-chavez commented May 4, 2026

Uh oh!

Uh oh!

Uh oh!

mkleczek commented May 12, 2026

Uh oh!

steve-chavez commented May 12, 2026

Uh oh!

steve-chavez commented May 20, 2026 •

edited

Loading

Uh oh!

steve-chavez May 20, 2026 •

edited

Loading

Uh oh!

mkleczek May 20, 2026 •

edited

Loading

Uh oh!

steve-chavez May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

	def test_admin_ready_includes_schema_cache_state(defaultenv, metapostgrest):
	"Should get a failed response from the admin server ready endpoint when the schema cache is not loaded"

	role = "timeout_authenticator"

	env = {
	**defaultenv,
	"PGUSER": role,
	"PGRST_DB_ANON_ROLE": role,
	"PGRST_INTERNAL_SCHEMA_CACHE_QUERY_SLEEP": "500",
	}

	with run(env=env) as postgrest:
	# The schema cache query takes at least 500ms, due to PGRST_INTERNAL_SCHEMA_CACHE_QUERY_SLEEP above.
	# Make it impossible to load the schema cache, by setting statement timeout to 400ms.
	set_statement_timeout(metapostgrest, role, 400)

	# force a reconnection so the new role setting is picked up
	postgrest.process.send_signal(signal.SIGUSR1)

	postgrest.wait_until_scache_starts_loading()

	response = postgrest.admin.get("/ready", timeout=1)
	assert response.status_code == 503

	response = postgrest.session.get("/projects", timeout=1)
	assert response.status_code == 503

	reset_statement_timeout(metapostgrest, role)

Uh oh!

Conversation

mkleczek commented May 4, 2026 • edited by steve-chavez Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steve-chavez commented May 4, 2026

Uh oh!

Uh oh!

steve-chavez commented May 4, 2026

Uh oh!

Uh oh!

Uh oh!

mkleczek commented May 12, 2026

Uh oh!

steve-chavez commented May 12, 2026

Uh oh!

steve-chavez commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steve-chavez May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steve-chavez May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

mkleczek commented May 4, 2026 •

edited by steve-chavez

Loading

steve-chavez commented May 20, 2026 •

edited

Loading

steve-chavez May 20, 2026 •

edited

Loading

mkleczek May 20, 2026 •

edited

Loading