fix: Start listening after schema cache load#4880
Conversation
ca17636 to
5bb48c5
Compare
| mainSocket <- initServerSocket conf | ||
| atomicWriteIORef mainSocketRef $ Just mainSocket |
There was a problem hiding this comment.
Q: Can this change help lift the limitation we have with the --ready check? (#4269 (comment)) i.e. can the admin server know the address that is resolved for the API server when using special hosts? (like !4)
|
Previous discussion on the motivation of the change on #4703 (comment). |
|
@mkleczek As per #4703 (comment), this would clearly benefit the case of But let's consider the scenario of a single instance managed by systemd behind a proxy:
With this change, now we'll not respond and clients will get a So under this scenario, it looks like this new behavior is worse? Thinking more what we need is zero-downtime restarts, which I guess is easier under this new behavior since we could rely on systemd socket activation? |
Not really. From the point of view of the client there is not much gain from these |
| response = postgrest.session.get("/projects") | ||
| assert response.status_code == 503 | ||
| with pytest.raises(requests.ConnectionError): | ||
| postgrest.session.get("/projects") |
There was a problem hiding this comment.
With this change, the test description does not fit anymore.
But anyway, with the change in this PR, the test case does not make sense anymore at all, I think. A connection error does not even make it to PostgREST, so testing its non-effect is useless. I'd remove that test-case as well, if we decide to go on with this PR.
There was a problem hiding this comment.
Not really sure about that - we want to show that schema cache is loaded once during startup.
We also verify the behavior during schema cache load (before it was 503 now it is connection error).
I also added another assertion that once schema cache is loaded we return 200 from /projects request.
Test description updated - it was indeed wrong.
5bb48c5 to
99554cb
Compare
Thought about removing the
So it's a minimum, not exact time. I think it should be fine to be clear about this on the docs and recommend jitter. |
|
@mkleczek The direction here is good, make sure to address the comments and then we can merge this. |
99554cb to
6a927aa
Compare
|
I am marking this PR as draft to address concerns related to handling schema cache loading errors during start-up. It seems the right course of action cannot be any of these two extremes:
The first one forces clients to handle normal conditions as errors. It seems like the best (ultimate?) startup sequence should be:
That way we achieve both:
The above requires wider refactoring - today the whole schema cache loading loop is implemented in a single function without any means to introspect the state of the loading process. Clients can only trigger asynchronous schema load and wait for it to finish. @steve-chavez WDYT? |
f577ea6 to
3eed89b
Compare
I wrote up two different proposals but threw them away, because I always came to the conclusion that this is the sensible thing to do. So 👍 |
Looks much better. Also 👍 from me. |
So between these two steps, we'd still return the connection error, however after that we'd retry and get the 503. I agree with this. @steve-chavez Not sure if it was discussed elsewhere, but this would mean that the proposal to wait until the schema cache is loaded on startup is no longer desired, right? |
|
@laurenceisla The waiting is being discussed on #4873 (comment). #4129 won't be solved here. |
e653c00 to
55c3e8c
Compare
Updated the code to implemented the above. |
55c3e8c to
e8964e8
Compare
| @@ -168,9 +171,6 @@ querySchemaCache conf@AppConfig{..} = do | |||
| tzones <- if configDbTimezoneEnabled | |||
| then sqlTimedStmt gucTzones mempty timezones | |||
| else pure S.empty | |||
| _ <- | |||
| let sleepCall = SQL.Statement "select pg_sleep($1 / 1000.0)" (param HE.int4) HD.noResult True in | |||
| for_ configInternalSCQuerySleep (`SQL.statement` sleepCall) -- only used for testing | |||
There was a problem hiding this comment.
Why is this moved here? Change looks like it should be done independently. I remember we had some tests relying on the fact this sleep was after all the schema cache queries.
There was a problem hiding this comment.
It is needed - otherwise test_log_error_when_schema_cache_load_error_on_startup_to_stderr fails. The reason is that this test sets two env variables: PGRST_INTERNAL_SCHEMA_CACHE_QUERY_SLEEP=1000 and PGRST_DB_SCHEMAS=non_existent_schema_aaaa (ie. we want to fail schema cache query but after waiting 1s.
If waiting is after queries then it fails immediately.
There was a problem hiding this comment.
From what I remember some tests depend on this sleep being last, so if we change it those tests would be ineffective and misleading. We should do a git blame for finding those tests and tune them (or remove) if necessary.
Or as an alternative, introduce another sleep before the schema cache queries and leave this one in place.
There was a problem hiding this comment.
Hmm... I've searched through the tests for usage of this variable and cannot see anything that would require it after queries. It would only be relevant for tests that don't want to wait in case of errors in queries - but we don't have such tests, I think.
There was a problem hiding this comment.
- So this sleep was added here Fix dropping schema cache reload notifications #2810
- This has the original test
- At some point the sleep was switched to the beginning (here)
- But the above was reverted on test: Fix internal_schema_cache_sleep after 747c78f6 #3510
Now, the original test was modified on 92ac7e5. This leaves us with this current test:
Lines 1336 to 1367 in ed0d9dc
Which IIRC it's kind of tricky because it involves concurrent notifications.
@mkleczek Can you ensure this test is still effective? I would also be fine if it's modified to not depend on this sleep if possible (if done should be another PR).
9c21cbc to
93d2726
Compare
93d2726 to
0077e6f
Compare
0077e6f to
37e2338
Compare
This change ensures PostgREST starts listening on a server socket only after it loaded the schema cache and is ready to handle requests. It is no longer going to return 503 errors during startup until the schema cache is loaded.
37e2338 to
9016fd8
Compare
| @@ -33,6 +35,7 @@ All notable changes to this project will be documented in this file. From versio | |||
| - Build the minimal docker image for aarch64-linux by @wolfgangwalther in #4193 | |||
| - The name of an embedded table can no longer be used in filters if it has an alias by @laurenceisla in #4075 | |||
| + e.g. `?select=alias:table(*)&table.id=eq.1` is not possible anymore, use `?select=alias:table(*)&alias.id=eq.1` instead. | |||
| - Do not listen on main socket until schema cache is loaded by @mkleczek in #4880 | |||
There was a problem hiding this comment.
Hm, both these entries should be switched places. We've always put the more descriptive change in Changed.
DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.
This change ensures PostgREST starts listening on a server socket only after it loaded the schema cache and is ready to handle requests. It is no longer going to return 503 errors during startup until the schema cache is loaded.