Skip to content

Add connection check to Trilogy#134

Closed
dbussink wants to merge 1 commit into
trilogy-libraries:mainfrom
dbussink:add-connection-check
Closed

Add connection check to Trilogy#134
dbussink wants to merge 1 commit into
trilogy-libraries:mainfrom
dbussink:add-connection-check

Conversation

@dbussink
Copy link
Copy Markdown
Contributor

This adds a very cheap connection check to Trilogy. This cheap check can be used before a connection for example is checked out from a connection pool.

It allows for making the client more robust in the case of the server having disconnected (like a failover, an intermediate proxy like ProxySQL or VTGate restarted etc) but it can be reconnected safely.

The implementation here matches what is also done in the Go driver, see also https://github.blog/2020-05-20-three-bugs-in-the-go-mysql-driver/ where this is decribed.

It works by doing a non-blocking recv to peek if there's any data. If there is, or if the function would otherwise block it means the connection is still safe.

If the TCP connection has been closed, recv immediately returns with zero bytes read. This indicates the connection is no longer safe to use. We do this directly on the file descriptor to be consistent regardless of things like TLS being used.

With this addition, I also want to propose a change then in Rails to start using this there as well on connection pool checkout.

This adds a very cheap connection check to Trilogy. This cheap check can
be used before a connection for example is checked out from a connection
pool.

It allows for making the client more robust in the case of the server
having disconnected (like a failover, an intermediate proxy like
ProxySQL or VTGate restarted etc) but it can be reconnected safely.

The implementation here matches what is also done in the Go driver, see
also https://github.blog/2020-05-20-three-bugs-in-the-go-mysql-driver/
where this is decribed.

It works by doing a non-blocking `recv` to peek if there's any data. If
there is, or if the function would otherwise block it means the
connection is still safe.

If the TCP connection has been closed, `recv` immediately returns with
zero bytes read. This indicates the connection is no longer safe to use.
We do this directly on the file descriptor to be consistent regardless
of things like TLS being used.
@kirs
Copy link
Copy Markdown

kirs commented Nov 3, 2023

This is great and we hacked something similar on our private mysql2 fork some time ago. Would love this to ship and to be used by Rails ❤️

@nvasilevski
Copy link
Copy Markdown
Contributor

Hey @dbussink, I'm extremely curious about this change! Coincidentally we are dealing with an issue where applications that use trilogy are unable to recover after a proxysql restart as application keeps trying to use broken connections. We haven't found the root cause just yet but I'm eager to try this to see if it helps with our issue.

I was wondering if you already have an idea of how the change in Rails may look like? Or perhaps you have a draft branch. It's okay if not, I was thinking to at least trying to use the new check as an addition (or a substitution?) to the .ping call inside the active? check

as this is what Rails uses to verify! the connection before making the query

I'm going to try this feature on Monday against our issue and get back with results. Thank you!

@composerinteralia
Copy link
Copy Markdown
Collaborator

composerinteralia commented Nov 4, 2023

an issue where applications that use trilogy are unable to recover after a proxysql restart as application keeps trying to use broken connections.

rails/rails#49811 could be a cause (although that would probably be any Rails 7.1 app, not just ones using trilogy)

@dbussink
Copy link
Copy Markdown
Contributor Author

dbussink commented Nov 4, 2023

I was wondering if you already have an idea of how the change in Rails may look like? Or perhaps you have a draft branch. It's okay if not, I was thinking to at least trying to use the new check as an addition (or a substitution?) to the .ping call inside the active? check

The idea is not to use this in the same way a .ping. That is an expensive call that goes to the server and back to check if things are alive. The goal of this check is that it is much cheaper, so it can be easily used in cases where .ping is too expensive. This can be for example when a connection is checked out from the connection pool.

I don't have anything here yet, but my first idea is to add this to the connection pool logic then but it's all up for debate of course. Maybe @composerinteralia has ideas for this already though.

@matthewd
Copy link
Copy Markdown
Collaborator

matthewd commented Nov 6, 2023

This sounds like a valuable API to have in the library.

Rails [now] already defers the verification, such that it's not done on pool checkout, but only when necessary (about to run a non-retryable query, could safely reconnect now, have not yet used the connection since it was checked out). And in that situation, I think the full ping remains appropriate: "server has mysteriously disappeared without warning" is exactly the sort of thing we're trying to protect against.

So I don't think this can actually save any round-trips we'd currently be doing. I do think it could be interesting, though, as an additional check we could use when running additional queries on an already-used connection -- it would be ridiculous to re-ping before every query, but a quick pre-query peek could well save some in-flight requests from trouble.

@dbussink
Copy link
Copy Markdown
Contributor Author

dbussink commented Nov 6, 2023

So I don't think this can actually save any round-trips we'd currently be doing.

That's not the goal here either I think. The goal is to provide a cheap check that can be done on pool checkout.

I do think it could be interesting, though, as an additional check we could use when running additional queries on an already-used connection -- it would be ridiculous to re-ping before every query, but a quick pre-query peek could well save some in-flight requests from trouble.

Right, that's the whole idea here. It's cheap enough to do on pool checkout, so we can avoid errors on the case that the server restarted for connections that weren't doing mid-flight queries. That can significantly reduce the number of exceptions for example during a primary failover, or when using something like Vitess, when vtgates are restarted.

@matthewd
Copy link
Copy Markdown
Collaborator

matthewd commented Nov 7, 2023

we can avoid errors on the case that the server restarted for connections that weren't doing mid-flight queries

Yes, Rails already handles that, and has for a long time. Historically we did it with a ping on checkout; we now do it by detecting the problem and reconnecting + recovering mid request.

@dbussink
Copy link
Copy Markdown
Contributor Author

dbussink commented Nov 7, 2023

Yes, Rails already handles that, and has for a long time. Historically we did it with a ping on checkout; we now do it by detecting the problem and reconnecting + recovering mid request.

I think that we also have rails/rails#49811 in the mix here for us internally, which is causing also a problem for us that it's never reconnecting. With the details in that PR I can easily reproduce a broken reconnect logic in our case (if the first query after a broken connection does a query which uses quote_string it never seems to recover). So that confused me here for a bit.

it would be ridiculous to re-ping before every query, but a quick pre-query peek could well save some in-flight requests from trouble.

How do you think we should go about this one in Rails itself? I'm not really sure how that would be used then there, you probably know better than I do.

@matthewd
Copy link
Copy Markdown
Collaborator

matthewd commented Nov 7, 2023

I'm picturing that we could re-check it here, where we otherwise breeze through just trusting the previously verified [within request] connection is still good: https://github.com/rails/rails/blob/d53ef2c18073a3a662edddd3fbbce32805e1135a/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb#L1012-L1014

@jhawthorn
Copy link
Copy Markdown
Collaborator

jhawthorn commented Nov 7, 2023

I had a similar idea: main...jhawthorn:trilogy:connected_p I was also checking getsockopt w/ SO_ERROR but didn't get around to testing how useful that was.

So I'm in favour of having an API like this but have two thoughts I'd like to know others opinions on:

  1. For naming I don't care for check?. I went with connected? in my prototype, which I still like other than it sounding like too much of a guarantee. I also considered available?. Maybe likely_connected?, likely_available?. Curious of others thoughts.
  2. In my prototype I avoided raising exceptions at all, as I think that's one of the things which has been challenging for users to deal with, and in this case you have to check both the return value and rescue for an exception (ie. to a user calling this they are going to want to do the same thing for the EOF 0 as for an ECONNRESET). I would prefer we never raise an exception and just return true/false. If others do prefer exceptions (I can see the argument that it would allow reuse of existing error handling and would be a drop-in replacement for ping, it also provides the most information) then I think we should raise for TRILOGY_CLOSED_CONNECTION as well and name the method check (without the predicate).

@dbussink
Copy link
Copy Markdown
Contributor Author

dbussink commented Nov 7, 2023

So I'm in favour of having an API like this but have two thoughts I'd like to know others opinions on:

@jhawthorn I don't really have strong opinions on either really. The bit I prefer about the implementation here is that code lives also in the C driver bits, seems to be long a bit more there than purely in the Ruby extension code. Naming wise I'm up for changing it to whatever is preferred really.

2. I would prefer we never raise an exception and just return true/false. If others do prefer exceptions (I can see the argument that it would allow reuse of existing error handling and would be a drop-in replacement for ping, it also provides the most information)

The reason I kept exceptions here is that in case of any other unexpected errors (so it's not a regular closed connection or an open one), it seems useful to know something unexpected is broken / wrong, vs. hiding that if it's purely true / false.

@jhawthorn
Copy link
Copy Markdown
Collaborator

jhawthorn commented Feb 5, 2024

Thank you! Merged as #154 using the client.check API.

@jhawthorn jhawthorn closed this Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants