Skip to content

Asserting for DB failovers #35

@jacobbednarz

Description

@jacobbednarz

I'm using Toxiproxy for our external services and we're now getting ready to do a bunch of DB failover work. To better handle our failovers without dropping queries, we've patched ActiveRecord to catch any MySQL errors, perform a reconnect and then try the query again. I can manually confirm this works by kicking off this script and either toggling the availability of the toxiproxy or DB server manually during the execution.

ATTEMPT_COUNT = 300
puts "==> Truncating the users test_db table"
ActiveRecord::Base.connection.execute("truncate test_db.user")

puts "==> Starting to send MySQL queries"
require 'securerandom'
ATTEMPT_COUNT.times do
  sleep 0.1
  hash = SecureRandom.uuid
  begin
    puts "    [#{Time.now.strftime("%T.%L")}] Inserting #{hash}"
    ActiveRecord::Base.connection.execute("INSERT INTO user (first_name, last_name) VALUES ('test', '#{hash}')")
    puts "    Success!"
  rescue Exception => e
    puts "    [#{Time.now.strftime("%T.%L")}] #{e}"
  end
end

row_count = ActiveRecord::Base.connection.execute("select * from user").count
puts
puts "Attempted writes: #{ATTEMPT_COUNT}"
puts "DB row count:     #{row_count}"
puts "Variance:         #{ATTEMPT_COUNT - row_count}"

However, I'm getting a little stuck when it comes to using Toxiproxy to emulate the failover completing. I first tried:

Toxiproxy[:mysql_master].down do
  User.first
end

It seems our patch works a little too well because it sits here waiting for the MySQL server to come back but it never does as the yield is still running. I then tried to split the enable/disable but still had the same results with the following:

Toxiproxy[:mysql_master].disable
User.first
Toxiproxy[:mysql_master].enable

Which leads me to the following questions:

  • Could you share how your using Toxiproxy with things like DB failovers? Is this something you're able to test similarly to my intention or do you handle it on a per model basis? I essentially need the proxy to only be present for a short period of time but re-enable after the time has passed.
  • The only way I could think of having this work would be to pass another argument to down (and later disable) which would only disable the proxy for a period of time. Is applying a non-blocking timeout to that functionality something you'd consider useful for the library?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions