vitess: add vschemas with vindexes#8634
Conversation
|
@jsha, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values. |
1 similar comment
|
@jsha, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values. |
|
One thing worth noting about this approach: while it's kind of nice that the |
aarongable
left a comment
There was a problem hiding this comment.
It would be great for this PR to come with some explicit documentation pointing towards the mariadb and vitess docs around loading schemas from disk. I had to dig for a bit until I found mariadb's docker-entrypoint process_init_files and finally understood how the schemas were getting loading into mariadb.
Overall LGTM, it took me a minute to wrap my head around how things work in this new world but the result is much cleaner and simpler than the status quo. I like it.
|
Hmm, I thought the integration failure was a flake, but the exact same failure happened on my retry. Looks like it might be real. |
|
The error is from revocation_test:297: This is testing that when we cause CT to reject a certificate (so we never create a final certificate), we can still revoke it. As part of the setup, we check that (a) the certificate issuance really did fail, and (b) the error message said something about SCT embeddeding. In fact we got some errors like these: and these: Also, with the addition of some improved error returns in the failOrder path I saw: I'm able to reproduce locally, intermittently. Still needs more investigation. Also I'm noticing a couple of things in the output from TestRevocation:
|
It was causing timeouts under vitess.
|
After a bunch of running in a loop, I concluded that this bug was triggered by the parallelism in TestRevocation. Two kinds of certs (precert, final cert), times three revocation reasons, times four methods of revocation, resulted in 24 parallel issuances. I don't understand why 24 parallel issuances caused the symptoms we saw. After adding some debug prints, it appears that on certain runs, several concurrent requests to AddCertificate would error out with I'd still like to understand this better, but given that only this test with its high parallelism runs into the problem, I decided to reduce the parallelism for now. |
|
Okay, now I think I understand the cause if not the mechanism. As @beautifulentropy previously found, The issue can be more reliably reproduced with the below sa unittest. This fails if the parallelism is 4, passes if the parallelism is 3. And passes, even with high parallelism, on What I still don't understand is exactly what is supposed to happen, when we hit the transaction cap. What happens in practice, if you run that unittest with parallelism 4, is that it opens all four transactions, does the selects, and then hangs on the inserts until the queries time out. All that said, I am reasonably satisfied with the understanding I've reached here, and consider this PR ready for review. |
beautifulentropy
left a comment
There was a problem hiding this comment.
Overall the change looks great, just a couple nits and one actual issue.
In order to exercise sharding logic in Vitess, we need our integration test environment to use sharded keyspaces, which means we need to use vindexes. This PR enables vindexes and changes DB initialization in support. It does not yet shard the keyspaces since doing so causes some consistency issues I still need to debug.
DB initialization changes:
boulder_sa/boulder_sa_next. This allows the two to coexist, which means we don't need todown --volumesthebmariadbcontainer when running locally and switching from./tn.shto./t.sh.bmariadbimage has support for applying SQL schemas from a directory. Use that. Thevttestserverhas support for applying SQL schemas and vschemas from a directory.Foreign keys: we had one foreign key constraint (
serials.registrationID->registrations), but we probably don't actually want this in the Vitess world. It makes things more complicated with no benefit. I removed it from the SQL schemas.test/db.go: Invttestserverwe can't delete more than 10k rows, so I changeddeleteEverythingInAllTablesto delete in a loop. Incidentally I cleaned up some stuff about foreign keys and1 = 1that is no longer needed. Also, I changed deletion calls insa_test.goto use the shared deletion functions.Deletion from
overrideswas failing under this setup withError 1105 (HY000): VT09015: schema tracking required. I concluded this was because the table had no primary key. I changed itsUNIQUE KEYto aPRIMARY KEY.test/vars/vars.go: calculate DSNs based on theBOULDER_CONFIG_DIRvariables, to select betweenboulder_saandboulder_sa_next(etc.).