CASSANDRA-20975 Add capability to load pluggable compression service providers#4804
CASSANDRA-20975 Add capability to load pluggable compression service providers#4804shyla226 wants to merge 4 commits into
Conversation
2) Added a new API serializedAs() in ICompressor 3) Added new distributed tests and unit tests for CompressorRegistry & CompressionParams
There was a problem hiding this comment.
Overall I like the approach. I've left a few interface questions but nothing too big.
I would like to see a little more safety around runtime construction and test coverage of the various fallback paths and such. I'm slightly worried about the cases where someone sideloads an accelerator that throws or doesn't respect the new serializedAs contract and then the operator doesn't know it's unsafe to upgrade to hardware that doesn't have that plugin supported.
Also what do we think about a quick sanity check of the ICompressor interface when we load the provder? When there is a custom provider at startup time, just roundtrip some basic smoketest bytes through the ICompressor interface to make sure that at least the provider passes a sniff test, for example initialCompressedBufferLength, compress, uncompress, supports, recommendedUses etc ... all match the default provider's implementation?
| # Cassandra will attempt to load the provider with the specified class_name for each compressor key. | ||
| # Parameters: | ||
| # - fallback_to_default_provider: If "true", falls back to DefaultCompressionProvider | ||
| # when the plugin fails to load. If "false", throws an exception on failure. |
There was a problem hiding this comment.
So just a note on boolean stringly types, anything other than "true" will be false, so for example "fallback_to_default_provider": "yes" will also be false. What do we think about an enum type with explicit values, something like?:
# If the provider fails to load, either warn (the default) or prevent startup via error
on_failure: [fallback, exception]
Also curious of Stefan's opinion. I generally prefer enums for behavior similar to how disk_failure_policy works because it a) is very clear as an operator what you're getting, and b) allows you to have alternatives later.
There was a problem hiding this comment.
@jolynch in general, I prefer enum as it is more expressive over trying to cover all possible "trues" (yes, Yes, YES, true, True, 1 ...). I do not think there is a single case in whole codebase which would do it by the latter approach. The former, the one you offer, is more viable, however by introducing that, we are deviating from the crypto_provider approach where it is "true / false" too.
crypto_provider:
- class_name: org.apache.cassandra.security.DefaultCryptoProvider
parameters:
- fail_on_missing_provider: "false"
I just realized that we should rename this to "fail_on_missing_provider" instead of "fallback_to_default_provider" to be aligned with crypto_provider.
So, I am slightly leaning towards true / false for now and then we might look at this more robustly in a separate ticket and we can try to support at least non-case-sensitive "yes" and "no" together with "true" / "false" for all other cases where applicable. Trying to introduce "one-off" seems a little bit less optimal instead of being consistent over the whole configuration file.
For example, all guardrails use "true / false" for now, we can indeed make it support "yes / no" but as said, I would delegate the investigation to a follow-up ticket.
| */ | ||
| default Class<? extends ICompressor> serializedAs() | ||
| { | ||
| return getClass(); |
There was a problem hiding this comment.
If I understand correctly if this yields the non C* name we can no longer deserialize if the provider is removed or unavailable (e.g. change instance types to something without the accelerator)? Can we check serializedAs as early as we can, maybe in createCompressor (before we write any data that a reader can't read), just check that the short name matches the key?
I think a unit test where a compressor author of the plugin compressor forgets to implement serializedAs but the operator registers it as an override for a valid class (e.g. LZ4Compressor) would demonstrate the problem, which I believe will lead to sstables we can't decompress later if we upgrade the hardware to something that cannot use the accelerator (e.g. a different architecture).
There was a problem hiding this comment.
Yes good point, we can make this more tight and check.
There was a problem hiding this comment.
I added the fix but have not added a test.
| } | ||
|
|
||
| @Test | ||
| public void testCustomProviderRegisteredAndUsedForTableCompression() throws Throwable |
There was a problem hiding this comment.
I think a useful variant of this test would be that you can open the sstable once you don't have the provider any more, ideally across all supported compressors. If that's too slow, a single register -> encode -> remove registration -> decode loop might be sufficient.
Basically test the property that data written with the provider on one node can be read on another.
| { | ||
| throw new ConfigurationException("Cannot initialize class " + compressorClass.getName()); | ||
| } | ||
| return registry.getProvider(compressorClass).createCompressor(compressorClass, compressionOptions); |
There was a problem hiding this comment.
Can the fallback happen here as well? If the provider fails to create the compressor and fallback is enabled we use the default provider? We could also just encapsulate that in the provider and not throw a runtime exception here.
There was a problem hiding this comment.
yes we can do that, added.
| * @return true if the provider is healthy and ready to use, false otherwise | ||
| * @throws Exception if an error occurs during the health check | ||
| */ | ||
| public abstract boolean isHealthy() throws Exception; |
There was a problem hiding this comment.
Can you add a test of fallback on isHealthy throwing, I think there is a test that excercises false but not exception (I believe both should fallback).
| } | ||
| } | ||
|
|
||
| public static class UnhealthyTestProvider extends AbstractCompressionProvider |
There was a problem hiding this comment.
You could have variants of this that throw exceptions out of healthy or init or the other places we support fallback. (I'd just make one class that has different ways of being unhealthy, e.g. can throw, or fail to create instances when called etc ...), then have the test explore all the ways this interface can fail and test fallback in those cases.
Adds support for plugin compression.