Skip to content

CASSANDRA-20975 Add capability to load pluggable compression service providers#4804

Open
shyla226 wants to merge 4 commits into
apache:trunkfrom
intel-staging:CASSANDRA-20975-plugin-compressor
Open

CASSANDRA-20975 Add capability to load pluggable compression service providers#4804
shyla226 wants to merge 4 commits into
apache:trunkfrom
intel-staging:CASSANDRA-20975-plugin-compressor

Conversation

@shyla226
Copy link
Copy Markdown
Contributor

Adds support for plugin compression.

shyla226 and others added 2 commits May 12, 2026 11:24
2) Added a new API serializedAs() in ICompressor
3) Added new distributed tests and unit tests for CompressorRegistry & CompressionParams
Copy link
Copy Markdown
Contributor

@jolynch jolynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the approach. I've left a few interface questions but nothing too big.

I would like to see a little more safety around runtime construction and test coverage of the various fallback paths and such. I'm slightly worried about the cases where someone sideloads an accelerator that throws or doesn't respect the new serializedAs contract and then the operator doesn't know it's unsafe to upgrade to hardware that doesn't have that plugin supported.

Also what do we think about a quick sanity check of the ICompressor interface when we load the provder? When there is a custom provider at startup time, just roundtrip some basic smoketest bytes through the ICompressor interface to make sure that at least the provider passes a sniff test, for example initialCompressedBufferLength, compress, uncompress, supports, recommendedUses etc ... all match the default provider's implementation?

Comment thread src/java/org/apache/cassandra/config/Config.java
Comment thread conf/cassandra.yaml
# Cassandra will attempt to load the provider with the specified class_name for each compressor key.
# Parameters:
# - fallback_to_default_provider: If "true", falls back to DefaultCompressionProvider
# when the plugin fails to load. If "false", throws an exception on failure.
Copy link
Copy Markdown
Contributor

@jolynch jolynch May 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just a note on boolean stringly types, anything other than "true" will be false, so for example "fallback_to_default_provider": "yes" will also be false. What do we think about an enum type with explicit values, something like?:

# If the provider fails to load, either warn (the default) or prevent startup via error
on_failure: [fallback, exception]

Also curious of Stefan's opinion. I generally prefer enums for behavior similar to how disk_failure_policy works because it a) is very clear as an operator what you're getting, and b) allows you to have alternatives later.

Copy link
Copy Markdown
Contributor

@smiklosovic smiklosovic May 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jolynch in general, I prefer enum as it is more expressive over trying to cover all possible "trues" (yes, Yes, YES, true, True, 1 ...). I do not think there is a single case in whole codebase which would do it by the latter approach. The former, the one you offer, is more viable, however by introducing that, we are deviating from the crypto_provider approach where it is "true / false" too.

crypto_provider:
  - class_name: org.apache.cassandra.security.DefaultCryptoProvider
    parameters:
      - fail_on_missing_provider: "false"

I just realized that we should rename this to "fail_on_missing_provider" instead of "fallback_to_default_provider" to be aligned with crypto_provider.

So, I am slightly leaning towards true / false for now and then we might look at this more robustly in a separate ticket and we can try to support at least non-case-sensitive "yes" and "no" together with "true" / "false" for all other cases where applicable. Trying to introduce "one-off" seems a little bit less optimal instead of being consistent over the whole configuration file.

For example, all guardrails use "true / false" for now, we can indeed make it support "yes / no" but as said, I would delegate the investigation to a follow-up ticket.

Comment thread conf/cassandra.yaml
Comment thread src/java/org/apache/cassandra/io/compress/CompressorRegistry.java
*/
default Class<? extends ICompressor> serializedAs()
{
return getClass();
Copy link
Copy Markdown
Contributor

@jolynch jolynch May 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly if this yields the non C* name we can no longer deserialize if the provider is removed or unavailable (e.g. change instance types to something without the accelerator)? Can we check serializedAs as early as we can, maybe in createCompressor (before we write any data that a reader can't read), just check that the short name matches the key?

I think a unit test where a compressor author of the plugin compressor forgets to implement serializedAs but the operator registers it as an override for a valid class (e.g. LZ4Compressor) would demonstrate the problem, which I believe will lead to sstables we can't decompress later if we upgrade the hardware to something that cannot use the accelerator (e.g. a different architecture).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point, we can make this more tight and check.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the fix but have not added a test.

}

@Test
public void testCustomProviderRegisteredAndUsedForTableCompression() throws Throwable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a useful variant of this test would be that you can open the sstable once you don't have the provider any more, ideally across all supported compressors. If that's too slow, a single register -> encode -> remove registration -> decode loop might be sufficient.

Basically test the property that data written with the provider on one node can be read on another.

{
throw new ConfigurationException("Cannot initialize class " + compressorClass.getName());
}
return registry.getProvider(compressorClass).createCompressor(compressorClass, compressionOptions);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the fallback happen here as well? If the provider fails to create the compressor and fallback is enabled we use the default provider? We could also just encapsulate that in the provider and not throw a runtime exception here.

Copy link
Copy Markdown
Contributor

@smiklosovic smiklosovic May 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we can do that, added.

* @return true if the provider is healthy and ready to use, false otherwise
* @throws Exception if an error occurs during the health check
*/
public abstract boolean isHealthy() throws Exception;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test of fallback on isHealthy throwing, I think there is a test that excercises false but not exception (I believe both should fallback).

}
}

public static class UnhealthyTestProvider extends AbstractCompressionProvider
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have variants of this that throw exceptions out of healthy or init or the other places we support fallback. (I'd just make one class that has different ways of being unhealthy, e.g. can throw, or fail to create instances when called etc ...), then have the test explore all the ways this interface can fail and test fallback in those cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants