Add allow_message_ttl migration option#85
Open
lukebakken wants to merge 11 commits into
Open
Conversation
Add an `allow_message_ttl` boolean field to the `migration_opts` record and parse it from the start-migration request body, defaulting to false. This commit only wires up parsing; nothing reads the field yet. Honoring it in the suitability checks and forcing tolerance follow in later commits for gh-80.
Thread an options map through the queue suitability and compatibility checks so that `message_ttl` is no longer treated as a blocking issue when `allow_message_ttl` is true. The flag is applied in both places that detect message TTL: `collect_all_suitability_issues` (readiness and eligibility) and `check_critical_message_ttl` (per-queue compatibility). The migration pre-flight passes the flag from the `migration_opts` record, and the readiness check passes it from the request options map. Internal helper arities were changed in place rather than preserving the old signatures. This commit makes message-TTL queues eligible for migration. Forcing the message-count tolerance to 100 when the flag is set is a separate commit.
Set the `allow_message_ttl` field on the `migration_opts` record from the request options, and when it is true force the message-count `tolerance` to 100.0 in both directions, overriding any explicit tolerance the caller passed. Opting in means accepting that any or all messages in message-TTL queues may expire during migration, so verification must not fail on the resulting count difference. This tolerance override is migration-wide: it relaxes message-count verification for every queue in the run, not only the message-TTL queues. That tradeoff is intentional, because per-message TTL cannot be detected at the queue level, and will be documented on the option.
Move the coupled `allow_message_ttl` and `tolerance` resolution out of `build_migration_opts` into a dedicated `message_ttl_opts/1`. The flag determines the tolerance, so both are resolved together and returned as a tuple, dispatched via function heads rather than an inline case.
Make every start-migration option validate strictly: a present-but-invalid
value now returns a 400 with a per-option reason instead of being silently
ignored. An absent option still uses its default. The option parsers return
`{ok, Opts} | {error, Reason}` and `parse_all_options` stops at the first
invalid value.
Reuse the same parser for the compatibility check endpoint, replacing the
minimal parser that only read `skip_unsuitable_queues`. The check and start
endpoints now parse options identically, so a compatibility check passes or
fails exactly as the actual migration would, including honoring
`allow_message_ttl`.
Also give `batch_order` an explicit absent-value clause so an omitted value
is not treated as invalid.
Document the `allow_message_ttl` option on both the start and check endpoints, including that it forces the message-count tolerance to 100% migration-wide and that the migrated quorum queue retains its `x-message-ttl`. Document the strict-validation behavior: an invalid request body option value now returns a 400 instead of being silently ignored. Fix correctness issues found while auditing the document: the `tolerance` default was wrongly described as `0.0` exact-match (it is 5.0% over / 0.0% under), and the check endpoint was described as accepting only `skip_unsuitable_queues` when it now parses the same option set as start. Note in the status fields that `allow_message_ttl` is not persisted.
Add a "Mnesia Schema Compatibility" section near the persisted record definitions explaining why changing the `queue_migration` / `queue_migration_status` records is upgrade-sensitive: rows are bare tuples validated by arity, field names are frozen in schema metadata at table-creation time, and adding or renaming a field needs `mnesia:transform_table` (awkward across a live cluster) or, preferably, storing extensible data in a single map field. Includes the `allow_message_ttl` decision as a worked example.
Wire the `allow_message_ttl` migration option through the Java integration harness so end-to-end runs can exercise it: a config field with accessors, a `--allow-message-ttl` CLI flag, a `QueueMigrationClient` overload that sends the option in the start request, and the end-to-end flow passing it through. The existing three-argument startMigration overload delegates with `false`, so current callers are unaffected.
Add an `allow_message_ttl_options` group to unit_SUITE covering the deterministic option logic: `allow_message_ttl` parses as a boolean, defaults are applied (the two boolean options are always materialized; value-typed options stay absent), invalid values for any option are rejected, and `message_ttl_opts/1` forces tolerance to 100.0 when the flag is set (overriding any supplied tolerance) and otherwise passes the supplied tolerance (or undefined) through. Expose `rqm_mgmt:parse_all_options/2` and `rqm:message_ttl_opts/1` under `-ifdef(TEST)` for the suite, matching the existing rqm_checks pattern.
Add an integration test (`allow-message-ttl-test`) that creates mirrored classic queues with a subset carrying a queue-level message TTL and verifies the deterministic behavior: migration fails without the flag (the TTL queues are unsuitable), succeeds with `allow_message_ttl=true`, all queues become quorum, and the TTL queues retain `x-message-ttl`. A long TTL is used so no messages expire during migration, keeping the test deterministic. Register it in the Main dispatcher. Document the lossy path (messages expiring mid-migration, with the forced 100% tolerance accepting the difference) as a manual reproduction in docs/SKIP_UNSUITABLE_QUEUES.md, since making messages expire inside the migration window is timing-dependent and would be flaky to assert. The test references that section for why the lossy case is not automated.
Add an `integration-allow-message-ttl` target and include it in the `integration-test` aggregate so the new test runs as part of the integration suite (registering the command in Main alone does not make CI run it). Also list all per-test integration targets in `.PHONY`, which previously named only a subset.
5a2a870 to
adf7155
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in
allow_message_ttloption to the queue migration start (and check) endpoints. When set, queues with a queue-level message TTL (thex-message-ttlqueue argument ormessage-ttlpolicy key) are migrated instead of being treated as unsuitable. Resolves #80.Without the option, these queues are blocked with the
message_ttlsuitability reason because messages can expire during migration and fail message-count verification. Opting in means accepting that loss: the option forces the message-counttoleranceto 100% for the migration so verification does not fail on the resulting difference. The migrated quorum queue keeps itsx-message-ttl.Behavior
allow_message_ttl(boolean, defaultfalse) onPOST /queue-migration/start[/:vhost].POST /queue-migration/check/:vhost, so a compatibility check passes or fails exactly as the real migration would. The check endpoint now uses the same option parser as start.toleranceto 100% in both directions, overriding any explicittolerance. This is migration-wide (it relaxes count verification for every queue in the run, not only the message-TTL queues); the tradeoff is documented on the option.queue_migrationrecord. A migration run with the option reflects it indirectly via the persistedtoleranceof 100. Persisting migration options properly (as a map) is tracked in Evaluate persisting migration options as a map (e.g. allow_message_ttl in status) #81.Validation hardening
All start-migration options now validate strictly: a present-but-invalid value returns a 400 with a per-option reason instead of being silently ignored. An absent option still uses its default. The check endpoint shares this parser, so the two endpoints cannot drift.
Tests
unit_SUITE): option parsing, strict-validation rejection per option, defaults, and the tolerance-forcing logic.AllowMessageTtlTest, wired into theintegration-testworkflow): a mixed set of queues where a subset has a queue-level TTL; migration fails without the flag, succeeds with it, all queues become quorum, and the TTL queues retainx-message-ttl. Verified against a 3-node cluster.docs/SKIP_UNSUITABLE_QUEUES.mdrather than asserted, to avoid a flaky test.Docs
docs/HTTP_API.md: documentsallow_message_ttlon both endpoints and the new invalid-option 400 behavior. Also corrects pre-existing inaccuracies found while auditing (thetolerancedefault was wrongly described as 0.0 exact-match; it is 5.0% over / 0.0% under, and the check endpoint was described as accepting onlyskip_unsuitable_queues).docs/SKIP_UNSUITABLE_QUEUES.md: addsallow_message_ttlas a resolution formessage_ttland the manual reproduction recipe.AGENTS.md: documents Mnesia schema-compatibility constraints for the persisted records, explaining whyallow_message_ttlis not persisted and what a future schema change would require.Follow-ups filed
EndToEndMigrationTestmessage-count validation tolerance- andallow_message_ttl-aware.