fix(opensearch): repair bare orphan index on bootstrap without discarding populated indices (#36237)#36353
Conversation
…ding populated indices (#36237) The idempotent-bootstrap reuse path from #36238 re-asserted the custom mapping via putMapping against an orphaned cluster index. On a bare orphan (no settings) this failed with HTTP 400 because the content mapping references the custom analyzer `my_analyzer`, which is a static index setting that can only be applied at creation time — leaving the index half-mapped (QA TC-003). Decide the orphan's fate by document count so a populated index is never discarded: - empty orphan (0 docs): delete and recreate from scratch, restoring full settings + base mapping + custom mapping. An empty index has no data and no reindex progress, so this costs nothing operationally. - populated orphan (>0 docs) or unknown count: reuse in place, untouched. A dotCMS-created index already carries the full mapping; deleting it would force a full reindex (hours, degraded search). On any uncertainty (the count probe fails) we err toward reuse. Covered by unit tests (empty->recreate, populated->reuse-untouched, probe-fails->reuse, delete-fails->still-creates, OS tag propagation) and a regression IT that recreates a bare orphan against a real OpenSearch cluster and asserts the recreated index carries the dynamic templates and my_analyzer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Claude finished @fabrizzio-dotCMS's task in 2m 22s —— View job Rollback Safety Analysis
Result: ✅ Safe To RollbackThis PR changes 3 files, all scoped to OpenSearch bootstrap logic and tests:
Analysis Against Every Unsafe Category
ReasoningThe entire diff is contained within the bootstrap seam of |
🤖 dotBot Review (Bedrock)Reviewed 3 file(s); 2 candidate(s) → 0 confirmed, 0 uncertain (unverified, kept for review). ✅ No issues found after verification. us.deepseek.r1-v1:0 · Run: #28390053907 · tokens: in: 28972 · out: 11674 · total: 40646 · calls: 7 · est. ~$0.102 |
…cknowledged (#36237) Addresses PR review: if the empty-orphan delete is not acknowledged the index may still exist, so the follow-up create would throw resource_already_exists — which bootstrapAndPointOS does not catch (it only handles IOException), aborting bootstrap (the original #36237 failure mode). Now, when the delete is not acknowledged, the empty orphan is reused in place instead of attempting a doomed create. The index is empty so nothing is lost, and a later clean restart recreates it properly once the cluster is healthy. Unit test updated accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the 🟡 Medium finding (failed orphan deletion → create failure) in 47b6899. The empty-orphan branch no longer falls through to a create after a non-acknowledged delete. If the delete isn't confirmed (so the index may still exist), the orphan is now reused in place instead of attempting a create that would throw Unit test |
…n when delete is stuck (#36237) Addresses the 🔴 Critical review finding: the previous revision returned true (reused the bare orphan) when its delete was not acknowledged, which would register an un-repairable half-mapped index in the store — bare orphans have no custom analyzer and their mapping cannot be repaired in place. Now, when the empty-orphan delete is not acknowledged, re-probe existence: - index gone (delete took effect without an ack) -> recreate cleanly; - index still present -> throw, failing bootstrap loudly with a clear message rather than recreating (would throw resource_already_exists) or registering a half-mapped index. This is an abnormal cluster state, not the orphan-name collision the method otherwise resolves. Unit tests: split the unacked-delete case into index-gone->recreate and index-still-present->fail-loud. 9/9 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the 🔴 Critical finding (failed orphan deletion leaves broken index state) in 8b8da22. You're right — the previous revision's New behavior when the empty-orphan delete is not acknowledged — re-probe existence:
Unit tests split into |
|
Tick the box to add this pull request to the merge queue (same as
|
Problem
Re-fix for #36237 (QA failed the prior PRs #36238/#36240 on TC-003).
The idempotent-bootstrap reuse path introduced in #36238 re-asserted the custom mapping via
putMappingagainst an orphaned cluster index (one that exists in the cluster but is missing from the dotCMS index store). On a bare orphan this failed:Root cause: the content mapping references the custom analyzer
my_analyzer, defined inos-content-settings.json. Analyzers are static index settings that can only be applied at index creation — so aputMapping-only re-assert against a bare index is rejected (analyzer [my_analyzer] not found) and the index is left half-mapped, with the dotCMS dynamic templates missing.How the orphan arises
In the migration catch-up path the OS index name is derived deterministically by mirroring the ES name (
working_T0→cluster_X.working_T0.os), not generated with a fresh timestamp. If a prior bootstrap created that physical index but crashed before committing itsVersionedIndicesstore pointer, the next restart re-derives the same name, finds it already in the cluster, and the create fails withresource_already_exists.Fix
Decide the orphan's fate by document count, so a populated index is never discarded:
putMapping-reuse could not repair.The delete only ever fires against a demonstrably empty orphan, and only in the bootstrap path for a store slot that is not registered — never against the active production index.
Tests
ContentletIndexAPIImplBootstrapTest, 8/8): empty→recreate, populated→reuse-untouched, doc-count-probe-fails→reuse, empty-orphan-delete-fails→still-creates, missing→create, create-fails→no-mapping, existence-probe-fails→create, OS-tag propagation.ContentletIndexAPIImplMigrationIntegrationTest): new regression IT creates a bare orphan against a real OpenSearch cluster, runs the bootstrap seam, and asserts the recreated index carries the dotCMS dynamic templates (template_1) and themy_analyzersetting. Verified green against OpenSearch:Closes #36237
🤖 Generated with Claude Code
This PR fixes: #36237