Improvement/s3 utils 233/script to manually repair md primary backup from secondary#386
Conversation
Hello fredmnl,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Incorrect fix versionThe
Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:
Please check the |
| expectedBseq = bseq + 1 | ||
| } | ||
|
|
||
| log.Printf(" scanned copy 0: %d entries (max bseq %d)", primary.count, primary.maxBseq) |
There was a problem hiding this comment.
If copy 0 is completely empty (e.g. a full primary failure), the iterator returns nothing, expectedBseq stays at minBseq, and the function reports zero inconsistencies. This silently masks the worst-case scenario.
Consider adding a check after the loop: if primary.count == 0, query at least one secondary to see if it has entries, and warn or return an error.
— Claude Code
| endpoint: endpoint, | ||
| path: path, | ||
| client: &http.Client{ | ||
| Timeout: 5 * time.Minute, |
There was a problem hiding this comment.
http.Client.Timeout covers the entire request lifecycle including body streaming. Since Get() returns resp.Body and the caller streams it into Put(), the 5-minute clock from the GET is still ticking during the copy. For large backup segments this could fire mid-transfer, leaving a partial object at the destination key.
Consider using per-request context timeouts for the initial response only (e.g. via http.NewRequestWithContext) and letting body streaming run without a hard cap, or at least bumping this significantly and documenting the maximum supported segment size.
— Claude Code
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## development/1 #386 +/- ##
==============================================
Coverage 44.12% 44.12%
==============================================
Files 88 88
Lines 6448 6448
Branches 1348 1348
==============================================
Hits 2845 2845
Misses 3555 3555
Partials 48 48 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7f04f2c to
4015265
Compare
| stdinReader.ReadByte() | ||
| } | ||
| fmt.Fprint(os.Stderr, "press enter to continue (or ctrl-c to abort)... ") | ||
| stdinReader.ReadString('\n') |
There was a problem hiding this comment.
confirmStep silently proceeds when stdin is closed (e.g. cron job, Docker without TTY). ReadString returns io.EOF which is discarded, so the tool runs all destructive repair steps without real user confirmation. Check the error and abort on EOF.
| stdinReader.ReadString('\n') | |
| if _, err := stdinReader.ReadString('\n'); err != nil { | |
| log.Fatalf("stdin closed unexpectedly (non-interactive environment?): %v — use -y to skip prompts", err) | |
| } |
— Claude Code
| return nil, err | ||
| } | ||
| if !ok { | ||
| break |
There was a problem hiding this comment.
When maxBseq is set and the primary iterator is exhausted before reaching maxBseq, bseqs between the primary's last entry and maxBseq are silently skipped. The existing MaxBseq and MinAndMaxBseq tests only pass because the primary fixture includes bseq=8 beyond maxBseq, acting as a sentinel. If the primary's last entry were below maxBseq with no entries beyond it, those trailing gaps would be missed.
Consider scanning expectedBseq..maxBseq after the loop exits when maxBseq > 0.
— Claude Code
| req.ContentLength = info.ContentLength | ||
| if info.UserMD != "" { | ||
| req.Header.Set("X-Scal-Usermd", info.UserMD) | ||
| } |
There was a problem hiding this comment.
Unlike Get and getBackupIndexPage, Put discards the response body before checking the status code, losing diagnostic info from the server on failure.
| } | |
| body, _ := io.ReadAll(resp.Body) | |
| if resp.StatusCode != http.StatusOK { | |
| return fmt.Errorf("PUT %s returned %d: %s", url, resp.StatusCode, string(body)) | |
| } |
— Claude Code
Review by Claude Code |
4015265 to
bd328ba
Compare
|
|
||
| const maxStallPolls = 10 | ||
|
|
||
| func (a *AdminClient) WaitForReindex(pollInterval time.Duration) error { |
There was a problem hiding this comment.
WaitForReindex has no overall wall-clock timeout. The stall detector only fires when ProcessingBseq stops advancing, so a reindex that advances by 1 bseq every 30 seconds will keep this loop alive indefinitely. Consider adding a maxDuration parameter (e.g. time.After) so the script cannot block forever in production.
— Claude Code
| return fmt.Errorf("POST %s: %w", url, err) | ||
| } | ||
| defer resp.Body.Close() | ||
| body, _ := io.ReadAll(resp.Body) |
There was a problem hiding this comment.
io.ReadAll error is silently discarded. If the read fails, the error message on line 172 will show an empty/partial body, hiding the actual server response.
— Claude Code
|
|
||
| // referenceKey reimplements the Node.js computeKey logic as a direct translation, | ||
| // to cross-check against the KeyGenerator implementation. | ||
| func referenceKey(raftSessionID string, installID int, bseq int, copyId int) string { |
There was a problem hiding this comment.
The test helper parameter is named raftSessionID but TestGenerateKeyWithBackupID passes a full backupID (bucket/7). Since the production code hashes cluster/raftSessionId via Config.BackupID(), the parameter name here is misleading and could mask a mismatch if someone modifies the hash input. Rename to backupID to match NewKeyGenerator.
— Claude Code
Review by Claude Code |
No description provided.