Skip to content

fix: add missing wg.Wait() in flaky storage cache test#2151

Merged
dobrac merged 2 commits into
mainfrom
dobrac/investigate-flaky-test
Mar 18, 2026
Merged

fix: add missing wg.Wait() in flaky storage cache test#2151
dobrac merged 2 commits into
mainfrom
dobrac/investigate-flaky-test

Conversation

@dobrac
Copy link
Copy Markdown
Contributor

@dobrac dobrac commented Mar 17, 2026

Summary

The TestCachedSeekable_ReadAt_PreservesEOF/EOF_from_inner_is_returned_to_caller_unchanged subtest was missing a c.wg.Wait() call, causing a race between async cache writeback and t.TempDir() cleanup. This made the test flaky with "directory not empty" errors during removal.

The sibling subtest and all other cache-writeback tests already call c.wg.Wait() to ensure async operations finish before cleanup.


Note

Low Risk
Low risk since changes are limited to tests, but they touch concurrency/timeout coordination and could still mask real timing issues if timeouts are too generous or too strict in CI.

Overview
Stabilizes two concurrency-sensitive tests by ensuring background work completes deterministically: cachedSeekable EOF preservation now waits for async cache writeback before temp dir cleanup, and StreamingChunker full-chunk caching verification replaces a polling Eventually loop with a single blocking Slice under a timeout to avoid races with Close().

Written by Cursor Bugbot for commit bcca097. This will update automatically on new commits. Configure here.

The EOF_from_inner_is_returned_to_caller_unchanged subtest was missing a
c.wg.Wait() call, causing a race between async cache writeback and t.TempDir()
cleanup. This made the test flaky with "directory not empty" errors.
Copy link
Copy Markdown
Contributor

@levb levb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, once the tests pass

@dobrac dobrac added the bug Something isn't working label Mar 17, 2026
@dobrac dobrac enabled auto-merge (squash) March 17, 2026 22:38
@dobrac dobrac disabled auto-merge March 17, 2026 22:52
@dobrac dobrac enabled auto-merge (squash) March 18, 2026 09:14
…g chunker test

require.Eventually spawns condition goroutines that race with
defer chunker.Close(), causing intermittent failures.
@dobrac dobrac merged commit e1fc1cc into main Mar 18, 2026
35 checks passed
@dobrac dobrac deleted the dobrac/investigate-flaky-test branch March 18, 2026 09:39
levb added a commit that referenced this pull request Mar 18, 2026
  Incorporate main commits: #2150 (validate specified IPs in egress),
  #2156 (disable request timeout), #2154 (envd init logs), #2151
  (storage cache test fix).

  Resolve conflict in sandbox_create.go by consolidating validation:

  Egress validation — matches main #2150 + port rejection:
  - validateEgressRules stays in sandbox_create.go (same location as main)
  - Uses IsSpecifiedIPOrCIDR from #2150 to reject unspecified addresses
  (0.0.0.0, ::, 0.0.0.0/24, etc.) while allowing 0.0.0.0/0
  - Uses ParseAddressesAndDomains to separate IPs from domains (same as main)
  - Adds port rejection (egress doesn't support port-specific rules)

  Ingress validation — new, added next to validateEgressRules:
  - validateIngressRules + validateIngressEntry in sandbox_create.go
  - Uses SplitHostPort to separate CIDR from port, validates each part
  - Uses IsSpecifiedIPOrCIDR for IPv4 + allows ::/0 for IPv6 block-all
  - Uses ParsePortRange for port/port-range validation
  - Rejects domains (ingress is IP/CIDR only)
  - Requires deny-all when allow rules are present

  Simplify rule.go — remove ParseRule/ParseRules:
  - Egress validation uses IsSpecifiedIPOrCIDR + ParseAddressesAndDomains directly
  - Ingress validation uses SplitHostPort + IsIPOrCIDR + ParsePortRange directly
  - Keep: Rule/ACL structs (hot-path matching), SplitHostPort, ParsePortRange (public)
make[1]: Entering directory '/home/lev/dev/infra/iac/provider-gcp'
  - Remove: ParseRule, ParseRules (tried to be one-size-fits-all, added complexity)

  Simplify orchestrator ACL building:
  - newEgressACL: uses parseCIDRs (direct net.ParseCIDR, no ParseRules)
  - newIngressACL/parseIngressRules: unchanged (builds from proto fields)

  Update create_instance.go:
  - buildEgressConfig: uses IsIPOrCIDR for domain detection (was ParseRule)
  - parseIngressRules: uses SplitHostPort + ParsePortRange (was ParseRule)

  Test updates:
  - Error messages updated to match main's IsSpecifiedIPOrCIDR style
  - Integration tests: replace ::/1/0.0.0.0/1 with ::/0/0.0.0.0/0
  (unspecified network addresses now rejected)
  - TestIsSpecifiedIPOrCIDR preserved from #2150
  - TestSplitHostPort, TestParsePortRange replace TestParseRule/TestParseRules
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants