Skip to content

Fix getdeps fallback mirror downloads#14763

Open
xingbowang wants to merge 2 commits into
facebook:mainfrom
xingbowang:2026_05_20_nightly_run_failure
Open

Fix getdeps fallback mirror downloads#14763
xingbowang wants to merge 2 commits into
facebook:mainfrom
xingbowang:2026_05_20_nightly_run_failure

Conversation

@xingbowang
Copy link
Copy Markdown
Contributor

@xingbowang xingbowang commented May 20, 2026

Summary:

  • parse folly getdeps manifests with bare package entries so fallback prefetching actually runs
  • validate and remove bad cached/downloaded archives before trying fallback mirrors
  • download through temporary files and include libiberty in the GNU toolchain fallback set

Context:

Nightly test failed with dependency download failure in folly.

Assessing autoconf...
Download with https://ftpmirror.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz -> /tmp/fbcode_builder_getdeps-Z__wZrocksdbZrocksdbZthird-partyZfollyZbuildZfbcode_builder-root/downloads/autoconf-autoconf-2.69.tar.gz ...
 [Complete in 136.616022 seconds]
    raise Exception(
Exception: https://ftpmirror.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz: expected sha256 954bd69b391edc12d6a4a51a2dd1476543da5c6bbf05a95b59dc0dd6fd4c2969 but got e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
make: *** [folly.mk:152: build_folly] Error 1
##[error]Process completed with exit code 2.

Test Plan:

  1. Connection failure:
  2. Empty file / bad hash:
    • Ran a local HTTP server returning a zero-byte autoconf-2.69.tar.gz
    • Script logged mismatch with actual=e3b0c442... size=0
    • It removed the bad download and fell back to mirrors.kernel.org
    • Download succeeded with the expected SHA.
  3. Existing zero-byte cache:
    • Seeded cache with an empty tarball.
    • Script removed invalid cache and downloaded a verified copy.

@meta-cla meta-cla Bot added the CLA Signed label May 20, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

⚠️ clang-tidy: 13 warning(s) on changed lines

Completed in 568.7s.

Summary by check

Check Count
concurrency-mt-unsafe 6
cppcoreguidelines-avoid-non-const-global-variables 4
cppcoreguidelines-pro-type-member-init 1
performance-inefficient-string-concatenation 2
Total 13

Details

db/db_wal_test.cc (1 warning(s))
db/db_wal_test.cc:117:5: warning: uninitialized record type: 'fs_stat' [cppcoreguidelines-pro-type-member-init]
db_stress_tool/db_stress_common.cc (2 warning(s))
db_stress_tool/db_stress_common.cc:22:25: warning: variable 'raw_env' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_common.cc:22:25: warning: variable 'raw_env' provides global access to a non-const object; consider making the pointed-to data 'const' [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_test_base.cc (8 warning(s))
db_stress_tool/db_stress_test_base.cc:253:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_test_base.cc:3974:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_test_base.cc:4438:52: warning: string concatenation results in allocation of unnecessary temporary strings; consider using 'operator+=' or 'string::append()' instead [performance-inefficient-string-concatenation]
db_stress_tool/db_stress_test_base.cc:4494:52: warning: string concatenation results in allocation of unnecessary temporary strings; consider using 'operator+=' or 'string::append()' instead [performance-inefficient-string-concatenation]
db_stress_tool/db_stress_test_base.cc:4511:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_test_base.cc:4545:11: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_test_base.cc:4583:11: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_test_base.cc:5324:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc (2 warning(s))
db_stress_tool/db_stress_tool.cc:39:49: warning: variable 'fault_fs_for_crash_report' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_tool.cc:39:49: warning: variable 'fault_fs_for_crash_report' provides global access to a non-const object; consider making the pointed-to data 'const' [cppcoreguidelines-avoid-non-const-global-variables]

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented May 20, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D105859558.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 556211b


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 556211b


Summary

This PR fixes real CI failures in the folly getdeps fallback mirror script by addressing manifest parsing, atomic downloads, cache validation, and adding libiberty. The changes are well-targeted and improve reliability and security. Most agent-flagged issues were resolved as false positives or theoretical concerns during debate.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None after debate resolution. Three initially-flagged HIGH findings were withdrawn:

  • ConfigParser interpolation attack -- false positive; the PR adds interpolation=None, fixing the vulnerability rather than introducing it.
  • os.path.getsize() crash after failed download -- false positive; code flow analysis shows download_url() exceptions hit continue, so getsize() only executes after successful downloads.
  • get_fallback_mirrors() returning [] -- intentional design; the script handles only known-unreliable mirrors, and packages without fallback patterns are left to getdeps.py's own download logic.

🟡 MEDIUM

M1. Verify manifest files only contain GNU package URLs -- build_tools/getdeps_fallback_mirror.py:28
  • Issue: get_fallback_mirrors() now returns [] for URLs not matching known GNU mirror patterns, causing those packages to be skipped. If any manifest for autoconf/automake/libtool/libiberty uses a non-GNU URL, downloads would silently fail.
  • Root cause: Intentional design change (script is specifically for unreliable GNU mirrors), but not validated against actual manifest contents.
  • Suggested fix: Verify that all four package manifests use GNU URLs. Consider adding a debug log line when a package is skipped due to no fallback mirrors for easier future debugging.
M2. prepare_download() has no top-level try/except -- build_tools/getdeps_fallback_mirror.py:85
  • Issue: If shutil.copy2(), os.remove(), or other filesystem operations raise unexpected exceptions (e.g., permission errors), the script crashes and remaining packages are not processed.
  • Root cause: The old code had the same gap, so this is not a regression, but the refactored structure makes it more apparent.
  • Suggested fix: Wrap the prepare_download() call in main() with a try/except to print a warning and continue to the next package.
M3. No automated test coverage
  • Issue: No unit or integration tests exist for this script. The PR includes good manual testing, but regressions won't be caught automatically.
  • Root cause: Build utilities in this codebase generally lack test coverage.
  • Suggested fix: Acceptable for now. Monitor CI builds post-merge; consider adding basic tests if issues arise.

🟢 LOW / NIT

L1. Redundant os.makedirs() calls -- build_tools/getdeps_fallback_mirror.py:168
  • Issue: folly.mk already creates both directories with mkdir -p before calling the script. The os.makedirs(exist_ok=True) calls are redundant.
  • Suggested fix: Harmless defensive coding; no change needed.
L2. Multiple SHA256 computations per file -- build_tools/getdeps_fallback_mirror.py
  • Issue: SHA256 may be computed up to 3 times per package (existing file, cache, download). For ~2MB files this is negligible.
  • Suggested fix: Not worth optimizing for a cold-path build script.
L3. 120s timeout is per-connection, not per-download
  • Issue: urllib.request.urlopen(timeout=120) sets socket-level timeout. For a slow connection, large files could take longer without triggering the timeout.
  • Suggested fix: Acceptable for ~2MB tarballs. No change needed.

Cross-Component Analysis

Context Applicable? Assessment
folly.mk integration Yes Compatible. Script output/exit code behavior preserved.
getdeps.py interaction Yes File naming {package}-{filename} matches existing convention.
wget removal Yes Net improvement. urllib.request has better defaults for SSL, redirects, proxies.
Concurrent execution No Script called from single make target; never runs concurrently.
Windows/macOS No folly build in this Makefile is Linux-only.

Assumption stress-test results:

  • "Atomic downloads" claim: Holds for single-process POSIX usage. SIGKILL can leave .tmp files, but these are cleaned up on next run.
  • "Bad files are validated and removed" claim: Holds. sha256_file() returning None on I/O error means the file is treated as non-existent (not validated), which is safe -- the download path will be attempted.
  • "urllib is suitable wget replacement" claim: Holds. urllib handles redirects (important for ftpmirror.gnu.org), validates SSL, and respects proxy env vars.

Positive Observations

  • Fixes the actual root cause of CI failures (empty files from wget + ConfigParser parsing).
  • Atomic temp-file download pattern prevents partial file persistence.
  • Cache validation removes bad cached files that caused repeated failures.
  • Removes external wget dependency in favor of Python stdlib.
  • Fixes ConfigParser interpolation vulnerability (interpolation=None).
  • Better error messages with SHA256 values and file sizes for debugging.
  • Adding libiberty to the package list addresses a known missing dependency.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented May 23, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D105859558.

@github-actions
Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 075b120


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase

@github-actions
Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 075b120


Summary

Solid bug fix that correctly addresses the CI failure (empty file accepted due to manifest parse failure + no invalid file cleanup). The core changes -- ConfigParser fix, atomic temp-file downloads, SHA256 validation with cleanup -- are well-engineered. A few medium-severity issues around edge cases and one behavioral subtlety worth noting.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🟡 MEDIUM

M1. No download size limit in download_url() -- getdeps_fallback_mirror.py:83
  • Issue: shutil.copyfileobj(response, output) reads unbounded data. A compromised mirror could serve an extremely large file, exhausting disk before the 120s timeout fires.
  • Root cause: No max-size guard on the download loop.
  • Suggested fix: Read in a loop with a cumulative size check, e.g.:
    MAX_DOWNLOAD_BYTES = 50 * 1024 * 1024  # 50 MB
    copied = 0
    while True:
        chunk = response.read(65536)
        if not chunk:
            break
        copied += len(chunk)
        if copied > MAX_DOWNLOAD_BYTES:
            raise Exception(f"Download exceeds {MAX_DOWNLOAD_BYTES} bytes")
        output.write(chunk)
  • Practical risk: Low -- mirrors are hardcoded reputable GNU mirrors and there's a 120s timeout. But defense-in-depth is good practice for CI infrastructure.
M2. Inconsistent SHA256 requirement between main() and prepare_download() -- getdeps_fallback_mirror.py:165,97
  • Issue: main() only checks if not info or not info["url"] (line 165), allowing packages with empty sha256 through. prepare_download() separately checks if not expected_sha256 (line 97) and returns False. This means a package with a URL but no SHA256 will increment checked but never ready, producing a misleading ready/checked count.
  • Root cause: Validation split across two functions.
  • Suggested fix: Either check sha256 in main() before incrementing checked, or have prepare_download() not count as "checked" when sha256 is missing.
M3. os.path.getsize() after failed SHA256 could raise if download was cleaned up -- getdeps_fallback_mirror.py:124-125
  • Issue: After download_url() succeeds, the code calls sha256_file(filepath) then os.path.getsize(filepath). If sha256_file() returns None (exception during hashing), os.path.getsize() could still succeed but actual_sha256 would be None, so the == check fails correctly. However, if sha256_file() returns None and the file was removed between the hash call and os.path.getsize(), the getsize() call would raise FileNotFoundError.
  • Root cause: Minor -- extremely unlikely race, but the flow is subtle.
  • Suggested fix: Consider storing sha256_file() result and checking for None before calling os.path.getsize().
M4. Race condition in shared cache directory on self-hosted runners -- getdeps_fallback_mirror.py:108-113
  • Issue: Multiple builds sharing /tmp/rocksdb-getdeps-cache could race on cache reads/writes. Process A validates and removes invalid cache; process B reads the cache path between validation and removal.
  • Root cause: No file locking on cache operations.
  • Practical risk: Low on GitHub Actions (isolated VMs) but medium on self-hosted runners or local parallel builds.
  • Suggested fix: Acceptable for now. Document the single-build assumption. If this becomes an issue, use fcntl.flock() or atomic cache writes.

🟢 LOW / NIT

L1. get_fallback_mirrors() returns [] for unknown URLs -- behavioral change
  • Issue: Old code returned [url] (try original URL), new code returns [] (skip entirely). This is correct for the current PACKAGES_TO_CHECK (all GNU), but if a non-GNU package is added to the tuple, it will be silently skipped.
  • Suggested fix: Add a brief comment on PACKAGES_TO_CHECK noting that entries must have corresponding MIRROR_FALLBACKS patterns.
L2. Duplicate directory creation with folly.mk
  • Issue: Both folly.mk (mkdir -p) and the script (os.makedirs(..., exist_ok=True)) create download/cache directories. Harmless but redundant.
  • Suggested fix: Fine as defensive programming -- the script should be self-contained.
L3. Missing comment on allow_no_value=True -- getdeps_fallback_mirror.py:52
  • Issue: The ConfigParser(allow_no_value=True, interpolation=None) is the key fix for parsing folly manifests that have bare package entries, but there's no comment explaining why these flags are needed.
  • Suggested fix: Add a one-line comment: e.g., # folly manifests contain bare keys (no =value) in some sections
L4. urllib.request vs wget behavioral differences
  • Issue: Switching from wget to urllib.request changes HTTP client behavior (redirect handling, proxy settings, User-Agent). The new code sets a custom User-Agent. urllib.request respects http_proxy/https_proxy env vars by default, similar to wget.
  • Practical risk: Very low -- this is pre-downloading from well-known mirrors in CI.
L5. Adding libiberty could be a separate change
  • Issue: Adding a new package to PACKAGES_TO_CHECK is logically separate from the bug fixes. Could be its own commit for clarity.
  • Suggested fix: Minor -- acceptable to bundle since it's a one-line change.

Cross-Component Analysis

Context Affected? Assessment
folly.mk cache restore (lines 126-128) Yes cp -n (no-clobber) won't overwrite files the script downloaded. Compatible.
folly.mk cache update (lines 134-139) Yes cp -n won't overwrite cache files the script wrote. Compatible.
getdeps.py fetch (line 132) Indirect Script pre-populates download dir; getdeps finds files by name. Filename format {package}-{filename} preserved. Compatible.
GitHub Actions CI Primary Isolated VMs, no concurrent access. All issues are mitigated.
Self-hosted runners Possible Shared /tmp could cause cache races (M4). Low practical risk.

Positive Observations

  • Atomic downloads via temp files (download_url()) is a significant reliability improvement -- partial downloads can no longer corrupt the download directory.
  • Invalid file cleanup directly fixes the root cause of the CI failure (stale empty file persisted across retries).
  • shutil.copy2 instead of subprocess.run(['cp', ...]) removes the cp binary dependency, improving portability.
  • urllib.request instead of wget removes an external tool dependency, making the script self-contained Python.
  • Good error messages with SHA256 values and file sizes aid debugging.
  • Graceful degradation -- individual package failures are warnings, not fatal errors.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant