chore: pin deps and security fixes#2124
Conversation
WalkthroughThis PR pins external dependencies and action versions to specific commits or SHA digests, updates GitHub Actions workflow permissions for release tasks, and replaces MD5 hashing with SHA-256 in file cache path generation. Changes
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
BenchstatBase: 📊 1 minor regression(s) (all within 5% threshold)
Full benchstat output |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
.github/workflows/benchmark.yml (1)
32-32: Use the exact release tag instead of the moving major tag in the inline comment.The SHA pin to
3a2844b7e9c422d3c10d287c895573f7108da1b3is correct, but# v9refers to a moving major tag. Change the comment to# v9.0.0(the exact release corresponding to this commit) to make audits clear which release was reviewed.Also applies to: 141-141
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/benchmark.yml at line 32, Update the inline comment on the actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 usages so it references the exact release tag instead of the moving major tag: replace the trailing comment text "# v9" with "# v9.0.0" for the occurrences associated with the actions/github-script SHA (the lines that contain "uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3").
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/release.yml:
- Around line 69-71: Remove the unnecessary OIDC permission by deleting the
"id-token: write" entry from the permissions block used by the docker job (the
block that currently contains "contents: read" and "id-token: write"); since
authentication is done via ECR_AWS_ACCESS_KEY / ECR_AWS_SECRET_ACCESS_KEY,
ensure only "contents: read" (or other minimal needed permissions) remain, and
also remove the same "id-token: write" entry in the duplicate permissions block
referenced later (the lines applying to the same docker job / release workflow).
In `@scrapers/file/file.go`:
- Line 60: The cache key is being computed from stripSecrets(uri) causing
different credentialed URIs to collide; change the hash input to the raw uri
(use uri, not stripSecrets(uri)) when computing hash (the line with hash :=
sha256.Sum256(...)) so tempDir is unique per exact URL, and keep
stripSecrets(uri) only where used for logging/debug output; ensure any
references to tempDir or getter.GetAny remain unchanged so writes stay isolated
per hashed raw URI.
---
Nitpick comments:
In @.github/workflows/benchmark.yml:
- Line 32: Update the inline comment on the
actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 usages so it
references the exact release tag instead of the moving major tag: replace the
trailing comment text "# v9" with "# v9.0.0" for the occurrences associated with
the actions/github-script SHA (the lines that contain "uses:
actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3").
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8380b1aa-ba4d-4bb5-8172-9c8704a96e58
📒 Files selected for processing (7)
.github/workflows/benchmark.yml.github/workflows/release.yml.github/workflows/test.ymlbuild/Dockerfilebuild/Dockerfile.debugscrapers/file/file.goscrapers/file/file_test.go
| permissions: | ||
| contents: read | ||
| id-token: write |
There was a problem hiding this comment.
Drop OIDC permission while using static AWS keys.
Line 71 grants id-token: write, but the AWS step still authenticates with ECR_AWS_ACCESS_KEY / ECR_AWS_SECRET_ACCESS_KEY. That gives every step in the docker job OIDC token access without using it. Remove it unless this job is switching to role-to-assume.
🔒 Proposed least-privilege fix
permissions:
contents: read
- id-token: writeAlso applies to: 104-109
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/release.yml around lines 69 - 71, Remove the unnecessary
OIDC permission by deleting the "id-token: write" entry from the permissions
block used by the docker job (the block that currently contains "contents: read"
and "id-token: write"); since authentication is done via ECR_AWS_ACCESS_KEY /
ECR_AWS_SECRET_ACCESS_KEY, ensure only "contents: read" (or other minimal needed
permissions) remain, and also remove the same "id-token: write" entry in the
duplicate permissions block referenced later (the lines applying to the same
docker job / release workflow).
| return uri | ||
| } | ||
| hash := md5.Sum([]byte(uri)) | ||
| hash := sha256.Sum256([]byte(stripSecrets(uri))) |
There was a problem hiding this comment.
Behavior change: cache key now collapses distinct credentials to the same path.
Hashing stripSecrets(uri) (instead of the raw uri) means two config entries pointing at the same host/path but with different embedded credentials now resolve to the same tempDir. Since getter.GetAny writes into that directory, concurrent or sequential scrapes for different credentialed URLs can share/overwrite each other's cached content. If that collision is intentional (dedup identical sources), consider calling it out; otherwise hash the raw uri and only use stripSecrets for logging.
Also worth noting: existing on-disk caches under the old MD5-suffixed directories will be orphaned after this change (harmless, but may warrant a one-time cleanup or a release note).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scrapers/file/file.go` at line 60, The cache key is being computed from
stripSecrets(uri) causing different credentialed URIs to collide; change the
hash input to the raw uri (use uri, not stripSecrets(uri)) when computing hash
(the line with hash := sha256.Sum256(...)) so tempDir is unique per exact URL,
and keep stripSecrets(uri) only where used for logging/debug output; ensure any
references to tempDir or getter.GetAny remain unchanged so writes stay isolated
per hashed raw URI.
Summary by CodeRabbit