Skip to content

Commit 6d2e49e

Browse files
authored
feat: support archive input for opcode_source (#204)
## Summary Adds an optional \`archive\` field to \`tests.source.opcode_source\` alongside the existing \`file\`. When set, benchmarkoor downloads the archive (local path, HTTP(S) URL, or GitHub Actions artifact URL), extracts it, and looks up the configured \`file\` inside the extracted tree by basename. Supported archive formats: - \`.zip\` - \`.tar.gz\` - GitHub Actions artifact zips containing an inner \`.tar.gz\` (auto-unpacked) Example config: \`\`\`yaml runner: benchmark: tests: opcode_source: archive: https://github.com/NethermindEth/gas-benchmarks/actions/runs/24460911828/artifacts/6456466898 file: opcodes_tracing.json \`\`\` ### Reuse - The archive download goes through the existing \`fetchCached\` pipeline, inheriting ETag / Last-Modified revalidation, the GitHub artifact URL rewrite with bearer-token auth (\`runner.github_token\`), and the per-URL on-disk cache. - The extracted directory is cached separately (keyed on \`sha256(archivePath)[:16]\`) and refreshed only when \`fetchCached\` reports the archive was re-downloaded. - Reuses \`detectArchiveFormat\`, \`extractZipFile\`, \`extractTarGzFile\`, \`extractInnerTarballs\`. ### Docs - \`docs/configuration.md\` — new subsections and options table updated with the archive mode. - \`config.example.yaml\` — commented archive-mode example. ## Test plan - [x] \`go vet\` + \`go test ./pkg/executor/ ./pkg/config/\` pass - [x] Manual: configure \`opcode_source.archive\` with the GitHub Actions artifact URL from the example, verify the opcode JSON is extracted and opcodes appear on the heatmap - [x] Manual: configure \`opcode_source.archive\` with a local \`.tar.gz\`, verify the same - [x] Manual: existing \`opcode_source.file\` (direct JSON URL) still works unchanged
1 parent f94c75d commit 6d2e49e

4 files changed

Lines changed: 238 additions & 10 deletions

File tree

config.example.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,9 +205,16 @@ runner:
205205
#
206206
# # Optional: External opcode metadata for the test suite.
207207
# # A JSON file mapping test names to opcode counts: {"test_name": {"OPCODE": count, ...}}
208-
# # Can be a local path or a URL (including GitHub Actions artifact URLs).
208+
# # Two modes:
209+
# # 1. Direct JSON file (local path or URL):
209210
# # opcode_source:
210211
# # file: opcodes_tracing.json
212+
# #
213+
# # 2. Archive (.zip / .tar.gz / GitHub Actions artifact) containing the JSON file.
214+
# # `file` is the filename to look up inside the extracted archive.
215+
# # opcode_source:
216+
# # archive: https://github.com/NethermindEth/gas-benchmarks/actions/runs/24460911828/artifacts/6456466898
217+
# # file: opcodes_tracing.json
211218

212219
# Optional: API server for authentication and user management.
213220
# When configured, the UI can integrate with the API for login, admin, and role-based access.

docs/configuration.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -377,7 +377,9 @@ tests:
377377

378378
##### Opcode Source
379379

380-
Optional external opcode metadata can be configured alongside the test source:
380+
Optional external opcode metadata can be configured alongside the test source. Two modes are supported.
381+
382+
**Direct JSON file** — `file` is a local path or URL to the JSON file:
381383

382384
```yaml
383385
runner:
@@ -387,13 +389,27 @@ runner:
387389
file: opcodes_tracing.json # Local path or URL to a JSON file
388390
```
389391

392+
**Archive mode** — `archive` is a `.zip` / `.tar.gz` (or a GitHub Actions artifact URL) that contains the JSON file; `file` is the filename to look up inside the extracted archive:
393+
394+
```yaml
395+
runner:
396+
benchmark:
397+
tests:
398+
opcode_source:
399+
archive: https://github.com/NethermindEth/gas-benchmarks/actions/runs/24460911828/artifacts/6456466898
400+
file: opcodes_tracing.json # Filename inside the archive
401+
```
402+
403+
`archive` can also be a plain URL to a `.zip` / `.tar.gz`, or a local path to one. When `archive` is set, `file` is interpreted as a filename inside the extracted tree (matched by basename, so nested folders are walked automatically).
404+
390405
| Option | Type | Required | Description |
391406
|--------|------|----------|-------------|
392-
| `file` | string | Yes | Local path or URL to a JSON file mapping test names to opcode counts: `{"test_name": {"OPCODE": count, ...}}` |
407+
| `file` | string | Yes | When `archive` is unset: local path or URL to the JSON file. When `archive` is set: filename to look up inside the extracted archive |
408+
| `archive` | string | No | Optional local path or URL to a `.zip` / `.tar.gz` / GitHub Actions artifact containing the opcode JSON file. When set, `file` names the entry inside the archive |
393409

394410
**GitHub Actions artifacts:** Browser URLs like `https://github.com/{owner}/{repo}/actions/runs/{run_id}/artifacts/{artifact_id}` are automatically converted to the GitHub API download endpoint. A GitHub token is required for artifact downloads (set via `runner.github_token` or `BENCHMARKOOR_RUNNER_GITHUB_TOKEN`).
395411

396-
**Archive extraction:** ZIP archives are extracted and any inner tarballs (common in GitHub Actions artifacts) are automatically extracted as well.
412+
**Archive extraction:** ZIP archives are extracted and any inner tarballs (common in GitHub Actions artifacts) are automatically extracted as well. Both direct-file and archive downloads are cache-validated on each run via HTTP `ETag` / `Last-Modified` — the archive (and its extraction) is refreshed automatically when the origin changes.
397413

398414
##### EEST Fixtures Source
399415

pkg/config/config.go

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -412,8 +412,15 @@ type ArchiveSourceConfig struct {
412412

413413
// OpcodeSourceConfig defines an external opcode metadata file.
414414
// The file is a JSON map of test name → opcode → count.
415+
//
416+
// Two modes are supported:
417+
// - Direct file: set File to a local path or URL pointing at the JSON file.
418+
// - Archive: set Archive to a .zip / .tar.gz (or GitHub Actions artifact URL).
419+
// The archive is downloaded + extracted, and File is then the filename
420+
// to look up inside the extracted tree.
415421
type OpcodeSourceConfig struct {
416-
File string `yaml:"file" mapstructure:"file"` // Local path or URL to a JSON file.
422+
File string `yaml:"file" mapstructure:"file"` // JSON file path — inside the archive when Archive is set, otherwise a local path or URL.
423+
Archive string `yaml:"archive,omitempty" mapstructure:"archive"` // Optional local path or URL to a .zip / .tar.gz archive containing File.
417424
}
418425

419426
// StepsConfig defines glob patterns for each step type.

pkg/executor/opcode.go

Lines changed: 203 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@ package executor
22

33
import (
44
"context"
5+
"crypto/sha256"
6+
"encoding/hex"
57
"encoding/json"
68
"fmt"
79
"os"
@@ -11,12 +13,28 @@ import (
1113
"github.com/sirupsen/logrus"
1214
)
1315

14-
// loadOpcodes resolves the opcode source file (local path or URL),
15-
// reads the JSON map, and attaches opcode counts to prepared tests.
16+
// loadOpcodes resolves the opcode source file (local path, URL, or a file
17+
// inside an archive), reads the JSON map, and attaches opcode counts to
18+
// prepared tests.
1619
func (e *executor) loadOpcodes(ctx context.Context) error {
1720
file := e.cfg.OpcodeSource.File
21+
archive := e.cfg.OpcodeSource.Archive
22+
23+
var (
24+
resolved string
25+
err error
26+
)
27+
28+
if archive != "" {
29+
if file == "" {
30+
return fmt.Errorf("opcode_source.file (filename inside the archive) is required when opcode_source.archive is set")
31+
}
32+
33+
resolved, err = resolveOpcodeFromArchive(ctx, archive, file, e.cfg.CacheDir, e.cfg.GitHubToken, e.log)
34+
} else {
35+
resolved, err = resolveFile(ctx, file, e.cfg.CacheDir, e.cfg.GitHubToken, e.log)
36+
}
1837

19-
resolved, err := resolveFile(ctx, file, e.cfg.CacheDir, e.cfg.GitHubToken, e.log)
2038
if err != nil {
2139
return fmt.Errorf("resolving opcode file: %w", err)
2240
}
@@ -61,12 +79,17 @@ func (e *executor) loadOpcodes(ctx context.Context) error {
6179
}
6280
}
6381

64-
e.log.WithFields(logrus.Fields{
82+
logFields := logrus.Fields{
6583
"file": file,
6684
"total_entries": filtered,
6785
"matched_tests": matched,
6886
"total_tests": len(e.prepared.Tests),
69-
}).Info("Loaded external opcode data")
87+
}
88+
if archive != "" {
89+
logFields["archive"] = archive
90+
}
91+
92+
e.log.WithFields(logFields).Info("Loaded external opcode data")
7093

7194
if unmatched := filtered - matched; unmatched > 0 {
7295
e.log.WithField("unmatched", unmatched).Warn(
@@ -120,3 +143,178 @@ func resolveFile(ctx context.Context, file, cacheDir, githubToken string, log lo
120143

121144
return file, nil
122145
}
146+
147+
// resolveOpcodeFromArchive downloads (with cache validation) an archive
148+
// containing the opcode JSON file and returns the path to the extracted
149+
// file within it. Supports local paths, HTTP(S) URLs, and GitHub Actions
150+
// artifact browser URLs.
151+
func resolveOpcodeFromArchive(
152+
ctx context.Context,
153+
archive, file, cacheDir, githubToken string,
154+
log logrus.FieldLogger,
155+
) (string, error) {
156+
// Step 1 — get a local path to the archive.
157+
archivePath, changed, err := resolveArchiveRef(ctx, archive, cacheDir, githubToken, log)
158+
if err != nil {
159+
return "", fmt.Errorf("resolving opcode archive: %w", err)
160+
}
161+
162+
// Step 2 — extract into a stable, per-archive cache dir. Re-extract
163+
// whenever the underlying archive was (re-)downloaded.
164+
extractDir, err := extractOpcodeArchive(archivePath, cacheDir, changed, log)
165+
if err != nil {
166+
return "", fmt.Errorf("extracting opcode archive: %w", err)
167+
}
168+
169+
// Step 3 — find the requested file by basename under the extraction
170+
// root, then fall back to an exact relative path if that didn't match.
171+
found, err := findFileInDir(extractDir, file)
172+
if err != nil {
173+
return "", fmt.Errorf("opcode file %q not found in archive: %w", file, err)
174+
}
175+
176+
return found, nil
177+
}
178+
179+
// resolveArchiveRef resolves an archive reference (local path or URL) to a
180+
// local file path. The `changed` return value is true when the call caused
181+
// a fresh download.
182+
func resolveArchiveRef(
183+
ctx context.Context,
184+
ref, cacheDir, githubToken string,
185+
log logrus.FieldLogger,
186+
) (string, bool, error) {
187+
if strings.HasPrefix(ref, "http://") || strings.HasPrefix(ref, "https://") {
188+
downloadURL := ref
189+
190+
var token string
191+
192+
if ghArtifactURLPattern.MatchString(ref) && githubToken != "" {
193+
m := ghArtifactURLPattern.FindStringSubmatch(ref)
194+
downloadURL = fmt.Sprintf(
195+
"https://api.github.com/repos/%s/actions/artifacts/%s/zip",
196+
m[1], m[2],
197+
)
198+
token = githubToken
199+
}
200+
201+
res, err := fetchCached(ctx, log, ref, downloadURL, token, cacheDir, "opcode-archive")
202+
if err != nil {
203+
return "", false, err
204+
}
205+
206+
return res.Path, res.Changed, nil
207+
}
208+
209+
// Local file path.
210+
if !filepath.IsAbs(ref) {
211+
absPath, err := filepath.Abs(ref)
212+
if err != nil {
213+
return "", false, fmt.Errorf("resolving path %q: %w", ref, err)
214+
}
215+
216+
ref = absPath
217+
}
218+
219+
if _, err := os.Stat(ref); os.IsNotExist(err) {
220+
return "", false, fmt.Errorf("archive %q does not exist", ref)
221+
}
222+
223+
return ref, false, nil
224+
}
225+
226+
// extractOpcodeArchive extracts archivePath into a stable cache directory
227+
// keyed by sha256(archivePath). The extraction is reused on subsequent
228+
// runs unless `forceRefresh` is true (set when fetchCached reported the
229+
// archive was re-downloaded). GitHub Actions artifact zips that contain
230+
// an inner tarball are auto-extracted as well.
231+
func extractOpcodeArchive(archivePath, cacheDir string, forceRefresh bool, log logrus.FieldLogger) (string, error) {
232+
if cacheDir == "" {
233+
cacheDir = os.TempDir()
234+
}
235+
236+
hash := sha256.Sum256([]byte(archivePath))
237+
extractDir := filepath.Join(cacheDir, "opcode-archive-extract-"+hex.EncodeToString(hash[:8]))
238+
239+
if forceRefresh {
240+
_ = os.RemoveAll(extractDir)
241+
}
242+
243+
if _, err := os.Stat(extractDir); err == nil {
244+
log.WithField("path", extractDir).Info("Using cached opcode archive extraction")
245+
return extractDir, nil
246+
}
247+
248+
if err := os.MkdirAll(extractDir, 0755); err != nil {
249+
return "", fmt.Errorf("creating extraction directory: %w", err)
250+
}
251+
252+
format, err := detectArchiveFormat(archivePath)
253+
if err != nil {
254+
return "", err
255+
}
256+
257+
switch format {
258+
case archiveFormatZip:
259+
if err := extractZipFile(archivePath, extractDir); err != nil {
260+
return "", fmt.Errorf("extracting zip: %w", err)
261+
}
262+
// GitHub Actions artifacts are zips containing an inner tar.gz;
263+
// unpack those too so findFileInDir can see the actual JSON.
264+
if err := extractInnerTarballs(extractDir, log); err != nil {
265+
return "", fmt.Errorf("extracting inner tarballs: %w", err)
266+
}
267+
case archiveFormatTarGz:
268+
if err := extractTarGzFile(archivePath, extractDir); err != nil {
269+
return "", fmt.Errorf("extracting tar.gz: %w", err)
270+
}
271+
default:
272+
return "", fmt.Errorf("unsupported archive format: %s", format)
273+
}
274+
275+
log.WithField("path", extractDir).Info("Extracted opcode archive")
276+
277+
return extractDir, nil
278+
}
279+
280+
// findFileInDir walks dir recursively and returns the absolute path of the
281+
// first file whose basename matches name. If name contains a path
282+
// separator, it is treated as a relative path from dir and matched exactly.
283+
func findFileInDir(dir, name string) (string, error) {
284+
// Exact relative match first.
285+
if strings.ContainsAny(name, "/\\") {
286+
candidate := filepath.Join(dir, name)
287+
if _, err := os.Stat(candidate); err == nil {
288+
return candidate, nil
289+
}
290+
}
291+
292+
var found string
293+
294+
err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
295+
if err != nil {
296+
return nil
297+
}
298+
299+
if info.IsDir() {
300+
return nil
301+
}
302+
303+
if filepath.Base(path) == filepath.Base(name) {
304+
found = path
305+
306+
return filepath.SkipAll
307+
}
308+
309+
return nil
310+
})
311+
if err != nil {
312+
return "", err
313+
}
314+
315+
if found == "" {
316+
return "", fmt.Errorf("%q not found under %q", name, dir)
317+
}
318+
319+
return found, nil
320+
}

0 commit comments

Comments
 (0)