Skip to content

Commit 4c3b7db

Browse files
authored
Add 50 benchmarks for performance-critical CLI operations with CI integration (#3778)
1 parent 9bf6533 commit 4c3b7db

11 files changed

Lines changed: 1535 additions & 2 deletions

.github/workflows/ci.yml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,36 @@ jobs:
8484
run: cd pkg/workflow/js && npm ci
8585
- name: Run tests
8686
run: cd pkg/workflow/js && npm test
87+
bench:
88+
runs-on: ubuntu-latest
89+
permissions:
90+
contents: read
91+
concurrency:
92+
group: ${{ github.workflow }}-${{ github.ref }}-bench
93+
cancel-in-progress: true
94+
steps:
95+
- name: Checkout code
96+
uses: actions/checkout@v5
97+
98+
- name: Set up Go
99+
uses: actions/setup-go@v5
100+
with:
101+
go-version-file: go.mod
102+
cache: true
103+
104+
- name: Verify dependencies
105+
run: go mod verify
106+
107+
- name: Run benchmarks
108+
run: make bench
109+
110+
- name: Save benchmark results
111+
uses: actions/upload-artifact@v4
112+
with:
113+
name: benchmark-results
114+
path: bench_results.txt
115+
if-no-files-found: ignore
116+
87117
lint:
88118
runs-on: ubuntu-latest
89119
permissions:

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,9 @@ coverage.html
5151
coverage/
5252
logs/
5353

54+
# Benchmark results
55+
bench_results.txt
56+
5457
node_modules/
5558
gh-aw-test/
5659

Makefile

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,19 @@ test-perf:
5858
rm -f /tmp/gh-aw/test-output.log; \
5959
exit $$EXIT_CODE
6060

61+
# Run benchmarks for performance testing
62+
.PHONY: bench
63+
bench:
64+
@echo "Running benchmarks..."
65+
go test -bench=. -benchmem -benchtime=3x -run=^$$ ./pkg/... | tee bench_results.txt
66+
67+
# Run benchmarks with comparison output
68+
.PHONY: bench-compare
69+
bench-compare:
70+
@echo "Running benchmarks and saving results..."
71+
go test -bench=. -benchmem -benchtime=100x -run=^$$ ./pkg/... | tee bench_results.txt
72+
@echo "Benchmark results saved to bench_results.txt"
73+
6174
# Test JavaScript files
6275
.PHONY: test-js
6376
test-js: build-js

TESTING.md

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,60 @@ The testing framework implements **Phase 6 (Quality Assurance)** of the Go reimp
1010

1111
### 1. Unit Tests (`pkg/*/`)
1212

13+
### 2. Benchmarks (`pkg/*/_benchmark_test.go`)
14+
15+
Performance benchmarks measure the speed of critical operations. Run benchmarks to:
16+
- Detect performance regressions
17+
- Identify optimization opportunities
18+
- Track performance trends over time
19+
20+
**Running Benchmarks:**
21+
```bash
22+
# Run all benchmarks with make (optimized for CI, runs in ~6 seconds)
23+
make bench
24+
25+
# Run all benchmarks manually
26+
go test -bench=. -benchtime=3x -run=^$ ./pkg/...
27+
28+
# Run benchmarks with more iterations for comparison
29+
make bench-compare
30+
31+
# Run benchmarks for specific package
32+
go test -bench=. -benchtime=3x -run=^$ ./pkg/workflow/
33+
34+
# Run specific benchmark
35+
go test -bench=BenchmarkCompileWorkflow -benchtime=3x -run=^$ ./pkg/workflow/
36+
37+
# Run with custom iterations (default is 1 second per benchmark)
38+
go test -bench=. -benchtime=100x -run=^$ ./pkg/workflow/
39+
40+
# Run with memory profiling
41+
go test -bench=. -benchmem -benchtime=3x -run=^$ ./pkg/...
42+
43+
# Compare benchmark results over time
44+
go test -bench=. -benchtime=3x -run=^$ ./pkg/... > bench_baseline.txt
45+
# ... make changes ...
46+
go test -bench=. -benchtime=3x -run=^$ ./pkg/... > bench_new.txt
47+
benchstat bench_baseline.txt bench_new.txt
48+
```
49+
50+
**Note**: Benchmarks use `-benchtime=3x` (3 iterations) for fast CI execution. For more accurate measurements, use `-benchtime=100x` or longer durations.
51+
52+
**Benchmark Coverage:**
53+
- **Workflow Compilation**: Basic, with MCP, with imports, with validation, complex workflows
54+
- **Frontmatter Parsing**: Simple, complex, minimal, with arrays, schema validation
55+
- **Expression Validation**: Single expressions, complex expressions, full markdown validation, parsing
56+
- **Log Processing**: Claude, Copilot, Codex log parsing, aggregation, JSON metrics extraction
57+
- **MCP Configuration**: Playwright config, Docker args, expression extraction
58+
- **Tool Processing**: Simple and complex tool configurations, safe outputs, network permissions
59+
60+
**Performance Baselines** (approximate, machine-dependent):
61+
- Workflow compilation: ~100μs - 2ms depending on complexity
62+
- Frontmatter parsing: ~10μs - 250μs depending on complexity
63+
- Expression validation: ~700ns - 10μs per expression
64+
- Log parsing: ~50μs - 1ms depending on log size
65+
- Schema validation: ~35μs - 130μs depending on complexity
66+
1367
### 3. Test Validation Framework (`test_validation.go`)
1468

1569
Comprehensive validation system that ensures:
@@ -73,6 +127,7 @@ As the Go implementation develops:
73127
- CLI interface structure and stability
74128
- Basic workflow compilation interface
75129
- Error handling for malformed inputs
130+
- **Performance benchmarks** for critical operations (62+ benchmarks)
76131

77132
### 🔄 Interface Testing (Ready for Implementation)
78133
- CLI command execution (stubs tested)
@@ -81,7 +136,7 @@ As the Go implementation develops:
81136

82137
### 📋 Ready for Enhancement
83138
- Bash-Go output comparison (when compiler is complete)
84-
- Performance benchmarking
139+
- **Performance regression tracking** (baseline established)
85140
- Cross-platform compatibility testing
86141
- Real workflow execution testing
87142

pkg/cli/logs_benchmark_test.go

Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
package cli
2+
3+
import (
4+
"testing"
5+
6+
"github.com/githubnext/gh-aw/pkg/workflow"
7+
)
8+
9+
// Sample log content for benchmarking
10+
const (
11+
sampleClaudeLog = `[{"type":"session_created","timestamp":"2024-01-15T10:00:00.000Z"}]
12+
[{"type":"message","timestamp":"2024-01-15T10:00:01.000Z","message":"Starting analysis"}]
13+
[{"type":"tool_use","timestamp":"2024-01-15T10:00:02.000Z","tool":"github.get_issue"}]
14+
[{"type":"tool_result","timestamp":"2024-01-15T10:00:03.000Z"}]
15+
[{"type":"usage","timestamp":"2024-01-15T10:00:04.000Z","input_tokens":1000,"output_tokens":500}]
16+
[{"type":"message","timestamp":"2024-01-15T10:00:05.000Z","message":"Analysis complete"}]
17+
[{"type":"result","timestamp":"2024-01-15T10:00:06.000Z","total_input_tokens":1000,"total_output_tokens":500,"cost":0.015}]`
18+
19+
sampleCopilotLog = `2024-01-15T10:00:00.123Z [INFO] Copilot started
20+
2024-01-15T10:00:01.456Z [INFO] Processing request
21+
2024-01-15T10:00:02.789Z [DEBUG] Tool call: github.get_issue
22+
2024-01-15T10:00:03.012Z [DEBUG] Tool result received
23+
2024-01-15T10:00:04.345Z [INFO] Token usage: 1500 total
24+
2024-01-15T10:00:05.678Z [ERROR] Minor issue detected
25+
2024-01-15T10:00:06.901Z [INFO] Request completed`
26+
27+
sampleCodexLog = `] tool github.search_issues(...)
28+
tool result: [{"id": 123, "title": "Issue 1"}]
29+
] exec ls -la in /tmp
30+
exec result: total 8
31+
] tool github.get_issue(...)
32+
tool result: {"id": 123, "body": "Issue content"}
33+
] success in 2.5s`
34+
35+
largeClaudeLog = sampleClaudeLog + "\n" + sampleClaudeLog + "\n" + sampleClaudeLog + "\n" + sampleClaudeLog + "\n" + sampleClaudeLog
36+
37+
largeCopilotLog = sampleCopilotLog + "\n" + sampleCopilotLog + "\n" + sampleCopilotLog + "\n" + sampleCopilotLog + "\n" + sampleCopilotLog
38+
)
39+
40+
// BenchmarkParseClaudeLog benchmarks Claude log parsing
41+
func BenchmarkParseClaudeLog(b *testing.B) {
42+
engine := &workflow.ClaudeEngine{}
43+
44+
b.ResetTimer()
45+
for i := 0; i < b.N; i++ {
46+
_ = engine.ParseLogMetrics(sampleClaudeLog, false)
47+
}
48+
}
49+
50+
// BenchmarkParseClaudeLog_Large benchmarks parsing large Claude log file
51+
func BenchmarkParseClaudeLog_Large(b *testing.B) {
52+
engine := &workflow.ClaudeEngine{}
53+
54+
b.ResetTimer()
55+
for i := 0; i < b.N; i++ {
56+
_ = engine.ParseLogMetrics(largeClaudeLog, false)
57+
}
58+
}
59+
60+
// BenchmarkParseCopilotLog benchmarks Copilot log parsing
61+
func BenchmarkParseCopilotLog(b *testing.B) {
62+
engine := &workflow.CopilotEngine{}
63+
64+
b.ResetTimer()
65+
for i := 0; i < b.N; i++ {
66+
_ = engine.ParseLogMetrics(sampleCopilotLog, false)
67+
}
68+
}
69+
70+
// BenchmarkParseCopilotLog_Large benchmarks parsing large Copilot log file
71+
func BenchmarkParseCopilotLog_Large(b *testing.B) {
72+
engine := &workflow.CopilotEngine{}
73+
74+
b.ResetTimer()
75+
for i := 0; i < b.N; i++ {
76+
_ = engine.ParseLogMetrics(largeCopilotLog, false)
77+
}
78+
}
79+
80+
// BenchmarkParseCodexLog benchmarks Codex log parsing
81+
func BenchmarkParseCodexLog(b *testing.B) {
82+
engine := &workflow.CodexEngine{}
83+
84+
b.ResetTimer()
85+
for i := 0; i < b.N; i++ {
86+
_ = engine.ParseLogMetrics(sampleCodexLog, false)
87+
}
88+
}
89+
90+
// BenchmarkParseCodexLog_WithErrors benchmarks Codex log parsing with errors
91+
func BenchmarkParseCodexLog_WithErrors(b *testing.B) {
92+
logWithErrors := sampleCodexLog + `
93+
] error: connection timeout
94+
] warning: retry attempt
95+
] error: max retries exceeded
96+
] tool github.get_repository(...)
97+
] success in 1.2s`
98+
99+
engine := &workflow.CodexEngine{}
100+
101+
b.ResetTimer()
102+
for i := 0; i < b.N; i++ {
103+
_ = engine.ParseLogMetrics(logWithErrors, false)
104+
}
105+
}
106+
107+
// BenchmarkAggregateWorkflowStats benchmarks log aggregation across multiple runs
108+
func BenchmarkAggregateWorkflowStats(b *testing.B) {
109+
// Create sample workflow runs
110+
runs := []WorkflowRun{
111+
{
112+
DatabaseID: 12345,
113+
WorkflowName: "test-workflow-1",
114+
Status: "completed",
115+
Conclusion: "success",
116+
TokenUsage: 1500,
117+
EstimatedCost: 0.015,
118+
Turns: 3,
119+
ErrorCount: 0,
120+
WarningCount: 1,
121+
},
122+
{
123+
DatabaseID: 12346,
124+
WorkflowName: "test-workflow-2",
125+
Status: "completed",
126+
Conclusion: "failure",
127+
TokenUsage: 2500,
128+
EstimatedCost: 0.025,
129+
Turns: 5,
130+
ErrorCount: 2,
131+
WarningCount: 3,
132+
},
133+
{
134+
DatabaseID: 12347,
135+
WorkflowName: "test-workflow-1",
136+
Status: "completed",
137+
Conclusion: "success",
138+
TokenUsage: 1800,
139+
EstimatedCost: 0.018,
140+
Turns: 4,
141+
ErrorCount: 0,
142+
WarningCount: 0,
143+
},
144+
}
145+
146+
b.ResetTimer()
147+
for i := 0; i < b.N; i++ {
148+
// Simulate aggregation logic
149+
totalTokens := 0
150+
totalCost := 0.0
151+
totalTurns := 0
152+
totalErrors := 0
153+
totalWarnings := 0
154+
155+
for _, run := range runs {
156+
totalTokens += run.TokenUsage
157+
totalCost += run.EstimatedCost
158+
totalTurns += run.Turns
159+
totalErrors += run.ErrorCount
160+
totalWarnings += run.WarningCount
161+
}
162+
163+
_ = totalTokens
164+
_ = totalCost
165+
_ = totalTurns
166+
_ = totalErrors
167+
_ = totalWarnings
168+
}
169+
}
170+
171+
// BenchmarkAggregateWorkflowStats_Large benchmarks aggregation with many runs
172+
func BenchmarkAggregateWorkflowStats_Large(b *testing.B) {
173+
// Create 100 sample workflow runs
174+
runs := make([]WorkflowRun, 100)
175+
for i := 0; i < 100; i++ {
176+
runs[i] = WorkflowRun{
177+
DatabaseID: int64(12345 + i),
178+
WorkflowName: "test-workflow",
179+
Status: "completed",
180+
Conclusion: "success",
181+
TokenUsage: 1500 + i*10,
182+
EstimatedCost: 0.015 + float64(i)*0.001,
183+
Turns: 3 + i%5,
184+
ErrorCount: i % 3,
185+
WarningCount: i % 2,
186+
}
187+
}
188+
189+
b.ResetTimer()
190+
for i := 0; i < b.N; i++ {
191+
totalTokens := 0
192+
totalCost := 0.0
193+
totalTurns := 0
194+
totalErrors := 0
195+
totalWarnings := 0
196+
197+
for _, run := range runs {
198+
totalTokens += run.TokenUsage
199+
totalCost += run.EstimatedCost
200+
totalTurns += run.Turns
201+
totalErrors += run.ErrorCount
202+
totalWarnings += run.WarningCount
203+
}
204+
205+
_ = totalTokens
206+
_ = totalCost
207+
_ = totalTurns
208+
_ = totalErrors
209+
_ = totalWarnings
210+
}
211+
}
212+
213+
// BenchmarkExtractJSONMetrics benchmarks JSON metrics extraction
214+
func BenchmarkExtractJSONMetrics(b *testing.B) {
215+
jsonLine := `{"type":"usage","input_tokens":1000,"output_tokens":500,"cost":0.015}`
216+
217+
b.ResetTimer()
218+
for i := 0; i < b.N; i++ {
219+
_ = workflow.ExtractJSONMetrics(jsonLine, false)
220+
}
221+
}
222+
223+
// BenchmarkExtractJSONMetrics_Complex benchmarks complex JSON metrics extraction
224+
func BenchmarkExtractJSONMetrics_Complex(b *testing.B) {
225+
jsonLine := `{"type":"result","total_input_tokens":5000,"total_output_tokens":2500,"cost":0.075,"metadata":{"tool_calls":["github.get_issue","github.add_comment"],"duration_ms":1500}}`
226+
227+
b.ResetTimer()
228+
for i := 0; i < b.N; i++ {
229+
_ = workflow.ExtractJSONMetrics(jsonLine, false)
230+
}
231+
}

0 commit comments

Comments
 (0)