Skip to content

Commit f1d8675

Browse files
authored
Merge pull request #3355 from geremyCohen/gcohen_go_gc
Measure Go GC behavior on AWS Graviton
2 parents 6f65a4e + 7054cee commit f1d8675

8 files changed

Lines changed: 946 additions & 0 deletions

File tree

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
title: Measure Go GC behavior on AWS Graviton
3+
draft: true
4+
cascade:
5+
draft: true
6+
7+
description: Learn how to measure and observe Go garbage collection metrics on AWS Graviton instances.
8+
9+
minutes_to_complete: 75
10+
11+
who_is_this_for: This Learning Path is for engineers interested in learning more about Go garbage collection (GC) behavior on Arm.
12+
13+
learning_objectives:
14+
- Select an AWS Graviton instance for repeatable Go GC measurements
15+
- Install Go and Benchstat on an Arm Linux server
16+
- Run a Go benchmark that reports allocation, GC, and pause-time metrics
17+
- Capture CPU and heap profiles without changing GC behavior
18+
19+
prerequisites:
20+
- An [AWS account](https://aws.amazon.com/) with permission to launch AWS Graviton EC2 instances
21+
- The [AWS CLI](/install-guides/aws-cli/) installed and configured on your local machine
22+
- An AWS Graviton instance running Ubuntu 24.04 LTS or another Arm Linux distribution
23+
- Basic familiarity with Go benchmarks and Linux shell commands
24+
25+
author: Geremy Cohen
26+
27+
### Tags
28+
skilllevels: Introductory
29+
subjects: Performance and Architecture
30+
cloud_service_providers:
31+
- AWS
32+
armips:
33+
- Neoverse
34+
tools_software_languages:
35+
- AWS
36+
- Go
37+
operatingsystems:
38+
- Linux
39+
40+
further_reading:
41+
- resource:
42+
title: Amazon EC2 M8g instances
43+
link: https://aws.amazon.com/ec2/instance-types/m8g/
44+
type: documentation
45+
- resource:
46+
title: Go GC guide
47+
link: https://go.dev/doc/gc-guide
48+
type: documentation
49+
- resource:
50+
title: Go runtime package
51+
link: https://pkg.go.dev/runtime
52+
type: documentation
53+
- resource:
54+
title: Go testing package
55+
link: https://pkg.go.dev/testing
56+
type: documentation
57+
- resource:
58+
title: Graviton Performance Runbook
59+
link: https://github.com/aws/aws-graviton-getting-started/blob/main/perfrunbook/README.md
60+
type: documentation
61+
- resource:
62+
title: Benchmark Go performance with Sweet and Benchstat
63+
link: /learning-paths/servers-and-cloud-computing/go-benchmarking-with-sweet/
64+
type: learning path
65+
66+
### FIXED, DO NOT MODIFY
67+
# ================================================================================
68+
weight: 1 # _index.md always has weight of 1 to order correctly
69+
layout: "learningpathall" # All files under learning paths have this same wrapper
70+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
71+
---
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # The weight controls the order of the pages. _index.md always has weight 1.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---
9+
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
title: Choose an AWS Graviton instance
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
## What is Garbage Collection? (GC)
9+
Memory management is a critical aspects of application performance, and Garbage Collection (GC) plays a central role in automating that process. GC continuously identifies and removes objects that are no longer needed, freeing memory for re-use for other purposes..
10+
11+
While this automation improves productivity and application safety, inefficient garbage collection can lead to increased CPU usage, longer response times, and unexpected application pauses.
12+
13+
Tracking GC metrics provides a window into an application's memory health, helping engineers optimize performance, and ensuring the system can scale efficiently under load.
14+
15+
## Measuring default Go GC behavior on Arm servers
16+
17+
Go is one such language which implements GC. As Go applications can spend meaningful time allocating memory and running garbage collection, it is important to understand how the Go runtime behaves under default settings.
18+
19+
In this Learning Path, you'll run Go benchmarks on an AWS Graviton instance. The goal is to build a clean baseline, measuring operation time, allocation rate, GC frequency, and GC pause cost.
20+
21+
## Selecting an instance for Go GC measurements
22+
23+
An AWS Graviton `m8g.xlarge` instance has enough CPU and memory to make Go runtime behavior visible, while keeping costs minimal. It's a good starting point as it provides four vCPUs and 16 GiB of memory on AWS Graviton4. If you choose to run this Learning Path on a different instance, make sure it has at least 4 vCPUs and 16 GiB of memory to ensure the benchmark runs smoothly and provides meaningful GC metrics.
24+
25+
Avoid burstable `t4g` instances as CPU credits can affect benchmark repeatability and make GC measurements harder to explain.
26+
27+
{{% notice Note %}}
28+
You can use larger instances, such as `m8g.2xlarge`, when you want more CPU width or more memory headroom. Start with `m8g.xlarge` so the first benchmark run is easy to reproduce and inexpensive.
29+
{{% /notice %}}
30+
31+
32+
## Checking instance availability
33+
34+
Use the AWS CLI to check whether `m8g.xlarge` is available in your selected Region.
35+
36+
Replace `us-east-1` with the Region you want to use.
37+
38+
```console
39+
aws ec2 describe-instance-type-offerings \
40+
--region us-east-1 \
41+
--location-type availability-zone \
42+
--filters Name=instance-type,Values=m8g.xlarge \
43+
--query 'InstanceTypeOfferings[].Location' \
44+
--output table
45+
```
46+
47+
If the command returns one or more Availability Zones, you can use `m8g.xlarge` in that Region. If you are unable to find `m8g.xlarge` in your Region, you can try a different Region, or fallback to an 'm7g.xlarge' instance, which is based on the previous generation AWS Graviton3:
48+
49+
```console
50+
aws ec2 describe-instance-type-offerings \
51+
--region us-east-1 \
52+
--location-type availability-zone \
53+
--filters Name=instance-type,Values=m7g.xlarge \
54+
--query 'InstanceTypeOfferings[].Location' \
55+
--output table
56+
```
57+
58+
Once you have chosen an instance type, provision it to run Ubuntu 24.04 LTS Arm64. Once the instance is running, and you are ssh'd into it, you can proceed to the next step.
Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
---
2+
title: Create a Go GC benchmark
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Creating a benchmark module
10+
11+
You'll first create a small Go benchmark module. The high-level flow is:
12+
13+
1. Generate a large input string.
14+
2. Repeatedly parse it and create new objects/strings.
15+
3. Force memory allocations so the garbage collector has work to do.
16+
4. Measure how long the workload takes.
17+
5. Measure how much GC activity occurred during the benchmark.
18+
6. Report both performance metrics and GC-related metrics.
19+
20+
Pasting the code below will create the module and benchmark file:
21+
22+
```bash
23+
24+
# Create the module directory and initialize it.
25+
26+
mkdir -p $HOME/go-gc-default/parsebench
27+
cd $HOME/go-gc-default
28+
go mod init example.com/go-gc-default
29+
30+
# Create the benchmark file:
31+
32+
cat > parsebench/parsebench_test.go <<'EOF'
33+
package parsebench
34+
35+
import (
36+
37+
"runtime"
38+
"strconv"
39+
"strings"
40+
"testing"
41+
42+
)
43+
44+
// Global variable used to store benchmark results.
45+
46+
var sink []string
47+
48+
func BenchmarkParseAndAllocate(b *testing.B) {
49+
50+
// This simulates a large payload by creating a large test string by
51+
// repeating the same key=value data many times.
52+
//
53+
// Example:
54+
// name=arm&runtime=go&gc=default&value=12345;
55+
//
56+
57+
payload := strings.Repeat("name=arm&runtime=go&gc=default&value=12345;",2048)
58+
59+
// Next, we tell the benchmark framework to track memory allocations.
60+
//
61+
// This will show metrics such as allocations per operation, and bytes allocated per operation
62+
63+
b.ReportAllocs()
64+
65+
// Capture runtime memory statistics before the benchmark starts. We will later compare these
66+
// values to see:
67+
// - how many garbage collections occurred
68+
// - how much pause time was spent in GC
69+
70+
var before runtime.MemStats
71+
runtime.ReadMemStats(&before)
72+
73+
// Reset benchmark timing so that any setup work performed above will not be included
74+
// in the benchmark measurements.
75+
76+
b.ResetTimer()
77+
78+
// The benchmark loop is where the actual work is done. The number of times this loop is
79+
// executed is controlled by the b.N variable. The value of b.N is automatically chosen by
80+
// the Go benchmark framework to obtain stable and statistically useful measurements.
81+
82+
// The reason for this design is that timing a single operation is often unreliable; running
83+
// it many times reduces noise from:
84+
// * OS scheduling
85+
// * CPU frequency changes
86+
// * background processes
87+
88+
for i := 0; i < b.N; i++ {
89+
// split the large payload into individual records.
90+
// Example:
91+
// "a=1;b=2;c=3;" becomes: ["a=1", "b=2", "c=3", ""]
92+
parts := strings.Split(payload, ";")
93+
// Create a new slice to store parsed output. This allocation is intentional because we want
94+
// the benchmark to generate memory pressure and trigger garbage collection activity.
95+
96+
out := make([]string, 0, len(parts))
97+
98+
// Process each record.
99+
100+
for _, part := range parts {
101+
// Ignore the empty string created by the trailing semicolon.
102+
if part == "" {
103+
continue
104+
}
105+
// Split the string into key and value.
106+
107+
fields := strings.SplitN(part, "=", 2)
108+
109+
// Make sure both key and value exist.
110+
if len(fields) == 2 {
111+
// Build a new string containing: key:length_of_value
112+
// This creates additional allocations and string objects, increasing GC activity.
113+
out = append(out,fields[0]+":"+strconv.Itoa(len(fields[1])),)
114+
}
115+
}
116+
// Save the result so the compiler cannot eliminate the work as unused.
117+
sink = out
118+
}
119+
// Stop benchmark timing.
120+
//
121+
// Everything below is measurement/reporting logic and should not affect benchmark performance results.
122+
b.StopTimer()
123+
124+
// Capture memory statistics after the benchmark completes.
125+
126+
var after runtime.MemStats
127+
runtime.ReadMemStats(&after)
128+
129+
// Number of benchmark operations executed.
130+
ops := float64(b.N)
131+
132+
// Total number of garbage collection cycles that occurred while the benchmark was running:
133+
134+
gcCycles := after.NumGC - before.NumGC
135+
136+
// Total "stop-the-world" pause time spent in GC. During these pauses, application execution
137+
// is temporarily halted while the runtime performs parts of garbage collection.
138+
139+
pauseNs := after.PauseTotalNs - before.PauseTotalNs
140+
141+
// Report GC events per benchmark operation. Example: 0.002 gc/op means one GC cycle
142+
// every 500 operations.
143+
144+
if ops > 0 {
145+
b.ReportMetric(float64(gcCycles)/ops, "gc/op")
146+
147+
// Report average GC pause time per operation.
148+
b.ReportMetric(float64(pauseNs)/ops, "stw-ns/op")
149+
}
150+
// If at least one GC occurred, report the average stop-the-world pause duration for each GC cycle.
151+
if gcCycles > 0 {
152+
b.ReportMetric(
153+
float64(pauseNs)/float64(gcCycles),
154+
"stw-ns/GC",
155+
)
156+
}
157+
158+
}
159+
EOF
160+
```
161+
162+
The benchmark code is now ready to run! Give it a try by running the following command:
163+
164+
```bash
165+
cd $HOME/go-gc-default
166+
go test ./parsebench -run '^$' -bench BenchmarkParseAndAllocate -benchmem -count 1 -benchtime=2s
167+
```
168+
169+
You should see output similar to below:
170+
171+
```output
172+
goos: linux
173+
goarch: arm64
174+
pkg: example.com/go-gc-default/parsebench
175+
BenchmarkParseAndAllocate-4 14014 170814 ns/op 0.04553 gc/op 102956 stw-ns/GC 4687 stw-ns/op 163840 B/op 4098 allocs/op
176+
PASS
177+
ok example.com/go-gc-default/parsebench 4.127s
178+
```
179+
180+
Your exact numbers will differ by instance type, Go version, operating system, and system load. If this test run yields results with no errors, you're ready to move on to the next step.
181+
182+
183+

0 commit comments

Comments
 (0)