Skip to content

Commit 405e35c

Browse files
committed
final draft
1 parent bd81c10 commit 405e35c

6 files changed

Lines changed: 257 additions & 118 deletions

File tree

content/learning-paths/servers-and-cloud-computing/go-gc-default-settings/_index.md

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,14 @@
11
---
2-
title: Measure Go GC behavior on AWS Graviton with default runtime settings
3-
description: Learn how to run a Go allocation benchmark on AWS Graviton and measure garbage collection behavior without changing Go runtime settings.
2+
title: Measure Go GC behavior on AWS Graviton
3+
description: Learn how to measure and observe Go garbage collection metrics on AWS Graviton instances.
44

55
minutes_to_complete: 75
66

7-
who_is_this_for: This Learning Path is for Go developers and performance engineers who want to measure garbage collection behavior on Arm servers without changing Go runtime GC settings.
7+
who_is_this_for: This Learning Path is for engineers interested in learning more about Go garbage collection (GC) behavior on Arm.
88

99
learning_objectives:
1010
- Select an AWS Graviton instance for repeatable Go GC measurements
1111
- Install Go and Benchstat on an Arm Linux server
12-
- Confirm that Go runtime tuning variables are unset
1312
- Run a Go benchmark that reports allocation, GC, and pause-time metrics
1413
- Capture CPU and heap profiles without changing GC behavior
1514

@@ -66,11 +65,3 @@ weight: 1 # _index.md always has weight of 1 to order corr
6665
layout: "learningpathall" # All files under learning paths have this same wrapper
6766
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
6867
---
69-
70-
## Measure default Go GC behavior on Arm servers
71-
72-
Go applications can spend meaningful time allocating memory and running garbage collection (GC). You should measure that behavior before you change runtime settings.
73-
74-
In this Learning Path, you run Go benchmarks on an AWS Graviton instance and keep the Go runtime in its default GC mode. You do not set `GOGC`, `GOMEMLIMIT`, `GODEBUG`, or `GOMAXPROCS`.
75-
76-
The goal is to build a clean baseline. You will measure operation time, allocation rate, GC frequency, GC pause cost, and profiles before making tuning decisions.

content/learning-paths/servers-and-cloud-computing/go-gc-default-settings/choose_aws_instance.md

Lines changed: 13 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -5,35 +5,31 @@ weight: 2
55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
8+
## What is Garbage Collection? (GC)
9+
Memory management is a critical aspects of application performance, and Garbage Collection (GC) plays a central role in automating that process. GC continuously identifies and removes objects that are no longer needed, freeing memory for re-use for other purposes..
810

9-
## Select an instance for Go GC measurements
11+
While this automation improves productivity and application safety, inefficient garbage collection can lead to increased CPU usage, longer response times, and unexpected application pauses.
1012

11-
Use an AWS Graviton instance that has enough CPU and memory to make Go runtime behavior visible, while keeping the Learning Path inexpensive to run.
13+
Tracking GC metrics provides a window into an application's memory health, helping engineers optimize performance, and ensuring the system can scale efficiently under load.
1214

13-
For the first prototype, use `m8g.xlarge`.
15+
## Measuring default Go GC behavior on Arm servers
1416

15-
`m8g.xlarge` is a good starting point because it provides four vCPUs and 16 GiB of memory on AWS Graviton4. Four vCPUs are enough to observe default Go CPU parallelism and GC worker behavior without requiring a large benchmark host. The 16 GiB memory size is enough for allocation-heavy benchmarks without immediately making the lab memory-bound.
17+
Go is one such language which implements GC. As Go applications can spend meaningful time allocating memory and running garbage collection, it is important to understand how the Go runtime behaves under default settings.
1618

17-
Avoid burstable `t4g` instances for this Learning Path. CPU credits can affect benchmark repeatability and make GC measurements harder to explain.
19+
In this Learning Path, you'll run Go benchmarks on an AWS Graviton instance. The goal is to build a clean baseline, measuring operation time, allocation rate, GC frequency, and GC pause cost.
1820

19-
If `m8g.xlarge` is not available in your AWS Region or Availability Zone, use `m7g.xlarge` as the fallback. It has the same vCPU and memory shape on an earlier Graviton generation, so the commands and benchmark workflow remain the same.
21+
## Selecting an instance for Go GC measurements
2022

21-
## Recommended prototype machine
23+
An AWS Graviton `m8g.xlarge` instance has enough CPU and memory to make Go runtime behavior visible, while keeping costs minimal. It's a good starting point as it provides four vCPUs and 16 GiB of memory on AWS Graviton4. If you choose to run this Learning Path on a different instance, make sure it has at least 4 vCPUs and 16 GiB of memory to ensure the benchmark runs smoothly and provides meaningful GC metrics.
2224

23-
Use this instance shape for the first version of the Learning Path:
24-
25-
| Purpose | Instance type | Processor | vCPUs | Memory |
26-
| --- | --- | --- | ---: | ---: |
27-
| Default prototype | `m8g.xlarge` | AWS Graviton4 | 4 | 16 GiB |
28-
| Fallback | `m7g.xlarge` | AWS Graviton3 | 4 | 16 GiB |
25+
Avoid burstable `t4g` instances as CPU credits can affect benchmark repeatability and make GC measurements harder to explain.
2926

3027
{{% notice Note %}}
3128
You can use larger instances, such as `m8g.2xlarge`, when you want more CPU width or more memory headroom. Start with `m8g.xlarge` so the first benchmark run is easy to reproduce and inexpensive.
3229
{{% /notice %}}
3330

34-
The commands in this Learning Path were validated on an `m8g.xlarge` instance running Ubuntu 24.04 LTS Arm64 and Go 1.26.3.
3531

36-
## Check instance availability
32+
## Checking instance availability
3733

3834
Use the AWS CLI to check whether `m8g.xlarge` is available in your selected Region.
3935

@@ -48,9 +44,7 @@ aws ec2 describe-instance-type-offerings \
4844
--output table
4945
```
5046

51-
If the command returns one or more Availability Zones, you can use `m8g.xlarge` in that Region.
52-
53-
Run the same command for `m7g.xlarge` if `m8g.xlarge` is not available:
47+
If the command returns one or more Availability Zones, you can use `m8g.xlarge` in that Region. If you are unable to find `m8g.xlarge` in your Region, you can try a different Region, or fallback to an 'm7g.xlarge' instance, which is based on the previous generation AWS Graviton3:
5448

5549
```console
5650
aws ec2 describe-instance-type-offerings \
@@ -61,4 +55,4 @@ aws ec2 describe-instance-type-offerings \
6155
--output table
6256
```
6357

64-
You have now selected a repeatable AWS Graviton test machine. You will confirm the default Go runtime environment before running the benchmark.
58+
Once you have chosen an instance type, provision it to run Ubuntu 24.04 LTS Arm64. Once the instance is running, and you are ssh'd into it, you can proceed to the next step.

content/learning-paths/servers-and-cloud-computing/go-gc-default-settings/create_gc_benchmark.md

Lines changed: 106 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -6,96 +6,167 @@ weight: 4
66
layout: learningpathall
77
---
88

9-
## Create a benchmark module
9+
## Creating a benchmark module
1010

11-
Create a small Go module for the benchmark:
11+
You'll first create a small Go benchmark module. The high-level flow is:
12+
13+
1. Generate a large input string.
14+
2. Repeatedly parse it and create new objects/strings.
15+
3. Force memory allocations so the garbage collector has work to do.
16+
4. Measure how long the workload takes.
17+
5. Measure how much GC activity occurred during the benchmark.
18+
6. Report both performance metrics and GC-related metrics.
19+
20+
Pasting the code below will create the module and benchmark file:
21+
22+
```bash
23+
24+
# Create the module directory and initialize it.
1225

13-
```console
1426
mkdir -p $HOME/go-gc-default/parsebench
1527
cd $HOME/go-gc-default
1628
go mod init example.com/go-gc-default
17-
```
1829

19-
Create the benchmark file:
30+
# Create the benchmark file:
2031

21-
```console
2232
cat > parsebench/parsebench_test.go <<'EOF'
2333
package parsebench
2434
2535
import (
36+
2637
"runtime"
2738
"strconv"
2839
"strings"
2940
"testing"
41+
3042
)
3143
44+
// Global variable used to store benchmark results.
45+
3246
var sink []string
3347
3448
func BenchmarkParseAndAllocate(b *testing.B) {
35-
payload := strings.Repeat("name=arm&runtime=go&gc=default&value=12345;", 2048)
3649
50+
// This simulates a large payload by creating a large test string by
51+
// repeating the same key=value data many times.
52+
//
53+
// Example:
54+
// name=arm&runtime=go&gc=default&value=12345;
55+
//
56+
57+
payload := strings.Repeat("name=arm&runtime=go&gc=default&value=12345;",2048)
58+
59+
// Next, we tell the benchmark framework to track memory allocations.
60+
//
61+
// This will show metrics such as allocations per operation, and bytes allocated per operation
62+
3763
b.ReportAllocs()
38-
64+
65+
// Capture runtime memory statistics before the benchmark starts. We will later compare these
66+
// values to see:
67+
// - how many garbage collections occurred
68+
// - how much pause time was spent in GC
69+
3970
var before runtime.MemStats
4071
runtime.ReadMemStats(&before)
41-
72+
73+
// Reset benchmark timing so that any setup work performed above will not be included
74+
// in the benchmark measurements.
75+
4276
b.ResetTimer()
77+
78+
// The benchmark loop is where the actual work is done. The number of times this loop is
79+
// executed is controlled by the b.N variable. The value of b.N is automatically chosen by
80+
// the Go benchmark framework to obtain stable and statistically useful measurements.
81+
82+
// The reason for this design is that timing a single operation is often unreliable; running
83+
// it many times reduces noise from:
84+
// * OS scheduling
85+
// * CPU frequency changes
86+
// * background processes
87+
4388
for i := 0; i < b.N; i++ {
89+
// split the large payload into individual records.
90+
// Example:
91+
// "a=1;b=2;c=3;" becomes: ["a=1", "b=2", "c=3", ""]
4492
parts := strings.Split(payload, ";")
93+
// Create a new slice to store parsed output. This allocation is intentional because we want
94+
// the benchmark to generate memory pressure and trigger garbage collection activity.
95+
4596
out := make([]string, 0, len(parts))
46-
97+
98+
// Process each record.
99+
47100
for _, part := range parts {
101+
// Ignore the empty string created by the trailing semicolon.
48102
if part == "" {
49103
continue
50104
}
105+
// Split the string into key and value.
106+
51107
fields := strings.SplitN(part, "=", 2)
108+
109+
// Make sure both key and value exist.
52110
if len(fields) == 2 {
53-
out = append(out, fields[0]+":"+strconv.Itoa(len(fields[1])))
111+
// Build a new string containing: key:length_of_value
112+
// This creates additional allocations and string objects, increasing GC activity.
113+
out = append(out,fields[0]+":"+strconv.Itoa(len(fields[1])),)
54114
}
55115
}
56-
116+
// Save the result so the compiler cannot eliminate the work as unused.
57117
sink = out
58118
}
119+
// Stop benchmark timing.
120+
//
121+
// Everything below is measurement/reporting logic and should not affect benchmark performance results.
59122
b.StopTimer()
60-
123+
124+
// Capture memory statistics after the benchmark completes.
125+
61126
var after runtime.MemStats
62127
runtime.ReadMemStats(&after)
63-
128+
129+
// Number of benchmark operations executed.
64130
ops := float64(b.N)
131+
132+
// Total number of garbage collection cycles that occurred while the benchmark was running:
133+
65134
gcCycles := after.NumGC - before.NumGC
135+
136+
// Total "stop-the-world" pause time spent in GC. During these pauses, application execution
137+
// is temporarily halted while the runtime performs parts of garbage collection.
138+
66139
pauseNs := after.PauseTotalNs - before.PauseTotalNs
67-
140+
141+
// Report GC events per benchmark operation. Example: 0.002 gc/op means one GC cycle
142+
// every 500 operations.
143+
68144
if ops > 0 {
69145
b.ReportMetric(float64(gcCycles)/ops, "gc/op")
146+
147+
// Report average GC pause time per operation.
70148
b.ReportMetric(float64(pauseNs)/ops, "stw-ns/op")
71149
}
150+
// If at least one GC occurred, report the average stop-the-world pause duration for each GC cycle.
72151
if gcCycles > 0 {
73-
b.ReportMetric(float64(pauseNs)/float64(gcCycles), "stw-ns/GC")
152+
b.ReportMetric(
153+
float64(pauseNs)/float64(gcCycles),
154+
"stw-ns/GC",
155+
)
74156
}
157+
75158
}
76159
EOF
77160
```
78161

79-
This benchmark repeatedly parses and allocates strings. It reports the default Go benchmark metrics plus three GC-specific metrics:
80-
81-
| Metric | Meaning |
82-
| --- | --- |
83-
| `gc/op` | GC cycles per completed benchmark operation |
84-
| `stw-ns/op` | GC stop-the-world pause nanoseconds per completed operation |
85-
| `stw-ns/GC` | GC stop-the-world pause nanoseconds per GC cycle |
162+
The benchmark code is now ready to run! Give it a try by running the following command:
86163

87-
The benchmark reads `runtime.MemStats` before and after the timed loop. It does not set Go runtime tuning variables.
88-
89-
## Confirm the benchmark builds
90-
91-
Run one short benchmark pass:
92-
93-
```console
164+
```bash
94165
cd $HOME/go-gc-default
95166
go test ./parsebench -run '^$' -bench BenchmarkParseAndAllocate -benchmem -count 1 -benchtime=2s
96167
```
97168

98-
You should see output with `ns/op`, `B/op`, `allocs/op`, and the GC-specific metrics:
169+
You should see output similar to below:
99170

100171
```output
101172
goos: linux
@@ -106,4 +177,7 @@ PASS
106177
ok example.com/go-gc-default/parsebench 4.127s
107178
```
108179

109-
Your exact numbers will differ by instance type, Go version, operating system, and system load.
180+
Your exact numbers will differ by instance type, Go version, operating system, and system load. If this test run yields results with no errors, you're ready to move on to the next step.
181+
182+
183+

0 commit comments

Comments
 (0)