Memory Layout and Alignment in Go Structs
In Go programming, we often view structs as simple collections of fields—logical groupings of data that make our code readable and organised. You define a User struct with an ID, a flag for active status, and a timestamp. It works perfectly. But underneath the clean syntax, the hardware is playing a rigid game of Tetris with your data.
If you’ve ever wondered why your application consumes more memory than the sum of its parts or why a concurrent counter isn't scaling across cores as expected, the answer often lies in memory layout and alignment.
This article explores the invisible costs of struct padding, how to reclaim wasted memory, and advanced techniques for optimising CPU cache usage to squeeze every drop of performance from your Go applications.
The Problem Space: Why Hardware Cares About Shape
Memory is not a seamless, byte-perfect container. Modern CPUs read memory in "words"—chunks of 4 bytes (32-bit) or 8 bytes (64-bit). To access data efficiently, the hardware requires that data be "aligned" at specific offsets.
For example, on a 64-bit architecture, the CPU prefers to read an int64 from an address that is a multiple of 8. If your data straddles the boundary of two words, the CPU might need to perform two read cycles instead of one, or in some architectures, it might raise a hardware exception (crash).
To prevent this, the Go compiler inserts padding—invisible bytes that push subsequent fields to the next aligned address. While this ensures safety and speed, it creates "holes" in your structs, wasting memory.
The Granularity of Access
Consider the physical reality:
- Byte: The smallest addressable unit.
- Word: The natural unit of data for the processor (8 bytes on 64-bit).
- Cache Line: The unit of data transferred between main memory and the CPU cache (typically 64 bytes).
Your code defines the logic; the alignment defines the physics. When these mismatch, you pay a tax in RAM usage and CPU cycles.
Investigating Layouts: The Unsafe Detective
Before we optimise, we must see the invisible. Go’s unsafe package allows us to inspect the size and offsets of struct fields, revealing exactly where the compiler is injecting padding.
Let's look at a "naive" struct where fields are ordered based on logical context rather than size:
package main
import (
"fmt"
"unsafe"
)
type BadStruct struct {
isActive bool // 1 byte
id int64 // 8 bytes
isAdmin bool // 1 byte
}
func main() {
var s BadStruct
fmt.Printf("Size of BadStruct: %d bytes\n", unsafe.Sizeof(s))
fmt.Printf("Offset of isActive: %d\n", unsafe.Offsetof(s.isActive))
fmt.Printf("Offset of id: %d\n", unsafe.Offsetof(s.id))
fmt.Printf("Offset of isAdmin: %d\n", unsafe.Offsetof(s.isAdmin))
}
Output Analysis
On a 64-bit architecture, you might expect the size to be $1 + 8 + 1 = 10$ bytes. However, the output tells a different story:
Size of BadStruct: 24 bytes
Offset of isActive: 0
Offset of id: 8
Offset of isAdmin: 16
What happened?
isActivetakes 1 byte.idrequires 8-byte alignment. It cannot start at offset 1. So, the compiler adds 7 bytes of padding.idstarts at offset 8 and takes 8 bytes (ending at 16).isAdmintakes 1 byte at offset 16.- The total struct size must be a multiple of the largest alignment guarantee (8 bytes). So, the compiler adds 7 more bytes at the end.
Total efficiency: 10 bytes of data, 14 bytes of padding. You are wasting 58% of the allocated memory.
Optimisation Level 1: Field Reordering
The simplest and most effective way to reduce struct size is field reordering. The rule of thumb is simple: Order fields from largest to smallest.
By grouping larger types (pointers, int64, float64) at the top, you ensure they align naturally without needing "gap" bytes before them. Smaller types (int32, bool, byte) can then fill the remaining space.
The GoodStruct
Let's apply this to our previous example:
type GoodStruct struct {
id int64 // 8 bytes
isActive bool // 1 byte
isAdmin bool // 1 byte
}
New Layout:
id: Offset 0, takes 8 bytes.isActive: Offset 8, takes 1 byte.isAdmin: Offset 9, takes 1 byte.- Padding: To reach a multiple of 8, the compiler adds 6 bytes at the end.
Total Size: 16 bytes. Savings: 8 bytes per struct (33% reduction).
While 8 bytes seems trivial, if you are allocating a slice of 10 million such structs, you just saved 80 MB of RAM—potentially the difference between fitting in memory or crashing with an OOM (Out of Memory) error.
Automating with fieldalignment
You don't need to manually calculate offsets. The Go tooling ecosystem provides a linter specifically for this.
Installation:
go install golang.org/x/tools/go/analysis/passes/fieldalignment/cmd/fieldalignment@latest
Usage: Run it against your package to see suggestions:
fieldalignment ./...
The tool will report structs that can be compacted. It can even rewrite your files automatically using the -fix flag, though you should review changes to ensure logical grouping isn't sacrificed too heavily for minor gains.
Optimisation Level 2: CPU Cache Line Optimisation
Reordering saves memory, but sometimes we need to add memory to save time. This brings us to Cache Line Optimisation.
Modern CPUs load data in lines (usually 64 bytes). If two independent variables sit on the same cache line and are updated by different processor cores simultaneously, the cores fight over the cache line. This phenomenon is called False Sharing.
Image credit: Diagram Placeholder - Illustration of False Sharing where two cores invalidate each other's cache lines.
The Scenario: Concurrent Counters
Imagine a struct tracking metrics for two distinct services, processed by separate goroutines:
type Metrics struct {
ServiceACount uint64
ServiceBCount uint64
}
These two fields are adjacent in memory.
- Core 1 updates
ServiceACount. It invalidates the cache line for Core 2. - Core 2 updates
ServiceBCount. It invalidates the cache line for Core 1. - The cache coherence protocol (MESI) forces the cores to communicate, drastically slowing down write operations.
The Solution: Padding for Isolation
To fix this, we force the fields onto separate cache lines by inserting padding.
import "golang.org/x/sys/cpu"
type OptimizedMetrics struct {
ServiceACount uint64
_ [56]byte // Padding to fill the 64-byte line (8 + 56 = 64)
ServiceBCount uint64
}
Note: In reality, cache line sizes can vary, but 64 bytes is the standard for x86-64. The _ [56]byte is a manual way to ensure ServiceBCount starts at offset 64 (assuming ServiceACount started at 0).
A more robust approach often used in high-performance libraries involves leveraging the cpu.CacheLinePad struct (if available in your architecture's support libraries) or defining a constant:
const CacheLineSize = 64
type OptimizedMetrics struct {
ServiceACount uint64
_ [CacheLineSize - 8]byte // Padding
ServiceBCount uint64
}
Benchmark: False Sharing
The performance impact is real. Here is a conceptual comparison of throughput when incrementing counters concurrently:
| Strategy | Throughput (Ops/sec) |
|---|---|
| Packed (False Sharing) | ~15,000,000 |
| Padded (No False Sharing) | ~85,000,000 |
Note: Actual numbers vary by hardware, but 5x-10x improvements are common in write-heavy concurrent scenarios.
Common Pitfalls and Trade-offs
Optimisation is rarely free. Before you start rearranging every struct in your codebase, consider the trade-offs.
1. Readability vs. Efficiency
Logical grouping aids maintainability. Separating a ZipCode field from its Address struct just to save 2 bytes might confuse future developers.
- Rule: Optimise "hot" structs (those allocated in the millions) or long-lived data structures. Ignore one-off config structs.
2. Architecture Dependency
The sizes of int and uint depend on the architecture (32-bit vs 64-bit). Hardcoding padding based on specific sizes might break alignment on different architectures (e.g., WASM or embedded ARM).
- Tip: Use
int64/uint64explicitly if you need consistent sizing, or trustunsafe.Alignofif writing generic padding logic.
3. Pointer Alignment
Pointers are 8 bytes on 64-bit systems. If your struct contains pointers (including strings and slices), they also have alignment requirements. The Garbage Collector also scans these pointers; compacting them together can sometimes slightly improve GC scanning speed (due to locality), but this is a micro-optimization.
Summary
Memory layout in Go is a subtle art that balances hardware requirements with software abstraction.
- The Problem: The compiler adds padding to align fields to machine words, causing "holes" in your data.
- The Quick Win: Reorder fields from largest to smallest (pointers/
int64->int32->bool/byte). Usefieldalignmentto spot opportunities. - The High-Performance Play: Use padding (
_ [56]byte) to prevent false sharing on hot concurrent fields, ensuring they reside on different CPU cache lines.
By understanding these low-level details, you move from writing code that simply "works" to writing code that is sympathetic to the machine it runs on—efficient, compact, and blazingly fast.