golangJanuary 12, 2026

Memory Layout and Alignment in Go Structs

In Go programming, we often view structs as simple collections of fields—logical groupings of data that make our code readable and organised. You define a User struct with an ID, a flag for active status, and a timestamp. It works perfectly. But underneath the clean syntax, the hardware is playing a rigid game of Tetris with your data.

If you’ve ever wondered why your application consumes more memory than the sum of its parts or why a concurrent counter isn't scaling across cores as expected, the answer often lies in memory layout and alignment.

This article explores the invisible costs of struct padding, how to reclaim wasted memory, and advanced techniques for optimising CPU cache usage to squeeze every drop of performance from your Go applications.

The Problem Space: Why Hardware Cares About Shape

Memory is not a seamless, byte-perfect container. Modern CPUs read memory in "words"—chunks of 4 bytes (32-bit) or 8 bytes (64-bit). To access data efficiently, the hardware requires that data be "aligned" at specific offsets.

For example, on a 64-bit architecture, the CPU prefers to read an int64 from an address that is a multiple of 8. If your data straddles the boundary of two words, the CPU might need to perform two read cycles instead of one, or in some architectures, it might raise a hardware exception (crash).

To prevent this, the Go compiler inserts padding—invisible bytes that push subsequent fields to the next aligned address. While this ensures safety and speed, it creates "holes" in your structs, wasting memory.

The Granularity of Access

Consider the physical reality:

Byte: The smallest addressable unit.
Word: The natural unit of data for the processor (8 bytes on 64-bit).
Cache Line: The unit of data transferred between main memory and the CPU cache (typically 64 bytes).

Your code defines the logic; the alignment defines the physics. When these mismatch, you pay a tax in RAM usage and CPU cycles.

Investigating Layouts: The Unsafe Detective

Before we optimise, we must see the invisible. Go’s unsafe package allows us to inspect the size and offsets of struct fields, revealing exactly where the compiler is injecting padding.

Let's look at a "naive" struct where fields are ordered based on logical context rather than size:

package main

import (
    "fmt"
    "unsafe"
)

type BadStruct struct {
    isActive bool    // 1 byte
    id       int64   // 8 bytes
    isAdmin  bool    // 1 byte
}

func main() {
    var s BadStruct
    fmt.Printf("Size of BadStruct: %d bytes\n", unsafe.Sizeof(s))
    fmt.Printf("Offset of isActive: %d\n", unsafe.Offsetof(s.isActive))
    fmt.Printf("Offset of id:       %d\n", unsafe.Offsetof(s.id))
    fmt.Printf("Offset of isAdmin:  %d\n", unsafe.Offsetof(s.isAdmin))
}

Output Analysis

On a 64-bit architecture, you might expect the size to be $1 + 8 + 1 = 10$ bytes. However, the output tells a different story:

Size of BadStruct: 24 bytes
Offset of isActive: 0
Offset of id:       8
Offset of isAdmin:  16

What happened?

isActive takes 1 byte.
id requires 8-byte alignment. It cannot start at offset 1. So, the compiler adds 7 bytes of padding.
id starts at offset 8 and takes 8 bytes (ending at 16).
isAdmin takes 1 byte at offset 16.
The total struct size must be a multiple of the largest alignment guarantee (8 bytes). So, the compiler adds 7 more bytes at the end.

Total efficiency: 10 bytes of data, 14 bytes of padding. You are wasting 58% of the allocated memory.

Optimisation Level 1: Field Reordering

The simplest and most effective way to reduce struct size is field reordering. The rule of thumb is simple: Order fields from largest to smallest.

By grouping larger types (pointers, int64, float64) at the top, you ensure they align naturally without needing "gap" bytes before them. Smaller types (int32, bool, byte) can then fill the remaining space.

The `GoodStruct`

Let's apply this to our previous example:

type GoodStruct struct {
    id       int64   // 8 bytes
    isActive bool    // 1 byte
    isAdmin  bool    // 1 byte
}

New Layout:

id: Offset 0, takes 8 bytes.
isActive: Offset 8, takes 1 byte.
isAdmin: Offset 9, takes 1 byte.
Padding: To reach a multiple of 8, the compiler adds 6 bytes at the end.

Total Size: 16 bytes. Savings: 8 bytes per struct (33% reduction).

While 8 bytes seems trivial, if you are allocating a slice of 10 million such structs, you just saved 80 MB of RAM—potentially the difference between fitting in memory or crashing with an OOM (Out of Memory) error.

Automating with `fieldalignment`

You don't need to manually calculate offsets. The Go tooling ecosystem provides a linter specifically for this.

Installation:

go install golang.org/x/tools/go/analysis/passes/fieldalignment/cmd/fieldalignment@latest

Usage: Run it against your package to see suggestions:

fieldalignment ./...

The tool will report structs that can be compacted. It can even rewrite your files automatically using the -fix flag, though you should review changes to ensure logical grouping isn't sacrificed too heavily for minor gains.

Optimisation Level 2: CPU Cache Line Optimisation

Reordering saves memory, but sometimes we need to add memory to save time. This brings us to Cache Line Optimisation.

Modern CPUs load data in lines (usually 64 bytes). If two independent variables sit on the same cache line and are updated by different processor cores simultaneously, the cores fight over the cache line. This phenomenon is called False Sharing.

False Sharing CPU Cache Diagram Image credit: Diagram Placeholder - Illustration of False Sharing where two cores invalidate each other's cache lines.

The Scenario: Concurrent Counters

Imagine a struct tracking metrics for two distinct services, processed by separate goroutines:

type Metrics struct {
    ServiceACount uint64
    ServiceBCount uint64
}

These two fields are adjacent in memory.

Core 1 updates ServiceACount. It invalidates the cache line for Core 2.
Core 2 updates ServiceBCount. It invalidates the cache line for Core 1.
The cache coherence protocol (MESI) forces the cores to communicate, drastically slowing down write operations.

The Solution: Padding for Isolation

To fix this, we force the fields onto separate cache lines by inserting padding.

import "golang.org/x/sys/cpu"

type OptimizedMetrics struct {
    ServiceACount uint64
    _             [56]byte // Padding to fill the 64-byte line (8 + 56 = 64)
    ServiceBCount uint64
}

Note: In reality, cache line sizes can vary, but 64 bytes is the standard for x86-64. The _ [56]byte is a manual way to ensure ServiceBCount starts at offset 64 (assuming ServiceACount started at 0).

A more robust approach often used in high-performance libraries involves leveraging the cpu.CacheLinePad struct (if available in your architecture's support libraries) or defining a constant:

const CacheLineSize = 64

type OptimizedMetrics struct {
    ServiceACount uint64
    _             [CacheLineSize - 8]byte // Padding
    ServiceBCount uint64
}

The performance impact is real. Here is a conceptual comparison of throughput when incrementing counters concurrently:

Strategy	Throughput (Ops/sec)
Packed (False Sharing)	~15,000,000
Padded (No False Sharing)	~85,000,000

Note: Actual numbers vary by hardware, but 5x-10x improvements are common in write-heavy concurrent scenarios.

Common Pitfalls and Trade-offs

Optimisation is rarely free. Before you start rearranging every struct in your codebase, consider the trade-offs.

1. Readability vs. Efficiency

Logical grouping aids maintainability. Separating a ZipCode field from its Address struct just to save 2 bytes might confuse future developers.

Rule: Optimise "hot" structs (those allocated in the millions) or long-lived data structures. Ignore one-off config structs.

2. Architecture Dependency

The sizes of int and uint depend on the architecture (32-bit vs 64-bit). Hardcoding padding based on specific sizes might break alignment on different architectures (e.g., WASM or embedded ARM).

Tip: Use int64 / uint64 explicitly if you need consistent sizing, or trust unsafe.Alignof if writing generic padding logic.

3. Pointer Alignment

Pointers are 8 bytes on 64-bit systems. If your struct contains pointers (including strings and slices), they also have alignment requirements. The Garbage Collector also scans these pointers; compacting them together can sometimes slightly improve GC scanning speed (due to locality), but this is a micro-optimization.

Summary

Memory layout in Go is a subtle art that balances hardware requirements with software abstraction.

The Problem: The compiler adds padding to align fields to machine words, causing "holes" in your data.
The Quick Win: Reorder fields from largest to smallest (pointers/int64 -> int32 -> bool/byte). Use fieldalignment to spot opportunities.
The High-Performance Play: Use padding (_ [56]byte) to prevent false sharing on hot concurrent fields, ensuring they reside on different CPU cache lines.

By understanding these low-level details, you move from writing code that simply "works" to writing code that is sympathetic to the machine it runs on—efficient, compact, and blazingly fast.

Memory Layout and Alignment in Go Structs

The Problem Space: Why Hardware Cares About Shape

The Granularity of Access

Investigating Layouts: The Unsafe Detective

Output Analysis

Optimisation Level 1: Field Reordering

The `GoodStruct`

Automating with `fieldalignment`

Optimisation Level 2: CPU Cache Line Optimisation

The Scenario: Concurrent Counters

The Solution: Padding for Isolation

Common Pitfalls and Trade-offs

1. Readability vs. Efficiency

2. Architecture Dependency

3. Pointer Alignment

Summary

Additional Resources

Recent Posts

Efficient Error Handling with Go 1.13 Wrapping

Building Robust Data Transfer Objects in PHP 8.3

Implementing Value Objects in PHP

Read Next

Memory Layout and Alignment in Go Structs

Recent Posts

Efficient Error Handling with Go 1.13 Wrapping

Building Robust Data Transfer Objects in PHP 8.3

Implementing Value Objects in PHP

Read Next

Mastering Go Modules and Dependency Management

Optimizing Go Application Performance