Advanced GoLang: Generics, Optimization, CGO & Runtime Internals | Mehran Khanjan

Generics (Go 1.18+)

Type Parameters

Type parameters allow functions and types to work with any type specified at compile time, declared in square brackets before the function parameters, enabling code reuse without sacrificing type safety.

func Print[T any](value T) {
    fmt.Println(value)
}
Print[int](42)      // Explicit type
Print("hello")      // Type inferred

Type Constraints

Type constraints restrict what types can be used as type arguments, defined as interfaces that specify the required methods or types, ensuring the generic code can only use operations available on the constrained types.

type Number interface {
    int | int64 | float64
}
func Sum[T Number](a, b T) T {
    return a + b  // Works because all Number types support +
}

Interface Constraints

Interfaces can now include both method signatures AND type lists, combining behavioral requirements with underlying type restrictions to create powerful constraints for generic functions.

type Stringer interface {
    ~string               // Underlying type constraint
    String() string       // Method constraint
}

Type Sets

Every interface now defines a type set—the set of all types that implement it; for method-only interfaces, this is infinite, but type unions create finite sets, fundamentally changing how Go thinks about interfaces.

┌─────────────────────────────────────┐
│        interface{ int | string }    │
│  Type Set = { int, string }         │
├─────────────────────────────────────┤
│        interface{ Read([]byte) }    │
│  Type Set = { all types with Read } │
└─────────────────────────────────────┘

Type Inference

Go's compiler automatically infers type arguments from function arguments in most cases, reducing verbosity while maintaining full type safety—explicit type arguments are only needed when inference is ambiguous.

func Map[T, U any](s []T, f func(T) U) []U { ... }

// Type inference in action:
result := Map([]int{1,2,3}, func(x int) string {
    return strconv.Itoa(x)
})  // T=int, U=string inferred automatically

Generic Functions

Generic functions accept type parameters, enabling algorithms that work across types; the type parameter is resolved at compile time, generating specialized code with zero runtime overhead.

func Filter[T any](slice []T, predicate func(T) bool) []T {
    result := make([]T, 0)
    for _, v := range slice {
        if predicate(v) {
            result = append(result, v)
        }
    }
    return result
}

Generic Types

Structs, interfaces, and other type definitions can have type parameters, enabling type-safe containers and data structures without interface{}/any boxing overhead.

type Stack[T any] struct {
    items []T
}
func (s *Stack[T]) Push(item T) { s.items = append(s.items, item) }
func (s *Stack[T]) Pop() T {
    item := s.items[len(s.items)-1]
    s.items = s.items[:len(s.items)-1]
    return item
}

Generic Methods Limitations

Methods cannot have their own type parameters beyond those declared on the receiver type—this is a deliberate design decision to avoid complexity in method sets and interface satisfaction.

type Container[T any] struct{ value T }

// ✅ Valid - uses type's parameter
func (c Container[T]) Get() T { return c.value }

// ❌ Invalid - methods can't add new type params
// func (c Container[T]) Convert[U any]() U { }

// ✅ Workaround: use a function
func Convert[T, U any](c Container[T], f func(T) U) U {
    return f(c.value)
}

comparable Constraint

The built-in comparable constraint includes all types that support == and != operators, essential for map keys and equality-based algorithms; it's predeclared and cannot be redefined.

func Contains[T comparable](slice []T, target T) bool {
    for _, v := range slice {
        if v == target {  // Requires comparable
            return true
        }
    }
    return false
}
// Works: Contains([]int{1,2,3}, 2)
// Fails: Contains([][]int{...}, target) // slices not comparable

any Constraint

any is an alias for interface{} introduced in Go 1.18, representing the empty type set constraint that accepts all types; use it when you need maximum flexibility with no operation requirements.

// These are identical:
func Process[T any](v T) { }
func Process[T interface{}](v T) { }

// any is also useful outside generics:
var data any = 42
data = "now a string"  // Replaces interface{}

Ordered Types

Ordered types support comparison operators (<, <=, >, >=) and are essential for sorting, min/max, and binary search algorithms; defined in the experimental constraints package.

type Ordered interface {
    ~int | ~int8 | ~int16 | ~int32 | ~int64 |
    ~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 |
    ~float32 | ~float64 | ~string
}
func Min[T Ordered](a, b T) T {
    if a < b { return a }
    return b
}

constraints Package

The golang.org/x/exp/constraints package provides common constraint definitions like Ordered, Signed, Unsigned, Integer, and Float—currently experimental but widely used pending stdlib inclusion.

import "golang.org/x/exp/constraints"

func Abs[T constraints.Signed](x T) T {
    if x < 0 { return -x }
    return x
}
func Sum[T constraints.Integer | constraints.Float](nums ...T) T {
    var sum T
    for _, n := range nums { sum += n }
    return sum
}

Generic Data Structures

Generics enable type-safe, reusable data structures without runtime type assertions; common implementations include linked lists, trees, heaps, and graphs that work with any type.

type Node[T any] struct {
    Value T
    Next  *Node[T]
}
type LinkedList[T any] struct {
    Head *Node[T]
    Len  int
}
func (l *LinkedList[T]) Append(value T) {
    node := &Node[T]{Value: value}
    if l.Head == nil {
        l.Head = node
    } else {
        curr := l.Head
        for curr.Next != nil { curr = curr.Next }
        curr.Next = node
    }
    l.Len++
}

Generic Algorithms

Generic algorithms implement common patterns (map, filter, reduce, sort) once and reuse across types, eliminating code duplication while maintaining type safety and performance.

func Map[T, U any](s []T, f func(T) U) []U {
    r := make([]U, len(s))
    for i, v := range s { r[i] = f(v) }
    return r
}
func Reduce[T, U any](s []T, init U, f func(U, T) U) U {
    acc := init
    for _, v := range s { acc = f(acc, v) }
    return acc
}
// Usage: sum := Reduce([]int{1,2,3}, 0, func(a, b int) int { return a+b })

Workspaces (Go 1.18+)

go.work File

The go.work file defines a workspace containing multiple modules, allowing you to work on interdependent modules simultaneously without publishing or using replace directives in go.mod.

// go.work
go 1.21

use (
    ./api
    ./common
    ./services/auth
)

replace example.com/old => ./legacy

Multi-module Workspaces

Workspaces solve the multi-module development pain point where you need to modify several related modules together; changes in one module are immediately visible to others without publishing.

┌─────────────────────────────────────┐
│           my-workspace/             │
├─────────────────────────────────────┤
│  go.work                            │
│  ├── api/                           │
│  │   ├── go.mod (module api)        │
│  │   └── api.go                     │
│  ├── common/                        │
│  │   ├── go.mod (module common)     │
│  │   └── utils.go                   │
│  └── cmd/                           │
│      ├── go.mod (requires api,common)│
│      └── main.go                    │
└─────────────────────────────────────┘

go work init

go work init creates a new go.work file in the current directory, optionally adding specified modules; this is the starting point for setting up a multi-module workspace.

# Create empty workspace
go work init

# Create workspace with modules
go work init ./moduleA ./moduleB

# Result: go.work file created
# go 1.21
# use (
#     ./moduleA
#     ./moduleB
# )

go work use

go work use adds modules to an existing workspace, updating the go.work file; it can add individual modules or recursively discover all modules in a directory tree.

# Add single module
go work use ./newmodule

# Add all modules recursively
go work use -r .

# After running, go.work is updated:
# use (
#     ./existing
#     ./newmodule
# )

go work sync

go work sync synchronizes the workspace's dependency requirements back to each module's go.mod file, ensuring consistency when modules have interdependencies.

go work sync

# What it does:
# 1. Computes minimal version requirements
# 2. Updates each module's go.mod
# 3. Ensures go.sum files are consistent
#
# ┌─────────┐    ┌─────────┐    ┌─────────┐
# │ go.work │───▶│ sync    │───▶│ go.mod  │
# └─────────┘    └─────────┘    │ go.mod  │
#                               │ go.mod  │
#                               └─────────┘

Workspace Benefits

Workspaces enable seamless multi-module development, eliminate replace directive hacks, improve IDE support across modules, and make monorepo-style development in Go practical and clean.

Benefits:
┌────────────────────────────────────────────┐
│ ✅ No more replace directives in go.mod   │
│ ✅ Changes visible immediately across mods│
│ ✅ Single go.work, not per-module hacks   │
│ ✅ IDE understands cross-module refs      │
│ ✅ go.work not committed (local dev only) │
│ ✅ CI uses published modules as normal    │
└────────────────────────────────────────────┘

Performance Optimization

Profiling (CPU, memory, block, mutex)

Go provides built-in profiling for CPU time, memory allocations, goroutine blocking, and mutex contention; these profiles identify hotspots and guide optimization efforts with real data.

import _ "net/http/pprof"

func main() {
    go func() {
        log.Println(http.ListenAndServe(":6060", nil))
    }()
    // Your app...
}

// Profile types:
// /debug/pprof/profile   - CPU (30s default)
// /debug/pprof/heap      - Memory allocations
// /debug/pprof/block     - Goroutine blocking
// /debug/pprof/mutex     - Mutex contention
// /debug/pprof/goroutine - Goroutine stacks

pprof Package

The runtime/pprof package provides programmatic control over profiling, allowing you to start/stop profiles, write them to files, and integrate profiling into test suites.

import "runtime/pprof"

func main() {
    // CPU profile
    f, _ := os.Create("cpu.prof")
    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()
    
    // Run workload...
    
    // Heap profile (snapshot)
    h, _ := os.Create("heap.prof")
    pprof.WriteHeapProfile(h)
}

go tool pprof

go tool pprof analyzes profile data interactively or generates visualizations; it can read from files, URLs, or compare profiles to measure optimization impact.

# Interactive analysis
go tool pprof cpu.prof
(pprof) top10           # Top 10 functions
(pprof) list funcName   # Source-annotated view
(pprof) web             # Open graph in browser

# One-liner visualizations
go tool pprof -http=:8080 cpu.prof  # Web UI
go tool pprof -png cpu.prof > cpu.png

# Profile running server
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

Runtime Profiling

Runtime profiling captures data from production systems with minimal overhead; use runtime.SetBlockProfileRate, runtime.SetMutexProfileFraction to control sampling rates.

import "runtime"

func init() {
    // Enable block profiling (1 = all events)
    runtime.SetBlockProfileRate(1)
    
    // Enable mutex profiling (fraction of events)
    runtime.SetMutexProfileFraction(5)
    
    // Control memory profiling rate
    runtime.MemProfileRate = 512 * 1024  // Sample every 512KB
}

Continuous Profiling

Continuous profiling collects low-overhead profiles in production over time, enabling historical analysis of performance trends; tools like Google Cloud Profiler, Pyroscope, or Parca integrate with Go apps.

// Example: Google Cloud Profiler
import "cloud.google.com/go/profiler"

func main() {
    cfg := profiler.Config{
        Service:        "my-service",
        ServiceVersion: "1.0.0",
        ProjectID:      "my-project",
    }
    if err := profiler.Start(cfg); err != nil {
        log.Fatal(err)
    }
    // Profiles automatically uploaded to Cloud Console
}

Benchmark-driven Optimization

Write benchmarks first, then optimize; Go's testing package provides reliable measurement with automatic iteration count adjustment and memory allocation statistics.

func BenchmarkConcat(b *testing.B) {
    b.Run("Plus", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            _ = "hello" + "world"
        }
    })
    b.Run("Builder", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            var sb strings.Builder
            sb.WriteString("hello")
            sb.WriteString("world")
            _ = sb.String()
        }
    })
}
// Run: go test -bench=. -benchmem

Escape Analysis

Escape analysis determines whether variables can live on the stack (fast) or must escape to the heap (slower, requires GC); understanding it helps write allocation-efficient code.

go build -gcflags="-m" main.go

# Output examples:
# ./main.go:10: x escapes to heap     ← BAD: heap alloc
# ./main.go:15: y does not escape     ← GOOD: stack alloc

# Common escape causes:
# - Returning pointer to local var
# - Storing in interface{}
# - Sending pointer to channel
# - Captured by closure

func NoEscape() int {
    x := 42      // Stack allocated
    return x     // Copied, x doesn't escape
}
func Escapes() *int {
    x := 42      // Heap allocated!
    return &x    // Pointer escapes function
}

Inlining

Inlining replaces function calls with the function body, eliminating call overhead and enabling further optimizations; the compiler inlines small, simple functions automatically.

go build -gcflags="-m" main.go
# ./main.go:5: can inline add
# ./main.go:10: inlining call to add

# Control inlining:
//go:noinline
func mustNotInline() { }

# Check inlining budget:
go build -gcflags="-m=2" main.go

// Likely inlined (simple, small)
func add(a, b int) int { return a + b }

// Won't inline (too complex, has loop)
func sum(nums []int) int {
    total := 0
    for _, n := range nums { total += n }
    return total
}

Bounds Check Elimination

The compiler eliminates redundant array/slice bounds checks when it can prove access is safe; explicit length checks before loops help the compiler optimize.

// BCE not possible - checked each iteration
func slow(s []int) int {
    sum := 0
    for i := 0; i < 100; i++ {
        sum += s[i]  // Bounds check every time
    }
    return sum
}

// BCE applied - compiler knows it's safe
func fast(s []int) int {
    if len(s) < 100 { return 0 }
    s = s[:100]  // Hint to compiler
    sum := 0
    for i := 0; i < 100; i++ {
        sum += s[i]  // No bounds check!
    }
    return sum
}

Reducing Allocations

Minimize heap allocations by reusing objects, using value types, avoiding interface conversions, and preallocating slices/maps; fewer allocations mean less GC pressure.

// ❌ Allocates each call
func process() *Result {
    return &Result{data: make([]byte, 1024)}
}

// ✅ Reuse with sync.Pool
var resultPool = sync.Pool{
    New: func() any { return &Result{data: make([]byte, 1024)} },
}
func processPooled() *Result {
    r := resultPool.Get().(*Result)
    // Use r...
    return r
}
func done(r *Result) {
    r.Reset()
    resultPool.Put(r)
}

String Concatenation Optimization

For multiple string concatenations, use strings.Builder instead of +; it minimizes allocations by growing a single buffer, dramatically improving performance for loops.

// ❌ O(n²) - each + creates new string
func slowConcat(parts []string) string {
    result := ""
    for _, p := range parts {
        result += p  // Allocates new string each time!
    }
    return result
}

// ✅ O(n) - single buffer, grows efficiently
func fastConcat(parts []string) string {
    var sb strings.Builder
    sb.Grow(1024)  // Pre-size if known
    for _, p := range parts {
        sb.WriteString(p)
    }
    return sb.String()
}

Slice Pre-allocation

Pre-allocate slices to their expected capacity using make([]T, 0, cap) to avoid repeated reallocations and copies during append operations.

// ❌ Multiple reallocations
func slow(n int) []int {
    var result []int
    for i := 0; i < n; i++ {
        result = append(result, i)  // Grows: 0→1→2→4→8→16...
    }
    return result
}

// ✅ Single allocation
func fast(n int) []int {
    result := make([]int, 0, n)  // Pre-allocate capacity
    for i := 0; i < n; i++ {
        result = append(result, i)  // No reallocation
    }
    return result
}

Map Pre-allocation

Pre-allocate maps with expected size using make(map[K]V, size) to reduce rehashing; this is especially important for large maps built in loops.

// ❌ Multiple rehashes as map grows
func slow(keys []string) map[string]int {
    m := make(map[string]int)  // Default small size
    for i, k := range keys {
        m[k] = i  // Triggers rehash when growing
    }
    return m
}

// ✅ Single allocation, no rehashing
func fast(keys []string) map[string]int {
    m := make(map[string]int, len(keys))  // Pre-size
    for i, k := range keys {
        m[k] = i
    }
    return m
}

sync.Pool Usage

sync.Pool provides per-CPU caches for reusing temporary objects, reducing GC pressure; objects may be garbage collected between uses, so pools are best for short-lived, frequently-allocated objects.

var bufPool = sync.Pool{
    New: func() any {
        return make([]byte, 4096)
    },
}

func processRequest(data []byte) {
    buf := bufPool.Get().([]byte)
    defer bufPool.Put(buf)
    
    // Use buf for temporary work...
    copy(buf, data)
    // Process...
}

// ┌─────────────────────────────────────┐
// │ sync.Pool Behavior                  │
// │ • Per-P (processor) local pools     │
// │ • Objects may be GC'd anytime       │
// │ • Best for short-lived allocs       │
// │ • Not for connection pooling!       │
// └─────────────────────────────────────┘

Object Pooling

For expensive objects (buffers, connections, compiled regexes), implement custom pools with explicit lifecycle management when sync.Pool's GC-friendly semantics don't fit.

type ConnPool struct {
    conns chan *Connection
    max   int
}

func NewConnPool(max int, factory func() *Connection) *ConnPool {
    p := &ConnPool{conns: make(chan *Connection, max), max: max}
    for i := 0; i < max; i++ {
        p.conns <- factory()
    }
    return p
}

func (p *ConnPool) Get() *Connection {
    return <-p.conns  // Blocks if empty
}

func (p *ConnPool) Put(c *Connection) {
    select {
    case p.conns <- c:  // Return to pool
    default:            // Pool full, discard
        c.Close()
    }
}

Memory Management

Stack vs Heap Allocation

Stack allocation is fast (pointer bump) and automatically freed; heap allocation involves the allocator and GC. Go's compiler decides placement via escape analysis—prefer values over pointers when possible.

┌────────────────────────────────────────────────┐
│   STACK (fast)           HEAP (slower)         │
├────────────────────────────────────────────────┤
│ • Fixed size per goroutine (2KB-1GB)           │
│ • Auto cleanup on function return              │
│ • No GC overhead                               │
│                                                │
│   func foo() {           func bar() *int {    │
│       x := 42  ← STACK       x := 42 ← HEAP   │
│       use(x)                 return &x        │
│   }                      }                     │
└────────────────────────────────────────────────┘

Escape Analysis

Escape analysis is the compiler's process of determining if a variable's lifetime exceeds its scope; if it does, the variable "escapes" to the heap. Use -gcflags="-m" to see decisions.

// View escape decisions:
// go build -gcflags="-m" .

func stackOnly() {
    data := [1000]int{}  // Stays on stack
    process(data[:])
}

func escapesToHeap() *[]int {
    data := make([]int, 1000)  // Escapes!
    return &data
}

// Common escape triggers:
// 1. Returning pointers to local variables
// 2. Storing in package-level variable
// 3. Sending to channel
// 4. Storing in interface value
// 5. Closure capturing by reference

GC Tuning (GOGC)

GOGC controls GC frequency as a percentage of heap growth before next GC; default is 100 (double heap triggers GC). Lower values reduce memory, higher values reduce CPU overhead.

# Default: GC when heap doubles
GOGC=100 ./myapp

# Less memory, more CPU (GC at 50% growth)
GOGC=50 ./myapp

# More memory, less CPU (GC at 200% growth)
GOGC=200 ./myapp

# Disable GC (dangerous!)
GOGC=off ./myapp

import "runtime/debug"

// Set programmatically
debug.SetGCPercent(50)

// Go 1.19+: Memory limit (soft)
debug.SetMemoryLimit(1 << 30)  // 1GB

GC Trace

Enable GC tracing with GODEBUG=gctrace=1 to see detailed GC activity including pause times, heap sizes, and CPU utilization—essential for diagnosing GC-related performance issues.

GODEBUG=gctrace=1 ./myapp

# Output format:
# gc 1 @0.012s 2%: 0.026+0.42+0.005 ms clock, 0.21+0.35/0.42/0+0.041 ms cpu, 4->4->0 MB, 5 MB goal, 8 P

# gc 1       - GC cycle number
# @0.012s    - Time since start
# 2%         - CPU used by GC
# 0.026+...  - STW + concurrent + STW times
# 4->4->0 MB - Heap before→after→live
# 5 MB goal  - Target heap size
# 8 P        - Number of processors

Memory Ballast

Memory ballast is a pre-allocated but unused chunk of heap that delays GC triggering; it reduces GC frequency in services with low live heap but high allocation rate—less relevant since Go 1.19's SetMemoryLimit.

func main() {
    // Old trick: allocate 10GB ballast
    ballast := make([]byte, 10<<30)
    _ = ballast  // Keep alive
    
    // With GOGC=100, GC triggers at ~20GB
    // instead of when actual heap doubles
    
    // Go 1.19+ preferred approach:
    debug.SetMemoryLimit(10 << 30)
}

Memory Profiling

Memory profiling captures allocation counts and sizes at sampled callsites; use go tool pprof to analyze heap profiles and identify allocation hotspots.

import "runtime/pprof"

func captureHeapProfile() {
    f, _ := os.Create("heap.prof")
    defer f.Close()
    runtime.GC()  // Get accurate live objects
    pprof.WriteHeapProfile(f)
}

# Analyze
go tool pprof heap.prof
(pprof) top           # Top allocators
(pprof) alloc_space   # Total bytes allocated
(pprof) inuse_space   # Currently held bytes

# Live profiling
go tool pprof http://localhost:6060/debug/pprof/heap

Memory Leaks Detection

Go memory leaks are typically goroutine leaks (goroutines blocked forever holding references); profile goroutines and heap over time to detect growth patterns.

// Common leak patterns:

// 1. Blocked goroutine
go func() {
    <-ch  // If ch never receives, goroutine leaks
}()

// 2. Forgotten timer
ticker := time.NewTicker(time.Second)
// Missing: defer ticker.Stop()

// Detection:
func debugGoroutines(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, "Goroutines: %d\n", runtime.NumGoroutine())
    pprof.Lookup("goroutine").WriteTo(w, 1)
}

// Tool: goleak in tests
func TestNoLeak(t *testing.T) {
    defer goleak.VerifyNone(t)
    // test code...
}

Finalizers (runtime.SetFinalizer)

Finalizers run when an object is about to be garbage collected; use sparingly for releasing external resources—they add GC overhead and have no timing guarantees.

type Resource struct {
    handle int
}

func NewResource() *Resource {
    r := &Resource{handle: openExternal()}
    runtime.SetFinalizer(r, func(r *Resource) {
        closeExternal(r.handle)  // Cleanup
        fmt.Println("Resource finalized")
    })
    return r
}

// ⚠️ Finalizer caveats:
// • No guarantee when (or if!) it runs
// • Adds GC overhead
// • Object survives one extra GC cycle
// • Prefer explicit Close() methods

Weak References (not native)

Go doesn't have native weak references; use sync.Map with periodic cleanup, or wait for the upcoming runtime/weak package (proposed). Custom solutions risk fighting the GC.

// Workaround: manual cache with TTL
type WeakCache struct {
    mu    sync.RWMutex
    items map[string]cacheEntry
}

type cacheEntry struct {
    value     any
    expiresAt time.Time
}

func (c *WeakCache) cleanup() {
    c.mu.Lock()
    defer c.mu.Unlock()
    now := time.Now()
    for k, v := range c.items {
        if now.After(v.expiresAt) {
            delete(c.items, k)
        }
    }
}

// Note: Go 1.24 will add runtime/weak package!

Assembly in Go

Plan 9 Assembly

Go uses Plan 9 assembly syntax, which differs from Intel/AT&T conventions; it's platform-agnostic with pseudo-registers and Go-specific conventions for interoperability with the Go runtime.

┌────────────────────────────────────────────────┐
│ Plan 9 Pseudo-Registers                        │
├────────────────────────────────────────────────┤
│ FP - Frame Pointer (args)    first_arg+0(FP)  │
│ SP - Stack Pointer (locals)  local_var-8(SP)  │
│ SB - Static Base (globals)   symbol(SB)       │
│ PC - Program Counter                           │
└────────────────────────────────────────────────┘

// Example: add.s
TEXT ·Add(SB), NOSPLIT, $0-24
    MOVQ    a+0(FP), AX    // Load first arg
    MOVQ    b+8(FP), BX    // Load second arg
    ADDQ    BX, AX          // Add
    MOVQ    AX, ret+16(FP)  // Store return
    RET

Function Calls in Assembly

Assembly functions follow Go's calling convention; arguments and return values are passed via the stack (referenced through FP), and functions must match their Go declarations exactly.

// add.go
package math
func Add(a, b int64) int64

// add_amd64.s
#include "textflag.h"

// func Add(a, b int64) int64
TEXT ·Add(SB), NOSPLIT, $0-24
    // Args at FP offsets: a+0, b+8, ret+16
    MOVQ    a+0(FP), AX
    MOVQ    b+8(FP), BX
    ADDQ    BX, AX
    MOVQ    AX, ret+16(FP)
    RET

Go Assembly Syntax

Go assembly uses specific syntax for sizes (B=1, W=2, L=4, Q=8 bytes), memory references, and directives; understanding TEXT, DATA, GLOBL directives is essential.

#include "textflag.h"

// TEXT: define function
// ·Name = package.Name (· is middle dot)
// (SB) = relative to static base
// NOSPLIT = no stack split check
// $0-24 = 0 local bytes, 24 arg bytes

TEXT ·Swap(SB), NOSPLIT, $0-16
    MOVQ    x+0(FP), AX    // Q = quadword (8 bytes)
    MOVQ    y+8(FP), BX
    MOVQ    BX, x+0(FP)
    MOVQ    AX, y+8(FP)
    RET

// DATA: define data
DATA msg<>+0(SB)/8, $"Hello, W"
DATA msg<>+8(SB)/5, $"orld"
GLOBL msg<>(SB), RODATA, $13

When to Use Assembly

Use assembly only when profiling proves it's necessary: SIMD operations, crypto primitives, or hot paths where the compiler generates suboptimal code; 99% of Go code never needs assembly.

┌─────────────────────────────────────────────┐
│ When to Consider Assembly:                  │
├─────────────────────────────────────────────┤
│ ✅ CPU-specific optimizations (SIMD/AVX)   │
│ ✅ Crypto (AES-NI, SHA extensions)          │
│ ✅ Hot inner loops (after profiling!)       │
│ ✅ Accessing CPU features (CPUID, etc)      │
├─────────────────────────────────────────────┤
│ ❌ Premature optimization                   │
│ ❌ Simple operations (compiler is good!)    │
│ ❌ Portability is important                 │
│ ❌ Maintainability matters                  │
└─────────────────────────────────────────────┘

Assembly File Naming

Assembly files must follow naming conventions: name_GOOS_GOARCH.s for OS/arch-specific, name_GOARCH.s for arch-only, or name.s for generic; the build system selects appropriate files automatically.

project/
├── add.go            # Go declaration
├── add_amd64.s       # AMD64 implementation
├── add_arm64.s       # ARM64 implementation
└── add_generic.go    # Fallback (with build tag)

// add_generic.go
//go:build !amd64 && !arm64

package math

func Add(a, b int64) int64 {
    return a + b
}

CGO

CGO Basics

CGO enables calling C code from Go and vice versa; import the pseudo-package "C" with preceding C code in a comment block, but use it sparingly due to complexity and performance costs.

package main

/*
#include <stdio.h>
#include <stdlib.h>

void hello(const char* name) {
    printf("Hello, %s!\n", name);
}
*/
import "C"
import "unsafe"

func main() {
    name := C.CString("Gopher")
    defer C.free(unsafe.Pointer(name))
    C.hello(name)
}

C Code in Go

C code can be written directly in the comment block before import "C" or in separate .c files in the same package; the Go build system compiles and links C code automatically.

/*
#include <math.h>

// Define C function inline
double circleArea(double radius) {
    return M_PI * radius * radius;
}
*/
import "C"
import "fmt"

func main() {
    area := C.circleArea(5.0)
    fmt.Printf("Area: %f\n", area)
}

Go Code in C

Export Go functions to C using //export directive; these can be called from C code in the same package or from external C code linking against the Go library.

package main

import "C"
import "fmt"

//export GoCallback
func GoCallback(x C.int) C.int {
    fmt.Printf("Go received: %d\n", int(x))
    return x * 2
}

/*
#include <stdio.h>
extern int GoCallback(int);

void callGo() {
    int result = GoCallback(21);
    printf("C received: %d\n", result);
}
*/
import "C"

func main() {
    C.callGo()
}

Calling C Functions

C functions are accessed through the C pseudo-package; pass Go values converted to C types, and always free memory allocated by C when you're done.

/*
#include <stdlib.h>
#include <string.h>

char* duplicate(const char* s) {
    return strdup(s);
}
*/
import "C"
import "unsafe"

func main() {
    input := C.CString("hello")
    defer C.free(unsafe.Pointer(input))
    
    output := C.duplicate(input)
    defer C.free(unsafe.Pointer(output))
    
    goString := C.GoString(output)
    println(goString)
}

C Types in Go

CGO automatically maps C types to Go types; use C.int, C.double, etc., with explicit conversions between Go and C types. Complex types require more care.

/*
#include <stdint.h>

typedef struct {
    int32_t x;
    int32_t y;
} Point;
*/
import "C"

func main() {
    // Numeric types
    var i C.int = 42
    var d C.double = 3.14
    goInt := int(i)
    
    // Struct
    p := C.Point{x: 10, y: 20}
    
    // String conversion
    cs := C.CString("hello")   // Go→C (must free!)
    gs := C.GoString(cs)       // C→Go
    
    // Bytes
    data := []byte{1, 2, 3}
    cdata := (*C.char)(unsafe.Pointer(&data[0]))
}

Memory Management with CGO

Go's GC doesn't track C memory; you must manually free C allocations, and passing Go pointers to C requires following strict rules to prevent GC from moving or collecting them.

/*
#include <stdlib.h>
*/
import "C"
import "unsafe"

func main() {
    // C allocated - YOU must free
    cstr := C.CString("hello")
    defer C.free(unsafe.Pointer(cstr))
    
    cmem := C.malloc(100)
    defer C.free(cmem)
    
    // Go allocated - GC manages
    goSlice := make([]byte, 100)
    // Pass to C: only if C won't store it!
    C.use((*C.char)(unsafe.Pointer(&goSlice[0])))
}

// ⚠️ Rules for passing Go pointers to C:
// 1. Go pointer can only point to Go memory
// 2. C cannot keep Go pointer after call returns
// 3. Go memory cannot contain Go pointers passed to C

Performance Implications

CGO calls have significant overhead (~150ns vs ~1ns for Go calls) due to stack switching, scheduler coordination, and Go→C ABI translation; batch operations when possible.

┌────────────────────────────────────────────────┐
│         CGO Call Overhead                      │
├────────────────────────────────────────────────┤
│ Pure Go call:    ~1-2 ns                       │
│ CGO call:        ~100-200 ns (50-100x slower!) │
├────────────────────────────────────────────────┤
│ Why so slow:                                   │
│ • Save/restore Go scheduler state              │
│ • Switch to system stack                       │
│ • Coordinate with GC (write barriers)          │
│ • ABI translation                              │
├────────────────────────────────────────────────┤
│ Mitigation:                                    │
│ • Batch multiple operations into one CGO call  │
│ • Do work in Go when possible                  │
│ • Consider pure Go alternatives                │
└────────────────────────────────────────────────┘

Cross-compilation with CGO

Cross-compiling with CGO requires a C cross-compiler for the target platform; this is complex and often requires Docker or target-native builds. Consider CGO-free alternatives.

# Without CGO: easy cross-compilation
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build

# With CGO: need cross-compiler
CGO_ENABLED=1 \
CC=aarch64-linux-gnu-gcc \
GOOS=linux GOARCH=arm64 \
go build

# Easier: build on target or use Docker
docker run --rm -v "$PWD":/app -w /app \
    golang:1.21 go build

# Check CGO status
go env CGO_ENABLED

CGO_ENABLED Flag

CGO_ENABLED controls whether CGO is available; 0 produces static binaries without C dependencies, 1 enables CGO. Some stdlib packages (net, os/user) have CGO implementations.

# Disable CGO (static, portable)
CGO_ENABLED=0 go build -o app-static

# Enable CGO (default on many systems)
CGO_ENABLED=1 go build -o app-dynamic

# Force pure Go stdlib implementations
CGO_ENABLED=0 go build -tags netgo,osusergo

# Check if binary uses CGO
ldd ./app-static   # "not a dynamic executable"
ldd ./app-dynamic  # lists libc, etc.

Build Constraints

Use build constraints to provide CGO and non-CGO implementations; this allows your package to work with or without CGO available.

// impl_cgo.go
//go:build cgo

package mypackage

/*
#include <somelib.h>
*/
import "C"

func DoWork() {
    C.someFunction()
}

// impl_nocgo.go
//go:build !cgo

package mypackage

func DoWork() {
    // Pure Go implementation
}

Build Process

go build Flags

go build accepts numerous flags to control compilation, output, and optimization; key flags include -o (output), -v (verbose), -race (detector), -trimpath (reproducible builds).

# Common flags
go build -o myapp                    # Output name
go build -v                          # Verbose
go build -race                       # Race detector
go build -trimpath                   # Remove file paths
go build -mod=readonly               # Don't update go.mod
go build -buildvcs=false             # No VCS info

# Combine
go build -v -race -o myapp ./cmd/server

# Build all packages
go build ./...

# Build specific OS/arch
GOOS=linux GOARCH=amd64 go build

-ldflags

-ldflags passes flags to the linker; commonly used to inject version info at build time and strip debug symbols for smaller binaries.

# Inject version at build time
go build -ldflags "-X main.Version=1.2.3 \
                   -X 'main.BuildTime=$(date)' \
                   -X main.Commit=$(git rev-parse HEAD)"

# Strip debug info (smaller binary)
go build -ldflags "-s -w"

# Combined
go build -ldflags "-s -w -X main.Version=1.0.0"

package main

var (
    Version   = "dev"
    BuildTime = "unknown"
    Commit    = "none"
)

func main() {
    fmt.Printf("Version: %s, Built: %s\n", Version, BuildTime)
}

-gcflags

-gcflags passes flags to the Go compiler; useful for debugging (disable optimization), analysis (escape analysis), and seeing compiler decisions.

# Escape analysis
go build -gcflags="-m"        # Basic
go build -gcflags="-m=2"      # Detailed

# Disable optimizations (for debugging)
go build -gcflags="-N -l"
# -N: disable optimizations
# -l: disable inlining

# Pass to specific packages
go build -gcflags="main=-m" ./...

# See assembly
go build -gcflags="-S" 2>&1 | head -100

-tags (Build Tags)

Build tags allow conditional compilation; files or code blocks are included only when specified tags match. Use -tags to specify which tags to include.

go build -tags="prod,metrics"
go build -tags="integration"

// +build tag syntax (old, still works)
// logging_debug.go
// +build debug

// go:build syntax (Go 1.17+, preferred)
// logging_prod.go
//go:build prod

package logging

// Combined with boolean logic
//go:build (linux && amd64) || darwin
//go:build !windows
//go:build cgo && sqlite

Build Constraints (//go:build)

Build constraints control when files are included based on OS, arch, compiler, tags, and Go version; the //go:build line must be first (before package) with a blank line after.

//go:build linux && amd64

package main

// File only included for linux/amd64

//go:build ignore

// File always excluded (useful for examples, generators)

Valid constraints:
┌────────────────────────────────────────────┐
│ GOOS:   linux, darwin, windows, etc.       │
│ GOARCH: amd64, arm64, wasm, etc.          │
│ Compiler: gc, gccgo                        │
│ Version: go1.21, go1.22                    │
│ cgo: cgo or !cgo                           │
│ Custom: -tags=mytag                        │
│ Boolean: && (and), || (or), ! (not)        │
└────────────────────────────────────────────┘

Cross-compilation

Go supports cross-compilation out of the box—set GOOS and GOARCH to target any supported platform. CGO_ENABLED=0 for pure Go ensures this works without C toolchains.

# Common targets
GOOS=linux   GOARCH=amd64  go build -o app-linux-amd64
GOOS=darwin  GOARCH=arm64  go build -o app-macos-arm64
GOOS=windows GOARCH=amd64  go build -o app.exe

# List all supported combinations
go tool dist list

# WebAssembly
GOOS=js GOARCH=wasm go build -o app.wasm

# Build script for all platforms
#!/bin/bash
for os in linux darwin windows; do
    for arch in amd64 arm64; do
        GOOS=$os GOARCH=$arch go build -o "app-$os-$arch"
    done
done

GOOS and GOARCH

GOOS specifies the target operating system, GOARCH the architecture; Go supports many combinations, and you can check valid pairs with go tool dist list.

# Current settings
go env GOOS GOARCH

# Common GOOS values:
# linux, darwin (macOS), windows, freebsd,
# android, ios, js (WebAssembly)

# Common GOARCH values:
# amd64 (x86-64), arm64 (Apple M1, AWS Graviton)
# arm, 386, wasm, riscv64

# Practical examples:
GOOS=linux GOARCH=arm64    # AWS Graviton
GOOS=darwin GOARCH=arm64   # Apple Silicon
GOOS=linux GOARCH=arm      # Raspberry Pi
GOOS=js GOARCH=wasm        # Browser

Reducing Binary Size

Combine multiple techniques to reduce Go binary size: strip symbols, disable DWARF, use UPX compression, and avoid unnecessary dependencies.

# Baseline
go build -o app                    # ~10MB

# Strip symbol table and DWARF
go build -ldflags="-s -w" -o app  # ~7MB

# Use trimpath (also helps)
go build -ldflags="-s -w" -trimpath -o app

# UPX compression (after build)
upx --best app                     # ~2-3MB

# Check what's in binary
go tool nm app | head
go tool objdump -s main app

# Analyze size by package
go build -o app
go tool nm -size app | sort -n

Stripping Debug Info

Use -ldflags="-s -w" to strip symbol table (-s) and DWARF debug info (-w); this significantly reduces binary size but removes stack traces' file/line info.

# With debug info (default)
go build -o app-debug
ls -lh app-debug  # 12M

# Stripped
go build -ldflags="-s -w" -o app-stripped
ls -lh app-stripped  # 8M

# Trade-offs:
# ✅ Smaller binary (20-30% reduction)
# ✅ Faster load time
# ❌ Panic stack traces less detailed
# ❌ Harder to debug with delve
# ❌ No line numbers in profiles

UPX Compression

UPX (Ultimate Packer for eXecutables) compresses binaries with decompression at startup; useful for size-constrained deployments but adds startup latency.

# Install UPX
apt install upx  # Linux
brew install upx # macOS

# Compress (after go build)
go build -ldflags="-s -w" -o app
upx --best app
# or for max compression:
upx --ultra-brute app

# Before: 8.0MB
# After:  2.5MB

# ⚠️ Caveats:
# • Startup time increases (~50-100ms)
# • Some antivirus flag UPX binaries
# • Memory usage slightly higher
# • Can't be used with -race or profiling

go generate

go generate runs commands specified in source files; commonly used for code generation (stringer, mockgen, protobuf), embedding, or build-time processing.

//go:generate stringer -type=Status
//go:generate mockgen -source=interface.go -destination=mock.go
//go:generate protoc --go_out=. schema.proto

package main

type Status int

const (
    Pending Status = iota
    Running
    Complete
)

# Run all generate directives
go generate ./...

# Run for specific package
go generate ./models

# Common generators:
# • stringer - String() for constants
# • mockgen  - Mock implementations
# • protoc   - Protocol buffers
# • go-bindata - Embed files (legacy)
# • sqlc     - Type-safe SQL

Code Generation

Code generation creates Go source files programmatically; use text/template, go/ast, or dedicated tools to generate repetitive, type-specific, or derived code at build time.

// gen.go
//go:build ignore

package main

import (
    "os"
    "text/template"
)

var tmpl = `// Code generated; DO NOT EDIT.
package {{.Package}}

{{range .Types}}
func ({{.Name}}) Is{{.Name}}() {}
{{end}}
`

func main() {
    t := template.Must(template.New("").Parse(tmpl))
    f, _ := os.Create("generated.go")
    t.Execute(f, struct {
        Package string
        Types   []struct{ Name string }
    }{
        Package: "mypackage",
        Types:   []struct{ Name string }{{Name: "Foo"}, {Name: "Bar"}},
    })
}

//go:generate go run gen.go

Embedded Files (embed Package)

The embed package (Go 1.16+) embeds files into the binary at compile time; use //go:embed directives to include static assets, templates, or configuration.

package main

import (
    "embed"
    "fmt"
    "net/http"
)

//go:embed version.txt
var version string

//go:embed templates/*.html
var templates embed.FS

//go:embed static/*
var static embed.FS

func main() {
    fmt.Println("Version:", version)
    
    // Serve embedded files
    http.Handle("/static/", http.FileServer(http.FS(static)))
    
    // Read embedded file
    data, _ := templates.ReadFile("templates/index.html")
    fmt.Println(string(data))
}

project/
├── main.go
├── version.txt
├── templates/
│   └── index.html
└── static/
    ├── style.css
    └── app.js

Compiler Directives

//go:noinline

//go:noinline prevents the compiler from inlining a function; useful for benchmarking (prevent optimization) or when inlining would cause issues with stack inspection.

//go:noinline
func mustNotInline(x int) int {
    return x * 2
}

// Use cases:
// • Accurate benchmarking (prevent dead code elimination)
// • Debugging (preserve function boundaries)
// • Stack trace requirements
// • Preventing code bloat for large hot functions

//go:inline (Hint)

Go doesn't have //go:inline—inlining is automatic based on complexity heuristics. You can encourage inlining by keeping functions small and simple; use -gcflags="-m" to verify.

// The compiler inlines automatically based on:
// • Function size (small = likely inline)
// • No loops (complex control flow = no inline)
// • No defer (usually prevents inline)
// • Not recursive

// This WILL likely inline (simple, small)
func add(a, b int) int { return a + b }

// This WON'T inline (has loop)
func sum(s []int) int {
    t := 0
    for _, v := range s { t += v }
    return t
}

// Check: go build -gcflags="-m"
// output: "can inline add"

//go:noescape

//go:noescape tells the compiler that a function's pointer arguments don't escape; used for assembly functions or special cases where escape analysis can't determine safety.

// Typically used for assembly functions
//go:noescape
func asmFunction(p *byte)

// The directive promises:
// • Pointers passed to this function won't be stored
// • Pointers won't be returned
// • Allows stack allocation of arguments

// ⚠️ Dangerous if misused!
// Only use when you KNOW the implementation doesn't escape

//go:linkname

//go:linkname links a local variable/function to a symbol in another package; bypasses Go's visibility rules. Extremely dangerous, used in stdlib internals.

package main

import (
    _ "unsafe"  // Required for linkname
)

// Link to runtime's internal function
//go:linkname nanotime runtime.nanotime
func nanotime() int64

func main() {
    println(nanotime())  // Access internal function
}

// ⚠️ DANGEROUS:
// • Bypasses API stability guarantees
// • Can break on Go version updates
// • Not for production code
// • Requires unsafe import

//go:generate

//go:generate specifies commands to run with go generate; any text after the directive is executed as a shell command with the package directory as working directory.

//go:generate stringer -type=Pill
//go:generate mockgen -source=service.go -destination=mock_service.go
//go:generate go run scripts/gen.go

package pharmacy

type Pill int

const (
    Placebo Pill = iota
    Aspirin
    Ibuprofen
)

# Run generators
go generate ./...

# Special variables in generate commands:
# $GOFILE   - Current file
# $GOLINE   - Line number
# $GOPACKAGE - Package name
# $DOLLAR   - Literal $

//go:embed

//go:embed embeds files/directories into the binary; the directive must precede a variable of type string, []byte, or embed.FS. Patterns support glob syntax.

import _ "embed"

//go:embed config.json
var config []byte

//go:embed version.txt
var version string  // Without newline

//go:embed templates/*
var templates embed.FS

//go:embed assets/images/*.png assets/fonts/*.ttf
var assets embed.FS

// Pattern rules:
// • No .. or absolute paths
// • Patterns like *.txt, dir/*, dir/**
// • Hidden files (_*) excluded by default
// • Use all:dir/* to include hidden files

//go:build

//go:build specifies build constraints for the file using boolean expressions; it replaced the older // +build syntax and must be the first line (before package clause).

//go:build linux && amd64

package main

// Only compiled for linux/amd64

//go:build !windows

// Compiled for all OS except Windows

//go:build (linux || darwin) && cgo

// Compiled for linux or macOS with CGO enabled

//go:build go1.21

// Only compiled with Go 1.21 or later

//go:build integration

// Only with: go test -tags=integration

// Valid operators: && (and), || (or), ! (not), ()
// Terms: GOOS, GOARCH, compiler, cgo, go version, custom tags