Back to Articles
55 min read

Advanced GoLang: Generics, Optimization, CGO & Runtime Internals

The deep end of the pool. A comprehensive guide to Go's most advanced features: from writing generic data structures and dissecting memory allocation strategies to cross-compilation and reducing binary sizes.

Generics (Go 1.18+)

Type Parameters

Type parameters allow functions and types to work with any type specified at compile time, declared in square brackets before the function parameters, enabling code reuse without sacrificing type safety.

func Print[T any](value T) { fmt.Println(value) } Print[int](42) // Explicit type Print("hello") // Type inferred

Type Constraints

Type constraints restrict what types can be used as type arguments, defined as interfaces that specify the required methods or types, ensuring the generic code can only use operations available on the constrained types.

type Number interface { int | int64 | float64 } func Sum[T Number](a, b T) T { return a + b // Works because all Number types support + }

Interface Constraints

Interfaces can now include both method signatures AND type lists, combining behavioral requirements with underlying type restrictions to create powerful constraints for generic functions.

type Stringer interface { ~string // Underlying type constraint String() string // Method constraint }

Type Sets

Every interface now defines a type set—the set of all types that implement it; for method-only interfaces, this is infinite, but type unions create finite sets, fundamentally changing how Go thinks about interfaces.

┌─────────────────────────────────────┐
│        interface{ int | string }    │
│  Type Set = { int, string }         │
├─────────────────────────────────────┤
│        interface{ Read([]byte) }    │
│  Type Set = { all types with Read } │
└─────────────────────────────────────┘

Type Inference

Go's compiler automatically infers type arguments from function arguments in most cases, reducing verbosity while maintaining full type safety—explicit type arguments are only needed when inference is ambiguous.

func Map[T, U any](s []T, f func(T) U) []U { ... } // Type inference in action: result := Map([]int{1,2,3}, func(x int) string { return strconv.Itoa(x) }) // T=int, U=string inferred automatically

Generic Functions

Generic functions accept type parameters, enabling algorithms that work across types; the type parameter is resolved at compile time, generating specialized code with zero runtime overhead.

func Filter[T any](slice []T, predicate func(T) bool) []T { result := make([]T, 0) for _, v := range slice { if predicate(v) { result = append(result, v) } } return result }

Generic Types

Structs, interfaces, and other type definitions can have type parameters, enabling type-safe containers and data structures without interface{}/any boxing overhead.

type Stack[T any] struct { items []T } func (s *Stack[T]) Push(item T) { s.items = append(s.items, item) } func (s *Stack[T]) Pop() T { item := s.items[len(s.items)-1] s.items = s.items[:len(s.items)-1] return item }

Generic Methods Limitations

Methods cannot have their own type parameters beyond those declared on the receiver type—this is a deliberate design decision to avoid complexity in method sets and interface satisfaction.

type Container[T any] struct{ value T } // ✅ Valid - uses type's parameter func (c Container[T]) Get() T { return c.value } // ❌ Invalid - methods can't add new type params // func (c Container[T]) Convert[U any]() U { } // ✅ Workaround: use a function func Convert[T, U any](c Container[T], f func(T) U) U { return f(c.value) }

comparable Constraint

The built-in comparable constraint includes all types that support == and != operators, essential for map keys and equality-based algorithms; it's predeclared and cannot be redefined.

func Contains[T comparable](slice []T, target T) bool { for _, v := range slice { if v == target { // Requires comparable return true } } return false } // Works: Contains([]int{1,2,3}, 2) // Fails: Contains([][]int{...}, target) // slices not comparable

any Constraint

any is an alias for interface{} introduced in Go 1.18, representing the empty type set constraint that accepts all types; use it when you need maximum flexibility with no operation requirements.

// These are identical: func Process[T any](v T) { } func Process[T interface{}](v T) { } // any is also useful outside generics: var data any = 42 data = "now a string" // Replaces interface{}

Ordered Types

Ordered types support comparison operators (<, <=, >, >=) and are essential for sorting, min/max, and binary search algorithms; defined in the experimental constraints package.

type Ordered interface { ~int | ~int8 | ~int16 | ~int32 | ~int64 | ~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 | ~float32 | ~float64 | ~string } func Min[T Ordered](a, b T) T { if a < b { return a } return b }

constraints Package

The golang.org/x/exp/constraints package provides common constraint definitions like Ordered, Signed, Unsigned, Integer, and Float—currently experimental but widely used pending stdlib inclusion.

import "golang.org/x/exp/constraints" func Abs[T constraints.Signed](x T) T { if x < 0 { return -x } return x } func Sum[T constraints.Integer | constraints.Float](nums ...T) T { var sum T for _, n := range nums { sum += n } return sum }

Generic Data Structures

Generics enable type-safe, reusable data structures without runtime type assertions; common implementations include linked lists, trees, heaps, and graphs that work with any type.

type Node[T any] struct { Value T Next *Node[T] } type LinkedList[T any] struct { Head *Node[T] Len int } func (l *LinkedList[T]) Append(value T) { node := &Node[T]{Value: value} if l.Head == nil { l.Head = node } else { curr := l.Head for curr.Next != nil { curr = curr.Next } curr.Next = node } l.Len++ }

Generic Algorithms

Generic algorithms implement common patterns (map, filter, reduce, sort) once and reuse across types, eliminating code duplication while maintaining type safety and performance.

func Map[T, U any](s []T, f func(T) U) []U { r := make([]U, len(s)) for i, v := range s { r[i] = f(v) } return r } func Reduce[T, U any](s []T, init U, f func(U, T) U) U { acc := init for _, v := range s { acc = f(acc, v) } return acc } // Usage: sum := Reduce([]int{1,2,3}, 0, func(a, b int) int { return a+b })

Workspaces (Go 1.18+)

go.work File

The go.work file defines a workspace containing multiple modules, allowing you to work on interdependent modules simultaneously without publishing or using replace directives in go.mod.

// go.work go 1.21 use ( ./api ./common ./services/auth ) replace example.com/old => ./legacy

Multi-module Workspaces

Workspaces solve the multi-module development pain point where you need to modify several related modules together; changes in one module are immediately visible to others without publishing.

┌─────────────────────────────────────┐ │ my-workspace/ │ ├─────────────────────────────────────┤ │ go.work │ │ ├── api/ │ │ │ ├── go.mod (module api) │ │ │ └── api.go │ │ ├── common/ │ │ │ ├── go.mod (module common) │ │ │ └── utils.go │ │ └── cmd/ │ │ ├── go.mod (requires api,common)│ │ └── main.go │ └─────────────────────────────────────┘

go work init

go work init creates a new go.work file in the current directory, optionally adding specified modules; this is the starting point for setting up a multi-module workspace.

# Create empty workspace go work init # Create workspace with modules go work init ./moduleA ./moduleB # Result: go.work file created # go 1.21 # use ( # ./moduleA # ./moduleB # )

go work use

go work use adds modules to an existing workspace, updating the go.work file; it can add individual modules or recursively discover all modules in a directory tree.

# Add single module go work use ./newmodule # Add all modules recursively go work use -r . # After running, go.work is updated: # use ( # ./existing # ./newmodule # )

go work sync

go work sync synchronizes the workspace's dependency requirements back to each module's go.mod file, ensuring consistency when modules have interdependencies.

go work sync # What it does: # 1. Computes minimal version requirements # 2. Updates each module's go.mod # 3. Ensures go.sum files are consistent # # ┌─────────┐ ┌─────────┐ ┌─────────┐ # │ go.work │───▶│ sync │───▶│ go.mod │ # └─────────┘ └─────────┘ │ go.mod │ # │ go.mod │ # └─────────┘

Workspace Benefits

Workspaces enable seamless multi-module development, eliminate replace directive hacks, improve IDE support across modules, and make monorepo-style development in Go practical and clean.

Benefits: ┌────────────────────────────────────────────┐ │ ✅ No more replace directives in go.mod │ │ ✅ Changes visible immediately across mods│ │ ✅ Single go.work, not per-module hacks │ │ ✅ IDE understands cross-module refs │ │ ✅ go.work not committed (local dev only) │ │ ✅ CI uses published modules as normal │ └────────────────────────────────────────────┘

Performance Optimization

Profiling (CPU, memory, block, mutex)

Go provides built-in profiling for CPU time, memory allocations, goroutine blocking, and mutex contention; these profiles identify hotspots and guide optimization efforts with real data.

import _ "net/http/pprof" func main() { go func() { log.Println(http.ListenAndServe(":6060", nil)) }() // Your app... } // Profile types: // /debug/pprof/profile - CPU (30s default) // /debug/pprof/heap - Memory allocations // /debug/pprof/block - Goroutine blocking // /debug/pprof/mutex - Mutex contention // /debug/pprof/goroutine - Goroutine stacks

pprof Package

The runtime/pprof package provides programmatic control over profiling, allowing you to start/stop profiles, write them to files, and integrate profiling into test suites.

import "runtime/pprof" func main() { // CPU profile f, _ := os.Create("cpu.prof") pprof.StartCPUProfile(f) defer pprof.StopCPUProfile() // Run workload... // Heap profile (snapshot) h, _ := os.Create("heap.prof") pprof.WriteHeapProfile(h) }

go tool pprof

go tool pprof analyzes profile data interactively or generates visualizations; it can read from files, URLs, or compare profiles to measure optimization impact.

# Interactive analysis go tool pprof cpu.prof (pprof) top10 # Top 10 functions (pprof) list funcName # Source-annotated view (pprof) web # Open graph in browser # One-liner visualizations go tool pprof -http=:8080 cpu.prof # Web UI go tool pprof -png cpu.prof > cpu.png # Profile running server go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

Runtime Profiling

Runtime profiling captures data from production systems with minimal overhead; use runtime.SetBlockProfileRate, runtime.SetMutexProfileFraction to control sampling rates.

import "runtime" func init() { // Enable block profiling (1 = all events) runtime.SetBlockProfileRate(1) // Enable mutex profiling (fraction of events) runtime.SetMutexProfileFraction(5) // Control memory profiling rate runtime.MemProfileRate = 512 * 1024 // Sample every 512KB }

Continuous Profiling

Continuous profiling collects low-overhead profiles in production over time, enabling historical analysis of performance trends; tools like Google Cloud Profiler, Pyroscope, or Parca integrate with Go apps.

// Example: Google Cloud Profiler import "cloud.google.com/go/profiler" func main() { cfg := profiler.Config{ Service: "my-service", ServiceVersion: "1.0.0", ProjectID: "my-project", } if err := profiler.Start(cfg); err != nil { log.Fatal(err) } // Profiles automatically uploaded to Cloud Console }

Benchmark-driven Optimization

Write benchmarks first, then optimize; Go's testing package provides reliable measurement with automatic iteration count adjustment and memory allocation statistics.

func BenchmarkConcat(b *testing.B) { b.Run("Plus", func(b *testing.B) { for i := 0; i < b.N; i++ { _ = "hello" + "world" } }) b.Run("Builder", func(b *testing.B) { for i := 0; i < b.N; i++ { var sb strings.Builder sb.WriteString("hello") sb.WriteString("world") _ = sb.String() } }) } // Run: go test -bench=. -benchmem

Escape Analysis

Escape analysis determines whether variables can live on the stack (fast) or must escape to the heap (slower, requires GC); understanding it helps write allocation-efficient code.

go build -gcflags="-m" main.go # Output examples: # ./main.go:10: x escapes to heap ← BAD: heap alloc # ./main.go:15: y does not escape ← GOOD: stack alloc # Common escape causes: # - Returning pointer to local var # - Storing in interface{} # - Sending pointer to channel # - Captured by closure
func NoEscape() int { x := 42 // Stack allocated return x // Copied, x doesn't escape } func Escapes() *int { x := 42 // Heap allocated! return &x // Pointer escapes function }

Inlining

Inlining replaces function calls with the function body, eliminating call overhead and enabling further optimizations; the compiler inlines small, simple functions automatically.

go build -gcflags="-m" main.go # ./main.go:5: can inline add # ./main.go:10: inlining call to add # Control inlining: //go:noinline func mustNotInline() { } # Check inlining budget: go build -gcflags="-m=2" main.go
// Likely inlined (simple, small) func add(a, b int) int { return a + b } // Won't inline (too complex, has loop) func sum(nums []int) int { total := 0 for _, n := range nums { total += n } return total }

Bounds Check Elimination

The compiler eliminates redundant array/slice bounds checks when it can prove access is safe; explicit length checks before loops help the compiler optimize.

// BCE not possible - checked each iteration func slow(s []int) int { sum := 0 for i := 0; i < 100; i++ { sum += s[i] // Bounds check every time } return sum } // BCE applied - compiler knows it's safe func fast(s []int) int { if len(s) < 100 { return 0 } s = s[:100] // Hint to compiler sum := 0 for i := 0; i < 100; i++ { sum += s[i] // No bounds check! } return sum }

Reducing Allocations

Minimize heap allocations by reusing objects, using value types, avoiding interface conversions, and preallocating slices/maps; fewer allocations mean less GC pressure.

// ❌ Allocates each call func process() *Result { return &Result{data: make([]byte, 1024)} } // ✅ Reuse with sync.Pool var resultPool = sync.Pool{ New: func() any { return &Result{data: make([]byte, 1024)} }, } func processPooled() *Result { r := resultPool.Get().(*Result) // Use r... return r } func done(r *Result) { r.Reset() resultPool.Put(r) }

String Concatenation Optimization

For multiple string concatenations, use strings.Builder instead of +; it minimizes allocations by growing a single buffer, dramatically improving performance for loops.

// ❌ O(n²) - each + creates new string func slowConcat(parts []string) string { result := "" for _, p := range parts { result += p // Allocates new string each time! } return result } // ✅ O(n) - single buffer, grows efficiently func fastConcat(parts []string) string { var sb strings.Builder sb.Grow(1024) // Pre-size if known for _, p := range parts { sb.WriteString(p) } return sb.String() }

Slice Pre-allocation

Pre-allocate slices to their expected capacity using make([]T, 0, cap) to avoid repeated reallocations and copies during append operations.

// ❌ Multiple reallocations func slow(n int) []int { var result []int for i := 0; i < n; i++ { result = append(result, i) // Grows: 0→1→2→4→8→16... } return result } // ✅ Single allocation func fast(n int) []int { result := make([]int, 0, n) // Pre-allocate capacity for i := 0; i < n; i++ { result = append(result, i) // No reallocation } return result }

Map Pre-allocation

Pre-allocate maps with expected size using make(map[K]V, size) to reduce rehashing; this is especially important for large maps built in loops.

// ❌ Multiple rehashes as map grows func slow(keys []string) map[string]int { m := make(map[string]int) // Default small size for i, k := range keys { m[k] = i // Triggers rehash when growing } return m } // ✅ Single allocation, no rehashing func fast(keys []string) map[string]int { m := make(map[string]int, len(keys)) // Pre-size for i, k := range keys { m[k] = i } return m }

sync.Pool Usage

sync.Pool provides per-CPU caches for reusing temporary objects, reducing GC pressure; objects may be garbage collected between uses, so pools are best for short-lived, frequently-allocated objects.

var bufPool = sync.Pool{ New: func() any { return make([]byte, 4096) }, } func processRequest(data []byte) { buf := bufPool.Get().([]byte) defer bufPool.Put(buf) // Use buf for temporary work... copy(buf, data) // Process... } // ┌─────────────────────────────────────┐ // │ sync.Pool Behavior │ // │ • Per-P (processor) local pools │ // │ • Objects may be GC'd anytime │ // │ • Best for short-lived allocs │ // │ • Not for connection pooling! │ // └─────────────────────────────────────┘

Object Pooling

For expensive objects (buffers, connections, compiled regexes), implement custom pools with explicit lifecycle management when sync.Pool's GC-friendly semantics don't fit.

type ConnPool struct { conns chan *Connection max int } func NewConnPool(max int, factory func() *Connection) *ConnPool { p := &ConnPool{conns: make(chan *Connection, max), max: max} for i := 0; i < max; i++ { p.conns <- factory() } return p } func (p *ConnPool) Get() *Connection { return <-p.conns // Blocks if empty } func (p *ConnPool) Put(c *Connection) { select { case p.conns <- c: // Return to pool default: // Pool full, discard c.Close() } }

Memory Management

Stack vs Heap Allocation

Stack allocation is fast (pointer bump) and automatically freed; heap allocation involves the allocator and GC. Go's compiler decides placement via escape analysis—prefer values over pointers when possible.

┌────────────────────────────────────────────────┐ │ STACK (fast) HEAP (slower) │ ├────────────────────────────────────────────────┤ │ • Fixed size per goroutine (2KB-1GB) │ │ • Auto cleanup on function return │ │ • No GC overhead │ │ │ │ func foo() { func bar() *int { │ │ x := 42 ← STACK x := 42 ← HEAP │ │ use(x) return &x │ │ } } │ └────────────────────────────────────────────────┘

Escape Analysis

Escape analysis is the compiler's process of determining if a variable's lifetime exceeds its scope; if it does, the variable "escapes" to the heap. Use -gcflags="-m" to see decisions.

// View escape decisions: // go build -gcflags="-m" . func stackOnly() { data := [1000]int{} // Stays on stack process(data[:]) } func escapesToHeap() *[]int { data := make([]int, 1000) // Escapes! return &data } // Common escape triggers: // 1. Returning pointers to local variables // 2. Storing in package-level variable // 3. Sending to channel // 4. Storing in interface value // 5. Closure capturing by reference

GC Tuning (GOGC)

GOGC controls GC frequency as a percentage of heap growth before next GC; default is 100 (double heap triggers GC). Lower values reduce memory, higher values reduce CPU overhead.

# Default: GC when heap doubles GOGC=100 ./myapp # Less memory, more CPU (GC at 50% growth) GOGC=50 ./myapp # More memory, less CPU (GC at 200% growth) GOGC=200 ./myapp # Disable GC (dangerous!) GOGC=off ./myapp
import "runtime/debug" // Set programmatically debug.SetGCPercent(50) // Go 1.19+: Memory limit (soft) debug.SetMemoryLimit(1 << 30) // 1GB

GC Trace

Enable GC tracing with GODEBUG=gctrace=1 to see detailed GC activity including pause times, heap sizes, and CPU utilization—essential for diagnosing GC-related performance issues.

GODEBUG=gctrace=1 ./myapp # Output format: # gc 1 @0.012s 2%: 0.026+0.42+0.005 ms clock, 0.21+0.35/0.42/0+0.041 ms cpu, 4->4->0 MB, 5 MB goal, 8 P # gc 1 - GC cycle number # @0.012s - Time since start # 2% - CPU used by GC # 0.026+... - STW + concurrent + STW times # 4->4->0 MB - Heap before→after→live # 5 MB goal - Target heap size # 8 P - Number of processors

Memory Ballast

Memory ballast is a pre-allocated but unused chunk of heap that delays GC triggering; it reduces GC frequency in services with low live heap but high allocation rate—less relevant since Go 1.19's SetMemoryLimit.

func main() { // Old trick: allocate 10GB ballast ballast := make([]byte, 10<<30) _ = ballast // Keep alive // With GOGC=100, GC triggers at ~20GB // instead of when actual heap doubles // Go 1.19+ preferred approach: debug.SetMemoryLimit(10 << 30) }

Memory Profiling

Memory profiling captures allocation counts and sizes at sampled callsites; use go tool pprof to analyze heap profiles and identify allocation hotspots.

import "runtime/pprof" func captureHeapProfile() { f, _ := os.Create("heap.prof") defer f.Close() runtime.GC() // Get accurate live objects pprof.WriteHeapProfile(f) }
# Analyze go tool pprof heap.prof (pprof) top # Top allocators (pprof) alloc_space # Total bytes allocated (pprof) inuse_space # Currently held bytes # Live profiling go tool pprof http://localhost:6060/debug/pprof/heap

Memory Leaks Detection

Go memory leaks are typically goroutine leaks (goroutines blocked forever holding references); profile goroutines and heap over time to detect growth patterns.

// Common leak patterns: // 1. Blocked goroutine go func() { <-ch // If ch never receives, goroutine leaks }() // 2. Forgotten timer ticker := time.NewTicker(time.Second) // Missing: defer ticker.Stop() // Detection: func debugGoroutines(w http.ResponseWriter, r *http.Request) { fmt.Fprintf(w, "Goroutines: %d\n", runtime.NumGoroutine()) pprof.Lookup("goroutine").WriteTo(w, 1) } // Tool: goleak in tests func TestNoLeak(t *testing.T) { defer goleak.VerifyNone(t) // test code... }

Finalizers (runtime.SetFinalizer)

Finalizers run when an object is about to be garbage collected; use sparingly for releasing external resources—they add GC overhead and have no timing guarantees.

type Resource struct { handle int } func NewResource() *Resource { r := &Resource{handle: openExternal()} runtime.SetFinalizer(r, func(r *Resource) { closeExternal(r.handle) // Cleanup fmt.Println("Resource finalized") }) return r } // ⚠️ Finalizer caveats: // • No guarantee when (or if!) it runs // • Adds GC overhead // • Object survives one extra GC cycle // • Prefer explicit Close() methods

Weak References (not native)

Go doesn't have native weak references; use sync.Map with periodic cleanup, or wait for the upcoming runtime/weak package (proposed). Custom solutions risk fighting the GC.

// Workaround: manual cache with TTL type WeakCache struct { mu sync.RWMutex items map[string]cacheEntry } type cacheEntry struct { value any expiresAt time.Time } func (c *WeakCache) cleanup() { c.mu.Lock() defer c.mu.Unlock() now := time.Now() for k, v := range c.items { if now.After(v.expiresAt) { delete(c.items, k) } } } // Note: Go 1.24 will add runtime/weak package!

Assembly in Go

Plan 9 Assembly

Go uses Plan 9 assembly syntax, which differs from Intel/AT&T conventions; it's platform-agnostic with pseudo-registers and Go-specific conventions for interoperability with the Go runtime.

┌────────────────────────────────────────────────┐ │ Plan 9 Pseudo-Registers │ ├────────────────────────────────────────────────┤ │ FP - Frame Pointer (args) first_arg+0(FP) │ │ SP - Stack Pointer (locals) local_var-8(SP) │ │ SB - Static Base (globals) symbol(SB) │ │ PC - Program Counter │ └────────────────────────────────────────────────┘ // Example: add.s TEXT ·Add(SB), NOSPLIT, $0-24 MOVQ a+0(FP), AX // Load first arg MOVQ b+8(FP), BX // Load second arg ADDQ BX, AX // Add MOVQ AX, ret+16(FP) // Store return RET

Function Calls in Assembly

Assembly functions follow Go's calling convention; arguments and return values are passed via the stack (referenced through FP), and functions must match their Go declarations exactly.

// add.go package math func Add(a, b int64) int64
// add_amd64.s #include "textflag.h" // func Add(a, b int64) int64 TEXT ·Add(SB), NOSPLIT, $0-24 // Args at FP offsets: a+0, b+8, ret+16 MOVQ a+0(FP), AX MOVQ b+8(FP), BX ADDQ BX, AX MOVQ AX, ret+16(FP) RET

Go Assembly Syntax

Go assembly uses specific syntax for sizes (B=1, W=2, L=4, Q=8 bytes), memory references, and directives; understanding TEXT, DATA, GLOBL directives is essential.

#include "textflag.h" // TEXT: define function // ·Name = package.Name (· is middle dot) // (SB) = relative to static base // NOSPLIT = no stack split check // $0-24 = 0 local bytes, 24 arg bytes TEXT ·Swap(SB), NOSPLIT, $0-16 MOVQ x+0(FP), AX // Q = quadword (8 bytes) MOVQ y+8(FP), BX MOVQ BX, x+0(FP) MOVQ AX, y+8(FP) RET // DATA: define data DATA msg<>+0(SB)/8, $"Hello, W" DATA msg<>+8(SB)/5, $"orld" GLOBL msg<>(SB), RODATA, $13

When to Use Assembly

Use assembly only when profiling proves it's necessary: SIMD operations, crypto primitives, or hot paths where the compiler generates suboptimal code; 99% of Go code never needs assembly.

┌─────────────────────────────────────────────┐
│ When to Consider Assembly:                  │
├─────────────────────────────────────────────┤
│ ✅ CPU-specific optimizations (SIMD/AVX)   │
│ ✅ Crypto (AES-NI, SHA extensions)          │
│ ✅ Hot inner loops (after profiling!)       │
│ ✅ Accessing CPU features (CPUID, etc)      │
├─────────────────────────────────────────────┤
│ ❌ Premature optimization                   │
│ ❌ Simple operations (compiler is good!)    │
│ ❌ Portability is important                 │
│ ❌ Maintainability matters                  │
└─────────────────────────────────────────────┘

Assembly File Naming

Assembly files must follow naming conventions: name_GOOS_GOARCH.s for OS/arch-specific, name_GOARCH.s for arch-only, or name.s for generic; the build system selects appropriate files automatically.

project/ ├── add.go # Go declaration ├── add_amd64.s # AMD64 implementation ├── add_arm64.s # ARM64 implementation └── add_generic.go # Fallback (with build tag) // add_generic.go //go:build !amd64 && !arm64 package math func Add(a, b int64) int64 { return a + b }

CGO

CGO Basics

CGO enables calling C code from Go and vice versa; import the pseudo-package "C" with preceding C code in a comment block, but use it sparingly due to complexity and performance costs.

package main /* #include <stdio.h> #include <stdlib.h> void hello(const char* name) { printf("Hello, %s!\n", name); } */ import "C" import "unsafe" func main() { name := C.CString("Gopher") defer C.free(unsafe.Pointer(name)) C.hello(name) }

C Code in Go

C code can be written directly in the comment block before import "C" or in separate .c files in the same package; the Go build system compiles and links C code automatically.

/* #include <math.h> // Define C function inline double circleArea(double radius) { return M_PI * radius * radius; } */ import "C" import "fmt" func main() { area := C.circleArea(5.0) fmt.Printf("Area: %f\n", area) }

Go Code in C

Export Go functions to C using //export directive; these can be called from C code in the same package or from external C code linking against the Go library.

package main import "C" import "fmt" //export GoCallback func GoCallback(x C.int) C.int { fmt.Printf("Go received: %d\n", int(x)) return x * 2 } /* #include <stdio.h> extern int GoCallback(int); void callGo() { int result = GoCallback(21); printf("C received: %d\n", result); } */ import "C" func main() { C.callGo() }

Calling C Functions

C functions are accessed through the C pseudo-package; pass Go values converted to C types, and always free memory allocated by C when you're done.

/* #include <stdlib.h> #include <string.h> char* duplicate(const char* s) { return strdup(s); } */ import "C" import "unsafe" func main() { input := C.CString("hello") defer C.free(unsafe.Pointer(input)) output := C.duplicate(input) defer C.free(unsafe.Pointer(output)) goString := C.GoString(output) println(goString) }

C Types in Go

CGO automatically maps C types to Go types; use C.int, C.double, etc., with explicit conversions between Go and C types. Complex types require more care.

/* #include <stdint.h> typedef struct { int32_t x; int32_t y; } Point; */ import "C" func main() { // Numeric types var i C.int = 42 var d C.double = 3.14 goInt := int(i) // Struct p := C.Point{x: 10, y: 20} // String conversion cs := C.CString("hello") // Go→C (must free!) gs := C.GoString(cs) // C→Go // Bytes data := []byte{1, 2, 3} cdata := (*C.char)(unsafe.Pointer(&data[0])) }

Memory Management with CGO

Go's GC doesn't track C memory; you must manually free C allocations, and passing Go pointers to C requires following strict rules to prevent GC from moving or collecting them.

/* #include <stdlib.h> */ import "C" import "unsafe" func main() { // C allocated - YOU must free cstr := C.CString("hello") defer C.free(unsafe.Pointer(cstr)) cmem := C.malloc(100) defer C.free(cmem) // Go allocated - GC manages goSlice := make([]byte, 100) // Pass to C: only if C won't store it! C.use((*C.char)(unsafe.Pointer(&goSlice[0]))) } // ⚠️ Rules for passing Go pointers to C: // 1. Go pointer can only point to Go memory // 2. C cannot keep Go pointer after call returns // 3. Go memory cannot contain Go pointers passed to C

Performance Implications

CGO calls have significant overhead (~150ns vs ~1ns for Go calls) due to stack switching, scheduler coordination, and Go→C ABI translation; batch operations when possible.

┌────────────────────────────────────────────────┐ │ CGO Call Overhead │ ├────────────────────────────────────────────────┤ │ Pure Go call: ~1-2 ns │ │ CGO call: ~100-200 ns (50-100x slower!) │ ├────────────────────────────────────────────────┤ │ Why so slow: │ │ • Save/restore Go scheduler state │ │ • Switch to system stack │ │ • Coordinate with GC (write barriers) │ │ • ABI translation │ ├────────────────────────────────────────────────┤ │ Mitigation: │ │ • Batch multiple operations into one CGO call │ │ • Do work in Go when possible │ │ • Consider pure Go alternatives │ └────────────────────────────────────────────────┘

Cross-compilation with CGO

Cross-compiling with CGO requires a C cross-compiler for the target platform; this is complex and often requires Docker or target-native builds. Consider CGO-free alternatives.

# Without CGO: easy cross-compilation CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build # With CGO: need cross-compiler CGO_ENABLED=1 \ CC=aarch64-linux-gnu-gcc \ GOOS=linux GOARCH=arm64 \ go build # Easier: build on target or use Docker docker run --rm -v "$PWD":/app -w /app \ golang:1.21 go build # Check CGO status go env CGO_ENABLED

CGO_ENABLED Flag

CGO_ENABLED controls whether CGO is available; 0 produces static binaries without C dependencies, 1 enables CGO. Some stdlib packages (net, os/user) have CGO implementations.

# Disable CGO (static, portable) CGO_ENABLED=0 go build -o app-static # Enable CGO (default on many systems) CGO_ENABLED=1 go build -o app-dynamic # Force pure Go stdlib implementations CGO_ENABLED=0 go build -tags netgo,osusergo # Check if binary uses CGO ldd ./app-static # "not a dynamic executable" ldd ./app-dynamic # lists libc, etc.

Build Constraints

Use build constraints to provide CGO and non-CGO implementations; this allows your package to work with or without CGO available.

// impl_cgo.go //go:build cgo package mypackage /* #include <somelib.h> */ import "C" func DoWork() { C.someFunction() }
// impl_nocgo.go //go:build !cgo package mypackage func DoWork() { // Pure Go implementation }

Build Process

go build Flags

go build accepts numerous flags to control compilation, output, and optimization; key flags include -o (output), -v (verbose), -race (detector), -trimpath (reproducible builds).

# Common flags go build -o myapp # Output name go build -v # Verbose go build -race # Race detector go build -trimpath # Remove file paths go build -mod=readonly # Don't update go.mod go build -buildvcs=false # No VCS info # Combine go build -v -race -o myapp ./cmd/server # Build all packages go build ./... # Build specific OS/arch GOOS=linux GOARCH=amd64 go build

-ldflags

-ldflags passes flags to the linker; commonly used to inject version info at build time and strip debug symbols for smaller binaries.

# Inject version at build time go build -ldflags "-X main.Version=1.2.3 \ -X 'main.BuildTime=$(date)' \ -X main.Commit=$(git rev-parse HEAD)" # Strip debug info (smaller binary) go build -ldflags "-s -w" # Combined go build -ldflags "-s -w -X main.Version=1.0.0"
package main var ( Version = "dev" BuildTime = "unknown" Commit = "none" ) func main() { fmt.Printf("Version: %s, Built: %s\n", Version, BuildTime) }

-gcflags

-gcflags passes flags to the Go compiler; useful for debugging (disable optimization), analysis (escape analysis), and seeing compiler decisions.

# Escape analysis go build -gcflags="-m" # Basic go build -gcflags="-m=2" # Detailed # Disable optimizations (for debugging) go build -gcflags="-N -l" # -N: disable optimizations # -l: disable inlining # Pass to specific packages go build -gcflags="main=-m" ./... # See assembly go build -gcflags="-S" 2>&1 | head -100

-tags (Build Tags)

Build tags allow conditional compilation; files or code blocks are included only when specified tags match. Use -tags to specify which tags to include.

go build -tags="prod,metrics" go build -tags="integration"
// +build tag syntax (old, still works) // logging_debug.go // +build debug // go:build syntax (Go 1.17+, preferred) // logging_prod.go //go:build prod package logging // Combined with boolean logic //go:build (linux && amd64) || darwin //go:build !windows //go:build cgo && sqlite

Build Constraints (//go:build)

Build constraints control when files are included based on OS, arch, compiler, tags, and Go version; the //go:build line must be first (before package) with a blank line after.

//go:build linux && amd64 package main // File only included for linux/amd64
//go:build ignore // File always excluded (useful for examples, generators)
Valid constraints:
┌────────────────────────────────────────────┐
│ GOOS:   linux, darwin, windows, etc.       │
│ GOARCH: amd64, arm64, wasm, etc.          │
│ Compiler: gc, gccgo                        │
│ Version: go1.21, go1.22                    │
│ cgo: cgo or !cgo                           │
│ Custom: -tags=mytag                        │
│ Boolean: && (and), || (or), ! (not)        │
└────────────────────────────────────────────┘

Cross-compilation

Go supports cross-compilation out of the box—set GOOS and GOARCH to target any supported platform. CGO_ENABLED=0 for pure Go ensures this works without C toolchains.

# Common targets GOOS=linux GOARCH=amd64 go build -o app-linux-amd64 GOOS=darwin GOARCH=arm64 go build -o app-macos-arm64 GOOS=windows GOARCH=amd64 go build -o app.exe # List all supported combinations go tool dist list # WebAssembly GOOS=js GOARCH=wasm go build -o app.wasm # Build script for all platforms #!/bin/bash for os in linux darwin windows; do for arch in amd64 arm64; do GOOS=$os GOARCH=$arch go build -o "app-$os-$arch" done done

GOOS and GOARCH

GOOS specifies the target operating system, GOARCH the architecture; Go supports many combinations, and you can check valid pairs with go tool dist list.

# Current settings go env GOOS GOARCH # Common GOOS values: # linux, darwin (macOS), windows, freebsd, # android, ios, js (WebAssembly) # Common GOARCH values: # amd64 (x86-64), arm64 (Apple M1, AWS Graviton) # arm, 386, wasm, riscv64 # Practical examples: GOOS=linux GOARCH=arm64 # AWS Graviton GOOS=darwin GOARCH=arm64 # Apple Silicon GOOS=linux GOARCH=arm # Raspberry Pi GOOS=js GOARCH=wasm # Browser

Reducing Binary Size

Combine multiple techniques to reduce Go binary size: strip symbols, disable DWARF, use UPX compression, and avoid unnecessary dependencies.

# Baseline go build -o app # ~10MB # Strip symbol table and DWARF go build -ldflags="-s -w" -o app # ~7MB # Use trimpath (also helps) go build -ldflags="-s -w" -trimpath -o app # UPX compression (after build) upx --best app # ~2-3MB # Check what's in binary go tool nm app | head go tool objdump -s main app # Analyze size by package go build -o app go tool nm -size app | sort -n

Stripping Debug Info

Use -ldflags="-s -w" to strip symbol table (-s) and DWARF debug info (-w); this significantly reduces binary size but removes stack traces' file/line info.

# With debug info (default) go build -o app-debug ls -lh app-debug # 12M # Stripped go build -ldflags="-s -w" -o app-stripped ls -lh app-stripped # 8M # Trade-offs: # ✅ Smaller binary (20-30% reduction) # ✅ Faster load time # ❌ Panic stack traces less detailed # ❌ Harder to debug with delve # ❌ No line numbers in profiles

UPX Compression

UPX (Ultimate Packer for eXecutables) compresses binaries with decompression at startup; useful for size-constrained deployments but adds startup latency.

# Install UPX apt install upx # Linux brew install upx # macOS # Compress (after go build) go build -ldflags="-s -w" -o app upx --best app # or for max compression: upx --ultra-brute app # Before: 8.0MB # After: 2.5MB # ⚠️ Caveats: # • Startup time increases (~50-100ms) # • Some antivirus flag UPX binaries # • Memory usage slightly higher # • Can't be used with -race or profiling

go generate

go generate runs commands specified in source files; commonly used for code generation (stringer, mockgen, protobuf), embedding, or build-time processing.

//go:generate stringer -type=Status //go:generate mockgen -source=interface.go -destination=mock.go //go:generate protoc --go_out=. schema.proto package main type Status int const ( Pending Status = iota Running Complete )
# Run all generate directives go generate ./... # Run for specific package go generate ./models # Common generators: # • stringer - String() for constants # • mockgen - Mock implementations # • protoc - Protocol buffers # • go-bindata - Embed files (legacy) # • sqlc - Type-safe SQL

Code Generation

Code generation creates Go source files programmatically; use text/template, go/ast, or dedicated tools to generate repetitive, type-specific, or derived code at build time.

// gen.go //go:build ignore package main import ( "os" "text/template" ) var tmpl = `// Code generated; DO NOT EDIT. package {{.Package}} {{range .Types}} func ({{.Name}}) Is{{.Name}}() {} {{end}} ` func main() { t := template.Must(template.New("").Parse(tmpl)) f, _ := os.Create("generated.go") t.Execute(f, struct { Package string Types []struct{ Name string } }{ Package: "mypackage", Types: []struct{ Name string }{{Name: "Foo"}, {Name: "Bar"}}, }) }
//go:generate go run gen.go

Embedded Files (embed Package)

The embed package (Go 1.16+) embeds files into the binary at compile time; use //go:embed directives to include static assets, templates, or configuration.

package main import ( "embed" "fmt" "net/http" ) //go:embed version.txt var version string //go:embed templates/*.html var templates embed.FS //go:embed static/* var static embed.FS func main() { fmt.Println("Version:", version) // Serve embedded files http.Handle("/static/", http.FileServer(http.FS(static))) // Read embedded file data, _ := templates.ReadFile("templates/index.html") fmt.Println(string(data)) }
project/
├── main.go
├── version.txt
├── templates/
│   └── index.html
└── static/
    ├── style.css
    └── app.js

Compiler Directives

//go:noinline

//go:noinline prevents the compiler from inlining a function; useful for benchmarking (prevent optimization) or when inlining would cause issues with stack inspection.

//go:noinline func mustNotInline(x int) int { return x * 2 } // Use cases: // • Accurate benchmarking (prevent dead code elimination) // • Debugging (preserve function boundaries) // • Stack trace requirements // • Preventing code bloat for large hot functions

//go:inline (Hint)

Go doesn't have //go:inline—inlining is automatic based on complexity heuristics. You can encourage inlining by keeping functions small and simple; use -gcflags="-m" to verify.

// The compiler inlines automatically based on: // • Function size (small = likely inline) // • No loops (complex control flow = no inline) // • No defer (usually prevents inline) // • Not recursive // This WILL likely inline (simple, small) func add(a, b int) int { return a + b } // This WON'T inline (has loop) func sum(s []int) int { t := 0 for _, v := range s { t += v } return t } // Check: go build -gcflags="-m" // output: "can inline add"

//go:noescape

//go:noescape tells the compiler that a function's pointer arguments don't escape; used for assembly functions or special cases where escape analysis can't determine safety.

// Typically used for assembly functions //go:noescape func asmFunction(p *byte) // The directive promises: // • Pointers passed to this function won't be stored // • Pointers won't be returned // • Allows stack allocation of arguments // ⚠️ Dangerous if misused! // Only use when you KNOW the implementation doesn't escape

//go:linkname

//go:linkname links a local variable/function to a symbol in another package; bypasses Go's visibility rules. Extremely dangerous, used in stdlib internals.

package main import ( _ "unsafe" // Required for linkname ) // Link to runtime's internal function //go:linkname nanotime runtime.nanotime func nanotime() int64 func main() { println(nanotime()) // Access internal function } // ⚠️ DANGEROUS: // • Bypasses API stability guarantees // • Can break on Go version updates // • Not for production code // • Requires unsafe import

//go:generate

//go:generate specifies commands to run with go generate; any text after the directive is executed as a shell command with the package directory as working directory.

//go:generate stringer -type=Pill //go:generate mockgen -source=service.go -destination=mock_service.go //go:generate go run scripts/gen.go package pharmacy type Pill int const ( Placebo Pill = iota Aspirin Ibuprofen )
# Run generators go generate ./... # Special variables in generate commands: # $GOFILE - Current file # $GOLINE - Line number # $GOPACKAGE - Package name # $DOLLAR - Literal $

//go:embed

//go:embed embeds files/directories into the binary; the directive must precede a variable of type string, []byte, or embed.FS. Patterns support glob syntax.

import _ "embed" //go:embed config.json var config []byte //go:embed version.txt var version string // Without newline //go:embed templates/* var templates embed.FS //go:embed assets/images/*.png assets/fonts/*.ttf var assets embed.FS // Pattern rules: // • No .. or absolute paths // • Patterns like *.txt, dir/*, dir/** // • Hidden files (_*) excluded by default // • Use all:dir/* to include hidden files

//go:build

//go:build specifies build constraints for the file using boolean expressions; it replaced the older // +build syntax and must be the first line (before package clause).

//go:build linux && amd64 package main // Only compiled for linux/amd64
//go:build !windows // Compiled for all OS except Windows
//go:build (linux || darwin) && cgo // Compiled for linux or macOS with CGO enabled
//go:build go1.21 // Only compiled with Go 1.21 or later
//go:build integration // Only with: go test -tags=integration // Valid operators: && (and), || (or), ! (not), () // Terms: GOOS, GOARCH, compiler, cgo, go version, custom tags