Advanced GoLang: Generics, Optimization, CGO & Runtime Internals
The deep end of the pool. A comprehensive guide to Go's most advanced features: from writing generic data structures and dissecting memory allocation strategies to cross-compilation and reducing binary sizes.
Generics (Go 1.18+)
Type Parameters
Type parameters allow functions and types to work with any type specified at compile time, declared in square brackets before the function parameters, enabling code reuse without sacrificing type safety.
func Print[T any](value T) { fmt.Println(value) } Print[int](42) // Explicit type Print("hello") // Type inferred
Type Constraints
Type constraints restrict what types can be used as type arguments, defined as interfaces that specify the required methods or types, ensuring the generic code can only use operations available on the constrained types.
type Number interface { int | int64 | float64 } func Sum[T Number](a, b T) T { return a + b // Works because all Number types support + }
Interface Constraints
Interfaces can now include both method signatures AND type lists, combining behavioral requirements with underlying type restrictions to create powerful constraints for generic functions.
type Stringer interface { ~string // Underlying type constraint String() string // Method constraint }
Type Sets
Every interface now defines a type set—the set of all types that implement it; for method-only interfaces, this is infinite, but type unions create finite sets, fundamentally changing how Go thinks about interfaces.
┌─────────────────────────────────────┐
│ interface{ int | string } │
│ Type Set = { int, string } │
├─────────────────────────────────────┤
│ interface{ Read([]byte) } │
│ Type Set = { all types with Read } │
└─────────────────────────────────────┘
Type Inference
Go's compiler automatically infers type arguments from function arguments in most cases, reducing verbosity while maintaining full type safety—explicit type arguments are only needed when inference is ambiguous.
func Map[T, U any](s []T, f func(T) U) []U { ... } // Type inference in action: result := Map([]int{1,2,3}, func(x int) string { return strconv.Itoa(x) }) // T=int, U=string inferred automatically
Generic Functions
Generic functions accept type parameters, enabling algorithms that work across types; the type parameter is resolved at compile time, generating specialized code with zero runtime overhead.
func Filter[T any](slice []T, predicate func(T) bool) []T { result := make([]T, 0) for _, v := range slice { if predicate(v) { result = append(result, v) } } return result }
Generic Types
Structs, interfaces, and other type definitions can have type parameters, enabling type-safe containers and data structures without interface{}/any boxing overhead.
type Stack[T any] struct { items []T } func (s *Stack[T]) Push(item T) { s.items = append(s.items, item) } func (s *Stack[T]) Pop() T { item := s.items[len(s.items)-1] s.items = s.items[:len(s.items)-1] return item }
Generic Methods Limitations
Methods cannot have their own type parameters beyond those declared on the receiver type—this is a deliberate design decision to avoid complexity in method sets and interface satisfaction.
type Container[T any] struct{ value T } // ✅ Valid - uses type's parameter func (c Container[T]) Get() T { return c.value } // ❌ Invalid - methods can't add new type params // func (c Container[T]) Convert[U any]() U { } // ✅ Workaround: use a function func Convert[T, U any](c Container[T], f func(T) U) U { return f(c.value) }
comparable Constraint
The built-in comparable constraint includes all types that support == and != operators, essential for map keys and equality-based algorithms; it's predeclared and cannot be redefined.
func Contains[T comparable](slice []T, target T) bool { for _, v := range slice { if v == target { // Requires comparable return true } } return false } // Works: Contains([]int{1,2,3}, 2) // Fails: Contains([][]int{...}, target) // slices not comparable
any Constraint
any is an alias for interface{} introduced in Go 1.18, representing the empty type set constraint that accepts all types; use it when you need maximum flexibility with no operation requirements.
// These are identical: func Process[T any](v T) { } func Process[T interface{}](v T) { } // any is also useful outside generics: var data any = 42 data = "now a string" // Replaces interface{}
Ordered Types
Ordered types support comparison operators (<, <=, >, >=) and are essential for sorting, min/max, and binary search algorithms; defined in the experimental constraints package.
type Ordered interface { ~int | ~int8 | ~int16 | ~int32 | ~int64 | ~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 | ~float32 | ~float64 | ~string } func Min[T Ordered](a, b T) T { if a < b { return a } return b }
constraints Package
The golang.org/x/exp/constraints package provides common constraint definitions like Ordered, Signed, Unsigned, Integer, and Float—currently experimental but widely used pending stdlib inclusion.
import "golang.org/x/exp/constraints" func Abs[T constraints.Signed](x T) T { if x < 0 { return -x } return x } func Sum[T constraints.Integer | constraints.Float](nums ...T) T { var sum T for _, n := range nums { sum += n } return sum }
Generic Data Structures
Generics enable type-safe, reusable data structures without runtime type assertions; common implementations include linked lists, trees, heaps, and graphs that work with any type.
type Node[T any] struct { Value T Next *Node[T] } type LinkedList[T any] struct { Head *Node[T] Len int } func (l *LinkedList[T]) Append(value T) { node := &Node[T]{Value: value} if l.Head == nil { l.Head = node } else { curr := l.Head for curr.Next != nil { curr = curr.Next } curr.Next = node } l.Len++ }
Generic Algorithms
Generic algorithms implement common patterns (map, filter, reduce, sort) once and reuse across types, eliminating code duplication while maintaining type safety and performance.
func Map[T, U any](s []T, f func(T) U) []U { r := make([]U, len(s)) for i, v := range s { r[i] = f(v) } return r } func Reduce[T, U any](s []T, init U, f func(U, T) U) U { acc := init for _, v := range s { acc = f(acc, v) } return acc } // Usage: sum := Reduce([]int{1,2,3}, 0, func(a, b int) int { return a+b })
Workspaces (Go 1.18+)
go.work File
The go.work file defines a workspace containing multiple modules, allowing you to work on interdependent modules simultaneously without publishing or using replace directives in go.mod.
// go.work go 1.21 use ( ./api ./common ./services/auth ) replace example.com/old => ./legacy
Multi-module Workspaces
Workspaces solve the multi-module development pain point where you need to modify several related modules together; changes in one module are immediately visible to others without publishing.
┌─────────────────────────────────────┐ │ my-workspace/ │ ├─────────────────────────────────────┤ │ go.work │ │ ├── api/ │ │ │ ├── go.mod (module api) │ │ │ └── api.go │ │ ├── common/ │ │ │ ├── go.mod (module common) │ │ │ └── utils.go │ │ └── cmd/ │ │ ├── go.mod (requires api,common)│ │ └── main.go │ └─────────────────────────────────────┘
go work init
go work init creates a new go.work file in the current directory, optionally adding specified modules; this is the starting point for setting up a multi-module workspace.
# Create empty workspace go work init # Create workspace with modules go work init ./moduleA ./moduleB # Result: go.work file created # go 1.21 # use ( # ./moduleA # ./moduleB # )
go work use
go work use adds modules to an existing workspace, updating the go.work file; it can add individual modules or recursively discover all modules in a directory tree.
# Add single module go work use ./newmodule # Add all modules recursively go work use -r . # After running, go.work is updated: # use ( # ./existing # ./newmodule # )
go work sync
go work sync synchronizes the workspace's dependency requirements back to each module's go.mod file, ensuring consistency when modules have interdependencies.
go work sync # What it does: # 1. Computes minimal version requirements # 2. Updates each module's go.mod # 3. Ensures go.sum files are consistent # # ┌─────────┐ ┌─────────┐ ┌─────────┐ # │ go.work │───▶│ sync │───▶│ go.mod │ # └─────────┘ └─────────┘ │ go.mod │ # │ go.mod │ # └─────────┘
Workspace Benefits
Workspaces enable seamless multi-module development, eliminate replace directive hacks, improve IDE support across modules, and make monorepo-style development in Go practical and clean.
Benefits: ┌────────────────────────────────────────────┐ │ ✅ No more replace directives in go.mod │ │ ✅ Changes visible immediately across mods│ │ ✅ Single go.work, not per-module hacks │ │ ✅ IDE understands cross-module refs │ │ ✅ go.work not committed (local dev only) │ │ ✅ CI uses published modules as normal │ └────────────────────────────────────────────┘
Performance Optimization
Profiling (CPU, memory, block, mutex)
Go provides built-in profiling for CPU time, memory allocations, goroutine blocking, and mutex contention; these profiles identify hotspots and guide optimization efforts with real data.
import _ "net/http/pprof" func main() { go func() { log.Println(http.ListenAndServe(":6060", nil)) }() // Your app... } // Profile types: // /debug/pprof/profile - CPU (30s default) // /debug/pprof/heap - Memory allocations // /debug/pprof/block - Goroutine blocking // /debug/pprof/mutex - Mutex contention // /debug/pprof/goroutine - Goroutine stacks
pprof Package
The runtime/pprof package provides programmatic control over profiling, allowing you to start/stop profiles, write them to files, and integrate profiling into test suites.
import "runtime/pprof" func main() { // CPU profile f, _ := os.Create("cpu.prof") pprof.StartCPUProfile(f) defer pprof.StopCPUProfile() // Run workload... // Heap profile (snapshot) h, _ := os.Create("heap.prof") pprof.WriteHeapProfile(h) }
go tool pprof
go tool pprof analyzes profile data interactively or generates visualizations; it can read from files, URLs, or compare profiles to measure optimization impact.
# Interactive analysis go tool pprof cpu.prof (pprof) top10 # Top 10 functions (pprof) list funcName # Source-annotated view (pprof) web # Open graph in browser # One-liner visualizations go tool pprof -http=:8080 cpu.prof # Web UI go tool pprof -png cpu.prof > cpu.png # Profile running server go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
Runtime Profiling
Runtime profiling captures data from production systems with minimal overhead; use runtime.SetBlockProfileRate, runtime.SetMutexProfileFraction to control sampling rates.
import "runtime" func init() { // Enable block profiling (1 = all events) runtime.SetBlockProfileRate(1) // Enable mutex profiling (fraction of events) runtime.SetMutexProfileFraction(5) // Control memory profiling rate runtime.MemProfileRate = 512 * 1024 // Sample every 512KB }
Continuous Profiling
Continuous profiling collects low-overhead profiles in production over time, enabling historical analysis of performance trends; tools like Google Cloud Profiler, Pyroscope, or Parca integrate with Go apps.
// Example: Google Cloud Profiler import "cloud.google.com/go/profiler" func main() { cfg := profiler.Config{ Service: "my-service", ServiceVersion: "1.0.0", ProjectID: "my-project", } if err := profiler.Start(cfg); err != nil { log.Fatal(err) } // Profiles automatically uploaded to Cloud Console }
Benchmark-driven Optimization
Write benchmarks first, then optimize; Go's testing package provides reliable measurement with automatic iteration count adjustment and memory allocation statistics.
func BenchmarkConcat(b *testing.B) { b.Run("Plus", func(b *testing.B) { for i := 0; i < b.N; i++ { _ = "hello" + "world" } }) b.Run("Builder", func(b *testing.B) { for i := 0; i < b.N; i++ { var sb strings.Builder sb.WriteString("hello") sb.WriteString("world") _ = sb.String() } }) } // Run: go test -bench=. -benchmem
Escape Analysis
Escape analysis determines whether variables can live on the stack (fast) or must escape to the heap (slower, requires GC); understanding it helps write allocation-efficient code.
go build -gcflags="-m" main.go # Output examples: # ./main.go:10: x escapes to heap ← BAD: heap alloc # ./main.go:15: y does not escape ← GOOD: stack alloc # Common escape causes: # - Returning pointer to local var # - Storing in interface{} # - Sending pointer to channel # - Captured by closure
func NoEscape() int { x := 42 // Stack allocated return x // Copied, x doesn't escape } func Escapes() *int { x := 42 // Heap allocated! return &x // Pointer escapes function }
Inlining
Inlining replaces function calls with the function body, eliminating call overhead and enabling further optimizations; the compiler inlines small, simple functions automatically.
go build -gcflags="-m" main.go # ./main.go:5: can inline add # ./main.go:10: inlining call to add # Control inlining: //go:noinline func mustNotInline() { } # Check inlining budget: go build -gcflags="-m=2" main.go
// Likely inlined (simple, small) func add(a, b int) int { return a + b } // Won't inline (too complex, has loop) func sum(nums []int) int { total := 0 for _, n := range nums { total += n } return total }
Bounds Check Elimination
The compiler eliminates redundant array/slice bounds checks when it can prove access is safe; explicit length checks before loops help the compiler optimize.
// BCE not possible - checked each iteration func slow(s []int) int { sum := 0 for i := 0; i < 100; i++ { sum += s[i] // Bounds check every time } return sum } // BCE applied - compiler knows it's safe func fast(s []int) int { if len(s) < 100 { return 0 } s = s[:100] // Hint to compiler sum := 0 for i := 0; i < 100; i++ { sum += s[i] // No bounds check! } return sum }
Reducing Allocations
Minimize heap allocations by reusing objects, using value types, avoiding interface conversions, and preallocating slices/maps; fewer allocations mean less GC pressure.
// ❌ Allocates each call func process() *Result { return &Result{data: make([]byte, 1024)} } // ✅ Reuse with sync.Pool var resultPool = sync.Pool{ New: func() any { return &Result{data: make([]byte, 1024)} }, } func processPooled() *Result { r := resultPool.Get().(*Result) // Use r... return r } func done(r *Result) { r.Reset() resultPool.Put(r) }
String Concatenation Optimization
For multiple string concatenations, use strings.Builder instead of +; it minimizes allocations by growing a single buffer, dramatically improving performance for loops.
// ❌ O(n²) - each + creates new string func slowConcat(parts []string) string { result := "" for _, p := range parts { result += p // Allocates new string each time! } return result } // ✅ O(n) - single buffer, grows efficiently func fastConcat(parts []string) string { var sb strings.Builder sb.Grow(1024) // Pre-size if known for _, p := range parts { sb.WriteString(p) } return sb.String() }
Slice Pre-allocation
Pre-allocate slices to their expected capacity using make([]T, 0, cap) to avoid repeated reallocations and copies during append operations.
// ❌ Multiple reallocations func slow(n int) []int { var result []int for i := 0; i < n; i++ { result = append(result, i) // Grows: 0→1→2→4→8→16... } return result } // ✅ Single allocation func fast(n int) []int { result := make([]int, 0, n) // Pre-allocate capacity for i := 0; i < n; i++ { result = append(result, i) // No reallocation } return result }
Map Pre-allocation
Pre-allocate maps with expected size using make(map[K]V, size) to reduce rehashing; this is especially important for large maps built in loops.
// ❌ Multiple rehashes as map grows func slow(keys []string) map[string]int { m := make(map[string]int) // Default small size for i, k := range keys { m[k] = i // Triggers rehash when growing } return m } // ✅ Single allocation, no rehashing func fast(keys []string) map[string]int { m := make(map[string]int, len(keys)) // Pre-size for i, k := range keys { m[k] = i } return m }
sync.Pool Usage
sync.Pool provides per-CPU caches for reusing temporary objects, reducing GC pressure; objects may be garbage collected between uses, so pools are best for short-lived, frequently-allocated objects.
var bufPool = sync.Pool{ New: func() any { return make([]byte, 4096) }, } func processRequest(data []byte) { buf := bufPool.Get().([]byte) defer bufPool.Put(buf) // Use buf for temporary work... copy(buf, data) // Process... } // ┌─────────────────────────────────────┐ // │ sync.Pool Behavior │ // │ • Per-P (processor) local pools │ // │ • Objects may be GC'd anytime │ // │ • Best for short-lived allocs │ // │ • Not for connection pooling! │ // └─────────────────────────────────────┘
Object Pooling
For expensive objects (buffers, connections, compiled regexes), implement custom pools with explicit lifecycle management when sync.Pool's GC-friendly semantics don't fit.
type ConnPool struct { conns chan *Connection max int } func NewConnPool(max int, factory func() *Connection) *ConnPool { p := &ConnPool{conns: make(chan *Connection, max), max: max} for i := 0; i < max; i++ { p.conns <- factory() } return p } func (p *ConnPool) Get() *Connection { return <-p.conns // Blocks if empty } func (p *ConnPool) Put(c *Connection) { select { case p.conns <- c: // Return to pool default: // Pool full, discard c.Close() } }
Memory Management
Stack vs Heap Allocation
Stack allocation is fast (pointer bump) and automatically freed; heap allocation involves the allocator and GC. Go's compiler decides placement via escape analysis—prefer values over pointers when possible.
┌────────────────────────────────────────────────┐ │ STACK (fast) HEAP (slower) │ ├────────────────────────────────────────────────┤ │ • Fixed size per goroutine (2KB-1GB) │ │ • Auto cleanup on function return │ │ • No GC overhead │ │ │ │ func foo() { func bar() *int { │ │ x := 42 ← STACK x := 42 ← HEAP │ │ use(x) return &x │ │ } } │ └────────────────────────────────────────────────┘
Escape Analysis
Escape analysis is the compiler's process of determining if a variable's lifetime exceeds its scope; if it does, the variable "escapes" to the heap. Use -gcflags="-m" to see decisions.
// View escape decisions: // go build -gcflags="-m" . func stackOnly() { data := [1000]int{} // Stays on stack process(data[:]) } func escapesToHeap() *[]int { data := make([]int, 1000) // Escapes! return &data } // Common escape triggers: // 1. Returning pointers to local variables // 2. Storing in package-level variable // 3. Sending to channel // 4. Storing in interface value // 5. Closure capturing by reference
GC Tuning (GOGC)
GOGC controls GC frequency as a percentage of heap growth before next GC; default is 100 (double heap triggers GC). Lower values reduce memory, higher values reduce CPU overhead.
# Default: GC when heap doubles GOGC=100 ./myapp # Less memory, more CPU (GC at 50% growth) GOGC=50 ./myapp # More memory, less CPU (GC at 200% growth) GOGC=200 ./myapp # Disable GC (dangerous!) GOGC=off ./myapp
import "runtime/debug" // Set programmatically debug.SetGCPercent(50) // Go 1.19+: Memory limit (soft) debug.SetMemoryLimit(1 << 30) // 1GB
GC Trace
Enable GC tracing with GODEBUG=gctrace=1 to see detailed GC activity including pause times, heap sizes, and CPU utilization—essential for diagnosing GC-related performance issues.
GODEBUG=gctrace=1 ./myapp # Output format: # gc 1 @0.012s 2%: 0.026+0.42+0.005 ms clock, 0.21+0.35/0.42/0+0.041 ms cpu, 4->4->0 MB, 5 MB goal, 8 P # gc 1 - GC cycle number # @0.012s - Time since start # 2% - CPU used by GC # 0.026+... - STW + concurrent + STW times # 4->4->0 MB - Heap before→after→live # 5 MB goal - Target heap size # 8 P - Number of processors
Memory Ballast
Memory ballast is a pre-allocated but unused chunk of heap that delays GC triggering; it reduces GC frequency in services with low live heap but high allocation rate—less relevant since Go 1.19's SetMemoryLimit.
func main() { // Old trick: allocate 10GB ballast ballast := make([]byte, 10<<30) _ = ballast // Keep alive // With GOGC=100, GC triggers at ~20GB // instead of when actual heap doubles // Go 1.19+ preferred approach: debug.SetMemoryLimit(10 << 30) }
Memory Profiling
Memory profiling captures allocation counts and sizes at sampled callsites; use go tool pprof to analyze heap profiles and identify allocation hotspots.
import "runtime/pprof" func captureHeapProfile() { f, _ := os.Create("heap.prof") defer f.Close() runtime.GC() // Get accurate live objects pprof.WriteHeapProfile(f) }
# Analyze go tool pprof heap.prof (pprof) top # Top allocators (pprof) alloc_space # Total bytes allocated (pprof) inuse_space # Currently held bytes # Live profiling go tool pprof http://localhost:6060/debug/pprof/heap
Memory Leaks Detection
Go memory leaks are typically goroutine leaks (goroutines blocked forever holding references); profile goroutines and heap over time to detect growth patterns.
// Common leak patterns: // 1. Blocked goroutine go func() { <-ch // If ch never receives, goroutine leaks }() // 2. Forgotten timer ticker := time.NewTicker(time.Second) // Missing: defer ticker.Stop() // Detection: func debugGoroutines(w http.ResponseWriter, r *http.Request) { fmt.Fprintf(w, "Goroutines: %d\n", runtime.NumGoroutine()) pprof.Lookup("goroutine").WriteTo(w, 1) } // Tool: goleak in tests func TestNoLeak(t *testing.T) { defer goleak.VerifyNone(t) // test code... }
Finalizers (runtime.SetFinalizer)
Finalizers run when an object is about to be garbage collected; use sparingly for releasing external resources—they add GC overhead and have no timing guarantees.
type Resource struct { handle int } func NewResource() *Resource { r := &Resource{handle: openExternal()} runtime.SetFinalizer(r, func(r *Resource) { closeExternal(r.handle) // Cleanup fmt.Println("Resource finalized") }) return r } // ⚠️ Finalizer caveats: // • No guarantee when (or if!) it runs // • Adds GC overhead // • Object survives one extra GC cycle // • Prefer explicit Close() methods
Weak References (not native)
Go doesn't have native weak references; use sync.Map with periodic cleanup, or wait for the upcoming runtime/weak package (proposed). Custom solutions risk fighting the GC.
// Workaround: manual cache with TTL type WeakCache struct { mu sync.RWMutex items map[string]cacheEntry } type cacheEntry struct { value any expiresAt time.Time } func (c *WeakCache) cleanup() { c.mu.Lock() defer c.mu.Unlock() now := time.Now() for k, v := range c.items { if now.After(v.expiresAt) { delete(c.items, k) } } } // Note: Go 1.24 will add runtime/weak package!
Assembly in Go
Plan 9 Assembly
Go uses Plan 9 assembly syntax, which differs from Intel/AT&T conventions; it's platform-agnostic with pseudo-registers and Go-specific conventions for interoperability with the Go runtime.
┌────────────────────────────────────────────────┐ │ Plan 9 Pseudo-Registers │ ├────────────────────────────────────────────────┤ │ FP - Frame Pointer (args) first_arg+0(FP) │ │ SP - Stack Pointer (locals) local_var-8(SP) │ │ SB - Static Base (globals) symbol(SB) │ │ PC - Program Counter │ └────────────────────────────────────────────────┘ // Example: add.s TEXT ·Add(SB), NOSPLIT, $0-24 MOVQ a+0(FP), AX // Load first arg MOVQ b+8(FP), BX // Load second arg ADDQ BX, AX // Add MOVQ AX, ret+16(FP) // Store return RET
Function Calls in Assembly
Assembly functions follow Go's calling convention; arguments and return values are passed via the stack (referenced through FP), and functions must match their Go declarations exactly.
// add.go package math func Add(a, b int64) int64
// add_amd64.s #include "textflag.h" // func Add(a, b int64) int64 TEXT ·Add(SB), NOSPLIT, $0-24 // Args at FP offsets: a+0, b+8, ret+16 MOVQ a+0(FP), AX MOVQ b+8(FP), BX ADDQ BX, AX MOVQ AX, ret+16(FP) RET
Go Assembly Syntax
Go assembly uses specific syntax for sizes (B=1, W=2, L=4, Q=8 bytes), memory references, and directives; understanding TEXT, DATA, GLOBL directives is essential.
#include "textflag.h" // TEXT: define function // ·Name = package.Name (· is middle dot) // (SB) = relative to static base // NOSPLIT = no stack split check // $0-24 = 0 local bytes, 24 arg bytes TEXT ·Swap(SB), NOSPLIT, $0-16 MOVQ x+0(FP), AX // Q = quadword (8 bytes) MOVQ y+8(FP), BX MOVQ BX, x+0(FP) MOVQ AX, y+8(FP) RET // DATA: define data DATA msg<>+0(SB)/8, $"Hello, W" DATA msg<>+8(SB)/5, $"orld" GLOBL msg<>(SB), RODATA, $13
When to Use Assembly
Use assembly only when profiling proves it's necessary: SIMD operations, crypto primitives, or hot paths where the compiler generates suboptimal code; 99% of Go code never needs assembly.
┌─────────────────────────────────────────────┐
│ When to Consider Assembly: │
├─────────────────────────────────────────────┤
│ ✅ CPU-specific optimizations (SIMD/AVX) │
│ ✅ Crypto (AES-NI, SHA extensions) │
│ ✅ Hot inner loops (after profiling!) │
│ ✅ Accessing CPU features (CPUID, etc) │
├─────────────────────────────────────────────┤
│ ❌ Premature optimization │
│ ❌ Simple operations (compiler is good!) │
│ ❌ Portability is important │
│ ❌ Maintainability matters │
└─────────────────────────────────────────────┘
Assembly File Naming
Assembly files must follow naming conventions: name_GOOS_GOARCH.s for OS/arch-specific, name_GOARCH.s for arch-only, or name.s for generic; the build system selects appropriate files automatically.
project/ ├── add.go # Go declaration ├── add_amd64.s # AMD64 implementation ├── add_arm64.s # ARM64 implementation └── add_generic.go # Fallback (with build tag) // add_generic.go //go:build !amd64 && !arm64 package math func Add(a, b int64) int64 { return a + b }
CGO
CGO Basics
CGO enables calling C code from Go and vice versa; import the pseudo-package "C" with preceding C code in a comment block, but use it sparingly due to complexity and performance costs.
package main /* #include <stdio.h> #include <stdlib.h> void hello(const char* name) { printf("Hello, %s!\n", name); } */ import "C" import "unsafe" func main() { name := C.CString("Gopher") defer C.free(unsafe.Pointer(name)) C.hello(name) }
C Code in Go
C code can be written directly in the comment block before import "C" or in separate .c files in the same package; the Go build system compiles and links C code automatically.
/* #include <math.h> // Define C function inline double circleArea(double radius) { return M_PI * radius * radius; } */ import "C" import "fmt" func main() { area := C.circleArea(5.0) fmt.Printf("Area: %f\n", area) }
Go Code in C
Export Go functions to C using //export directive; these can be called from C code in the same package or from external C code linking against the Go library.
package main import "C" import "fmt" //export GoCallback func GoCallback(x C.int) C.int { fmt.Printf("Go received: %d\n", int(x)) return x * 2 } /* #include <stdio.h> extern int GoCallback(int); void callGo() { int result = GoCallback(21); printf("C received: %d\n", result); } */ import "C" func main() { C.callGo() }
Calling C Functions
C functions are accessed through the C pseudo-package; pass Go values converted to C types, and always free memory allocated by C when you're done.
/* #include <stdlib.h> #include <string.h> char* duplicate(const char* s) { return strdup(s); } */ import "C" import "unsafe" func main() { input := C.CString("hello") defer C.free(unsafe.Pointer(input)) output := C.duplicate(input) defer C.free(unsafe.Pointer(output)) goString := C.GoString(output) println(goString) }
C Types in Go
CGO automatically maps C types to Go types; use C.int, C.double, etc., with explicit conversions between Go and C types. Complex types require more care.
/* #include <stdint.h> typedef struct { int32_t x; int32_t y; } Point; */ import "C" func main() { // Numeric types var i C.int = 42 var d C.double = 3.14 goInt := int(i) // Struct p := C.Point{x: 10, y: 20} // String conversion cs := C.CString("hello") // Go→C (must free!) gs := C.GoString(cs) // C→Go // Bytes data := []byte{1, 2, 3} cdata := (*C.char)(unsafe.Pointer(&data[0])) }
Memory Management with CGO
Go's GC doesn't track C memory; you must manually free C allocations, and passing Go pointers to C requires following strict rules to prevent GC from moving or collecting them.
/* #include <stdlib.h> */ import "C" import "unsafe" func main() { // C allocated - YOU must free cstr := C.CString("hello") defer C.free(unsafe.Pointer(cstr)) cmem := C.malloc(100) defer C.free(cmem) // Go allocated - GC manages goSlice := make([]byte, 100) // Pass to C: only if C won't store it! C.use((*C.char)(unsafe.Pointer(&goSlice[0]))) } // ⚠️ Rules for passing Go pointers to C: // 1. Go pointer can only point to Go memory // 2. C cannot keep Go pointer after call returns // 3. Go memory cannot contain Go pointers passed to C
Performance Implications
CGO calls have significant overhead (~150ns vs ~1ns for Go calls) due to stack switching, scheduler coordination, and Go→C ABI translation; batch operations when possible.
┌────────────────────────────────────────────────┐ │ CGO Call Overhead │ ├────────────────────────────────────────────────┤ │ Pure Go call: ~1-2 ns │ │ CGO call: ~100-200 ns (50-100x slower!) │ ├────────────────────────────────────────────────┤ │ Why so slow: │ │ • Save/restore Go scheduler state │ │ • Switch to system stack │ │ • Coordinate with GC (write barriers) │ │ • ABI translation │ ├────────────────────────────────────────────────┤ │ Mitigation: │ │ • Batch multiple operations into one CGO call │ │ • Do work in Go when possible │ │ • Consider pure Go alternatives │ └────────────────────────────────────────────────┘
Cross-compilation with CGO
Cross-compiling with CGO requires a C cross-compiler for the target platform; this is complex and often requires Docker or target-native builds. Consider CGO-free alternatives.
# Without CGO: easy cross-compilation CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build # With CGO: need cross-compiler CGO_ENABLED=1 \ CC=aarch64-linux-gnu-gcc \ GOOS=linux GOARCH=arm64 \ go build # Easier: build on target or use Docker docker run --rm -v "$PWD":/app -w /app \ golang:1.21 go build # Check CGO status go env CGO_ENABLED
CGO_ENABLED Flag
CGO_ENABLED controls whether CGO is available; 0 produces static binaries without C dependencies, 1 enables CGO. Some stdlib packages (net, os/user) have CGO implementations.
# Disable CGO (static, portable) CGO_ENABLED=0 go build -o app-static # Enable CGO (default on many systems) CGO_ENABLED=1 go build -o app-dynamic # Force pure Go stdlib implementations CGO_ENABLED=0 go build -tags netgo,osusergo # Check if binary uses CGO ldd ./app-static # "not a dynamic executable" ldd ./app-dynamic # lists libc, etc.
Build Constraints
Use build constraints to provide CGO and non-CGO implementations; this allows your package to work with or without CGO available.
// impl_cgo.go //go:build cgo package mypackage /* #include <somelib.h> */ import "C" func DoWork() { C.someFunction() }
// impl_nocgo.go //go:build !cgo package mypackage func DoWork() { // Pure Go implementation }
Build Process
go build Flags
go build accepts numerous flags to control compilation, output, and optimization; key flags include -o (output), -v (verbose), -race (detector), -trimpath (reproducible builds).
# Common flags go build -o myapp # Output name go build -v # Verbose go build -race # Race detector go build -trimpath # Remove file paths go build -mod=readonly # Don't update go.mod go build -buildvcs=false # No VCS info # Combine go build -v -race -o myapp ./cmd/server # Build all packages go build ./... # Build specific OS/arch GOOS=linux GOARCH=amd64 go build
-ldflags
-ldflags passes flags to the linker; commonly used to inject version info at build time and strip debug symbols for smaller binaries.
# Inject version at build time go build -ldflags "-X main.Version=1.2.3 \ -X 'main.BuildTime=$(date)' \ -X main.Commit=$(git rev-parse HEAD)" # Strip debug info (smaller binary) go build -ldflags "-s -w" # Combined go build -ldflags "-s -w -X main.Version=1.0.0"
package main var ( Version = "dev" BuildTime = "unknown" Commit = "none" ) func main() { fmt.Printf("Version: %s, Built: %s\n", Version, BuildTime) }
-gcflags
-gcflags passes flags to the Go compiler; useful for debugging (disable optimization), analysis (escape analysis), and seeing compiler decisions.
# Escape analysis go build -gcflags="-m" # Basic go build -gcflags="-m=2" # Detailed # Disable optimizations (for debugging) go build -gcflags="-N -l" # -N: disable optimizations # -l: disable inlining # Pass to specific packages go build -gcflags="main=-m" ./... # See assembly go build -gcflags="-S" 2>&1 | head -100
-tags (Build Tags)
Build tags allow conditional compilation; files or code blocks are included only when specified tags match. Use -tags to specify which tags to include.
go build -tags="prod,metrics" go build -tags="integration"
// +build tag syntax (old, still works) // logging_debug.go // +build debug // go:build syntax (Go 1.17+, preferred) // logging_prod.go //go:build prod package logging // Combined with boolean logic //go:build (linux && amd64) || darwin //go:build !windows //go:build cgo && sqlite
Build Constraints (//go:build)
Build constraints control when files are included based on OS, arch, compiler, tags, and Go version; the //go:build line must be first (before package) with a blank line after.
//go:build linux && amd64 package main // File only included for linux/amd64
//go:build ignore // File always excluded (useful for examples, generators)
Valid constraints:
┌────────────────────────────────────────────┐
│ GOOS: linux, darwin, windows, etc. │
│ GOARCH: amd64, arm64, wasm, etc. │
│ Compiler: gc, gccgo │
│ Version: go1.21, go1.22 │
│ cgo: cgo or !cgo │
│ Custom: -tags=mytag │
│ Boolean: && (and), || (or), ! (not) │
└────────────────────────────────────────────┘
Cross-compilation
Go supports cross-compilation out of the box—set GOOS and GOARCH to target any supported platform. CGO_ENABLED=0 for pure Go ensures this works without C toolchains.
# Common targets GOOS=linux GOARCH=amd64 go build -o app-linux-amd64 GOOS=darwin GOARCH=arm64 go build -o app-macos-arm64 GOOS=windows GOARCH=amd64 go build -o app.exe # List all supported combinations go tool dist list # WebAssembly GOOS=js GOARCH=wasm go build -o app.wasm # Build script for all platforms #!/bin/bash for os in linux darwin windows; do for arch in amd64 arm64; do GOOS=$os GOARCH=$arch go build -o "app-$os-$arch" done done
GOOS and GOARCH
GOOS specifies the target operating system, GOARCH the architecture; Go supports many combinations, and you can check valid pairs with go tool dist list.
# Current settings go env GOOS GOARCH # Common GOOS values: # linux, darwin (macOS), windows, freebsd, # android, ios, js (WebAssembly) # Common GOARCH values: # amd64 (x86-64), arm64 (Apple M1, AWS Graviton) # arm, 386, wasm, riscv64 # Practical examples: GOOS=linux GOARCH=arm64 # AWS Graviton GOOS=darwin GOARCH=arm64 # Apple Silicon GOOS=linux GOARCH=arm # Raspberry Pi GOOS=js GOARCH=wasm # Browser
Reducing Binary Size
Combine multiple techniques to reduce Go binary size: strip symbols, disable DWARF, use UPX compression, and avoid unnecessary dependencies.
# Baseline go build -o app # ~10MB # Strip symbol table and DWARF go build -ldflags="-s -w" -o app # ~7MB # Use trimpath (also helps) go build -ldflags="-s -w" -trimpath -o app # UPX compression (after build) upx --best app # ~2-3MB # Check what's in binary go tool nm app | head go tool objdump -s main app # Analyze size by package go build -o app go tool nm -size app | sort -n
Stripping Debug Info
Use -ldflags="-s -w" to strip symbol table (-s) and DWARF debug info (-w); this significantly reduces binary size but removes stack traces' file/line info.
# With debug info (default) go build -o app-debug ls -lh app-debug # 12M # Stripped go build -ldflags="-s -w" -o app-stripped ls -lh app-stripped # 8M # Trade-offs: # ✅ Smaller binary (20-30% reduction) # ✅ Faster load time # ❌ Panic stack traces less detailed # ❌ Harder to debug with delve # ❌ No line numbers in profiles
UPX Compression
UPX (Ultimate Packer for eXecutables) compresses binaries with decompression at startup; useful for size-constrained deployments but adds startup latency.
# Install UPX apt install upx # Linux brew install upx # macOS # Compress (after go build) go build -ldflags="-s -w" -o app upx --best app # or for max compression: upx --ultra-brute app # Before: 8.0MB # After: 2.5MB # ⚠️ Caveats: # • Startup time increases (~50-100ms) # • Some antivirus flag UPX binaries # • Memory usage slightly higher # • Can't be used with -race or profiling
go generate
go generate runs commands specified in source files; commonly used for code generation (stringer, mockgen, protobuf), embedding, or build-time processing.
//go:generate stringer -type=Status //go:generate mockgen -source=interface.go -destination=mock.go //go:generate protoc --go_out=. schema.proto package main type Status int const ( Pending Status = iota Running Complete )
# Run all generate directives go generate ./... # Run for specific package go generate ./models # Common generators: # • stringer - String() for constants # • mockgen - Mock implementations # • protoc - Protocol buffers # • go-bindata - Embed files (legacy) # • sqlc - Type-safe SQL
Code Generation
Code generation creates Go source files programmatically; use text/template, go/ast, or dedicated tools to generate repetitive, type-specific, or derived code at build time.
// gen.go //go:build ignore package main import ( "os" "text/template" ) var tmpl = `// Code generated; DO NOT EDIT. package {{.Package}} {{range .Types}} func ({{.Name}}) Is{{.Name}}() {} {{end}} ` func main() { t := template.Must(template.New("").Parse(tmpl)) f, _ := os.Create("generated.go") t.Execute(f, struct { Package string Types []struct{ Name string } }{ Package: "mypackage", Types: []struct{ Name string }{{Name: "Foo"}, {Name: "Bar"}}, }) }
//go:generate go run gen.go
Embedded Files (embed Package)
The embed package (Go 1.16+) embeds files into the binary at compile time; use //go:embed directives to include static assets, templates, or configuration.
package main import ( "embed" "fmt" "net/http" ) //go:embed version.txt var version string //go:embed templates/*.html var templates embed.FS //go:embed static/* var static embed.FS func main() { fmt.Println("Version:", version) // Serve embedded files http.Handle("/static/", http.FileServer(http.FS(static))) // Read embedded file data, _ := templates.ReadFile("templates/index.html") fmt.Println(string(data)) }
project/
├── main.go
├── version.txt
├── templates/
│ └── index.html
└── static/
├── style.css
└── app.js
Compiler Directives
//go:noinline
//go:noinline prevents the compiler from inlining a function; useful for benchmarking (prevent optimization) or when inlining would cause issues with stack inspection.
//go:noinline func mustNotInline(x int) int { return x * 2 } // Use cases: // • Accurate benchmarking (prevent dead code elimination) // • Debugging (preserve function boundaries) // • Stack trace requirements // • Preventing code bloat for large hot functions
//go:inline (Hint)
Go doesn't have //go:inline—inlining is automatic based on complexity heuristics. You can encourage inlining by keeping functions small and simple; use -gcflags="-m" to verify.
// The compiler inlines automatically based on: // • Function size (small = likely inline) // • No loops (complex control flow = no inline) // • No defer (usually prevents inline) // • Not recursive // This WILL likely inline (simple, small) func add(a, b int) int { return a + b } // This WON'T inline (has loop) func sum(s []int) int { t := 0 for _, v := range s { t += v } return t } // Check: go build -gcflags="-m" // output: "can inline add"
//go:noescape
//go:noescape tells the compiler that a function's pointer arguments don't escape; used for assembly functions or special cases where escape analysis can't determine safety.
// Typically used for assembly functions //go:noescape func asmFunction(p *byte) // The directive promises: // • Pointers passed to this function won't be stored // • Pointers won't be returned // • Allows stack allocation of arguments // ⚠️ Dangerous if misused! // Only use when you KNOW the implementation doesn't escape
//go:linkname
//go:linkname links a local variable/function to a symbol in another package; bypasses Go's visibility rules. Extremely dangerous, used in stdlib internals.
package main import ( _ "unsafe" // Required for linkname ) // Link to runtime's internal function //go:linkname nanotime runtime.nanotime func nanotime() int64 func main() { println(nanotime()) // Access internal function } // ⚠️ DANGEROUS: // • Bypasses API stability guarantees // • Can break on Go version updates // • Not for production code // • Requires unsafe import
//go:generate
//go:generate specifies commands to run with go generate; any text after the directive is executed as a shell command with the package directory as working directory.
//go:generate stringer -type=Pill //go:generate mockgen -source=service.go -destination=mock_service.go //go:generate go run scripts/gen.go package pharmacy type Pill int const ( Placebo Pill = iota Aspirin Ibuprofen )
# Run generators go generate ./... # Special variables in generate commands: # $GOFILE - Current file # $GOLINE - Line number # $GOPACKAGE - Package name # $DOLLAR - Literal $
//go:embed
//go:embed embeds files/directories into the binary; the directive must precede a variable of type string, []byte, or embed.FS. Patterns support glob syntax.
import _ "embed" //go:embed config.json var config []byte //go:embed version.txt var version string // Without newline //go:embed templates/* var templates embed.FS //go:embed assets/images/*.png assets/fonts/*.ttf var assets embed.FS // Pattern rules: // • No .. or absolute paths // • Patterns like *.txt, dir/*, dir/** // • Hidden files (_*) excluded by default // • Use all:dir/* to include hidden files
//go:build
//go:build specifies build constraints for the file using boolean expressions; it replaced the older // +build syntax and must be the first line (before package clause).
//go:build linux && amd64 package main // Only compiled for linux/amd64
//go:build !windows // Compiled for all OS except Windows
//go:build (linux || darwin) && cgo // Compiled for linux or macOS with CGO enabled
//go:build go1.21 // Only compiled with Go 1.21 or later
//go:build integration // Only with: go test -tags=integration // Valid operators: && (and), || (or), ! (not), () // Terms: GOOS, GOARCH, compiler, cgo, go version, custom tags