Go: Pointer and Value Semantics

Go has two different data semantics, pointer and value. What does this even mean? Since Go is a managed language it is kind of important to understand the difference between pointer and value semantics. In this blog post, we’ll discuss the difference between the two and poke around with some compiler commands to help us determine when variables are allocated on the heap.

Stack vs. Heap

The tale of two different memory regions, the Stack and the Heap. If you’ve been programming for sometime, you’ve probably heard of both the Stack and the Heap. The basic idea is during the boot up of your program, the program contains a pre-allocated region of memory known as the Stack and when you need more memory you can request blocks of memory from the operating system dynamically, which is often abstracted away as allocators in C this would be malloc, as blocks on the Heap. The Stack block of memory is used for local variables within functions as a temporary storage mechanism, often you’ll hear that the Stack grows from top down referring to the address range of starting at a higher address and growing to a lower address. On the other hand, you’ll hear that the Heap grows from the bottom up, meaning the address range starts lower and gets higher.

In Go and in C, the address-of operator & and the dereference operator * are provided. Obviously, there are some differences with particular, but in general the & operator will give us the address of the variable and the * operator will get the value of a pointer variable and it can be used to denote that a variable is a pointer.

The behavior of the Stack and Heap growth is system architecture and management specific. A simple example of Stack growth in C and Go would look like the following:

#include <stdio.h>

void stackGrowth(int level, void* prevAddr) {
    int x;
    void* currAddr = (void*)&x;
    printf("Level %d: Address of x = %p, Difference = %+ld\n",
            level, currAddr, (char*)currAddr - (char*)prevAddr);
    if (level < 5) {
        stackGrowth(level + 1, currAddr);
    }
}

int main() {
    printf("Stack growth demonstration:\n");
    int x;
    stackGrowth(1, (void*)&x);
    return 0;
}

package main

import (
	"fmt"
	"unsafe"
)

func stackGrowth(level int, prevAddr uintptr) {
	var x int
	currAddr := uintptr(unsafe.Pointer(&x))
	fmt.Printf("Level %d: Address of x = %p, Difference = %+d\n",
            level, unsafe.Pointer(&x), currAddr-prevAddr)
	if level < 5 {
		stackGrowth(level+1, currAddr)
	}
}

func main() {
	fmt.Println("Stack growth demonstration:")
	var x int
	stackGrowth(1, uintptr(unsafe.Pointer(&x)))
}

In order to demonstrate Heap growth, I’ll just provide an example in C, keep in mind depending on the architecture and operating system the behavior of these program could differ.

#include <stdio.h>
#include <stdlib.h>

int main() {
    printf("Heap growth demonstration:\n");
    int *ptr1 = (int *)malloc(sizeof(int));
    int *ptr2 = (int *)malloc(sizeof(int));
    int *ptr3 = (int *)malloc(sizeof(int));
    int *ptr4 = (int *)malloc(sizeof(int));
    int *ptr5 = (int *)malloc(sizeof(int));
    printf("Address of ptr1: %p\n", (void *)ptr1);
    printf("Address of ptr2: %p, Difference = %+ld bytes\n",
            (void *)ptr2, (char *)ptr2 - (char *)ptr1);
    printf("Address of ptr3: %p, Difference = %+ld bytes\n",
            (void *)ptr3, (char *)ptr3 - (char *)ptr2);
    printf("Address of ptr4: %p, Difference = %+ld bytes\n",
            (void *)ptr4, (char *)ptr4 - (char *)ptr3);
    printf("Address of ptr5: %p, Difference = %+ld bytes\n",
            (void *)ptr5, (char *)ptr5 - (char *)ptr4);
    free(ptr1);
    free(ptr2);
    free(ptr3);
    free(ptr4);
    free(ptr5);
    return 0;
}

To avoid getting too deep in to the these topics, we’ll just leave it at this high level overview.

Understanding Allocations

A goroutine is a lightweight thread managed by the Go runtime. In computer science terminology a goroutine is considered a green thread, and in Go each goroutine will have it’s own stack that is allocated and managed by the Go runtime. During compilation of a Go program, the compiler will ultimately determine whether or not a variable needs to be placed on the Stack or the Heap. Typically, this is done by a process called escape analysis, which is a method for determining the dynamic scope of pointers.

The below example has a declared variable n and a function func square(x int) int, this function just returns a value. When a function is called, a stack frame is created which uses a chunk of memory from the Stack to allocate local variables of the function. Since we’re just returning a value, once the function returns the stack frame can be placed back in to the pool of memory of the Stack for reuse. Next, the println function is called, which would subsequently reuse blocks of memory from the Stack that were previously used by the function square effectively overwrite the Stack memory; therefore, no variables have escaped to the Heap.

package main

func main() {
	n := 4
	n2 := square(n)
	println(n2)
}

func square(x int) int {
	return x * x
}

Similarly, the next example shows that sharing down pointers should typically result in the same behavior as the previous example. The main function contains a variable n which address is passed to the inc function, the pointer is dereferenced and incremented.

package main

func main() {
	n := 4
	inc(&n)
	println(n)
}

func inc(x *int) {
	*x++
}

Now this is all well and good, but how can we prove that this is actually what is happening? The Go compiler allows us to pass in garbage collector build flags, such as go build -gcflags '-m -m' main.go. This can give us some insight into what variables might be escaping to the Heap.

❯ go build -gcflags '-m -m' main.go
# command-line-arguments
./main.go:9:6: can inline inc with cost 4 as: func(*int) { *x++ }
./main.go:3:6: can inline main with cost 15 as: func() { n := 4; inc(&n); println(n) }
./main.go:5:5: inlining call to inc
./main.go:9:10: x does not escape

Now that we’ve got some output from the compiler that gives us some hints about whether or not a variable escapes to the Heap. In that output, we saw a term inline, stating that a particular function can be inline. What does it mean to inline a function? Well, it’s pretty simple actually, it’s a compiler optimization that can be done that replaces a function call with the exact body of the called function to avoid the overhead of setting up a function call. If you’d like to tell the compiler to explicitly not inline a function, Go has a special comment called a pragma that can be added above the function //go:noinline that tells the compiler to not perform the inlining optimization on that particular function.

Now, what happens when we have a function that returns a pointer? Again, it’s better to specifically check your use case since the compiler is ultimately the thing that will determine whether or not a particular variable is allocated on the Stack or the Heap. In generally, it’s probably safe to assume that if you declare a pointer value inside of a function that is returned from that function that it will most likely be escaped to the Heap. In other words, sharing up will typically escape to the Heap.

package main

func main() {
	n := answer()
	println(*n / 2)
}

func answer() *int {
	x := 42
	return &x
}

❯ go build -gcflags '-m -m' main.go
# command-line-arguments
./main.go:8:6: can inline answer with cost 8 as: func() *int { x := 42; return &x }
./main.go:3:6: can inline main with cost 19 as: func() { n := answer(); println(*n / 2) }
./main.go:4:13: inlining call to answer
./main.go:9:2: x escapes to heap:
./main.go:9:2:   flow: ~r0 = &x:
./main.go:9:2:     from &x (address-of) at ./main.go:10:9
./main.go:9:2:     from return &x (return) at ./main.go:10:2
./main.go:9:2: moved to heap: x

Another example of returning a value from a function versus return a pointer, showing that the pointer returned is escaped to the Heap.

package main

func main() {
	var a int
	a = 10

	var b int
	b = 15

	_ = AddReturnValue(a, b)

	_ = AddReturnPointer(a, b)
}

//go:noinline
func AddReturnValue(x int, y int) int {
	value := x + y
	return value
}

//go:noinline
func AddReturnPointer(x int, y int) *int {
	pointer := x + y
	return &pointer
}

❯ go build -gcflags '-m -m' main.go
# command-line-arguments
./main.go:16:6: cannot inline AddReturnValue: marked go:noinline
./main.go:22:6: cannot inline AddReturnPointer: marked go:noinline
./main.go:3:6: cannot inline main: function too complex: cost 140 exceeds budget 80
./main.go:23:2: pointer escapes to heap:
./main.go:23:2:   flow: ~r0 = &pointer:
./main.go:23:2:     from &pointer (address-of) at ./main.go:24:9
./main.go:23:2:     from return &pointer (return) at ./main.go:24:2
./main.go:23:2: moved to heap: pointer

If a variable is wrapped as an interface, the escape analysis sometimes cannot prove that it’s safe for the variable to be on the Stack.

package main

import "fmt"

func main() {
	var i interface{}
	x := 10
	i = x
	fmt.Println(i)
}

❯ go build -gcflags '-m -m' main.go
# command-line-arguments
./main.go:5:6: cannot inline main: function too complex: cost 90 exceeds budget 80
./main.go:9:13: inlining call to fmt.Println
./main.go:8:6: x escapes to heap:
./main.go:8:6:   flow: i = &{storage for x}:
./main.go:8:6:     from x (spill) at ./main.go:8:6
./main.go:8:6:     from i = x (assign) at ./main.go:8:4
./main.go:8:6:   flow: {storage for ... argument} = i:
./main.go:8:6:     from ... argument (slice-literal-element) at ./main.go:9:13
./main.go:8:6:   flow: fmt.a = &{storage for ... argument}:
./main.go:8:6:     from ... argument (spill) at ./main.go:9:13
./main.go:8:6:     from fmt.a := ... argument (assign-pair) at ./main.go:9:13
./main.go:8:6:   flow: {heap} = *fmt.a:
./main.go:8:6:     from fmt.Fprintln(os.Stdout, fmt.a...) (call parameter) at ./main.go:9:13
./main.go:8:6: x escapes to heap
./main.go:9:13: ... argument does not escape

Go Interfaces for References

If you take a look at the Go interface for io.Reader, it’s a great example of a function that shares down a pointer instead of sharing up.

type Reader interface {
	Read(p []byte) (n int, err error)
}