Go: Unsafe

As with every language, there are those odd parts and gotcha. If you’re a web developer using Go for back-end services there is a high probability that you’ve managed to avoid a majority of these odd parts of the language. One of these odd parts, which I don’t have a ton of experience using, is the unsafe package. This package gives us the ability to side step Go’s type system and do more C like operations. I know, I know, somehow in these post I always manage to find a way to mention C. Either way, I think providing some examples in C might provide us some better insight in to why this package exist and how to use it. Enough of this intro, let’s get started.

C Pointers

Nowadays, most of our PCs have 64-bit CPU’s in them; however, it is important to mention that was not always the case. Why bother even mentioning this? Whoa, slow down there skipper, what does it mean when we have a 32-bit vs 64-bit CPU? Well, one major thing is addressable memory space and the default word size of the CPU. The term word just refers to the natural size that a processor uses, typically related to the register sizes. Normally, 32-bit CPUs have 32-bit or 4 byte word sizes and 64-bit have 64-bit or 8 byte word sizes.

Well, okay Nick, I’m still not really sure why we should care about this?

Alright, let me slap ya with a bit of C enlightenment and Go knowledge. When you create a variable, in this example we’ll just say we are talking about C, as the type int what is the size of that variable? Well, instead of just talking about it, let’s just write a C program that will tell me. In C, we have a built-in sizeof that will tell us the byte size of a particular type.

#include <stdio.h>

int main()
{
    int a;
    printf("%lu\n", sizeof(a));
    return 0;
}

./a.out
4

The CPUs that I’m using on my machines are all x64 Intel chips. We wrote an example C program that creates a integer variable as an int, and it looks like the default size for an integer variable is 4 bytes on my hardware. Alright, what about the size of an int * (integer pointer)?

#include <stdio.h>

int main()
{
    int *a;
    printf("%lu\n", sizeof(a));
    return 0;
}

./a.out
8

Interesting, it appears that the size of an integer pointer is 8 bytes on my hardware. To compile these programs, I’ve been using clang and by default the compiler will use whatever architecture size you currently using unless specified otherwise. We can actually compile a 32-bit program instead of a 64-bit program by adding the flag -m32 in order to see the pointer size will indeed change based on the bit size of the program that we are compiling.

#include <stdio.h>

int main()
{
    int *a;
    printf("%u\n", sizeof(a));
    return 0;
}

clang -o main-32 -m32 main.c
./main-32
4

The Void *

In C, there is a lot of trust between the language and the programmer that is using the language. We have the ability to take any type and cast it to a different type without any strong type checking like we might have with Go’s type assertions. An interesting thing that you can do in C is type cast a pointer to be a void * (void pointer), but what does it mean to have a variable that points to the type of void? Well, it basically means that you have an untyped pointer that you can’t dereference.

#include <stdio.h>

int main() {
  void *a;
  int b = 128;
  a = &b;
  *a = 256;
  return 0;
}

clang main.c -o main-64
main.c:7:6: error: incomplete type 'void' is not assignable
  *a = 256;
  ~~ ^
1 error generated.

Well, why is this useful, Nick, you smooth brain crayon eater?

It’s used to represent an abstract type that can be explicitly or implicitly cast to another type. This can be useful when we want to give some control to the consumer of an API that we might have created. For instance, take a look at the function prototype for malloc, the return type should be a void *. This is the reason why you would have to cast the result of malloc to the expected pointer type.

Holy Func, that’s unsafe

In the simplest terms, the Go unsafe package gives us, as programmers, more control of our programs by side stepping the safety of the Go type system. Just like the sizeof built-in in C, Go offers a similar operation as unsafe.SizeOf.

package main

import (
	"fmt"
	"unsafe"
)

func main() {
	var x int
	size := unsafe.Sizeof(x)
	fmt.Printf("%d\n", size)
}

Once again, the Go compiler allows us to choose the target architecture type that we are compiling to. If you’re on a x64 machine, you can target x86 (32-bits, GOARCH=386) or x64 (64-bit, GOARCH=amd64) which will change the default size of this of our pointer types. Another topic that’s important when we’re talking about dealing with lower level code is alignment, structure padding, and structure packing. Let’s start with an example in C, and then we will recreate a similar example in Go.

#include <stdint.h>
#include <stdio.h>

typedef struct _a {
  int32_t FieldOne;
  int64_t FieldTwo;
} A;

int main() {
  A a = {};
  printf("%ld\n", sizeof(a));
  return 0;
}

Before compiling this code, let’s take a look at the structure we have defined. The structure A contains two fields a int32_t (32-bit signed integer) and a int64_t (64-bit signed integer), so that should be 4 bytes plus 8 bytes for a total of 12 bytes for this structure, right? Well, turns out that in certain scenarios the compiler will choose the optimal data alignment for the compilation target because it usually more efficient when to store and load values from memory when they are properly aligned.

If we compile the C code above and run it, we should see the output of the value 16; therefore, an additional 4 bytes of padding was adding to this structure. We could actually enforce that the compiler to align our structure by packing it instead of adding padding, but this would be platform dependent. For brevity, I will just provide what it would look like using Linux and clang to pack our structure to be 12 bytes instead of 16 bytes.

#include <stdint.h>
#include <stdio.h>

typedef struct __attribute__((packed)) {
  int32_t FieldOne;
  int64_t FieldTwo;
} A;

int main() {
  A a = {};
  printf("%ld\n", sizeof(a));
  return 0;
}

A similar Go example will show us that the Go compiler will do the same thing, add padding to our structure. The below program should output the value of 16 as well. I don’t believe that Go directly supports structure packing, but there are ways around it that require manually writing to byte buffers which we can talk about another time.

package main

import (
	"fmt"
	"unsafe"
)

type A struct {
	FieldOne int32
	FieldTwo int64
}

func main() {
	var a A
	size := unsafe.Sizeof(a)
	fmt.Printf("%d\n", size)
}

In Go, we can also get the alignment of a structure or fields within a structure using the unsafe.Alignof. This reports the required alignment of its argument’s type. In our example below, we’ve defined a type A that contains three fields.

package main

import (
	"fmt"
	"unsafe"
)

// total size: 64-bit (32 bytes) | 32-bit (16 bytes)
type A struct {
	X bool  // 64-bit:  1 byte  & Alignof=1, 32-bit:  1 byte  & Alignof=1
	Y int16 // 64-bit:  2 bytes & Alignof=2, 32-bit:  2 bytes & Alignof=2
	Z []int // 64-bit: 24 bytes & Alignof=8, 32-bit: 24 bytes & Alignof=3
}

func main() {
	var a A
	fmt.Printf("size: %d\n", unsafe.Sizeof(a))
	fmt.Printf("alignment a: %d\n", unsafe.Alignof(a))
	fmt.Printf("alignment a.X: %d\n", unsafe.Alignof(a.X))
	fmt.Printf("alignment a.Y: %d\n", unsafe.Alignof(a.Y))
	fmt.Printf("alignment a.Z: %d\n", unsafe.Alignof(a.Z))
}

Looking at our program, we can see that unsafe.Alignof, just gives us the byte boundary of a particular type or field within that type. In addition to getting the field alignments, the unsafe.Offset function can be called to compute the offset of field f relative to the start of its enclosing struct x, accounting for any padding.

This segues nicely to to uintptr and unsafe.Pointer, a majority of the current functions in the unsafe package return a uintptr which is another built-it type that is supposed to represent that platform dependent default pointer size we discussed earlier; however, this just give us an integer value that would potentially represent an address in memory. We can use these both together to perform pointer arithmetic to, for example, change a value of a field.

package main

import (
	"fmt"
	"unsafe"
)

type A struct {
	X bool
	Y int16
	Z []int
}

func main() {
	var a A
	a.X = true
	a.Y = 128
	fmt.Printf("size: %d\n", unsafe.Sizeof(a))
	fmt.Printf("alignment a: %d\n", unsafe.Alignof(a))
	fmt.Printf("alignment a.X: %d\n", unsafe.Alignof(a.X))
	fmt.Printf("alignment a.Y: %d\n", unsafe.Alignof(a.Y))
	fmt.Printf("alignment a.Z: %d\n", unsafe.Alignof(a.Z))
	fmt.Printf("a.Y: %d\n", a.Y)
	var ptr *int16
	ptr = (*int16)(unsafe.Pointer(
		uintptr(unsafe.Pointer(&a)) +
			unsafe.Offsetof(a.Y)))
	*ptr = 256
	fmt.Printf("ptr: %d\n", *ptr)
	fmt.Printf("a.Y: %d\n", a.Y)
}

Because Go is a garbage collected language, we have to be careful of particular pitfalls that we might encounter working with this kind of code. It might seem safe to do the following:

package main

import (
	"fmt"
	"unsafe"
)

type A struct {
	X bool
	Y int16
	Z []int
}

func main() {
	var a A
	a.X = true
	a.Y = 128
	fmt.Printf("size: %d\n", unsafe.Sizeof(a))
	fmt.Printf("alignment a: %d\n", unsafe.Alignof(a))
	fmt.Printf("alignment a.X: %d\n", unsafe.Alignof(a.X))
	fmt.Printf("alignment a.Y: %d\n", unsafe.Alignof(a.Y))
	fmt.Printf("alignment a.Z: %d\n", unsafe.Alignof(a.Z))
	fmt.Printf("a.Y: %d\n", a.Y)
	var tmp uintptr
	tmp = uintptr(
		unsafe.Pointer(uintptr(unsafe.Pointer(&a)) +
			unsafe.Offsetof(a.Y)))
	var ptr *int16
	ptr = (*int16)(unsafe.Pointer(tmp))
	*ptr = 256
	fmt.Printf("ptr: %d\n", *ptr)
	fmt.Printf("a.Y: %d\n", a.Y)
}

The reason why this code is incorrect is because that some garbage collectors move variables around in memory to reduce fragmentation or bookkeeping. When a variable is moved, all pointers that hold the address of the old location, must be updated to point to the new one. From Go’s garbage collector’s perspective, an unsafe.Pointer is a pointer; therefore, its value must change as the variable moves, but a uintptr is just a number. In other words, the above code hides a pointer from the garbage collector in a non-pointer variable tmp and by the time the next statement executes to convert the tmp variable to a unsafe.Pointer the variable a could have been moved and the number in tmp would no longer be the address &a.Y. The next statement to write to the address would then clobber an arbitrary memory location with the value 256.

Because the unsafe.Pointer value gives us the equivalent of a void * in C, we can convert our types similarly to how you might expect it to behave in C. In the example below, when we do a type version between a float64 and a uint64 the decimal value is truncated and the whole number is stored in the uint64 variable; however, when we do our pointer conversion and then cast to a *uint64 we keep the float64 binary representation and dump it out as a uint64. Technically, Go doesn’t have type casting, but I’m just going to go with this terminology for now when we’ve converting from our unsafe.Pointer to another type.

package main

import (
	"fmt"
	"unsafe"
)

func main() {
	var a float64
	a = 3.14
	var b uint64
	fmt.Printf("a: %f\n", a)
	fmt.Printf("int(a) conversion: %d\n", int(a))
	b = Float64bits(a)
	fmt.Printf("pointer cast: %d\n", b)
}

func Float64bits(f float64) uint64 {
	return *((*uint64)(unsafe.Pointer(&f)))
}

go run main.go
a: 3.140000
int(a) conversion: 3
pointer cast: 4614253070214989087

Hello Operator: syscalls

The unsafe package in Go is useful when needing to interface with lower level APIs or C code, in this example we will make Unix system calls in a C program and it’s equivalent in a Go program. These lower level calls in Go typically require the usage of the unsafe package.

#include <fcntl.h>      // For open() and O_RDONLY
#include <stdint.h>     // For uint32_t
#include <stdio.h>      // For printf() and perror()
#include <sys/ioctl.h>  // For ioctl()
#include <sys/socket.h> // For struct sockaddr and sa_family_t
#include <unistd.h>     // For close()

#include <linux/vm_sockets.h> // For IOCTL_VM_SOCKETS_GET_LOCAL_CID

int main() {
  // Open the vsock device
  int fd = open("/dev/vsock", O_RDONLY);
  if (fd < 0) {
    perror("Failed to open /dev/vsock");
    return 1;
  }

  // Variable to hold the local CID
  uint32_t cid;

  // Get the local CID using ioctl
  if (ioctl(fd, IOCTL_VM_SOCKETS_GET_LOCAL_CID, &cid) < 0) {
    perror("Failed to get local CID");
    close(fd);
    return 1;
  }

  // Print the CID
  printf("CID: %u\n", cid);

  // Close the file descriptor
  close(fd);

  return 0;
}

package main

import (
	"fmt"
	"os"
	"syscall"
	"unsafe"
)

const IOCTL_VM_SOCKETS_GET_LOCAL_CID = 0x7B9 // Placeholder for the actual IOCTL value

func main() {
	// Open the vsock device
	fd, err := os.OpenFile("/dev/vsock", os.O_RDONLY, 0)
	if err != nil {
		fmt.Fprintf(os.Stderr, "Failed to open /dev/vsock: %v\n", err)
		return
	}
	defer fd.Close()

	// Variable to hold the local CID
	var cid uint32

	// Perform ioctl syscall
	_, _, errno := syscall.Syscall(
		syscall.SYS_IOCTL,
		fd.Fd(),
		IOCTL_VM_SOCKETS_GET_LOCAL_CID,
		uintptr(unsafe.Pointer(&cid)),
	)

	// Check for errors
	if errno != 0 {
		fmt.Fprintf(os.Stderr, "Failed to get local CID: %v\n", errno)
		return
	}

	// Print the CID
	fmt.Printf("CID: %d\n", cid)
}

SIGTERM

Well, at the least I hope that this demystified a little bit of one of the odd parts of the Go ecosystem and gave you some insight in to C. I will leave you with one fun program that does some pointer arithmetic in Go. Stay curious, friends.

package main

import (
	"fmt"
	"unsafe"
)

func main() {
	b := []uint32{1, 2, 3, 4}
	for i := 0; i < len(b); i++ {
		fmt.Printf("%d ",
			*(*uint32)(unsafe.Pointer(uintptr(unsafe.Pointer(&b[0])) +
				uintptr(i)*unsafe.Sizeof(b[0]))))
	}
}

References