Unsafe Territory! Understanding Raw Bytes and Pointers in Swift

It’s rare that you juggle with bits and bytes in Swift because of its neat static type system. Most of the time, you don’t care what’s under the hood – you just work with classes, structs, and enums or primitive types like Int, Float or String. But sooner or later, almost every Swift developer gets to a point where you need to get your hands dirty and access the raw data that’s normally hidden behind those types. For example, you’ll need to work with raw data when working with Bluetooth devices as explained in our article How to Read BLE Characteristics in Swift. And if you don’t know what you’re doing, things quickly get out of control…

We wrote this article to shed some light on the “dark corners” of Swift, to make you feel more secure when working with pointers and raw bytes – and most of all: We want you to know what you’re doing when accessing memory, in other words: We want you to feel safe in the unsafe territory of Swift pointers.

The Basics: Memory Layout in Swift

This section is a prerequisite for the remaining article. However, if you know your way around the basics of memory management, feel free to skip this section and jump directly to Pointers in Swift.

From the most naive (but still accurate) perspective, memory is just a long string of bits (0s and 1s):

01001100010011110101011001000101...

Single bits usually have very little meaning on their own, in almost every case only groups of bits make sense for us. And we usually go with groups of 8. So let’s chunk this bit string in our memory into 8-bit groups:

01001100 01001111 01010110 01000101 ...

We call each of these 8-bit groups a byte.

Nowadays our memory is usually pretty big. We’re dealing with Gigabytes or even Terrabytes of data. (1 GB is 1,000,000,000 of those groups.) So when storing data in memory or reading data from memory, we need a way to label the bytes we’re referring to. The most straight-forward way to do this is to simply use an incrementing integer number. We call this number a byte’s address.

Most systems access the stored data not byte-wise, but word-wise: A word is basically just multiple bytes. How long a word is (i.e. the number of bytes per word) depends on the platform. Modern platforms (like the latest iPhones and Macs) are 64 bit systems where 64 determines the number of bits per word. As 1 byte = 8 bits, it follows that words on these systems count 64/8 = 8 bytes per word.

So this would be a better representation for the memory on a 64-bit system:

Finally, we usually represent both addresses and the stored bytes as hexadecimal numbers:

Now comes the cool part:

The address of a byte itself can be stored in memory as well – and for that reason, the address has the same size as a word: on a 64-bit system that’s 8 bytes. So in reality, the addresses are all 8 bytes:

In other words: An address has exactly the size of a word and thus fits perfectly in memory. In the following example, the word stored under address ...08 references the word at the address ...18:

When we interpret a word in memory as a reference to another address in memory, we call that word a pointer (because it points to another address).


Note: There are not only written languages but also computer platforms that read “in the opposite direction”, aka from right to left. On those platforms, the word

would be reversed byte-wise and look like this:

We call the first representation big-endian (where the most-signification byte comes first) and the latter representation little-endian (where the least-significant byte comes first).


So far, so good? Great! Then lets take the next step and see how we can work with these pointers in Swift!

Pointers in Swift

When I started working with pointers in Swift, I was very much overwhelmed by the sheer number of pointer types. The API isn’t easy, especially when you’re new to it. It used to be hard for me to figure out which pointer type or which method was the correct one to be used in the code I was working with. What made it worse was that all those pointer methods start with the word unsafe, which kind of scared me off and discouraged me using those pointers in the first place.

Turns out, it’s not that complex as it seems after all – when you look at Swift pointers in a structured way!

Pointer Types

Look at unsafe as a simple prefix for all pointers.

Yes, there are some other pointer types that don’t begin with unsafe, but unless you’re doing something really fancy (like bridging to C / Ojective-C methods), you’ll most likely use unsafe pointers and nothing else. In other words: Every pointer* is unsafe. The unsafe prefix is only there to remind you that you need to know what you do with pointers and that you basically leave Swift‘s comfortable memory management and type safety where you practically “can’t do anything wrong” (in terms of accessing memory). But with that being said, it helps if you just drop the prefix in your mind and that’s what we’re going to do in this section.

(* you normally work with)

How Pointer Types Are Classified

There are 3 properties by which Swift pointers are classified:

  1. Single or Repeating?
  2. Typed or Untyped?
  3. Mutable or Immutable?

As every property has two possible values, that makes for 8 combinations in total, and for each of those combinations, there is a separate pointer type in Swift. Let’s quickly go through those properties one-by-one.

1. Single or Repeating?

The first property of a pointer is whether it points to a single value in storage (for example a single Int value) or to a number of values with the same memory properties (for example an [Int] array).

  • Single value pointers are simply called Pointer (without any prefix).
  • Pointers to an “array” of values of the same kind are called BufferPointer.

You can think of a Buffer as an array of equally sized memory space.

2. Typed or Untyped?

Bytes in memory have no meaning unless you know how to interpret them. The type of a value provides that information. And that’s what the second pointer property is all about:

  • You can access memory with a typed pointer that knows the type of the bytes it points to. In Swift, these pointers have a generic paramter <T> which represents the type: Pointer<T>.
  • You can also access memory without any type information in case you’re really just interested in the raw bytes and nothing else. In Swift, these pointers are called RawPointer and don’t have a generic parameter.

If we combine the first and the second property of a pointer, we end up with four classes of pointers:

3. Mutable or Immutable?

Finally, each one of these four pointer types has a mutable and an immutable version. When using the mutable version, you can write to the memory it points to. When you use the immutable version, you can only read the memory.

  • Mutable pointers are prefixed with Mutable in Swift.
  • Immutable pointers don’t have this prefix.

To sum it up, after adding the Unsafe prefix to all of them, these are all eight pointer types you would normally use in Swift:

Working with Pointers in Swift

Getting a value’s (typed) pointer

For any value in Swift, you can access its underlying pointer with the global function withUnsafePointer (or its equivalent withUnsafeMutablePointer if you need to modify the underlying memory). So if we have an integer, for example:

let registrationNumber: Int = 74656

we can access its pointer like this:

withUnsafePointer(to: registrationNumber) { pointer in
    // access to the pointer
}

That’s a bit boring, because you can’t do much with it. Things get more interesting with a mutable pointer. But if we call the function, the Swift compiler will complain. (That’s why it’s called a comp-iler, isn’t it? 🙈)

withUnsafeMutablePointer(to: registrationNumber) { mutablePointer in
    ...
}

Why? Because we cannot change the value of constants in Swift and we defined registrationNumber as a constant above with let. So let‘s fix that! (I promise, the puns won’t get worse than at this point. 😇)

var registrationNumber: Int = 74656

However, the code still doesn’t compile. And that’s because when we want to change a value that we pass as a paramter to a function, we must mark it as an inout paramter in Swift. This is done by prefixing the value with an ampersand (&):

withUnsafeMutablePointer(to: &registrationNumber) { mutablePointer in
    // modify the pointer here
}

Now we’re good! 🎉

The above method give us typed pointers to work with. But most of the time, we’re more interested in the actual bytes of the pointer. In order to get those bytes, we need a different view on the memory, an untyped view, in other words: a RawPointer or to be precise, an UnsafeRawPointer.

Accessing a value’s bytes in memory

For any value in Swift, we can access its underlying raw pointer with the global function withUnsafeBytes (or its equivalent withUnsafeMutableBytes if you need to modify the underlying memory). The naming here is a bit confusing, because we’re not accessing a value’s bytes directly, but a pointer, as well. Those methods should be named withUnsafeRawPointerBuffer and withUnsafeMutableRawPointerBuffer respectively to be consistent with the names of the pointers, but that naming is quite verbose and there might be some other good reasons why the Swift team went with the other names.

withUnsafeBytes(of: registrationNumber) { rawPointerBuffer in
    // access to the value's raw pointer buffer
}

Why does this method give us a raw pointer buffer and not just a raw pointer?

Well, we said above that raw pointers are without type information. But that’s only a very top-level way to see look at the picture. Actually, raw pointers do have a type and that type is always UInt8.

UInt8 has a size of 8 bit and thus exactly the size of a single byte. Thus, it’s the perfect container type for single bytes.

Tip: You might want to consider using a type alias to your Swift project, so you have a more expressive name for UInt8s:

typealias Byte = UInt8

With that, you can always write Byte instead if UInt8.

Now every instance of any type in Swift occupies at least 1 byte in memory, but most types are larger. For example, Int occupies 8 bytes on a 64-bit system. So if the withUnsafeBytes function gave us a raw pointer only instead of a raw pointer buffer, we would only get a pointer to the very first byte of any type instance. And that’s not what we want 99% of the time. Normally, we really want to have access to all the bytes that make up the value, and if it’s multiple Bytes (aka multiple UInt8s), we need a buffer, of course.

Finally, if we want to modify the underlying bytes of a value in memory, we need a mutable raw pointer buffer:

withUnsafeMutableBytes(of: &registrationNumber) { mutablePointerBuffer in
    // modify the pointer buffer (the bytes) here
}

(Note that we need to mark the registrationNumber as an inout parameter and make it a var just like before.)

Working with Raw Bytes in Swift

That was a lot of theory! Now let’s get practical and see how we can actually work with the bytes in memory:

The good thing is that just like an array, each pointer buffer is a Sequence. So we can use for ... in loops and all that stuff we know from arrays:

withUnsafeBytes(of: registrationNumber) { pointerBuffer in
    for byte in pointerBuffer {
        print(byte)
    }
}

This function prints the values of the registrationNumber‘s underlying bytes as UInt8 values:

160
35
1
0
0
0
0
0

You might be surprised that the zeros are at the end and not in the beginning. That’s because the Mac I was running this code on is a little-endian machine, meaning that the least-significant byte comes first.

Remember that registrationNumber was 74656. Let’s modify the first three bytes of this value for fun …

withUnsafeMutableBytes(of: &registrationNumber) { mutablePointerBuffer in
    mutablePointerBuffer[0] = 165
    mutablePointerBuffer[1] = 6
    mutablePointerBuffer[2] = 0
}

… and leave the remaining zero bytes untouched. When we now print the value of registrationNumber, we see that it has changed:

print(registrationNumber) // 1701

So this is how we manipulate values on the byte level.

At this point, you can already see one reason why pointers are unsafe: If we’re not careful, we could access the mutablePointerBuffer at an index out of bounds. For example, the following code would crash – as registrationNumber is an Int and as such only consists of eight bytes (indices 0-7):

withUnsafeMutableBytes(of: &registrationNumber) { mutablePointerBuffer in
    mutablePointerBuffer[8] = 1 // 💥 crashes
}   

You can also do more fancy stuff. For example, the following function sets all bytes to 0 which are not valid printable ACSII characters:

withUnsafeMutableBytes(of: &registrationNumber) { unsafeMutablePointer in
    for (index, byte) in unsafeMutablePointer.enumerated() {
        if !(32...126).contains(byte) {
            unsafeMutablePointer[index] = 0
        }
    }
}

Of course, there are high-level functions in Swift that you should use instead for making sure that a value is valid ASCII code. (We’re even dealing with an Int here, so treating that as a string doesn’t really make sense in the first place.) But this function shows how you can do operations on the byte-level easily.

Making Computations With Bytes

Last but not least, these withUnsafe(Mutable)Bytes functions also have a return value. You can use it for making computations on the byte-level and then hand the result to your code in the calling scope.

For example, we could simply add up all the bytes to get something like a (very basic) checksum:

let checksum = withUnsafeBytes(of: registrationNumber) { pointerBuffer -> UInt8 in
    pointerBuffer.reduce(0, +)
}

or we could define a function that checks if all bytes of the registrationNumber are greater than zero:

let allBytesNonZero = withUnsafeBytes(of: registrationNumber) { pointerBuffer -> Bool in
    pointerBuffer.reduce(true) { (intermediateResult, nextByte) -> Bool in
        intermediateResult && (nextByte > 0)
    }
}

Why “withUnsafeBytes” Works With Closures

One of the first questions I had when I started working with pointers in Swift was: Why does Swift have this verbose syntax with a closure when all I want to do is get the bytes of an instances?

Maybe you can answer that question yourself already? A closure has a well-defined scope and you only have access to the pointer within that scope. The reason is that pointers are automatically managed by Swift‘s memory management system meaning that they might be modified, deallocated or invalidated at any time. Within the closure of these withUnsafeBytes methods, you have a guarantee that the pointer is valid. The moment the program flow leaves that closure, that guarantee is gone.

If you just want to get a copy of the bytes of any type in Swift (without modifying them) and you want to continue working with those bytes in your program scope, here’s a simple way to get them:

let bytesArray = withUnsafeBytes(of: registrationNumber, Array.init)


bytesArray is now an array of UInt8 values representing the bytes. You can safely work with this array as it’s “out of the unsafe territory”.

Wrapping It Up

In this article, you’ve learned what pointers are, which pointer types exist in Swift, and how you can access the pointer for any given value to work with the individual bytes in memory. There are some wonderful articles on the web that you might wanna check out if you want to learn more about data and pointers in Swift:

If you enjoyed reading this article, please consider sharing it with others. Thanks for reading! 💙

🐣

Get notified when our next article is born!

(no spam, just one app-development-related article per month)