Unsafe Territory! Understanding Raw Bytes and Pointers in Swift
Int
, Float
or String
. But sooner or later, almost every Swift developer gets to a point where you need to get your hands dirty and access the raw data that’s normally hidden behind those types. For example, you’ll need to work with raw data when working with Bluetooth devices as explained in our article How to Read BLE Characteristics in Swift. And if you don’t know what you’re doing, things quickly get out of control…
We wrote this article to shed some light on the “dark corners” of Swift, to make you feel more secure when working with pointers and raw bytes – and most of all: We want you to know what you’re doing when accessing memory, in other words: We want you to feel safe in the unsafe
territory of Swift pointers.
The Basics: Memory Layout in Swift
This section is a prerequisite for the remaining article. However, if you know your way around the basics of memory management, feel free to skip this section and jump directly to Pointers in Swift.
From the most naive (but still accurate) perspective, memory is just a long string of bits (0s and 1s):
01001100010011110101011001000101
...
Single bits usually have very little meaning on their own, in almost every case only groups of bits make sense for us. And we usually go with groups of 8. So let’s chunk this bit string in our memory into 8-bit groups:
01001100 01001111 01010110 01000101 ...
We call each of these 8-bit groups a byte.
Nowadays our memory is usually pretty big. We’re dealing with Gigabytes or even Terrabytes of data. (1 GB is 1,000,000,000 of those groups.) So when storing data in memory or reading data from memory, we need a way to label the bytes we’re referring to. The most straight-forward way to do this is to simply use an incrementing integer number. We call this number a byte’s address.
Most systems access the stored data not byte-wise, but word-wise: A word is basically just multiple bytes. How long a word is (i.e. the number of bytes per word) depends on the platform. Modern platforms (like the latest iPhones and Macs) are 64 bit systems where 64 determines the number of bits per word. As 1 byte = 8 bits, it follows that words on these systems count 64/8 = 8 bytes per word.
So this would be a better representation for the memory on a 64-bit system:
Finally, we usually represent both addresses and the stored bytes as hexadecimal numbers:
Now comes the cool part:
The address of a byte itself can be stored in memory as well – and for that reason, the address has the same size as a word: on a 64-bit system that’s 8 bytes. So in reality, the addresses are all 8 bytes:
In other words: An address has exactly the size of a word and thus fits perfectly in memory. In the following example, the word stored under address ...08
references the word at the address ...18
:
When we interpret a word in memory as a reference to another address in memory, we call that word a pointer (because it points to another address).
Note: There are not only written languages but also computer platforms that read “in the opposite direction”, aka from right to left. On those platforms, the word
would be reversed byte-wise and look like this:
We call the first representation big-endian (where the most-signification byte comes first) and the latter representation little-endian (where the least-significant byte comes first).
So far, so good? Great! Then lets take the next step and see how we can work with these pointers in Swift!
Pointers in Swift
When I started working with pointers in Swift, I was very much overwhelmed by the sheer number of pointer types. The API isn’t easy, especially when you’re new to it. It used to be hard for me to figure out which pointer type or which method was the correct one to be used in the code I was working with. What made it worse was that all those pointer methods start with the word unsafe
, which kind of scared me off and discouraged me using those pointers in the first place.
Turns out, it’s not that complex as it seems after all – when you look at Swift pointers in a structured way!
Pointer Types
Look at
unsafe
as a simple prefix for all pointers.
Yes, there are some other pointer types that don’t begin with unsafe
, but unless you’re doing something really fancy (like bridging to C / Ojective-C methods), you’ll most likely use unsafe pointers and nothing else. In other words: Every pointer* is unsafe. The unsafe
prefix is only there to remind you that you need to know what you do with pointers and that you basically leave Swift‘s comfortable memory management and type safety where you practically “can’t do anything wrong” (in terms of accessing memory). But with that being said, it helps if you just drop the prefix in your mind and that’s what we’re going to do in this section.
(* you normally work with)
How Pointer Types Are Classified
There are 3 properties by which Swift pointers are classified:
- Single or Repeating?
- Typed or Untyped?
- Mutable or Immutable?
As every property has two possible values, that makes for 8 combinations in total, and for each of those combinations, there is a separate pointer type in Swift. Let’s quickly go through those properties one-by-one.
1. Single or Repeating?
The first property of a pointer is whether it points to a single value in storage (for example a single Int
value) or to a number of values with the same memory properties (for example an [Int]
array).
- Single value pointers are simply called
Pointer
(without any prefix). - Pointers to an “array” of values of the same kind are called
BufferPointer
.
You can think of a Buffer
as an array of equally sized memory space.
2. Typed or Untyped?
Bytes in memory have no meaning unless you know how to interpret them. The type of a value provides that information. And that’s what the second pointer property is all about:
- You can access memory with a typed pointer that knows the type of the bytes it points to. In Swift, these pointers have a generic paramter
<T>
which represents the type:Pointer<T>
. - You can also access memory without any type information in case you’re really just interested in the raw bytes and nothing else. In Swift, these pointers are called
RawPointer
and don’t have a generic parameter.
If we combine the first and the second property of a pointer, we end up with four classes of pointers:
3. Mutable or Immutable?
Finally, each one of these four pointer types has a mutable and an immutable version. When using the mutable version, you can write to the memory it points to. When you use the immutable version, you can only read the memory.
- Mutable pointers are prefixed with
Mutable
in Swift. - Immutable pointers don’t have this prefix.
To sum it up, after adding the Unsafe
prefix to all of them, these are all eight pointer types you would normally use in Swift:
Working with Pointers in Swift
Getting a value’s (typed) pointer
For any value in Swift, you can access its underlying pointer with the global function withUnsafePointer
(or its equivalent withUnsafeMutablePointer
if you need to modify the underlying memory). So if we have an integer, for example:
let registrationNumber: Int = 74656
we can access its pointer like this:
withUnsafePointer(to: registrationNumber) { pointer in // access to the pointer }
That’s a bit boring, because you can’t do much with it. Things get more interesting with a mutable pointer. But if we call the function, the Swift compiler will complain. (That’s why it’s called a comp-iler, isn’t it? 🙈)
withUnsafeMutablePointer(to: registrationNumber) { mutablePointer in ... }
Why? Because we cannot change the value of constants in Swift and we defined registrationNumber
as a constant above with let
. So let
‘s fix that! (I promise, the puns won’t get worse than at this point. 😇)
var registrationNumber: Int = 74656
However, the code still doesn’t compile. And that’s because when we want to change a value that we pass as a paramter to a function, we must mark it as an inout
paramter in Swift. This is done by prefixing the value with an ampersand (&
):
withUnsafeMutablePointer(to: ®istrationNumber) { mutablePointer in // modify the pointer here }
Now we’re good! 🎉
The above method give us typed pointers to work with. But most of the time, we’re more interested in the actual bytes of the pointer. In order to get those bytes, we need a different view on the memory, an untyped view, in other words: a RawPointer
or to be precise, an UnsafeRawPointer
.
Accessing a value’s bytes in memory
For any value in Swift, we can access its underlying raw pointer with the global function withUnsafeBytes
(or its equivalent withUnsafeMutableBytes
if you need to modify the underlying memory). The naming here is a bit confusing, because we’re not accessing a value’s bytes directly, but a pointer, as well. Those methods should be named withUnsafeRawPointerBuffer
and withUnsafeMutableRawPointerBuffer
respectively to be consistent with the names of the pointers, but that naming is quite verbose and there might be some other good reasons why the Swift team went with the other names.
withUnsafeBytes(of: registrationNumber) { rawPointerBuffer in // access to the value's raw pointer buffer }
Why does this method give us a raw pointer buffer and not just a raw pointer?
Well, we said above that raw pointers are without type information. But that’s only a very top-level way to see look at the picture. Actually, raw pointers do have a type and that type is always UInt8
.
UInt8
has a size of 8 bit and thus exactly the size of a single byte. Thus, it’s the perfect container type for single bytes.
Tip: You might want to consider using a type alias to your Swift project, so you have a more expressive name for UInt8
s:
typealias Byte = UInt8
With that, you can always write Byte
instead if UInt8
.
Now every instance of any type in Swift occupies at least 1 byte in memory, but most types are larger. For example, Int
occupies 8 bytes on a 64-bit system. So if the withUnsafeBytes
function gave us a raw pointer only instead of a raw pointer buffer, we would only get a pointer to the very first byte of any type instance. And that’s not what we want 99% of the time. Normally, we really want to have access to all the bytes that make up the value, and if it’s multiple Bytes
(aka multiple UInt8
s), we need a buffer, of course.
Finally, if we want to modify the underlying bytes of a value in memory, we need a mutable raw pointer buffer:
withUnsafeMutableBytes(of: ®istrationNumber) { mutablePointerBuffer in // modify the pointer buffer (the bytes) here }
(Note that we need to mark the registrationNumber
as an inout
parameter and make it a var
just like before.)
Working with Raw Bytes in Swift
That was a lot of theory! Now let’s get practical and see how we can actually work with the bytes in memory:
The good thing is that just like an array, each pointer buffer is a Sequence
. So we can use for ... in
loops and all that stuff we know from arrays:
withUnsafeBytes(of: registrationNumber) { pointerBuffer in for byte in pointerBuffer { print(byte) } }
This function prints the values of the registrationNumber
‘s underlying bytes as UInt8
values:
160 35 1 0 0 0 0 0
You might be surprised that the zeros are at the end and not in the beginning. That’s because the Mac I was running this code on is a little-endian machine, meaning that the least-significant byte comes first.
Remember that registrationNumber
was 74656
. Let’s modify the first three bytes of this value for fun …
withUnsafeMutableBytes(of: ®istrationNumber) { mutablePointerBuffer in mutablePointerBuffer[0] = 165 mutablePointerBuffer[1] = 6 mutablePointerBuffer[2] = 0 }
… and leave the remaining zero bytes untouched. When we now print the value of registrationNumber
, we see that it has changed:
print(registrationNumber) // 1701
So this is how we manipulate values on the byte level.
At this point, you can already see one reason why pointers are unsafe: If we’re not careful, we could access the mutablePointerBuffer
at an index out of bounds. For example, the following code would crash – as registrationNumber
is an Int
and as such only consists of eight bytes (indices 0-7):
withUnsafeMutableBytes(of: ®istrationNumber) { mutablePointerBuffer in mutablePointerBuffer[8] = 1 // 💥 crashes }
You can also do more fancy stuff. For example, the following function sets all bytes to 0
which are not valid printable ACSII characters:
withUnsafeMutableBytes(of: ®istrationNumber) { unsafeMutablePointer in for (index, byte) in unsafeMutablePointer.enumerated() { if !(32...126).contains(byte) { unsafeMutablePointer[index] = 0 } } }
Of course, there are high-level functions in Swift that you should use instead for making sure that a value is valid ASCII code. (We’re even dealing with an Int
here, so treating that as a string doesn’t really make sense in the first place.) But this function shows how you can do operations on the byte-level easily.
Making Computations With Bytes
Last but not least, these withUnsafe(Mutable)Bytes
functions also have a return value. You can use it for making computations on the byte-level and then hand the result to your code in the calling scope.
For example, we could simply add up all the bytes to get something like a (very basic) checksum:
let checksum = withUnsafeBytes(of: registrationNumber) { pointerBuffer -> UInt8 in pointerBuffer.reduce(0, +) }
or we could define a function that checks if all bytes of the registrationNumber
are greater than zero:
let allBytesNonZero = withUnsafeBytes(of: registrationNumber) { pointerBuffer -> Bool in pointerBuffer.reduce(true) { (intermediateResult, nextByte) -> Bool in intermediateResult && (nextByte > 0) } }
Why “withUnsafeBytes” Works With Closures
One of the first questions I had when I started working with pointers in Swift was: Why does Swift have this verbose syntax with a closure when all I want to do is get the bytes of an instances?
Maybe you can answer that question yourself already? A closure has a well-defined scope and you only have access to the pointer within that scope. The reason is that pointers are automatically managed by Swift‘s memory management system meaning that they might be modified, deallocated or invalidated at any time. Within the closure of these withUnsafeBytes
methods, you have a guarantee that the pointer is valid. The moment the program flow leaves that closure, that guarantee is gone.
If you just want to get a copy of the bytes of any type in Swift (without modifying them) and you want to continue working with those bytes in your program scope, here’s a simple way to get them:
let bytesArray = withUnsafeBytes(of: registrationNumber, Array.init)
bytesArray
is now an array of UInt8
values representing the bytes. You can safely work with this array as it’s “out of the unsafe territory”.
Wrapping It Up
In this article, you’ve learned what pointers are, which pointer types exist in Swift, and how you can access the pointer for any given value to work with the individual bytes in memory. There are some wonderful articles on the web that you might wanna check out if you want to learn more about data and pointers in Swift:
- swift unboxed: Size, Stride, Alignment by Greg Heo: A must-read and a logical next step after finishing this article. Explains very important concepts of memory management in Swift in a very intuitive way.
- Swift Pointers Overview: Unsafe, Buffer, Raw and Managed Pointers by Vadim Bulavin: Gives a quick overview on all pointer types in Swift, including those we didn’t mention in this article.
- Unsafe Swift: Using Pointers and Interacting With C by Brody Eller: A tutorial showing some nice applications of Swift pointers.
- UnsafePointer: Apple’s documentation on the
UnsafePointer
type has some really nice examples and provides further information.
If you enjoyed reading this article, please consider sharing it with others. Thanks for reading! 💙