Swift RegexBuilder – A New Way to Build Regular Expressions in iOS

Remember the last time you were trying to validate an email address? Or you struggled to create a RegEx to find all the matches for a specified pattern in a string? Chances are high that you ended up just copying a RegEx (that you didn’t understand) from the internet.

iOS 16 tries to solve this by introducing a new intuitive way how to write and understand regular expressions with the RegexBuilder.


Introduction

Regular expressions (regexes) haven’t changed much since their introduction in the 50s and native language support for them has been similarly stagnant in its design. As useful as they are, they can also require a lot of memorization of esoteric symbols and escaped characters. Also, the difficulty to read and write also makes them difficult to understand.

These expressions can quickly become large and difficult to work through, especially in cases where the data being matched doesn’t have an established standard or where the standard is vague. Email addresses and non-standardized date formats are two classic examples.

All of these problems can lead to a substandard experience and developers often choose to use expensive and complicated string processing and transformation APIs instead.

Apple and the Swift community are looking to alleviate some of that pain for us in Swift 5.7 and through improvements in Xcode 14 by introducing the RegexBuilder and making regexes feel like more of a modern first-class citizen in the language than it ever has before.

We Need an Example

What does it look like to “build” a regex pattern now and what are some of the new ways you might choose to work with that same expression in Swift 5.7? Let’s look at an example.

Imagine that we are very close with the neighbors on our street and want to build an application for ourselves. We decide the best way for our users to verify that they belong to our neighborhood is for them to enter their address so we can use a regular expression to check whether they meet our criteria. An example of a valid address may look like this:

let address = """
123 Fake St.
Imaginary, USA 55555
"""

Working step-by-step, we can define the rules we want to use to validate.

  • A house number that is 2 or 3 digits long and ends with an odd number. On our street, odd numbers always end on the same side of the road and Fake St is a very busy street, so we rarely get to cross and meet the neighbors on the other side.
  • Fake St, Fake Blvd, or Fake Ave should all be accepted. While we refer to Fake St for simplicity, some of our neighbors call it Fake Blvd or Fake Ave since there are no street signs to be sure. We don’t want to exclude them. Some of them may also include a “.” after St.
  • Either a newline or a comma followed by a space, but not both.
  • Imaginary, USA either with a comma or without it.
  • And since Fake St crosses through a border between zip codes, we need to accept either zip code 55555 or 55556.
  • All of it should be case insensitive, with nothing before or after it in the string.

Did they make the Old Way Better Too?

With all of that work, we now have the basis to write our pattern. It might end up looking like this:

func isAddressValid(_ address: String) -> Bool {
  let regex = /^(?:\d){1,2}[13579] Fake (?:St|Ave|Blvd)\.?(?:\n|, )(?:Imaginary),? USA 5555[56]$/.ignoresCase()

  return address.contains(regex)
}

If you’ve used regular expressions in Swift before, you might notice that our pattern is no longer defined using a String. That’s because Swift now offers support for regex literals. They look similar but have a few nice features in their favor.

Even for compact regular expressions like this one, Apple has given us some nice quality of life improvements in Xcode. Xcode will now syntax highlight the expression, giving you a way to catch whole categories of errors. The regex-aware compiler also means that the pattern isn’t parsed at runtime (although you can still do that). Errors in the syntax will cause a build failure and ensure those issues never make it into production.

This specific format has one thing going for it over any other. There’s only one little line. It’s compact, but that advantage comes with issues. For one thing, even our relatively simple example has a lot of parentheses and numbers whose meaning we will not necessarily remember when we return to it in a few months.

Converting Existing Expressions to RegexBuilder

So what if we could get the same behavior, but use a more verbose, natural-looking syntax? This is where Swift 5.7’s RegexBuilder APIs come in. Using a custom domain-specific language, built on top of ResultBuilders, rules can be combined to create a pattern, just like the one we made with our literal expression.

Even better, if we already have that literal, Xcode can convert it for us! It’s as simple as selecting the literal expression, right-clicking it, and choosing Refactor -> Convert to Regex Builder.

Xcode will expand it into its new builder syntax which looks like this:

func isAddressValidWithBuilder(_ address: String) -> Bool {
  let regex = Regex {
    Anchor.startOfLine
    Repeat(1...2) {
      One(.digit)
    }
    One(.anyOf("13579"))
    " Fake "
    ChoiceOf {
      "St"
      "Ave"
      "Blvd"
    }
    Optionally {
      "."
    }
    ChoiceOf {
      "\u{A}"
      ", "
    }
    "Imaginary"
    Optionally {
      ","
    }
    " USA 5555"
    One(.anyOf("56"))
    Anchor.endOfLine
  }.ignoresCase()

  return address.contains(regex)
}

For each block of the builder, we can see natural and understandable groupings, like groups of strings where one string must match, or groups of optional characters. Additionally, we have quantifiers, allowing us to specify how many of the characters we want. There are also several type properties like .digit, making it as easy as ever to specify a common set of characters.

Looking at this, I can feel reasonably sure that when I return to this code or share it with a teammate I’ll be able to recognize what this expression is matching without needing to remember esoteric escape sequences or consult a cheat sheet.

Captures

Going back to our example, let’s say we want to store our verified user’s address to prevent people from registering multiple times from the same location. Extracting portions of a string based on the defined pattern is a very common use case for regex. Let’s see what that looks like with RegexBuilder. To start, we’ll get the street name by just wrapping it in a Capture block.

func getStreet(_ address: String) -> String? {
  let regex = Regex {
    Anchor.startOfLine
    Repeat(1...2) {
      One(.digit)
    }
    One(.anyOf("13579"))
    Capture {
      " Fake "
      ChoiceOf {
        "St"
        "Ave"
        "Blvd"
      }
    }
    Optionally {
      "."
    }
    ChoiceOf {
      "\u{A}"
      ", "
    }
    "Imaginary"
    Optionally {
      ","
    }
    " USA 5555"
    One(.anyOf("56"))
    Anchor.endOfLine
  }.ignoresCase()

  if let matches = try? regex.wholeMatch(in: address) {
    return String(matches.1)
  }
  return nil
}

This is a start, but it still has a problem. Getting back an unnamed tuple isn’t really what we want in a perfect world. There’s a good chance we’ll come back to this expression later and decide we need to capture something else and forget to update our tuple indices. It’s also just not very clear what data we are pulling from that tuple. Regex has a solution for that with named groups, and SwiftUI makes it more apparent than ever what is happening by giving Capture initializers to do just that.

func getStreet(_ address: String) -> String? {
  let streetNameReference = Reference(Substring.self)

  let regex = Regex {
    Anchor.startOfLine
    Repeat(1...2) {
      One(.digit)
    }
    One(.anyOf("13579"))
    Capture(as: streetNameReference) {
      " Fake "
      ChoiceOf {
        "St"
        "Ave"
        "Blvd"
      }
    }
    Optionally {
      "."
    }
    ChoiceOf {
      "\u{A}"
      ", "
    }
    "Imaginary"
    Optionally {
      ","
    }
    " USA 5555"
    One(.anyOf("56"))
    Anchor.endOfLine
  }.ignoresCase()

  if let matches = try? regex.wholeMatch(in: address) {
    return String(matches[streetNameReference])
  }
  return nil
}

All we need to do is create a reference and pass it to our capture block and then we can use it when we retrieve the data. Not a bad way to go, and anybody looking at our code can tell at a glance what information we are returning.

It Gets Even Safer

As Swift developers, we love our type-safety and Apple knows that. Sometimes, we use regular expressions to get data that we need to work with as something other than a String. Previously, that might require that we capture the value we wanted, try to convert it to the type we want, carefully guard against any thrown errors, make decisions about what to do with the rest of the data we had matched, and decide if we should even keep it. Well, now that’s a lot easier. Let’s return the street number, if it exists, along with our street name.

func getStreet(_ address: String) -> (Int?, String?) {
  let streetNumberReference = Reference(Int.self)
  let streetNameReference = Reference(Substring.self)

  let regex = Regex {
    Anchor.startOfLine
    TryCapture(as: streetNumberReference) {
      Repeat(1...2) {
        One(.digit)
      }
      One(.anyOf("13579"))
    } transform: { streetNumber in
      Int(streetNumber)
    }
    Capture(as: streetNameReference) {
      " Fake "
      ChoiceOf {
        "St"
        "Ave"
        "Blvd"
      }
    }
    Optionally {
      "."
    }
    ChoiceOf {
      "\u{A}"
      ", "
    }
    "Imaginary"
    Optionally {
      ","
    }
    " USA 5555"
    One(.anyOf("56"))
    Anchor.endOfLine
  }.ignoresCase()

  if let matches = try? regex.wholeMatch(in: address) {
    return (matches[streetNumberReference], String(matches[streetNameReference]))
  }
  return (nil, nil)
}

After creating a reference for the street number in the same way we did for the street name, we can use a TryCapture block just like we did with Capture. TryCapture works similarly except that a failing capture or thrown error will cause the entire expression not to match. A much simpler way to recognize that something has gone wrong than in the past.

We also pass a closure to the transform property of TryCapture's initializer, allowing us to return whatever type we want (as long as it matches the reference we created).

Conclusion

There’s a whole lot to love about regex support all around in Xcode and Swift 5.7 and working with it has never been easier. Still, RegexBuilder takes it a step above and beyond by providing first-class support, readability, and maintainability to the process. Having unprecedented safety in regular expressions makes that low entry barrier feel even better and makes RegexBuilder an obvious choice.

This is one sample of what RegexBuilder provides and it’s worthwhile to check out Apple’s Documentation on RegexBuilder. There were also two regex-specific talks at WWDC22 this year called “Meet Swift Regex” and “Swift Regex: Beyond the basics” that dig even deeper into the new framework. And finally, keep an eye out for all of the new versions of old APIs that now accept a Regex parameter.

Did you enjoy this article? Tell us your opinion on Twitter! And if you have any open questions, thoughts or ideas, feel free to get in touch with us! Thanks for reading! 🙏

Are you an iOS Developer?
Do you want to work with people that care about good software engineering?
Join our team in Munich

🐣

Get notified when our next article is born!

(no spam, just one app-development-related article per month)