The complexity of Strings
In Swift one of the things that most programmers new to Swift will find confusing / irritating is how Swift strings are so different from Strings in other languages.
Let's start with an example:
Code:
String input = "a quick movement"
In C# and Java, we can extract the word quick respectively as follows:
Code:
String output = input.Substring(2, 5)
String output = input.substring(2, 5)
In Swift:
Code:
let input = "a quick movement"
let start = input.startIndex.advancedBy(2)
let end = start.advancedBy(5)
let range = start..<end
let output = input[range]
or all in 1 line like this:
Code:
let output2 = input[input.startIndex.advancedBy(2)..<input.startIndex.advancedBy(2).advancedBy(5)]
This should immediately bring about a WTF
moment; why so complicated?
The short answer is that Swift doesn't treat characters the same way that C# and Java do. The String type in Swift is a collection of Character values. A Swift Character represents one perceived character (what a person thinks of as a single character, called a grapheme). Since Unicode often uses two or more code points (called a grapheme cluster) to form one perceived character, this implies that a Character can be composed of multiple Unicode scalar values if they form a single grapheme cluster. (Unicode scalar is the term for any Unicode code point except surrogate pair characters, which are used to encode UTF-16.)
Ok, that's a mouth full, show me an example:
C#
Code:
string nfc = "\u03D4"; // equals ϔ
string nfd = "\u03D2\u0308"; // equals ϔ
var b = nfc == nfd; // false
Swift
Code:
var nfc: String = "\u{03D4}" // equals ϔ
var nfd: String = "\u{03D2}\u{0308}" // equals ϔ
var b = nfc == nfd // true
In the above example, we assign a Greek upsilon with diaeresis and hook symbol (ϔ) to the variables nfc and nfd. We do this using different Unicode code points. Both examples do the same thing, but as you can see the results differ.
In .NET, comparison is done at byte level, so it is actually important to either normalize it before comparison or to use the IsNormalized method to check that both strings use the same Normalization Form. In Swift, the result is true because “their extended grapheme clusters are canonically equivalent”.
An extended grapheme cluster is a sequence of Unicode scalars as illustrated by the variable nfd in both examples. So when are they canonically equivalent? Apple provides the following explanation: “they have the same linguistic meaning and appearance, even if they are composed from different Unicode scalars behind the scenes.” In short Swift Character indexes are consistent irrespective of the method used to construct an extended grapheme cluster i.e. that's why Swift's index does not equal an int byte offset as it does it C# and Java.
- characters is a collection of Character values, or extended grapheme clusters.
- unicodeScalars is a collection of Unicode scalar values.
- utf8 is a collection of UTF–8 code units.
- utf16 is a collection of UTF–16 code units.
If we take the word “café”, comprised of the decomposed characters [ c, a, f, e ] and [ ´ ], here's what the various string views would consist of:
As you can see depending on your point of view (or the your language's default), the above is not the same. In Swift, you can easily break a String down into its extended grapheme clusters (Character), or UTF8, UTF16 or even Unicode Scalars; these are simple .method calls on a String.
Now for some code fun:
With Swift's consistent support of extended grapheme clusters, you can even use these as part of your Swift code, for example:

...and that's how you load an ark.
Ok, as a final bit let's see if Swift really has to be so complex when cutting up a String; in short no it doesn't as you can extend the language to simplify this. First thing we're going to extend string to support an Int range e.g. 2...5
Code:
extension String
{
subscript (range: Range<Int>) -> String?
{
guard range.startIndex >= 0 &&
range.endIndex <= self.characters.count else { return nil }
let subStart = self.startIndex.advancedBy(range.startIndex)
let subEnd = self.startIndex.advancedBy(range.endIndex)
return self[subStart...subEnd]
}
}
Ok whew, what does that give us, well let's take the first example, and see how the substring will now work in Swift:
Code:
let input = "a quick movement"
let output = input[2...5] // this is now our substring command
Ok if it's that easy, you might ask why doesn't Apple just include it as part of the standard library; well the short answer is that because of the differences between Swift String and other languages, they want programmers to make an informed choice by working at index level.
In the long run this will most likely be included, but for now we're all just building our own custom standard library extensions.