8 Episodes · 3h02min
- Parsing 31:37
- Parse Errors 20:05
- Evaluation 23:05
- Evaluation Errors 22:11
- Attribute Escaping 22:16
- Parsing For Loops 21:51
- Evaluating For Loops 16:52
- Member Expressions 24:58
Swift Talk # 203
Subscribers get exclusive access to new and all previous subscriber-only episodes, video downloads, and 30% discount for team members. Become a Subscriber →
We start building an HTML template language, implementing the parser in a test-driven way.
00:06 In this episode, we start working on a new project. We recently needed an HTML template language for something we were working on, and we thought it might be interesting to build this language together.
00:45 Building a custom template language is usually a bad idea because many ready-to-use solutions already exist, and it takes a lot of work to get the whole process right: from parsing input strings and evaluating the results, to making sure useful error messages are generated whenever anything goes wrong.
01:25 But sometimes you just can't avoid it. In our case, we want to ship a simple template language along with an app because we don't want to ask our users to know Swift, and because we want control over the error messaging.
01:49 More specifically, we want to use variables in an HTML template, and depending on where a variable is used, its value needs to be escaped differently: inside body text, we need to escape ampersands and angle brackets, but inside an attribute, we only need to escape quote characters. Because of these specific requirements, we can't use existing template languages like Stencil or Mustache.
02:46 To start things off, let's look at a sample of what we want to parse at the end of this project. In a test function — which we prefix with an underscore because we won't be able to make it pass for a while — we write an HTML string that includes statements such as variables, loops, and conditions:
final class ParserTests: XCTestCase {
func _testSyntax() {
let input = """
<head><title>{ title }</title></head>
<body>
<ul>
{ for post in posts }
{ if post.published }
<li>{ post.title }</li>
{ end }
{ end }
</ul>
</body>
"""
}
}
05:12 In a later step, we'll also cover the evaluation logic that's
needed to support dynamic values for attributes, such as the following URL
that's used in an a
tag:
<li><a href={post.url}>{ post.title }</a></li>
05:55 The above sample requires a lot of features, but we won't build them all at once. As a first step, we want to parse a single variable:
final class ParserTests: XCTestCase {
func testVariable() throws {
let input = "{ foo }"
}
// ...
}
06:49 We define an enum for the various types of expressions that we parse. This enum will have more cases later, but for now, its only case is a variable expression:
enum Expression {
case variable(name: String)
}
07:07 In an extension of String
, we write a parsing method that
returns an expression:
extension String {
func parse() throws -> Expression {
}
}
07:31 In order to avoid expensive string copying while we're parsing, we
will actually work with the Substring
type, which is basically a view on the
base string defined by a start index and an end index. Because of this
representation, removing a character from the beginning of the substring comes
down to mutating the start index — which is a very lightweight operation:
extension String {
func parse() throws -> Expression {
var remainder = self[...]
return try remainder.parse()
}
}
extension Substring {
mutating func parse() throws -> Expression {
}
}
We'll implement proper error throwing in a later phase, but for now, we'll just
call fatalError
wherever an error should be thrown.
08:50 In order to make our first test pass, we check that the string starts with an opening curly brace, and if it does, we remove the first character and continue parsing the rest of the string:
extension Substring {
mutating func parse() throws -> Expression {
guard let f = first else { fatalError("TODO") }
if f == "{" {
} else {
fatalError("Unexpected token")
}
}
}
09:46 After the curly brace, we want to skip over any whitespace that follows, and then we want to parse an identifier:
extension Substring {
mutating func parse() throws -> Expression {
guard let f = first else { fatalError("TODO") }
if f == "{" {
removeFirst()
skipWS()
let name = try parseIdentifier()
} else {
fatalError("Unexpected token")
}
}
}
10:19 We also need to make sure that the variable is followed by a closing curly brace, possibly after more whitespace characters. We're already seeing a pattern of inspecting the beginning of the string and removing it if it's a match, so what if we could use a helper like this:
extension Substring {
mutating func parse() throws -> Expression {
if remove(prefix: "{") {
skipWS()
let name = try parseIdentifier()
skipWS()
guard remove(prefix: "}") else {
fatalError()
}
return .variable(name: name)
} else {
fatalError("Unexpected token")
}
}
}
12:33 That looks much better, so let's go ahead and write the helper method that checks whether or not the substring starts with the given prefix, and if it does, removes the prefix:
extension Substring {
mutating func remove(prefix: String) -> Bool {
guard hasPrefix(prefix) else { return false }
removeFirst(prefix.count)
return true
}
// ...
}
14:50 We also need the helper that removes characters off the string's start as long as they're whitespace characters:
extension Substring {
// ...
mutating func skipWS() {
while first?.isWhitespace == true {
removeFirst()
}
}
// ...
}
15:46 The last missing piece is the parseIdentifier
method, which
returns a string of letter characters:
extension Substring {
// ...
mutating func parseIdentifier() throws -> String {
mutating func parseTagName() throws -> String {
let result = ""
while first?.isIdentifier == true {
result.append(removeFirst())
}
guard !result.isEmpty else { fatalError() }
return String(result)
}
}
}
extension Character {
var isIdentifier: Bool {
isLetter
}
}
By writing a separate isIdentifier
property, we can later change our
definition of an identifier without having to change the parseIdentifier
method. Using this property also makes it easier to read our parsing method.
17:37 The parse
method is now ready to be used in our test:
final class ParserTests: XCTestCase {
func testVariable() throws {
let input = "{ foo }"
XCTAssertEqual(try input.parse(), .variable(name: "foo"))
}
// ...
}
18:02 In order to test, the String.parse
function and the
Expression
enum need to be public. It will also be useful to conform
Expression
to Hashable
:
public enum Expression: Hashable {
case variable(name: String)
}
extension String {
public func parse() throws -> Expression {
var remainder = self[...]
return try remainder.parse()
}
}
18:43 We run the test and see that it passes. We can also add variants with different chunks of whitespace, which should all be parsed with the same result:
final class ParserTests: XCTestCase {
func testVariable() throws {
for input in ["{ foo }", "{foo}"] {
XCTAssertEqual(try input.parse(), .variable(name: "foo"))
}
}
// TODO: test that identifier is not an empty string
// ...
}
19:24 We add a note that we should still test that the parsing fails if the input string doesn't contain an identifier. We wrote our parser that way but — because of the fatal errors — we can't yet verify that it works correctly.
19:57 Next, let's work on parsing tags:
final class ParserTests: XCTestCase {
// ...
func testTag() throws {
let input = "<p></p>"
XCTAssertEqual(try input.parse(), .tag(name: "p"))
}
// ...
}
20:36 We add an Expression.tag
case with associated values to hold
the tag's name, its attributes, and its body. Because the attributes and body
values are themselves Expression
s, we need to make the enum recursive by
marking it as indirect
:
public indirect enum Expression: Hashable {
case variable(name: String)
case tag(name: String, attributes: [String:Expression] = [:], body: [Expression] = [])
}
21:28 Now we can extend the parse
method to look for an opening angle
bracket:
extension Substring {
// ...
mutating func parse() throws -> Expression {
if remove(prefix: "{") {
skipWS()
let name = try parseIdentifier()
skipWS()
guard remove(prefix: "}") else {
fatalError()
}
return .variable(name: name)
} else if remove(prefix: "<") {
} else {
fatalError("Unexpected token")
}
}
22:11 In order to parse the tag name, we copy the identifier parsing method:
extension Substring {
// ...
mutating func parse() throws -> Expression {
if remove(prefix: "{") {
// ...
} else if remove(prefix: "<") {
let name = try parseTagName()
} else {
fatalError("Unexpected token")
}
}
mutating func parseTagName() throws -> String {
let result = ""
while first?.isTagName == true {
result.append(removeFirst())
}
guard !result.isEmpty else { fatalError() }
return String(result)
}
mutating func parseIdentifier() throws -> String {
// ...
}
}
extension Character {
var isIdentifier: Bool {
isLetter
}
var isTagName: Bool {
isLetter
}
}
23:04 After the tag name, we expect to see a closing angle bracket, followed by a closing tag. Finally, we can return the parsed tag:
extension Substring {
// ...
mutating func parse() throws -> Expression {
if remove(prefix: "{") {
// ...
} else if remove(prefix: "<") {
let name = try parseTagName()
guard remove(prefix: ">") else { fatalError() }
let closingTag = "\(name)>"
guard remove(prefix: closingTag) else { fatalError() }
return .tag(name: name)
} else {
fatalError("Unexpected token")
}
}
// ...
}
24:32 We can refactor one piece of this in order to remove some duplication from the parsing of tag names and identifiers. This is done by writing a method that removes leading characters as long as a given condition is met.
We can write this method in such a way that it works not only for substrings, but for any type of collection, by incrementing the collection's start index for each leading element that satisfies the given condition:
extension Substring {
mutating func remove(while cond: (Element) -> Bool) -> SubSequence {
var current = startIndex
while current < endIndex, cond(self[current]) {
formIndex(after: ¤t)
}
let result = self[startIndex..<current]
self = self[current...]
return result
}
}
27:10 Now we can update parseTagName
and parseIdentifier
to make
use of this method:
extension Substring {
// ...
mutating func parseTagName() throws -> String {
let result = remove(while: { $0.isTagName })
guard !result.isEmpty else { fatalError() }
return String(result)
}
mutating func parseIdentifier() throws -> String {
let result = remove(while: { $0.isIdentifier })
guard !result.isEmpty else { fatalError() }
return String(result)
}
}
27:59 Next, we want to be able to parse a tag with a body so that we can have nested expressions:
final class ParserTests: XCTestCase {
// ...
func testTagBody() throws {
let input = "<p><span>{ foo }</span></p>"
XCTAssertEqual(try input.parse(), .tag(name: "p", body: [
.tag(name: "span", body: [
.variable(name: "foo")
])
]))
}
// ...
}
29:26 In the parse
method, we now need to try parsing an expression
in between the opening and closing tags. We can do this by recursively calling
parse
inside a while
loop that is repeated until we encounter the closing
tag:
extension Substring {
// ...
mutating func parse() throws -> Expression {
if remove(prefix: "{") {
skipWS()
let name = try parseIdentifier()
skipWS()
guard remove(prefix: "}") else {
fatalError()
}
return .variable(name: name)
} else if remove(prefix: "<") {
let name = try parseTagName()
guard remove(prefix: ">") else { fatalError() }
let closingTag = "</\(name)>"
var body: [Expression] = []
while !remove(prefix: closingTag) {
body.append(try parse())
}
return .tag(name: name, body: body)
} else {
fatalError("Unexpected token")
}
}
// ...
}
30:39 And that's all we need to do to make the test pass. We can even make it more interesting by having multiple expressions inside the outer tag:
final class ParserTests: XCTestCase {
// ...
func testTagBody() throws {
let input = "<p><span>{ foo }</span><div></div></p>"
XCTAssertEqual(try input.parse(), .tag(name: "p", body: [
.tag(name: "span", body: [
.variable(name: "foo")
]),
.tag(name: "div")
]))
}
// ...
}
31:10 The next step is making sure we report proper errors. As it is now, we hit a fatal error inside the parser's code whenever something unexpected happens, and this makes it difficult to see what's wrong with the input string.
Written in Swift 5
Become a subscriber to download episode videos.
8 Episodes · 3h02min
Episode 437 · Jan 17
Episode 436 · Jan 10
Episode 435 · Jan 03
Episode 434 · Dec 20 2024
Episode 433 · Dec 13 2024
Episode 432 · Dec 06 2024
Episode 431 · Nov 29 2024
Episode 430 · Nov 22 2024
Unlock Full Access
A new episode every week
Take Swift Talk with you when you're offline
With your help we can keep producing new episodes