I'm implementing a parser using Swift's RegexBuilder
for the lexer stage. I would like to understand if my regex ChoiceOf
statement could be improved.
Specifically in the example code below, when tokenRegex
matches it produces a tuple of type (SubString, Token?, Token?, ...)
, where the number of Token?
members will be equal to the number regex terms in the ChoiceOf
construct. Assuming all the regex terms in the ChoiceOf
block return type Token
, is there anyway to express it so that the return type is (SubString, Token?)
.
At present I deal with this by collapsing the match output tuple using the mirror functionality, but would like to skip this step if possible:
let token = Mirror(reflecting: match.output).children.compactMap({ $0.value as? Token }).first
Here's the full example code.
import Foundation
import RegexBuilder
enum Token {
case number(Double)
case text(String)
case error(String)
static let tokenRegex = Regex {
ChoiceOf {
numberRegex
textRegex
// Further regex patterns, all returning type Token...
}
}
static let numberRegex = Regex {
Capture {
.localizedDouble(locale: Locale(identifier: "en-US"))
} transform: {
Token.number($0)
}}
static let textRegex = Regex {
Capture {
OneOrMore(.word)
} transform: {
Token.text(String($0))
}}
static func tokenise(_ text: String) -> [Token] {
var stringToParse = text
var tokens: [Token] = []
while !stringToParse.isEmpty {
let (token, matchEndIndex) = findNextToken(in: stringToParse)
tokens.append(token)
stringToParse = String(stringToParse[matchEndIndex...])
}
return tokens
}
static func findNextToken(in string: String) -> (Token, String.Index) {
do {
return if
let match = try tokenRegex.firstMatch(in: string),
let token = Mirror(reflecting: match.output).children.compactMap({ $0.value as? Token }).first
{
(token, match.0.endIndex)
} else {
(.error("Parse error"), string.endIndex)
}
} catch {
return (.error(error.localizedDescription), string.endIndex)
}
}
}
Token.tokenise("123MyDogIsHappy") // Produces -> [Token.double(123.0), Token.text("MyDogIsHappy")]