Using Swift RegexBuilder in a lexer - collapsing ChoiceOf match output tuple to (SubString, TypeOfMatch) type

49 Views Asked by At

I'm implementing a parser using Swift's RegexBuilder for the lexer stage. I would like to understand if my regex ChoiceOf statement could be improved.

Specifically in the example code below, when tokenRegex matches it produces a tuple of type (SubString, Token?, Token?, ...), where the number of Token? members will be equal to the number regex terms in the ChoiceOf construct. Assuming all the regex terms in the ChoiceOf block return type Token, is there anyway to express it so that the return type is (SubString, Token?).

At present I deal with this by collapsing the match output tuple using the mirror functionality, but would like to skip this step if possible:

let token = Mirror(reflecting: match.output).children.compactMap({ $0.value as? Token }).first

Here's the full example code.

import Foundation
import RegexBuilder

enum Token {
    case number(Double)
    case text(String)
    case error(String)
    
    static let tokenRegex = Regex {
        ChoiceOf {
            numberRegex
            textRegex
            // Further regex patterns, all returning type Token...
        }
    }
    
    static let numberRegex = Regex {
        Capture {
            .localizedDouble(locale: Locale(identifier: "en-US"))
        } transform: {
            Token.number($0)
    }}
    
    static let textRegex = Regex {
        Capture {
            OneOrMore(.word)
        } transform: {
            Token.text(String($0))
    }}
    
    static func tokenise(_ text: String) -> [Token] {
        var stringToParse = text
        var tokens: [Token] = []
        
        while !stringToParse.isEmpty {
            let (token, matchEndIndex) = findNextToken(in: stringToParse)
            tokens.append(token)
            stringToParse = String(stringToParse[matchEndIndex...])
        }
        return tokens
    }
    
    static func findNextToken(in string: String) -> (Token, String.Index) {
        do {
            return if
                let match = try tokenRegex.firstMatch(in: string),
                let token = Mirror(reflecting: match.output).children.compactMap({ $0.value as? Token }).first
            {
                (token, match.0.endIndex)
            } else {
                (.error("Parse error"), string.endIndex)
            }
        } catch {
            return (.error(error.localizedDescription), string.endIndex)
        }
    }
}

Token.tokenise("123MyDogIsHappy") // Produces -> [Token.double(123.0), Token.text("MyDogIsHappy")]

0

There are 0 best solutions below