How to create a Swift Regex that outputs a custom type?

670 Views Asked by At

In the WWDC videos, it was shown that you can do something like this with Captures/TryCaptures in the Regex Builder:

let regex = Regex {
  // ...

  TryCapture {
    OneOrMore(.digit)
  } transform: {
    Int($0)
  }

  // ...
}

And the output of the Regex will be type safe. The Regex will output an Int for that group, instead of a Substring like it normally does.

However, what I would like to do is to change the entire output type of the whole Regex, like applying a transform: at the end of the Regex closure. For example, to parse a line containing the name, age and date of birth of a person:

John (30) 1992-09-22

I would like to do something like:

// this doesn't work and is just for illustration - there is no such Regex.init
let regex = Regex {
    Capture(/\w+/)
    " ("
    TryCapture(/\d+/) { Int($0) }
    ") "
    Capture(.iso8601Date(timeZone: .gmt))
} transform: { (_, name, age, dob) in
    Person(name: String(name), age: age, dob: dob)
}

And I would expect regex be of type Regex<Person>, and not Regex<(Substring, Substring, Int, Date)>. That is, someString.wholeMatch(of: regex).output would be a string, not a tuple.

I'm basically just trying to reduce the occurrence of tuples, because I find it very inconvenient to work with them, especially unnamed ones. Since RegexComponent is parameterised by the unconstrained RegexOutput type, and there are built-in types where RegexOutput is Date and Decimal, surely doing this for arbitrary types using regex is not impossible, right?

My attempt was:

struct Person {
    let name: String
    let age: Int
    let dob: Date
}
let line = "John (30) 1992-09-22"
let regex = Regex {
    Capture {
        Capture(/\w+/)
        " ("
        TryCapture(/\d+/) { Int($0) }
        ") "
        Capture(.iso8601Date(timeZone: .gmt))
    } transform: { (_, name, age, dob) in
        Person(name: String(name), age: age, dob: dob)
    }
}
line.wholeMatch(of: regex)

but this crashed at runtime, giving the message:

Could not cast value of type 'Swift.Substring' (0x7ff865e3ead8) to '(Swift.Substring, Swift.Substring, Swift.Int, Foundation.Date)' (0x7ff863f2e660).

Another attempt of mine using CustomConsumingRegexComponent is shown here in this answer, but that has quite a large caveat, namely that it doesn't backtrack properly.

How can I create a Regex that outputs my own type?

1

There are 1 best solutions below

0
On

From what I have read/seen in samples (e.g. swift-regex), it might be a good idea to create a regex component similar to .word, .digit, but nesting captures does not seem to work easily.

Here is an example run in the playground to create a Person struct instance:

public static func regexBuilderMatching(string: String = "John (30) 1992-09-22") {

    struct Person: CustomStringConvertible {
        let name: String
        let age: Int
        let dob: Date

        public func dobToFormatterString() -> String {
            let dateFormatter = DateFormatter()
            // 1992-09-22 04:00:00 +0000
            dateFormatter.dateFormat = "yyyy-MM-dd"
            return dateFormatter.string(from: self.dob)
        }
        
        var description: String {
            return "\(name), age: \(age), has dob: \(dobToFormatterString())"
        }
    }

    func dateFromString(dateString: String) -> Date? {
        let formatter = DateFormatter()
        formatter.timeStyle = .none // removes time from date
        formatter.dateStyle = .full
        formatter.dateFormat = "y-MM-d" // 1992-09-22
        return formatter.date(from: dateString)
    }

    let regexWithBasicCapture = Regex {
        /* 1. */ Capture { OneOrMore(.word) }
        /* 2. */ " ("
        /* 3. */ TryCapture { OneOrMore(.digit) }
                    transform: { match in
                        Int(match)
                    }
        /* 4. */ ") "
        /* 5. */ TryCapture { OneOrMore(.iso8601Date(timeZone: .gmt)) }
                    transform: { match in
                        dateFromString(dateString: String(match))
                    }
    }

    let matches = string.matches(of: regexWithBasicCapture)
    for match in matches {
        // shorthand syntax using match output
        // https://developer.apple.com/documentation/swift/regex/match
        let (_, name, age, date) = match.output
        let person = Person(name: String(name), age: age, dob: date)
        print(person)
    }
}

The above code will output:

John, age: 30, has dob: 1992-09-22