I'm trying to parse iCalendar (RFC2445) input using a regex.
Here's a [simplified] example of what the input looks like:
BEGIN:VEVENT
abc:123
def:456
END:VEVENT
BEGIN:VEVENT
ghi:789
END:VEVENT
I'd like to get an array of matches: the "outer" match is each VEVENT block and the inner matches are each of the field:value pairs.
I've tried variants of this:
BEGIN:VEVENT\n((?<field>(?<name>\S+):\s*(?<value>\S+)\n)+?)END:VEVENT
But given the input above, the result seems to have only ONE field for each matching VEVENT, despite the +? on the capture group:
**Match 1**
field def:456
name def
value 456
**Match 2**
field ghi:789
name ghi
value 789
In the first match, I would have expected TWO fields: the abc:123 and the def:456 matches...
I'm sure this is a newbie mistake (since I seem to perpetually be a newbie when it comes to regex's...) - but maybe you can point me in the right direction?
Thanks!
You need to split your regex up into one matching a
VEVENT
and one matching the name/value pairs. You can then use nestedscan
to find all occurences, e. g.where
str
is your input. This outputs:If you want to make the code more readable, i suggest you
require 'english'
and replace$~
with$LAST_MATCH_INFO