Possible Duplicate:
How do I make part of a regular expression optional in Ruby?
I'm trying to build a regular expression with rubular to match:
On Feb 23, 2011, at 10:22 , James Bond wrote:
OR
On Feb 23, 2011, at 10:22 AM , James Bond wrote:
Here's what I have so far, but for some reason it's not matching? Ideas?
(On.* (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2}, [12]\d{3}.* at \d{1,2}:\d{1,2} (?:AM|PM),.*wrote:)
How can I make the AM/PM text optional? Either match AM/PM or neither?
This seems to catch the date info. I purposely captured in groups, making it easier to build a real date:
I purposed didn't try to match on the months. Your sample strings look like quote headers from email messages. Those are very standard and generated by software, so you should see a lot of consistency in the format, allowing some simplification in the regex. If you can't trust those, then go with the matches on month name abbreviations to help ignore false-positive matches. The same things apply for the day, year, and time values.
The important thing in the regex is how to deal with the AM/PM when it's missing.