I am using regular expressions to parse a file for some patterns. However if there's some whitespace in the middle of my data I end up getting wrong data. I have data with this format:
067 000100 A
067 000200 B
067 000300 C
067 000400 D
067 000500 E
067 000600 F
I am trying to get the first two strings, the middle two digits of the second string and the value like this: (there's cases that I might have 7 digits that's why it's ok in this case to have a regex that goes one extra digit at the end)
('67 000100 ', '01', 'A')
I am using the following regular expression:
qnum = r'067'
subq = r' .00' #using . because I am not sure if there's one space or two!
fmt = r'(?sm)^(' + qnum + subq + r'(..)...)\s*(.*?)\s*$'
#data is a string with all those values and \n
result = re.findall(fmt,data, re.I)
but at the end I end up with the followings:
('67 000100 ', '01', 'A')
('67 000200 ', '02', 'B')
('67 000300 ', '30', 'C')
How can I get the proper header so there's only "one space" in the middle and also the correct middle digits?
.
doesn't mean an optional character; it just means a character. Instead of a space and.
, you want\s+
.