I'm trying to split a string that is delimited by multiple spaces i.e:
string1 = "abcd efgh a. abcd b efgh"
print re.findall(r"[\w.]+")
as expected, the results are:
['abcd', 'efgh', 'a.', 'abcd', 'b', 'efgh']
However, I would like to group 'a.' and 'abcd' into the same group, and 'b' and 'efgh' into the same group. So the result I want would look something like:
['abcd', 'efgh', 'a. abcd', 'b efgh']
My approach at the moment is to create two types of expression. The first to deal with the regular expression without the space i.e. 'abcd' and 'efgh'. The second to deal with the ones with a single space. i.e. 'a.' + 'abcd'.
So if r'[\w]+ can deal with the first type, and r'[\w]+ [\w]+ can deal with the second type. But I don't know how to combine them into the same expression using '|'.
As always, any other approaches are welcome. And thanks for your time!
i.e. splitting on two spaces and removing extraneous spaces from the result (using strip).