Python parse string with regex for constitute a dictionary

2.8k Views Asked by At

I need extract the following string in Python to constitute a dictionary:

2014:02:02-12:24:17 NAMETEST ulogd[4834]: id="xxxx" severity="xxxx" sys="xxxx" sub="xxxx" name="xxxx aaaa" action="xxxx" fwrule="xxxx" outitf="xxxx" srcmac="xxxx" srcip="xxxx" dstip="xxxx" proto="x" length="xxxx" tos="xxxx" prec="xxxx" ttl="xx" srcport="xxxx" dstport="xxxx" tcpflags="xxxx"

I do not use split(' ') with space, because for example, the field name="xxxx aaaa" can contain a space.

first with the following regex I have extracted the data only:

re.findall('"([^"]*)"', line)

But now I need to used an dictionary format like: line['id'] = 1111.

So the regex? Have you an idea?


There are 1 best solutions below


You can use re.findall() to find the key value pairs:

>>> import re
>>> groups = re.findall(r'(\w+)="(.*?)"', s)
>>> line = dict(groups)
>>> from pprint import pprint
>>> pprint(line)
{'action': 'xxxx',
 'dstip': 'xxxx',
 'dstport': 'xxxx',
 'fwrule': 'xxxx',
 'id': 'xxxx',
 'length': 'xxxx',
 'name': 'xxxx aaaa',
 'outitf': 'xxxx',
 'prec': 'xxxx',
 'proto': 'x',
 'severity': 'xxxx',
 'srcip': 'xxxx',
 'srcmac': 'xxxx',
 'srcport': 'xxxx',
 'sub': 'xxxx',
 'sys': 'xxxx',
 'tcpflags': 'xxxx',
 'tos': 'xxxx',
 'ttl': 'xx'}

(\w+)="(.*?)" would match one or more alphanumeric characters (the \w+ part), followed by =", followed by any characters (.*?, non-greedy), followed by ". Parenthesis here define capturing groups.