I have a text file that looks like this (but 132k lines)
********
name : one
Place : city
Initial: none
********
name : two
Place : city2
Initial: none
********
name : three
Place : city3
Initial: none
Limits : some
I'm trying to move it into a more friendly format (excel/database records). Each 'record' is separated by the ********, the fields for 90% of the records are all the same, but some have additional fields, like the limits in the 3rd record.
I would like a csv, or similar output like:
name,place,initial,limit
one,city,none,n/a
two,city2,none,n/a
three,city3,none,some
Is python better suited for parsing and manipulating this?
A Notepad++ regex replace of
([^*\r\n])\R([^*\r\n])with\1,\2will change the input example text to be:This can be followed by marking (use menu => Search => Mark...) with a regex of
^\*\*\*\*\*\*\*\*$and finally removing the marked lines (use menu => Search => Bookmark => Remove Bookmarked Lines).You may need to tidy up the very start and end of the text, including adding the line of column titles.
Variations:
Whitespace at the start or end of lines may lead to unwanted changes, so it might be best to remove it before replacing line-breaks with commas. Use menu => Edit => Blank Operations => Trim Leading and Trailing Space.
The Number of asterisks may be different on some lines. So perhaps change the marking regex to be
^\*\*\*\*\*\*\**$. Adjust the number of\*to match the minimum in the source text.The Regular Expressions
The replacement is
\1,\2, meaning insert the two captured characters separated by a comma.The marking regex of
^\*\*\*\*\*\*\*$means start of line^then several\*meaning actual asterisks finally the$means end-of-line. The variation of^\*\*\*\*\*\*\**$adds a*near the end, meaning zero-or-more occurrences of the last actual asterisk.