Access tab-separated rows with quoted fields containing newlines and more quotes

84 Views Asked by At

I have a long file with rows ending in newlines, and fields separated by tabs. Fields are quoted using "". A single quoted field may also contain newlines, and -- as an added twist -- may additionally contain quoted strings.

Here is an example illustrating all cases:

"FieldA"    "FieldB"    "FieldC"
"AnotherOne"    "May contain
newlines"   "FieldC"
"Here is one more row"  "FieldB"    "FieldC"
"And here is a twist"   "Some fields with newlines may contain or end with "quotes and"
continue on next line"  "FieldC"

I tried csv module in this way:

with open(sys.argv[1], 'rU') as csvfile:
    a = csv.reader(csvfile, delimiter='\t', quotechar='"')
    for row in a:
        print len(row)

...but this gives me variable row lenghts, so I cannot access a field reliably. How to access values in such a file reliably from Python?

0

There are 0 best solutions below