I have a long file with rows ending in newlines, and fields separated by tabs. Fields are quoted using "". A single quoted field may also contain newlines, and -- as an added twist -- may additionally contain quoted strings.
Here is an example illustrating all cases:
"FieldA" "FieldB" "FieldC"
"AnotherOne" "May contain
newlines" "FieldC"
"Here is one more row" "FieldB" "FieldC"
"And here is a twist" "Some fields with newlines may contain or end with "quotes and"
continue on next line" "FieldC"
I tried csv module in this way:
with open(sys.argv[1], 'rU') as csvfile:
a = csv.reader(csvfile, delimiter='\t', quotechar='"')
for row in a:
print len(row)
...but this gives me variable row lenghts, so I cannot access a field reliably. How to access values in such a file reliably from Python?