-------------2000--------------
1 17824
2 20131125192004.9
3 690714s1969 dcu 000 0 eng
4 a 75601809
4 a DLC
4 b eng
4 c DLC
5 a WA 750
-------------2001--------------
1 3224
2 20w125192004.9
3 690714s1969 dcu 000 0 eng
5 a WA 120
-------------2002--------------
2 2013341524626245.9
3 484914s1969 dcu 000 0 eng
4 a 75601809
4 a eng
4 c DLC
5 a WA 345
I want to iterate through both the years and the fields under each year (e.g. 1, 2, 3, 4, and 5). a, b, and other alphabet letters after some fields are subfields.
The lines with dashes in my code indicates the year of the entry. Each record group starts at ---year--- and ends at the line before ---year---.
Also, fields is a list:
fields=["1", "2", "3,", "4", "5"].
I'm eventually trying to retrieve the values next to the fields for each entry/year. For example, if my current field is 1, which is equivalent to fields[0], I would iterate through all the years (2000, 2001, and 2002) to get the values for the field 1. The output would be
17824
3224
(Blank space for Year 2002)
How can I iterate through the years (indicated by the dashes)? I can't seem to think of a code to generate the desired output.
You can first use regex to split your text then use
itertools.izip_longestwithin a nested list comprehension to get your expected columns :each sub list represent a specific line in each block for example the first sub list is first lines in each block :