-------------2000--------------
1 17824
2 20131125192004.9
3 690714s1969 dcu 000 0 eng
4 a 75601809
4 a DLC
4 b eng
4 c DLC
5 a WA 750
-------------2001--------------
1 3224
2 20w125192004.9
3 690714s1969 dcu 000 0 eng
5 a WA 120
-------------2002--------------
2 2013341524626245.9
3 484914s1969 dcu 000 0 eng
4 a 75601809
4 a eng
4 c DLC
5 a WA 345
I want to iterate through both the years and the fields under each year (e.g. 1, 2, 3, 4, and 5). a
, b
, and other alphabet letters after some fields are subfields.
The lines with dashes in my code indicates the year of the entry. Each record group starts at ---year--- and ends at the line before ---year---.
Also, fields
is a list:
fields=["1", "2", "3,", "4", "5"]
.
I'm eventually trying to retrieve the values next to the fields for each entry/year. For example, if my current field is 1
, which is equivalent to fields[0]
, I would iterate through all the years (2000, 2001, and 2002) to get the values for the field 1
. The output would be
17824
3224
(Blank space for Year 2002)
How can I iterate through the years (indicated by the dashes)? I can't seem to think of a code to generate the desired output.
So I'm writing a pretty involved answer that uses a helper function, but I think you'll find it pretty flexible. It uses an iterutil type helper function that I wrote called groupby. The groupby function accepts a key function to specify which group each item belongs to. In your case the key function was a little fancy because it had to maintain state to know which year each element belonged to. The code below is totally runnable. Just copy and paste into a script and let me know what you think.
EDIT
Turns out the groupby function is already implemented in the itertools module and I've been missing it forever. I edited the code to use the itertools version