-------------2000--------------
1 17824
2 20131125192004.9
3 690714s1969 dcu 000 0 eng
4 a 75601809
4 a DLC
4 b eng
4 c DLC
5 a WA 750
-------------2001--------------
1 3224
2 20w125192004.9
3 690714s1969 dcu 000 0 eng
5 a WA 120
-------------2002--------------
2 2013341524626245.9
3 484914s1969 dcu 000 0 eng
4 a 75601809
4 a eng
4 c DLC
5 a WA 345
I want to iterate through both the years and the fields under each year (e.g. 1, 2, 3, 4, and 5). a, b, and other alphabet letters after some fields are subfields.
The lines with dashes in my code indicates the year of the entry. Each record group starts at ---year--- and ends at the line before ---year---.
Also, fields is a list:
fields=["1", "2", "3,", "4", "5"].
I'm eventually trying to retrieve the values next to the fields for each entry/year. For example, if my current field is 1, which is equivalent to fields[0], I would iterate through all the years (2000, 2001, and 2002) to get the values for the field 1. The output would be
17824
3224
(Blank space for Year 2002)
How can I iterate through the years (indicated by the dashes)? I can't seem to think of a code to generate the desired output.
So I'm writing a pretty involved answer that uses a helper function, but I think you'll find it pretty flexible. It uses an iterutil type helper function that I wrote called groupby. The groupby function accepts a key function to specify which group each item belongs to. In your case the key function was a little fancy because it had to maintain state to know which year each element belonged to. The code below is totally runnable. Just copy and paste into a script and let me know what you think.
EDIT
Turns out the groupby function is already implemented in the itertools module and I've been missing it forever. I edited the code to use the itertools version