I am reading a file using Python, and within the file there are sections that are enclosed with the '#' character:
#HEADER1, SOME EXTRA INFO
data first section
1 2
1 233
...
// THIS IS A COMMENT
#HEADER2, SECOND SECTION
452
134
// ANOTHER COMMENT
...
#HEADER3, THIRD SECTION
Now I wrote code to read the file as follows:
with open(filename) as fh:
enumerated = enumerate(iter(fh.readline, ''), start=1)
for lino, line in enumerated:
# handle special section
if line.startswith('#'):
print("="*40)
print(line)
while True:
start = fh.tell()
lino, line = next(enumerated)
if line.startswith('#'):
fh.seek(start)
break
print("[{}] {}".format(lino,line))
The output is:
========================================
#HEADER1, SOME EXTRA INFO
[2] data first section
[3] 1 2
[4] 1 233
[5] ...
[6] // THIS IS A COMMENT
========================================
#HEADER2, SECOND SECTION
[9] 452
[10] 134
[11] // ANOTHER COMMENT
[12] ...
========================================
#HEADER3, THIRD SECTION
Now you see that the line counter lino is no longer valid because I'm using seek. Also, it won't help I decrease it before breaking the loop because this counter is increased with each call to next. So is there an elegant way to solve this problem in Python 3.x? Also, is there a better way of solving the StopIteration without putting a pass statement in an Except block?
UPDATE
So far I have adopted an implementation based on the suggestion made by @Dunes. I had to change it a bit so I can peek ahead to see if a new section is starting. I don't know if there's a better way to do this, so please jump in with comments:
class EnumeratedFile:
def __init__(self, fh, lineno_start=1):
self.fh = fh
self.lineno = lineno_start
def __iter__(self):
return self
def __next__(self):
result = self.lineno, self.fh.readline()
if result[1] == '':
raise StopIteration
self.lineno += 1
return result
def mark(self):
self.marked_lineno = self.lineno
self.marked_file_position = self.fh.tell()
def recall(self):
self.lineno = self.marked_lineno
self.fh.seek(self.marked_file_position)
def section(self):
pos = self.fh.tell()
char = self.fh.read(1)
self.fh.seek(pos)
return char != '#'
And then the file is read and each section is processed as follows:
# create enumerated object
e = EnumeratedFile(fh)
header = ""
for lineno, line, in e:
print("[{}] {}".format(lineno, line))
header = line.rstrip()
# HEADER1
if header.startswith("#HEADER1"):
# process header 1 lines
while e.section():
# get node line
lineno, line = next(e)
# do whatever needs to be done with the line
elif header.startswith("#HEADER2"):
# etc.
You cannot alter the counter of the
enumerate()iterable, no.You don't need to at all here, nor do you need to seek. Instead use a nested loop and buffer the section header:
This buffers the header line only; every time we come across a new header, it is stored and the current section loop is ended.
Demo:
The third section remains unprocessed because there were no lines in it, but had there been, the
headervariable has already been set in anticipation.