I am looking to pull certain groups of lines from large (~870,000,000 line/~4GB) text files. As a small example, in a 50 line file I might want lines 3-6, 18-27, and 39-45. Using SO to start, and writing some programs to benchmark with my data, it seems that fortran90 has given me the best results (as compared with python, shell commands (bash), etc...).
My current scheme is simply to open the file and use a series of loops to move the read pointer to where I need and writing the results to an output file.
With the above small example this would look like:
open(unit=1,fileName)
open(unit=2,outFile)
do i=1,2
read(1,*)
end do
do i=3,6
read(1,*) line
write(2,*) line
end do
do i=7,17
read(1,*)
end do
do i=18,27
read(1,*) line
write(2,*) line
end do
do i=28,38
read(1,*)
end do
do i=39,45
read(1,*) line
write(2,*) line
end do
*It should be noted I am assuming buffered i/o when compiling, although this seems to only minimally speed things up.
I am curious if this is the most efficient way to accomplish my task. If the above is in fact the best way to do this with fortran90, is there another language more suited to this task?
*Update: Made sure I was using buffered i/o, manually finding the most efficient blocksize/blockcount. That increased speed by about 7%. I should note that the files I am working with do not have a fixed record length.
One should be able to do this is most any language, so sticking with the theme here is something that should be close to working if you fix up the typos. (If I had a fortran compiler on an iPad that would make it more useful.)