I'm working on a log file reader which will parse files and display fields from lines in a nicely formatted table in a node/electron app.
If the files were small, I could just read them line-by-line, parse them, store fields extracted from each line in a data structure and allow clients to scroll back and forth throughout the whole the file.
Since these files can be several gigabytes long, I need to do something more complex.
My current thinking is to either:
- Read the whole file via the readline package.
- Keep track of line ending offsets
- Once the file is read, parse the bottom (hence most recent) 50 or so lines so I can extract relevant data and display visually
- If client wants to scroll beyond my 50 lines, use the offsets to go to previous lines (via fs.read(..)).
Another method:
- Use fs.read() to go straight to the end
- Work backwards until newline characters are found
- If client wants to scroll around the file, figure out the line offsets on demand
This doesn't even take into account building tail -f
style functionality.
I'll have to account for at least ascii
and utf8
encodings, as well as windows vs linux style line endings.
This is a LOT of low level work.
Is there a library which already provides functionality?
If I do this myself, any major caveats I haven't mentioned here? I haven't done low level, random access, programming for 20 years.