I have some log files that is written by log4cpp format
--By the nature of log4cpp, this file is sorted by the datetime at the beginning of each line
Assuming the format is like
2012-09-02 17:17:36.891 This is line 1 in file 2
...
2013-08-05 14:17:35.344 This is line 607082 in file 2
2013-08-05 14:17:36.891 This is line 607083 in file 2
...
2013-09-05 14:27:36.891 This is line 934594 in file 2
Now I am writing a program to parse these files and try to quickly locate a line.
For example, if I run
./my_program -start_time "2013-08-05 14:17:36" file_2.txt
I am expecting this program can return 607083 as a result.
Also, the -start_time can be based upon other granularity like "2013-08-05 14:17:35.899" or "2013-08-15" But I am expecting the nearest result.
I can traverse this file line by line, and compare the timestamp at the beginning of each line(just use string comparison) , but it will take O(N) time. I already implemented that and found it's really slow if there are millions of lines at the beginning to skip.
I am wondering if we can use Binary Search for this. I think it is the best way to return the nearest result and only takes O(lgN) time
Yes you can. It's an ordered log by date. Why not take the first and last line which should be the most recent and last recent date.
You can make a function that converts date into seconds. in the first call go to the middle of your log and check wether your date is bigger or smaller and so on... (Binary Search)
Hope this helps and hope my explanation on how this will work is clear