I have 5,000,000 unordered strings formatted this way (Name.Name.Day-Month-Year 24hrTime):
"John.Howard.12-11-2020 13:14"
"Diane.Barry.29-07-2020 20:50"
"Joseph.Ferns.08-05-2020 08:02"
"Joseph.Ferns.02-03-2020 05:09"
"Josephine.Fernie.01-01-2020 07:20"
"Alex.Alexander.06-06-2020 10:10"
"Howard.Jennings.07-07-2020 13:17"
"Hannah.Johnson.08-08-2020 00:49"
...
What is the fastest way to find all strings having a time t between some n and m? (i.e. fastest way to remove all strings whose time < n || m < time)
This filtering will be done multiple times with different ranges. Time ranges must always be on the same day and the starting time is always earlier than the end time.
In java, heres's my current approach given some time string M and N and a 5 million string list:
ArrayList<String> finalSolution = new ArrayList<>();
String[] startingMtimeArr = m.split(":");
String[] startingNtimeArr = n.split(":");
Integer startingMhour = Integer.parseInt(startingMtimeArr[0]);
Integer startingMminute = Integer.parseInt(startingMtimeArr[1]);
Integer endingNhour = Integer.parseInt(startingNtimeArr[0]);
Integer endingNminute = Integer.parseInt(startingNtimeArr[1]);
for combinedString in ArraySizeOf5Million{
String[] arr = combinedString.split(".");
String[] subArr = arr[2].split(" ");
String[] timeArr = subArr[1].split(":");
String hour = timeArr[0];
String minute = timeArr[1];
If hour >= startingMhour
&& minute >= startingMminute
&& hour <= endingNhour
&& minute <= endingNminute {
finalSolution.add(hour)
}
}
Java's my native language but any other languages work too. Better/faster logic is what I am after
Some example in Python using index for every minute: