How do I compare similar file names with timestamps in the names to see which is the newest in Bash?

117 Views Asked by At

I am new to Bash scripting and I'm open to constructive criticism and want to learn. I'm working on a bash script to automate moving Backup files to an archive server. The plan is to run the script monthly to copy backups from storage server 1 to storage server 2. These backups are generated on the first Sunday of every month, and I need to copy the full backups from storage server 1 to storage server 2 on the following Monday. I want to loop through my directory of backup files and see which ones are the most recently created and match the file extension that I'm looking for. The .vbi extension is an incremental backup and I don't care about copying these, I only want to copy the most recently created .vbk files which are full backup files. There should only every be 2 files in the parent directory with names that match other than the timestamp and random 4 digit section. The last 5 characters before the file extension don't matter for my purposes (Im not really sure what they represent), and the last 22 characters in the filename before the .vbk will be the section that is different in each file. To clarify this, the filename is ('server name' - 'Server IP' D yyyy-mm-dd T hhmmss _ xxxx) I want to compare ('server name' - 'Server IP' D yyyy-mm-dd T hhmmss) against the time section (D yyyy-mm-dd T hhmmss) of the matching ('server name' - 'Server IP') I have most of this figured out, but I'm struggling with this one piece. This is an example of what the directory looks like

-rw-r--r-- 1 root root    0 Jul  1 10:20 'Webserver - 10.10.0.60D2023-07-01T003026_u153.vbk'
-rw-r--r-- 1 root root    0 Jul  8 08:32 'WebServer - 10.10.0.60D2023-07-08T002832_g842.vbk'
-rw-r--r-- 1 root root    0 Jul  8 07:23 'WebServer - 10.10.0.60D2023-07-08T023216_f264.vbi'
-rw-r--r-- 1 root root    0 Jul  1 10:10 'SQLServer - 10.10.0.4D2023-07-01T021049_8fj3.vbk'
-rw-r--r-- 1 root root    0 Jul  8 05:20 'SQLServer - 10.10.0.4D2023-07-08T012046_k860.vbk'
-rw-r--r-- 1 root root    0 Jul  8 11:04 'SQLServer - 10.10.0.4D2023-07-08T042046_9ju7.vbi'

I want to grab the files on line 2 and line 5 because they are the most recently created backups, and have the .vbk extension.

I can get a list of just the .vbk files already by running this.

for i in *.vbk;
do
     [ -f "$i" ] || break
          echo "$i"
done

and I get this list

'Webserver - 10.10.0.60D2023-07-01T003026_u153.vbk'
'WebServer - 10.10.0.60D2023-07-08T002832_g842.vbk'
'SQLServer - 10.10.0.4D2023-07-01T021049_8fj3.vbk'
'SQLServer - 10.10.0.4D2023-07-08T012046_k860.vbk'

how can I loop through this list and create a list of only the 2 newest backups where the _xxxx at the end of the name appears to be random? In this example I want to grab lines 2 and 4. I can compare the timestamps in the file name, or I can compare the system file times, I believe either will work.

2

There are 2 best solutions below

0
On

The ascii order, considering the date and time format, is the same than date and time order. sort can be used to start sorting after the D char.

Finally, you could iterate over the ordered filenames and use an associative map to only keep the latest backups.

The solution below could break if there are fancy chars in filenames:

unset latest_server_backup
declare -A latest_server_backup
while IFS= read -r filename ; do
  server=${filename%% *}
  server=${server^^}
  latest_server_backup[$server]=${filename}
done < <(find . -type f -name \*.vbk 2>/dev/null | sed 's%^./%%' | sort -tD -k2)
for server in "${!latest_server_backup[@]}" ; do
  printf "%s\n" "${latest_server_backup[$server]}"
done

Output:

SQLServer - 10.10.0.4D2023-07-08T012046_k860.vbk
WebServer - 10.100.0.60D2023-07-08T002832_g842.vbk
1
On

This command will extract the date - time of your to latest files:

find data -type f -name "*.vbk" -print | sed 's/.*D\(.*\)_.*/\1/' | sort -n | tail -2
  • I assume all files in a directory called "data".
  • find ...: lists all files named *.vbk
  • sed ...: extract the portion between the D and the _. This is where you have your data and time information.
  • sort: sort numerically. You are lucky, the file naming convention used by whatever produces theses files has the date and time properly ordered for a simple sort to work.
  • tail: keep only the last 2 lines

The result of this command is the following:

2023-07-08T002832
2023-07-08T012046

You can then use a while loop to list files:

#!/bin/bash

while IFS= read -r datetime
do
    /bin/ls data/*${datetime}*
done < <( find data -type f -name "*.vbk" -print | sed 's/.*D\(.*\)_.*/\1/' | sort -n | tail -2 )