path/mydir contains a list of directories. The names of these directories tell me which database they relate to.
Inside each directory is a bunch of files, but the filenames tell me nothing of importance.
I'm trying to write a command in linux bash that accomplishes the following:
- For each directory in
path/mydir, find the max timestamp of the last modified file within that directory - Print the last modified file's timestamp next to the parent directory's name
- Exclude any timestamps less than 30 days old
- Exclude specific directory names using regex
- Order by oldest timestamp
Given this directory structure in path/mydir:
database_1
table_1.file (last modified 2021-11-01)
table_2.file (last modified 2021-11-01)
table_3.file (last modified 2021-11-05)
database_2
table_1.file (last modified 2021-05-01)
table_2.file (last modified 2021-05-01)
table_3.file (last modified 2021-08-01)
database_3
table_1.file (last modified 2020-01-01)
table_2.file (last modified 2020-01-01)
table_3.file (last modified 2020-06-01)
I would want to output:
database_3 2020-06-01
database_2 2021-08-01
This half works, but looks at the modified date of the parent directory instead of the max timestamp of files under the directory:
find . -maxdepth 1 -mtime +30 -type d -ls | grep -vE 'name1|name2'
I'm very much a novice with bash, so any help and guidance is appreciated!
Would you please try the following
Output with the provided example:
for d in */; doloops over the subdirectories inpath/mydir/.dirname=${d%/}removes the trailing slash just for the printing purpose.printf "%TY-%Tm-%Td\t%TT\t%p\n"prepends the modification date and time to the filename delimited by a tab character. The result will look like:sort -rk1,2sorts the output by the date and time fields in descending order.head -n 1picks the line with the latest timestamp.cut -f1extracts the first field with the modification date.[[ -n $mdate ]]skips the emptymdate.sort -k1,1just afterdoneperforms the global sorting across the outputs of the subdirectories.sed -E ...swaps the timestamp and the dirname. It just considers the case the dirname may contain a tab character. If not, you can omit thesedcommand by switching the order of timestamp and dirname in theechocommand and changing thesortcommand tosort -k2,2.As for the mentioned
Exclude specific directory names using regex, add your own logic to thefindcommand or whatever.[Edit]
In order to print the directory name if the last modified file in the subdirectories is older than the specified date, please try instead:
now=$(date +%s)assigns the variablenowto the current time as the seconds since the epoch.for d in */; doloops over the subdirectories inpath/mydir/.dirname=${d%/}removes the trailing slash just for the printing purpose.-printf "%T@\t%TY-%Tm-%Td\n"prints the modificaton time as seconds since the epoch and the modification date delimited by a tab character. The result will look like:sort -nrk1,1sorts the output by the modification time in descending order.head -n 1picks the line with the latest timestamp.read -r secs mdate < <( stuff )assignssecsandmdateto the outputs of the command in order.secs=${secs%.*}removes the fractional part.(( secs < now - 3600 * 24 * 30 ))meets ifsecsis 30 days or more older thannow.echo -e "$secs\t$dirname $mdate"printsdirnameandmdateprepending thesecsfor the sorting purpose.sort -nk1,1just afterdoneperforms the global sorting across the outputs of the subdirectories.cut -f2-removessecsportion.