Calculating the total disk usage for a set of subdirectories across a directory tree?

1.1k Views Asked by At

Here is the scenario.

Imagine we have this style of directory stucture -

/snapshots/201801/users/tom/
/snapshots/201802/users/harry/
/snapshots/201803/users/chris/

and so on.

What I'm trying to get in my output is the total disk usage of the name directories but factoring in all of the "20180x" folders into the total. At the end this output should be sorted by highest disk usage to lowest.

As a very basic example if I was to do -

du -h --max-depth=1 /snapshots/2018*/users/ | sort -n

The output would show the total directory disk usage for each "201801", "201802", "201803" directory in the output as it goes through them.

What I actually want is the output to instead show the total of the disk usage across "201801", "201802", "201803" for a given user and then sort it by size.

So for example the resulting output would need to be showing disk space totals where it's calculated like - "20GB = The total sum of "/tom" - across all of the available "201801", "201802", "201803" folders in the directory tree. Imagine the "201801", "201802", "201803" are representative of a particular version of the user's folder, and we want to calculate totals for each user across all of these. We can assume the "201801","201802","201803" style folders will always be found at the same depth in the tree.

20GB /tom
10GB /harry
900MB /chris

Hopefully this makes sense. Please let me know if you have a better title I could use for this question as I'm not sure of the best terminology to use to explain what I'm trying to do in just a sentence.

Thanks!

1

There are 1 best solutions below

0
On

Using GNU utilities du and sort, you can write a script something along these lines:

#!/bin/bash

users=(tom harry chris)

for user in "${users[@]}"; do
    usage=$(du -hc /snapshots/*/users/"$user")
    usage=${usage##*$'\n'}
    printf "%s\t%s\n" "${usage%%[[:blank:]]*}" "$user"
done | sort -h