i have a tar.gz file and it contains .yang files along with some empty .yang files. so i want to go into the tar.gz file and delete only those empty files Currently i am using:
for f in *.tar.gz
do
echo "Processing file $f"
gzip -d "$f"
find $PWD -size 0 -print -delete
gzip -9 "${f%.*}"
echo "******************************************"
done
but this is not working maybe because currently, i m not in a directory instead inside the tar.gz file.
any other way to do this?
Your
findcommand doesn't do anything useful to your tarballs because it searches and deletes in the current directory, not inside the tarballs.So we need to first unpack the tarball (
tar -xf), delete the empty files (find), and repack (tar -czf). As a safety measure we will work in temporary directories (mktemp -d) and create new tarballs (*.tar.gz.new) instead of overwriting the old ones. As you want to delete onlyyangempty files, we will also use some more find options. The following is for GNU tar, adapt to your own tar version (or install GNU tar). Before using it read what comes next, just in case...But what you want is more complex than it seems because your tarballs could contain files with meta-data (owner, permissions...) that you are not allowed to use. If you run what precedes as a regular user, tar will silently change the ownership and permissions of such files and directories. When re-packing they will thus have modified meta-data. If it is a problem and you absolutely want to preserve the meta-data there are basically two options:
fakerootor an equivalent.To use
fakerootjust run the above bash script inside afakerootenvironment:The second solution (in-place tarball edition) uses GNU tar and GNU awk:
Explanations:
We use the GNU tar
--deleteoption to delete files directly inside the tarball, without unpacking it, which is probably more elegant (even if it is also probably slower than afakeroot-based solution).Let's first find all empty files in the tarball:
As you can see the size is in third column. Directory names have a leading
dand a trailing/. Symbolic links have a leadingl. So by keeping only lines starting with-and ending with.yangwe eliminate them. GNU awk can do this twofold filtering:This is more than what we want, so let's print only the name part. We first measure the length of the 5 first fields, including the spaces, with the
matchfunction (that sets a variable namedRLENGTH) and remove them withsubstr:We could try to optimize a bit by calling
matchonly on the first line but I am not 100% sure that all output lines are perfectly aligned, so let's call it on each line.We are almost done: just pass this to
tar -f foo.tar --delete <filename>, one name at a time.xargscan do this for us but there is a last trick: as file names can contain spaces we must use another separator, something that cannot be found in file names, like theNULcharacter (ASCII code 0). Fortunately GNUawkcan useNULas Output Record Separator (ORS) andxargshas the-0option to use it as input separator. So, let's put all this together:Inside your
forloop:Note that we must decompress the tarballs before editing them because GNU tar cannot edit compressed tarballs.