Find and delete files that contain same string in filename in linux terminal

949 Views Asked by At

I want to delete all files from a folder that contain a not unique numerical string in the filename using linux terminal. E.g.:

werrt-110009.jpg => delete
asfff-110009.JPG => delete
asffa-123489.jpg => maintain
asffa-111122.JPG => maintain

Any suggestions?

3

There are 3 best solutions below

1
On BEST ANSWER

I only now understand your question, I think. You want to remove all files that contain a numeric value that is not unique (in a particular folder). If a filename contains a value that is also found in another filename, you want to remove both files, right?

This is how I would do that (it may not be the fastest way):

# put all files in your folder in a list
# for array=(*) to work make sure you have enabled nullglob: shopt -s nullglob
array=(*)
delete=()

for elem in "${array[@]}"; do
    # for each elem in your list extract the number
    num_regex='([0-9]+)\.'
    [[ "$elem" =~ $num_regex ]]
    num="${BASH_REMATCH[1]}"
    # use the extracted number to check if it is unique
    dup_regex="[^0-9]($num)\..+?(\1)"
    # if it is not unique, put the file in the files-to-delete list
    if [[ "${array[@]}" =~ $dup_regex ]]; then
        delete+=("$elem")
    fi
done

# delete all found duplicates
for elem in "${delete[@]}"; do
    rm "$elem"
done

In your example, array would be:

array=(werrt-110009.jpg asfff-110009.JPG asffa-123489.jpg asffa-111122.JPG)

And the result in delete would be:

delete=(werrt-110009.jpg asfff-110009.JPG)

Is this what you meant?

4
On

Use "rm" command to delete all matching string files in directory

cd <path-to-directory>/ && rm *110009* 

This command helps to delete all files with matching string and it doesn't depend on the position of string in file name.

I was mentioned rm command option as another option to delete files with matching string.

Below is the complete script to achieve your requirement,

#!/bin/sh -eu

#provide the destination fodler path
DEST_FOLDER_PATH="$1"

TEMP_BUILD_DIR="/tmp/$( date +%Y%m%d-%H%M%S)_clenup_duplicate_files"
#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
clean_up()
{
    if [ -d $TEMP_BUILD_DIR ]; then
        rm -rf $TEMP_BUILD_DIR
    fi
}
trap clean_up EXIT

[ ! -d $TEMP_BUILD_DIR ] && mkdir -p $TEMP_BUILD_DIR
TEMP_FILES_LIST_FILE="$TEMP_BUILD_DIR/folder_file_names.txt"
echo "$(ls $DEST_FOLDER_PATH)" > $TEMP_FILES_LIST_FILE
while read filename
do
    #check files with number pattern
    if [[ "$filename" =~ '([0-9]+)\.' ]]; then
        #fetch the number to find files with similar number
        matching_string="${BASH_REMATCH[1]}"

        # use the extracted number to check if it is unique
        #find the files count with matching_string
        if [ $(ls -1 $DEST_FOLDER_PATH/*$matching_string* | wc -l) -gt 1 ]; then
            rm $DEST_FOLDER_PATH/*$matching_string*
        fi
    fi
    #reload remaining files in folder (this optimizes the loop and speeds up the operation
    #(this helps lot when folder contains more files))
    echo "$(ls $DEST_FOLDER_PATH)" > $TEMP_FILES_LIST_FILE
done < $TEMP_FILES_LIST_FILE

exit 0

How to execute this script,

  1. Save this script into file as path-to-script/delete_duplicate_files.sh (you can rename whatever you want)
  2. Make script executable

    chmod +x {path-to-script}/delete_duplicate_files.sh

  3. Execute script by providing directory path where duplicate files(files with matching number pattern) needs to be deleted

    {path-to-script}/delete_duplicate_files.sh "{path-to-directory}"

0
On

you can use the linux find command along with the -regex parameter and the -delete parameter to do it in one command