Copy file with filename based on grep output

495 Views Asked by At

I have a collection of files that all have a specific sequence in them. The files are named sequentially, and I want to copy over the first instance of each file that has a unique sequence.

For example,

1.txt   Content: 1[Block]Alpha[/Block]1
2.txt   Content: 2[Block]Beta[/Block]2
3.txt   Content: 3[Block]Charlie[/Block]3
4.txt   Content: 4[Block]Alpha[/Block]4

I want the output to be

Alpha.txt Content: 1[Block]Alpha[/Block]1
Beta.txt   Content: 2[Block]Beta[/Block]2
Charlie.txt   Content: 3[Block]Charlie[/Block]3

4.txt is missing, as it has 'Alpha' in it which a previous file already matched on.

Currently, I Have the following:

ls | sort -r | xargs grep -oE -m 1 '[Block].{0,40}[/Block]'
#which returns:
1.txt:[Block]Alpha[Block]
2.txt:[Block]Beta[Block]
3.txt:[Block]Charlie[Block]
4.txt:[Block]Alpha[Block]

I want to separate the filename from the left of the ':' and rename it to either everything to the right of it (including Block).txt, or just Alpha.txt (for example).

cp has -n flag for no overwriting, so as long as I do it in sequence i should have no issue there, but I am a bit lost how to continue

2

There are 2 best solutions below

0
On

I your case, you want to rename your files in a directory with pattern matched from content of those files, and remove a file that duplicated with other?

I have tested on directory /tmp/test. In this dir, i have 4 file (1.txt 2.txt 3.txt, 4.txt) and write a shell script to perform requirement.

shell script as below:

#/bin/bash
cd /tmp/test
files=$(ls)
for i in $files; do
    pattern=$(cat $i | sed "s/Block//g" | grep -o "[a-Z][a-Z]*")
    
    if ! echo $pattern_list | grep -w $pattern; then
        echo "Rename $i to ${pattern}.txt"
        mv $i ${pattern}.txt
        pattern_list+="$pattern "
    else
        rm $i
    fi
done

Brief explain:

  1. List all current file in /tmp/test
  2. Read each file to capture file name and pattern (Alpha, Beta, Charlie, ...)
  3. Rename the file with new pattern
  4. Remove the file if pattern is duplicated

The Result as below:

sh /tmp/myscript.sh 
Rename 1.txt to Alpha.txt
Rename 2.txt to Beta.txt
Rename 3.txt to Charlie.txt
Alpha Beta Charlie

ls
Alpha.txt  Beta.txt  Charlie.txt
0
On

Here is a solution that uses one awk process to do the search and extract the filenames and the text between blocks. For the first occurence, it checks if the matched text has been used already, if not it prints, and goes to next file. Output is piped to xargs -n2 with the cp command.

#!/bin/bash
awk '/\[Block\].*\[\/Block\]/ {
        gsub(/^.*\[Block\]/,""); gsub(/\[\/Block\].*$/,"")
        if (!a[$0]++) print FILENAME, $0 ".txt"; nextfile
}' *.txt | xargs -n2 echo cp -n --

Note: remove echo after you are done with testing.

Testing with your sample files:

> sh test.sh
cp -n -- 1.txt Alpha.txt
cp -n -- 2.txt Beta.txt
cp -n -- 3.txt Charlie.txt