Add filename before first occurrence of a character in all lines for all files in a given folder

290 Views Asked by At

I have a folder full of files with lines which look like this:

S149.sh

sox preaching.wav _001 trim 889.11 891.23
sox preaching.wav _002 trim 891.45 893.92
sox preaching.wav _003 trim 1599.95 1606.78

And I want to add the filename without its extension (which is S149) right before the first occurrence of the _ character in every line, so that it ends up looking like this:

sox preaching.wav S149_001 trim 889.11 891.23
sox preaching.wav S149_002 trim 891.45 893.92
sox preaching.wav S149_003 trim 1599.95 1606.78

And I want to automatically do this for every *.sh file in a given folder.

How do I achieve that with either bash (this includes awk, grep, sed, etc.) or python? Any help will be greatly appreciated.

4

There are 4 best solutions below

5
On

A sed version:

for i in *.sh; do
    sed -i "s/_/${i%.*}_/g" "$i"
done

${i%.*} expands to the filename minus the extension used by the in-place replacement operation.

6
On

With GNU awk for inplace editing:

awk -i inplace 'FNR==1{f=gensub(/\.[^.]+$/,"",1,FILENAME)} {$3=f$3} 1' *.sh

If you're considering using a shell loop instead, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

6
On

One possibility, using ed, the standard editor and a loop:

for i in *.sh; do
    printf '%s\n' ",g/_/ s/_/${i%.sh}&/" w q | ed -s -- "$i"
done

The parameter expansion ${i%.sh} expands to $i where the suffix .sh is removed.

The ed commands are, in the case i=S149.sh:

,g/_/ s/_/S149&/
w

,g/_/ marks all lines containing an underscore and s/_/S149&/ replaces the underscore by S149_. Then w writes the file.

2
On

@Ruran- In case you do not have an awk which could do Input_file editing while reading the Input_file then following may help you in same.

 awk '(FILENAME != P && P && Q){close(P);system("mv " Q OFS P)} {Q=P=FILENAME;sub(/\..*/,X,Q);sub(/_/,Q"&");print > Q;} END{system("mv " Q OFS P)}' *.sh

Logic, behind is simple it is changing the first occurrence of _(char) and then it is keeping the new formatted lines into a tmp file while reading next Input_file it is renaming that temp file into the previous Input_file.

Also one more point which I have not seen here in above posts it as we are using *.sh so let's say you have thousands of Input_files then code may give error which is because of too many Input_files will be opened and we are NOT closing the files, so I am closing them too, let me know if this helps you.

A non-one liner form of solution too as follows.

awk '(FILENAME != P && P && Q){
                                close(P);
                                system("mv " Q OFS P)
                              }
                              {
                                Q=P=FILENAME;
                                sub(/\..*/,X,Q);
                                sub(/_/,Q"&");
                                print > Q;
                              }
     END                      {
                                system("mv " Q OFS P)
                              }
    ' *.sh