How to save in two columns of the same file from different output in bash

434 Views Asked by At

I am working on a project that require me to take some .bed in input, extract one column from each file, take only certain parameters and count how many of them there are for each file. I am extremely inexperienced with bash so I don't know most of the commands. But with this line of code it should do the trick.

for FILE in *; do cat $FILE | awk '$9>1.3'| wc -l ; done>/home/parallels/Desktop/EP_Cell_Type.xls

I saved those values in a .xls since I need to do some graphs with them. Now I would like to take the filenames with -ls and save them in the first column of my .xls while my parameters should be in the 2nd column of my excel file. I managed to save everything in one column with the command:

ls>/home/parallels/Desktop/EP_Cell_Type.xls | for FILE in *; do cat $FILE | awk '$9>1.3'-x| wc -l ; done >>/home/parallels/Desktop/EP_Cell_Type.xls

My sample files are:A549.bed, GM12878.bed, H1.bed, HeLa-S3.bed, HepG2.bed, Ishikawa.bed, K562.bed, MCF-7.bed, SK-N-SH.bed and are contained in a folder with those files only.

The output is the list of all filenames and the values on the same column like this:

Column 1
A549.bed
GM12878.bed
H1.bed
HeLa-S3.bed
HepG2.bed
Ishikawa.bed
K562.bed
MCF-7.bed
SK-N-SH.bed
4536
8846
6754
14880
25440
14905
22721
8760
28286

but what I need should be something like this:

Filenames #BS
A549.bed 4536
GM12878.bed 8846
H1.bed 6754
HeLa-S3.bed 14880
HepG2.bed 25440
Ishikawa.bed 14905
K562.bed 22721
MCF-7.bed 8760
SK-N-SH.bed 28286
1

There are 1 best solutions below

4
On BEST ANSWER

Assuming OP's awk program (correctly) finds all of the desired rows, an easier (and faster) solution can be written completely in awk.

One awk solution that keeps track of the number of matching rows and then prints the filename and line count:

awk '
FNR==1 { if ( count >= 1 )                       # first line of new file? if line counter > 0
             printf "%s\t%d\n", prevFN, count   # then print previous FILENAME + tab + line count
         count=0                                # then reset our line counter
         prevFN=FILENAME                        # and save the current FILENAME for later printing
       }

$9>1.3 { count++ }                              # if field #9 > 1.3 then increment line counter

END    { if ( count >= 1 )                       # flush last FILENAME/line counter to stdout
             printf "%s\t%d\n", prevFN, count
       }
' *                                             # * ==> pass all files as input to awk

For testing purposes I replaced $9>1.3 with /do/ (match any line containing the string 'do') and ran against a directory containing an assortment of scripts and data files. This generated the following tab-delimited output:

bigfile.txt     7
blocker_tree.sql        4
git.bash        2
hist.bash       4
host.bash       2
lines.awk       2
local.sh        3
multi_file.awk  2