I am working on a project that require me to take some .bed in input, extract one column from each file, take only certain parameters and count how many of them there are for each file. I am extremely inexperienced with bash so I don't know most of the commands. But with this line of code it should do the trick.
for FILE in *; do cat $FILE | awk '$9>1.3'| wc -l ; done>/home/parallels/Desktop/EP_Cell_Type.xls
I saved those values in a .xls since I need to do some graphs with them. Now I would like to take the filenames with -ls and save them in the first column of my .xls while my parameters should be in the 2nd column of my excel file. I managed to save everything in one column with the command:
ls>/home/parallels/Desktop/EP_Cell_Type.xls | for FILE in *; do cat $FILE | awk '$9>1.3'-x| wc -l ; done >>/home/parallels/Desktop/EP_Cell_Type.xls
My sample files are:A549.bed, GM12878.bed, H1.bed, HeLa-S3.bed, HepG2.bed, Ishikawa.bed, K562.bed, MCF-7.bed, SK-N-SH.bed and are contained in a folder with those files only.
The output is the list of all filenames and the values on the same column like this:
Column 1 |
---|
A549.bed |
GM12878.bed |
H1.bed |
HeLa-S3.bed |
HepG2.bed |
Ishikawa.bed |
K562.bed |
MCF-7.bed |
SK-N-SH.bed |
4536 |
8846 |
6754 |
14880 |
25440 |
14905 |
22721 |
8760 |
28286 |
but what I need should be something like this:
Filenames | #BS |
---|---|
A549.bed | 4536 |
GM12878.bed | 8846 |
H1.bed | 6754 |
HeLa-S3.bed | 14880 |
HepG2.bed | 25440 |
Ishikawa.bed | 14905 |
K562.bed | 22721 |
MCF-7.bed | 8760 |
SK-N-SH.bed | 28286 |
Assuming OP's
awk
program (correctly) finds all of the desired rows, an easier (and faster) solution can be written completely inawk
.One
awk
solution that keeps track of the number of matching rows and then prints the filename and line count:For testing purposes I replaced
$9>1.3
with/do/
(match any line containing the string'do'
) and ran against a directory containing an assortment of scripts and data files. This generated the following tab-delimited output: