Concatenate column from many bed files into single bed file

Question

Concatenate column from many bed files into single bed file

136 Views Asked by rscott At 26 September 2023 at 13:53

I have n bed files in the format:

n.bed


chr1	0	10000	4	331
chr1	10000	20000	6	154
chr1	20000	30000	3	12

I would like to take column 4 (4, 6, 3) from each bed file and output as a single table file (csv/tsv/exact format doesn't matter), where columns 4 through 4+n are labelled the name of each bed file and contain column 4.

For example, take two bed files:

1.bed :


chr1	0	10000	4	331
chr1	10000	20000	6	154
chr1	20000	30000	3	12

2.bed :


chr1	0	10000	2	412
chr1	10000	20000	7	14
chr1	20000	30000	2	155

I would like the output to be:

chrom	start	end	1.bed	2.bed
chr1	0	10000	4	2
chr1	10000	20000	6	7
chr1	20000	30000	3	2

My current attempt has been to use bedops:

$ bedops --everything *.bed \
    | bedmap --echo-map - \
    | awk '(split($0, a, ";") == 3)' - \
    | sed 's/\;/\n/g' - \
    | sort-bed - \
    | uniq - \
    > answer.bed

However this produces the output:

Error: Unable to find file: 1.bed

Original Q&A

There are 1 best solutions below

**markp-fuso** · Accepted Answer · 2023-09-26T15:49:36.273000

Assumptions:

none of the input files have a header record
all input files have the same number of rows where ...
the first 3 columns are chrom, start and end and ...
there's at least one additional (4th) column
all input/ouput field delimiters are tabs
rows (from different input files) are joined based on the triple key of chrom + start + end
all input files have the same set of keys (ie, we don't have to worry about a key missing from some input files)
the input files are already sorted by key

One awk idea:

awk '
BEGIN  { FS=OFS="\t"
         hdr = "chrom" OFS "start" OFS "end"
       }
FNR==1 { hdr = hdr OFS FILENAME }
       { key = $1 OFS $2 OFS $3
         lines[FNR] = (FNR==NR ? key : lines[FNR]) OFS $4
       }
END    { print hdr
         for (i=1;i<=FNR;i++)
             print lines[i]
       }
' *.bed

NOTES:

this single awk script replaces OP's current bedops | bedmap | awk | sed | sort-bed | uniq code
this assumes the *.bed files already exist and are not the output from bedops | bedmap

This generates:

chrom   start   end     1.bed   2.bed
chr1    0       10000   4       2
chr1    10000   20000   6       7
chr1    20000   30000   3       2

Concatenate column from many bed files into single bed file

There are 1 best solutions below

Related Questions in BASH

Related Questions in BED

Trending Questions

Popular # Hahtags

Popular Questions