Convert a data.frame in R to .bed format file

1.2k Views Asked by At

I have a data.frame that looks like this.

bed <- data.frame(chrom=c(rep("Chr1",5)),
                        chromStart=c(18915152,24199229,73730,81430,89350),
                        chromEnd=c(18915034,24199347,74684,81550,89768), 
                         strand=c("-","+","+","+","+"))

write.table(bed, "test_xRNA.bed",row.names = F,col.names = F, sep="\t", quote=FALSE) 

Created on 2022-07-29 by the reprex package (v2.0.1)

and I want to convert it into a bed file. I try to do it with the writing.table function, but I fail miserably by taking this error comment when I look at the intersect

Error: unable to open file or unable to determine types for file test_xRNA.bed

- Please ensure that your file is TAB delimited (e.g., cat -t FILE).
- Also ensure that your file has integer chromosome coordinates in the 
  expected columns (e.g., cols 2 and 3 for BED).

Any ideas of how I can properly convert a data.frame to a .bed file in R?

I have heard about the rtracklayer package, does anyone have an experience with it?

I have tried the following post but it does not work at all for me export file from R in bed format. Any help is highly appreciated

2

There are 2 best solutions below

5
On BEST ANSWER

I think its a lot more complicated to make a bed file: Here is a solution I have been working on the last days

suppressPackageStartupMessages(library(GenomicRanges))
suppressPackageStartupMessages(library(rtracklayer))
suppressPackageStartupMessages(library(tidyverse))

# data 
bed <- data.frame(chrom=c(rep("Chr1",5)),
                  chromStart=c(18915152,24199229,73730,81430,89350),
                  chromEnd=c(18915034,24199347,74684,81550,89768), 
                  strand=c("-","+","+","+","+"))

# transform such as always chromStart < chromEnd
bed2 <- bed |> 
transform(chromStart=ifelse(chromStart>chromEnd,chromEnd,chromStart),
          chromEnd= ifelse(chromEnd<chromStart,chromStart,chromEnd))

# Genomic Ranges 
bed3 <- GenomicRanges::makeGRangesFromDataFrame(bed2)
head(bed3)
#> GRanges object with 5 ranges and 0 metadata columns:
#>       seqnames            ranges strand
#>          <Rle>         <IRanges>  <Rle>
#>   [1]     Chr1 18915034-18915152      -
#>   [2]     Chr1 24199229-24199347      +
#>   [3]     Chr1       73730-74684      +
#>   [4]     Chr1       81430-81550      +
#>   [5]     Chr1       89350-89768      +
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

# rtracklayer 
bed4 <- rtracklayer::export(bed3, format="bed", ignore.strand = FALSE)
bed4
#> [1] "Chr1\t18915033\t18915152\t.\t0\t-" "Chr1\t24199228\t24199347\t.\t0\t+"
#> [3] "Chr1\t73729\t74684\t.\t0\t+"       "Chr1\t81429\t81550\t.\t0\t+"      
#> [5] "Chr1\t89349\t89768\t.\t0\t+"

# write it as a bed file
# this is essential to make sure that this works properly 
write.table(bed4, "test.bed", sep="\t", col.names=FALSE, row.names = FALSE, append = TRUE, quote = FALSE) 

Created on 2022-08-02 by the reprex package (v2.0.1)

and now you have a functional bed file to work with the bed tools

3
On

Check the BED format specification. The first three columns (chromosome, start, end) are obligatory. Strand is the sixth column, and if you want to use it, you need to include columns 4 (name) and 5 (score). They can be filled with "." if you have nothing to put there.

bed <- data.frame(chrom=c(rep("Chr1",5)),
                  chromStart=c(18915152,24199229,73730,81430,89350),
                  chromEnd=c(18915034,24199347,74684,81550,89768),
                  name = rep(".", 5),
                  score = rep(".", 5),
                  strand=c("-","+","+","+","+"))