ChIP-seq Analysis: Extracting Sequences and Generating Bed File

28 Views Asked by At

I'm Logan who study Bioinformatics.

I have analyzed ChIP-seq data by conducting peak calling with MACS3 and annotating the peaks using HOMER. The output provides coordinates for each peak, such as those for the region CM025010.1 (Start: 40194025, End: 40194493) mentioned below. Now, my objective is to extract sequences within a ±1kb window around each peak position and subsequently generate a bed file. I would appreciate any ideas or suggestions you may have. Thank you.

PeakID (cmd=annotatePeaks.pl refGenome.fa)  Chr Start   End Strand  Peak Score  Focus Ratio/Region Size Annotation  Detailed
peakCalling_q_0.01_peak_16  CM025010.1  40194025    40194493    +   311 NA  NA  NA  NA  NA
peakCalling_q_0.01_peak_50  CM025021.1  12981866    12982368    +   279 NA  NA  NA  NA  NA
peakCalling_q_0.01_peak_27  CM025012.1  6509890 6510225 +   266 NA  NA  NA  NA  NA
peakCalling_q_0.01_peak_45  CM025021.1  12517853    12518147    +   246 NA  NA  NA  NA  NA
1

There are 1 best solutions below

1
On
import pandas as pd

df = pd.read_csv('original_peaks.bed', sep='\t', header=None, names=['your', 'headers', 'here', 'a', 'b', 'etc'])

window_size = 1000

df['Start'] = df['Start'] - window_size
df['End'] = df['End'] + window_size

#### Clip the extended regions to ensure they don't fall below 0
df['start'] = df['start'].clip(lower=0)

#### Save the extended regions to a new BED file
df.to_csv('extended_peaks.bed', sep='\t', header=False, index=False)