How to sort entries with the same ID based off their allele frequency (AF) in a vcf file

60 Views Asked by pooch At 15 May 2025 at 04:28

I have a vcf file whose multiallelic variants are expressed as multiple biallelic records. I am trying to convert the file into a plink bed file, and thus each entry in the vcf must have a unique ID. Here is an example :

[-----.-----@hydra1 data]$ tabix gnomad.genomes.v3.1.2.hgdp_tgp.chr6.vcf.bgz chr6:29440751-29440751 | cut -f 1-5
chr6    29440751    rs2074464   A   C
chr6    29440751    rs2074464   A   G
chr6    29440751    rs2074464   A   T

The first row has AF=0.000148017, the second has row has AF=0.586294 and the third row has AF=0.0592066.

I would like to filter this vcf so that when there are multiple rows with the same ID, only the one with the highest "AF" is kept. In this example, filter out row 1 and 3.

I have been looking through bcftools documentation but I find it to be very brief and can't figure out a way to do this. These vcf files I'm using are massive so I would like to use a package and not do manipulations manually on the files.

Original Q&A

How to sort entries with the same ID based off their allele frequency (AF) in a vcf file

There are 0 best solutions below

Related Questions in BIOINFORMATICS

Related Questions in VCF-VARIANT-CALL-FORMAT

Related Questions in BCFTOOLS

Trending Questions

Popular # Hahtags

Popular Questions