Trouble with Phenotype File in PLINK GWAS - 0 Individuals with Non-Missing Phenotypes

58 Views Asked by At

Problem Description: I am facing an issue while running a Genome-Wide Association Study (GWAS) using PLINK. Despite specifying the phenotype file and confirming the presence of the phenotype column ('ChildPhenotype'), I consistently receive the error message: "0 individuals have non-missing phenotypes." I have ensured that the values in the specified phenotype column are not -9 and have verified the column's presence in the file. I am unable to identify the cause of this issue and need assistance in resolving it.

GWAS Command:

plink --bfile NewH_Children_Rho0.0_B1000_N10000_h0.3
      --pheno NewH_ChildrenPhenotypes_Rho0.0_B1000_N10000_h0.3_forPlink.tab
      --pheno-name ChildPhenotype
      --linear
      --allow-no-sex
      --out NewH_Children_Rho0.0_B1000_N10000_h0.3_fromR_GWAS
      --remove ChildValidation_N10000.txt
      --noweb

Phenotype File (excerpt):

FID IID ChildPhenotype Normalized_ChildPRS ChildNoise Normalized_MomPRS Normalized_DadPRS
1   1   -0.752957   1.06091 -1.33404    0.905608    0.473768
2   2   -0.0834629  1.26574 -0.776737   -0.0526346  0.985595
3   3   0.167607    0.952674    -0.354195   0.797956    0.66072
4   4   -0.800988   -0.400058   -0.581867   -2.46413    0.249984
5   5   1.04002 1.54299 0.194889    0.6917  0.491152
6   6   -0.310458   -1.24833    0.373281    -0.532347   -0.555342

Observations:

The specified phenotype column, 'ChildPhenotype,' is present in the provided phenotype file. The values in the 'ChildPhenotype' column are not -9, as confirmed by examining the phenotype file. The PLINK command includes the necessary parameters, such as --pheno and --pheno-name. Request for Assistance: I am seeking guidance on why PLINK is reporting 0 individuals with non-missing phenotypes despite the presence of valid phenotypic information. Any insights or suggestions to troubleshoot and resolve this issue would be greatly appreciated.

Note:

I have already checked for issues related to column names, column order, and missing values in the specified phenotype column. I have already tried using a tab-separated and also a space-separated version of my phenotype file

1

There are 1 best solutions below

0
On

I did not figure out how to make the GWAS run properly using a separate phenotype file. However, I simply copied the phenotype information into the fam file and removed the specification to use an external phenotype file while running my GWAS.

Here is how I modified my FAM file:

# Define prefix of bed/bim/fam files
bed_prefix="NewH_Children_Rho0.0_B1000_N10000_h0.3"

# Define name of phenotype fie
pheno_file="NewH_ChildrenPhenotypes_Rho0.0_B1000_N10000_h0.3_forPlink.tab"

# Create a file identical to the fam file except the 6th column houses information from the 3rd column of the phenotype file, omitting the 'ChildPhenotype' header (as the fam file lacks headers)
awk 'BEGIN {OFS="\t"} NR==FNR {if (FNR > 1) a[FNR-1]=$3; next} {print $1, $2, $3, $4, $5, a[FNR]}' "${pheno_file}" "${bed_prefix}.fam" > modified.fam

# Remove the original fam file so you can overwrite it with the new fam file without causing any conflicts
rm "${bed_prefix}.fam"

# Copy the modified fam to a fam file with the same name as the original
cp modified.fam "${bed_prefix}.fam"

# Remove the modified fam, as it now houses information duplicated in "${bed_prefix}.fam"
rm NewH_Modified.fam

Here is how I ran my GWAS:

plink --bfile NewH_Children_Rho0.0_B1000_N10000_h0.3
      --linear
      --allow-no-sex
      --out NewH_Children_Rho0.0_B1000_N10000_h0.3_fromR_GWAS
      --remove ChildValidation_N10000.txt
      --noweb

I now see 10000 individuals with nonmissing phenotypes in my GWAS log output. Since there are 10000 individuals in my FAM file, I know that the GWAS is no longer erroneously reading individuals' phenotypes as missing.