Uncovering complementary sets of variants for the prediction of quantitative phenotypes

2020 
Recent genome-wide association studies (GWAS) show that mutations in single genetic loci, frequently called single nucleotide polymorphisms (SNPs), alone are not sufficient to explain the phenotypic heritability of complex, quantitative phenotypes. Instead, many methods attempt to deal with this issue by considering a set of loci that can characterize the phenotype together. While the state-of-the-art methods are successful in selecting subsets of SNPs that can achieve high phenotype prediction rates, they are either slow in runtime or have hyper-parameters that require further fine tuning through cross-validation or other similar techniques, which makes such methods inconvenient to use. In this work, we propose a fast and simple algorithm named Macarons to select a small, complementary subset of SNPs by avoiding redundant pairs of SNPs that are likely to be in linkage disequilibrium (LD). Our method features two interpretable parameters that control the time/performance trade-off without requiring any hyper-parameter optimization procedures. In our experiments, we benchmark the performance of the SNP selection methods on the 17 flowering time phenotypes of Arabidopsis Thaliana. Our results consistently show that Macarons has similar or better phenotype prediction performance while being faster and having a simpler premise than other SNP selection methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    0
    Citations
    NaN
    KQI
    []