Imputation (genetics)

Imputation in genetics refers to the statistical inference of unobserved genotypes. It is achieved by using known haplotypes in a population, for instance from the HapMap or the 1000 Genomes Project in humans, thereby allowing to test for association between a trait of interest (e.g. a disease) and experimentally untyped genetic variants, but whose genotypes have been statistically inferred ('imputed'). Genotype imputation is usually performed on SNP, the most common kind of genetic variation. Imputation in genetics refers to the statistical inference of unobserved genotypes. It is achieved by using known haplotypes in a population, for instance from the HapMap or the 1000 Genomes Project in humans, thereby allowing to test for association between a trait of interest (e.g. a disease) and experimentally untyped genetic variants, but whose genotypes have been statistically inferred ('imputed'). Genotype imputation is usually performed on SNP, the most common kind of genetic variation. Genotype imputation hence helps tremendously in narrowing-down the location of probably causal variants in genome-wide association studies, because it increases the SNP density (the genome size remains constant, but the number of genetic variants increases) thus reduces the distance between two adjacent SNPs. In genetic epidemiology and quantitative genetics, researchers aim at identifying genomic locations where variation between individuals is associated with variation in traits of interest between individuals. Such studies hence require access to the genetic make-up of a set of individuals. Sequencing the whole genome of each individual in the study is often too costly, only a subset of the genome can therefore be measured. This often means, first, only considering single-nucleotide polymorphisms (SNPs) and neglecting copy number variants, and second, only measuring SNPs known to be variable enough in the population so that they are likely to be also variable in the set of individuals under consideration. The most informative subset of SNPs is chosen based on the distribution of common genetic variation along the genome, for instance as produced by the HapMap or the 1000 Genomes Project in humans. These SNPs are then used to build a micro-array, thereby allowing each individual in the study to be genotyped at all these SNPs simultaneously.

Parent Topic

Child Topic

No Parent Topic