Using a Random Forest proximity measure for variable importance stratification in genotypic data

Jose A. Seoane,Ian N. M. Day,Colin Campbell,Juan P. Casas,Tom R. Gaunt

Using a Random Forest proximity measure for variable importance stratification in genotypic data

2014

Jose A. Seoane
Ian N. M. Day
Colin Campbell
Juan P. Casas
Tom R. Gaunt

In this work we study variable-significance in classification using the Random Forest proximity matrix and local Importance matrix. We use the prox- imity m atrix t o g roup t he s amples acr oss a num ber of c lusters a nd use t hese clusters to s tratify th e importance of a v ariable. W e apply t his a pproach t o a cardiovascular g enotype d ataset f or sample classification b ased o n coronary heart disease and we found a number of variations related with cardiovascular disease phenotypes. We also used a set of phenotypes related with this genotype data to match the obtained clusters with coronary heart diseases phenotypes.

Keywords:

Genetics
Coronary heart diseases
Disease
Heart disease
Genotype
Random forest
Biology
Bioinformatics
proximity measure
Combinatorics
disease phenotype
sample classification
Data mining
coronary heart disease

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations