Linear normalised hash function for clustering gene sequences and identifying reference sequences from multiple sequence alignments
2012
Background
Comparative genomics has put additional demands on the assessment of similarity between sequences and their clustering as means for classification. However, defining the optimal number of clusters, cluster density and boundaries for sets of potentially related sequences of genes with variable degrees of polymorphism remains a significant challenge. The aim of this study was to develop a method that would identify the cluster centroids and the optimal number of clusters for a given sensitivity level and could work equally well for the different sequence datasets.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
44
References
2
Citations
NaN
KQI