Aequatus: an open-source homology browser

2018 
Background: The phylogenetic information inferred from the study of homologous genes helps us to understand the evolution of gene families, the study of homology plays a vital role in finding ancestral gene duplication events as well as identifying regions those are under positive selection within species. Conservation of homologous loci results in syntenic blocks, and there are various tools available to visualise syntenic information between species. These tools provide an overview of syntenic regions as a whole, reaching down to the gene level, but none provide any information about structural changes within genes such as the conservation of ancestral exon boundaries amongst multiple genomes. Findings: We present Aequatus, a standalone web-based tool that provides an in-depth view of gene structure across gene families, with various options to render and filter visualisations. It relies on pre-calculated alignment and gene feature information held in an Ensembl database, typically generated through, but not limited to, the Ensembl Compara workflow. We also offer Aequatus.js, a reusable JavaScript module that fulfils the visualisation aspects of Aequatus. Availability: Aequatus is an open-source tool freely available to download under GPLv3 license at https://github.com/TGAC/Aequatus and a demo is available at http://aequatus.tgac.ac.uk Contact: : Anil.Thanki@tgac.ac.uk and Robert.Davey@tgac.ac.uk Introduction Inferring the homology of genes across or within species is a commonplace technique to investigate synteny [1]. The inference process involves carrying out multiple sequence alignments comprising multiple steps and these can be computationally intensive even for small numbers of data points [2]. There are many methods available for findings of genome-wide orthology descriptions, for example MSOAR [3], OrthoMCL [4], HomoloGene [5], TreeFam [6], TreeBeST [7]. TreeBeST gives combined results based on species trees and dN/dS nucleotide and protein measures, unlike others which typically provide clustering without considering a given species tree topology. PhyOP [8] uses a tree-based method but it is useful only for closely related species. For these reasons, TreeBeST is used in the Ensembl Compara pipeline [9] a computational workflow developed by the Ensembl Compara team to infer familial relationships that includes clustering, multiple alignment, and tree generation. The Ensembl Compara schema is able to store comparative data such as gene families, syntenic regions, and protein families, and Ensembl Core database stores gene feature informations and other genomic annotations at the species level. The Ensembl project (release 84) at EMBL-EBI houses 87 species [10] on both production and early access websites, among them precomputed multiple alignments and gene family information for 70 vertebrate species. There are many ways to represent and view comparative datasets, with the traditional method being phylogenetic trees, but also using tools such as Ensembl Browser [11], Genomicus [12], SyMap [13], and MizBee [14]. These tools are able to provide an overview of syntenic regions as a whole, with some reaching down to the gene order and orientation level. However, whilst retaining ancestral information, phylogenetic trees do not represent the underlying . CC-BY-NC 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/055632 doi: bioRxiv preprint first posted online May. 27, 2016;
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []