BPGA- an ultra-fast pan-genome analysis pipeline
2016
Recent advances in ultra-high-throughput sequencing technology and metagenomics have led to a
paradigm shift in microbial genomics from few genome comparisons to large-scale pan-genome studies
at different scales of phylogenetic resolution. Pan-genome studies provide a framework for estimating
the genomic diversity of the dataset, determining core (conserved), accessory (dispensable) and
unique (strain-specific) gene pool of a species, tracing horizontal gene-flux across strains and providing
insight into species evolution. The existing pan genome software tools suffer from various limitations
like limited datasets, difficult installation/requirements, inadequate functional features etc. Here we
present an ultra-fast computational pipeline BPGA (Bacterial Pan Genome Analysis tool) with seven
functional modules. In addition to the routine pan genome analyses, BPGA introduces a number of
novel features for downstream analyses like core/pan/MLST (Multi Locus Sequence Typing) phylogeny,
exclusive presence/absence of genes in specific strains, subset analysis, atypical G + C content analysis
and KEGG & COG mapping of core, accessory and unique genes. Other notable features include
minimum running prerequisites, freedom to select the gene clustering method, ultra-fast execution,
user friendly command line interface and high-quality graphics outputs. The performance of BPGA has
been evaluated using a dataset of complete genome sequences of 28 Streptococcus pyogenes strains.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
44
References
343
Citations
NaN
KQI