Influenza Classification from Short Reads with VAPOR Facilitates Robust Mapping Pipelines and Zoonotic Strain Detection for Routine Surveillance Applications

2019 
Background: Influenza viruses are associated with a significant global public health burden. The segmented RNA genome of influenza changes continually due to mutation, and the accumulation of these changes within the antigenic recognition sites of haemagglutinin (HA) and neuraminidase (NA) in turn leads to annual epidemics. Influenza A is also zoonotic, allowing for exchange of segments between human and non-human viruses, resulting in new strains with pandemic potential. These processes necessitate a global surveillance system for influenza monitoring. To this end, whole-genome sequencing (WGS) has begun to emerge as a useful tool. However, due to the diversity and mutability of the influenza genome, and noise in short-read data, bioinformatics processing can present challenges. Results: Conventional mapping approaches can be insufficient when a sub-optimal reference strain is chosen. For short-read datasets simulated from influenza H1N1 HA sequences, read recovery after single-reference mapping was routinely as low as 90% for human-origin influenza sequences, and often lower than 10% for those from avian hosts. To this end, we developed a de Bruijn Graph (DBG)-based classifier of influenza WGS datasets: VAPOR. In real data benchmarking using 257 WGS read sets with corresponding de novo assemblies, VAPOR provided classifications for all samples with a mean of >99.8% identity to assembled contigs. This resulted in an increase in the number of mapped reads by 6.8% on average, up to a maximum of 13.3%. Additionally, using simulations, we demonstrate that classification from reads may be applied to detection of reassorted strains. Conclusions: VAPOR has potential to simplify bioinformatics pipelines for surveillance, providing a novel method for detection of influenza strains of human and non-human origin directly from reads, minimization of potential data loss and bias associated with conventional mapping, and allowing visualization of alignments that would otherwise require slow de novo assembly. Whilst with expertise and time these pitfalls can largely be avoided, with pre-classification they are remedied in a single step. Furthermore, our algorithm could be adapted in future to surveillance of other RNA viruses. VAPOR is available at https://github.com/connor-lab/vapor.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    49
    References
    1
    Citations
    NaN
    KQI
    []