Broad-spectrum respiratory tract pathogen identification using resequencing DNA microarrays

2006 
The critical need for advanced infectious diagnostic and surveillance systems has taken on a new urgency with increased concerns over bioterrorism agents as well as natural pathogens (e.g., Bacillus anthracis, coronavirus, avian influenza virus). A DNA microarray platform that can simultaneously detect and characterize many different types of human pathogens that cause similar symptoms provides considerable potential for both medical use and national defense purposes (Bodrossy and Sessitsch 2004; Cleland et al. 2004). DNA microarrays do this by simultaneously interrogating hundreds to thousands of immobilized probe DNA oligonucleotides, where each probe provides a single query for a known sequence that is unique for an organism or trait. Using DNA microarrays for pathogen detection has gained prominence leading to an explosive growth of research (Bryant et al. 2004). The effective use of microarrays for pathogen detection requires the optimization of several factors, such as sample amplification, probe specificity, and interpretation strategy in order to obtain unambiguous and reproducible results (Striebel et al. 2003). A major technical hurdle that limits the straightforward application of DNA microarrays to broad-spectrum pathogen diagnostics has been the requirement of specific amplification reagents and protocols (primarily PCR) to amplify chosen targets prior to microarray hybridization (Lopez et al. 2003; Striebel et al. 2003). A few random amplification strategies in conjunction with spotted microarrays have been developed using multiple rounds of amplification to detect a broad spectrum of pathogens in complex biological samples (Wang et al. 2002; Vora et al. 2004). Another hurdle to using spotted microarrays is that the design of specific oligonucleotide probes for pathogen identification is dependent on assumptions regarding target sequence composition. Long (50–70mer) oligonucleotide probes used in most prior studies have the disadvantage of decreased specificity (threshold for differentiation at 75%–87% sequence similarity), making it necessary to target multiple markers and rely on hybridization patterns for pathogen identification, which can lead to unquantifiable errors (Bodrossy and Sessitsch 2004). Nevertheless, these microarrays have provided a successful platform for screening a large number of pathogens at a viral family level via the use of highly conserved and hybridization mismatch-tolerant 70mer oligonucleotides (Wang et al. 2002, 2003). An additional problem with this format is that cross-hybridization occurs when two sequences share a high degree of similarity (Kothapalli et al. 2002). Careful data interpretation is needed to differentiate subtypes of pathogens using spotted microarrays and hybridization patterns. This approach does not produce direct genomic sequence as an output, but requires manual isolation and conventional DNA sequencing of captured pathogen targets (Wang et al. 2003). Thus, it is obvious that any incorporation of these concepts into a broad-spectrum diagnostic device for hundreds of pathogenic microorganisms and their variants will require a significant reduction in design, processing, and analysis steps. The exponentially increasing availability of microbial sequences makes it possible to envision the use of direct sequence for routine pathogen diagnostics and surveillance; however, this requires that pathogen sequence information be rapidly obtained. “Resequencing” microarrays use “tiled” sets of 105 to 106 probes of either 25mers or 29mers, containing one perfectly matched and three mismatched probes per base for both strands of target genes (Hacia 1999). This array-based format, combined with specific PCR, has proven ideal for single nucleotide polymorphism (SNP) genotyping and phylogenetic analysis (Kozal et al. 1996; Gingeras et al. 1998; K. Wilson et al. 2002; W. Wilson et al. 2002; Wong et al. 2004). Because several types of variations (especially insertion/deletion or frequent multiple substitutions) in pathogen sequence can perturb hybridization patterns, these approaches used differential measures of specific pathogen hybridization patterns to identify individual sequence variants. That is, identifications require a priori knowledge of a differential hybridization pattern that is empirically determined in control experiments. Even when control experiments are carried out, these characteristic and conserved hybridization patterns do not always occur with highly diverse pathogen targets obtained from clinical specimens. In this study, our overall objective was to demonstrate the utility of a resequencing microarray approach for simultaneous detection of respiratory pathogens in a format that can be used in a clinical environment without requiring the design of pathogen-specific PCR primers (W. Wilson et al. 2002) or fixed hybridization patterns (Gingeras et al. 1998). We chose to use a custom-designed Affymetrix resequencing Respiratory Pathogen Microarray (RPM v.1). Furthermore, we developed a method for automatic assembly of incomplete and disconnected pathogen sequence data into cumulative sequences amenable for similarity-based (e.g., Basic Local Alignment Search Tool-BLAST, Altschul et al. 1990) identification. The combination of a resequencing microarray with the application of statistical metrics to the raw output of the assay can allow unambiguous and reproducible sequence-based pathogen identification from clinical specimens. Our results demonstrate the feasibility of this approach for correct species- and strain-level identification with unambiguous statistical interpretation of adenovirus and influenza A strains at clinically relevant sensitivity levels. This report further suggests the feasibility of using this technology for broad-spectrum surveillance of respiratory pathogens, while providing new information on the incidence of pathogen coinfection.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    135
    Citations
    NaN
    KQI
    []