CRIS: Complete Reconstruction of Immunoglobulin V-D-J Sequences from RNA-seq data

2021 
Motivation B cells display remarkable diversity in producing B-cell receptors through recombination of immunoglobulin (Ig) V-D-J genes. Somatic hypermutation (SHM) of immunoglobulin heavy chain variable (IGHV) genes are used as a prognostic marker in B-cell malignancies. Clinically, IGHV mutation status is determined by targeted Sanger sequencing which is a resource-intensive and low-throughput procedure. Here, we describe a bioinformatic pipeline, CRIS (Complete Reconstruction of Immunoglobulin IGHV-D-J Sequences) that uses RNA sequencing (RNA-seq) datasets to reconstruct IGHV-D-J sequences and determine IGHV SHM status. Results CRIS extracts RNA-seq reads aligned to Ig gene loci, performs assembly of Ig transcripts and aligns the resulting contigs to reference Ig sequences to enumerate and classify SHMs in the IGHV gene sequence. CRIS improves on existing tools that infer the B-cell receptor repertoire from RNA-seq data using a portion IGHV gene segment by de novo assembly. We show that the SHM status identified by CRIS using the entire IGHV gene segment is highly concordant with clinical classification in three independent chronic lymphocytic leukemia patient cohorts. Availability and implementation The CRIS pipeline is available under the MIT License from https://github.com/Rashedul/CRIS. Supplementary information Supplementary data are available at Bioinformatics Advances online.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    0
    Citations
    NaN
    KQI
    []