Completion of HLA protein sequences by automated homology-based nearest-neighbor extrapolation of HLA database sequences

2016 
The IMGT/HLA database contains every publicly available HLA sequence. However, most of these HLA protein sequences are restricted to the alpha-1/alpha-2 domain for HLA class-I and alpha-1/beta-1 domain for HLA class-II. Nevertheless, also polymorphism outside these domains may play a role in alloreactivity after transplantation. Several computational algorithms that aim for predicting alloreactivity after transplantation, HLAMatchmaker and the PIRCHE algorithm, require a major or the whole part of the HLA protein sequence as input for their prediction. In this study we describe an automated homology-based nearest-neighbor method to extrapolate incomplete HLA protein sequences. To get insight in the reliability of our extrapolation method, we performed a 10-fold cross-validation. The majority of the amino acid positions of the individual HLA class-I and -II proteins were correctly predicted. Eplets as defined by HLAMatchmaker were located both at correctly predicted as well as at lower reliably predicted amino acid positions. Moreover, six out of seven completely sequenced HLA alleles with previously unknown exons sequences were in agreement with the extrapolated amino acid sequences. In conclusion, incomplete HLA sequences can be completed by a homology-based nearest-neighbor principle. Since this method is automated, future submitted incomplete HLA sequences can easily be extrapolated.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    7
    Citations
    NaN
    KQI
    []