In silico saturation mutagenesis of cancer genes.

2021 
Despite the existence of good catalogues of cancer genes1,2, identifying the specific mutations of those genes that drive tumorigenesis across tumour types is still a largely unsolved problem. As a result, most mutations identified in cancer genes across tumours are of unknown significance to tumorigenesis3. We propose that the mutations observed in thousands of tumours—natural experiments testing their oncogenic potential replicated across individuals and tissues—can be exploited to solve this problem. From these mutations, features that describe the mechanism of tumorigenesis of each cancer gene and tissue may be computed and used to build machine learning models that encapsulate these mechanisms. Here we demonstrate the feasibility of this solution by building and validating 185 gene–tissue-specific machine learning models that outperform experimental saturation mutagenesis in the identification of  driver and passenger mutations. The models and their assessment of each mutation are designed to be interpretable, thus avoiding a black-box prediction device. Using these models, we outline the blueprints of potential driver mutations in cancer genes, and demonstrate the role of mutation probability in shaping the landscape of observed driver mutations. These blueprints will support the interpretation of newly sequenced tumours in patients and the study of the mechanisms of tumorigenesis of cancer genes across tissues. A new computational approach to in silico mutagenesis screening allow comprehensive mapping of cancer driver mutations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    43
    References
    6
    Citations
    NaN
    KQI
    []