• 2022-09
  • 2022-08
  • 2022-07
  • 2022-05
  • 2022-04
  • 2021-03
  • 2020-08
  • 2020-07
  • 2018-07
  • br predictions to supporting functional


    predictions to supporting functional evidence. We provide an interactive resource for exploring driver missense mutations iden-tified from the TCGA ( and a user-friendly tool ( to pre-dict whether newly observed mutations from further sequencing are likely cancer drivers. Last, we examine the diversity of driver missense mutations across various types of cancer, which leads to a refined understanding of the likely trajectory of driver missense mutation discovery with further sequencing.
    Overview of CHASMplus
    We have developed a method named CHASMplus that uses ma-chine learning to discriminate somatic missense mutations (referred to hereafter as missense mutations) as either cancer drivers or passengers (Figure 1A; STAR Methods). In Spectinomycin to our recent analysis of TCGA mutations (Bailey et al., 2018), the method is designed so that predictions can be done in a can-cer-type-specific manner (Figure 1B), as opposed to only considered across multiple cancer types in aggregate (‘‘pan-cancer’’). To generate predictions, CHASMplus is trained using somatic mutation calls from TCGA covering 8,657 samples in
    32 cancer types (Figure S1; Table S1; STAR Methods). Because there is no gold standard set of driver and passenger missense mutations, we developed a semi-supervised approach to assign class labels to missense mutations. Finally, mutation scores from
    CHASMplus are weighted by a driver gene score for the respec-tive gene, producing gene-weighted (gwCHASMplus) scores (STAR Methods). r> CHASMplus Predicts Cancer-Type Specificity of Driver Missense Mutations
    CHASMplus provides a predictive model for each of 32 cancer types sequenced by TCGA. In contrast, most previous methods provide a single impact score for each missense mutation (Adz-hubei et al., 2010; Carter et al., 2013; Gonzalez-Perez et al., 2012; Ioannidis et al., 2016; Jagadeesh et al., 2016; Kumar et al., 2016; Ng and Henikoff, 2001; Reva et al., 2011; Shihab et al., 2013), regardless of cancer type. However, two methods (CHASM [Carter et al., 2009] and CanDrA [Mao et al., 2013]) do provide cancer-type-specific prediction models, but this capability has not been validated. To illustrate the significant advance in cancer-specific prediction made by CHASMplus, we compared the cancer type specificity of CHASMplus to CHASM and CanDrA, along with, for reference, two additional methods (ParsSNP [Kumar et al., 2016] and REVEL [Ioannidis et al., 2016]) that are not cancer-type specific.
    First, a cancer-type-specific model should accurately predict the oncogenic effects of missense mutations in an appropriate cell line (Fro¨hling et al., 2007; Wan et al., 2004). We therefore compared predictions of breast-cancer-specific CHASMplus, CHASM, and CanDrA models in known breast cancer-driver genes to a previous large-scale validation of 698 missense mu-tations in MCF10A (breast epithelium) cells that measured cell viability (Ng et al., 2018) (Figure 2A; STAR Methods). We used the area under the receiver operating characteristic curve (auROC) as a performance metric, similar to many prior studies of variant effect prediction (Adzhubei et al., 2010; Ioannidis et al., 2016; Kircher et al., 2014; Kumar et al., 2016; Mao et al., 2013). In general, auROC values range from 0.5 (random predic-tion performance) to 1.0 (perfect). We found that CHASMplus had substantially higher auROC than compared to CHASM and CanDrA (p < 2.2e 16; DeLong test; Table S2). It was also signif-icantly higher than ParsSNP, which is not cancer-type specific, and REVEL, a general-purpose pathogenicity predictor (p < 2.2e 16; DeLong test). In fact, CanDrA and CHASM had a lower auROC than ParsSNP, suggesting that these prior methods only captured a limited amount of cancer-type specificity.