The Hunting of the Snark: Whither Genome-Wide Association Studies for Colorectal Cancer?

2016 
EDITORIALS The Hunting of the Snark: Whither Genome-Wide Association Studies for Colorectal Cancer? See “Identification of susceptibility loci and genes for colorectal cancer risk,” by Zeng C, Matsuda K, Jia W-H, et al, on page 1633. I n the Lewis Carroll nonsense poem, The Hunting of the Snark, a curious assembly of characters with a variety of dubious skills sets forth on a poorly defined quest to find the half-real snark. On the way, the pursuers use a number of approaches, and in turns collaborate, fall out with each other and, in at least one case, go mad. The only seeker who claims to find the snark, disappears. Genome-wide association studies (GWAS), based on thousands of cases and controls typed at thousands of single nucleotide polymorphisms (SNPs), have identified several variants that associate with gastrointestinal cancer risk. More than 30 colorectal cancer (CRC) predisposition SNPs are known, together with a smaller number of loci for other gastrointestinal cancers. 1–4 It is indisputable, however, that GWAS for other common cancers—notably for breast and prostate—have been much more successful at finding larger numbers of SNPs. It sometimes feels that we will be writing the phrase, “known genetic variants explain only a small fraction of the heritability of gastrointestinal cancers,” for many years to come. The manuscript by Zeng et al 5 in this issue of Gastro- enterology reports another very useful increase in our CRC genetics knowledge. This study, the largest carried out to date in Asian populations, identified 6 SNPs associated with CRC risk at genome-wide significance (P < 5 10 -8 ), including rs4711689 at 6p21, rs2450115, and rs6469656 at 8q23, rs4919687 at 10q24, rs11064437 at 12p13, and rs6061231 at 20q13. The most likely candidate genes affected by the functional variation at each of the 5 sites were respectively reported to be TFEB (involved in lyso- somal biogenesis), EIF3H (initiation of translation), CYP17A1 (steroid synthesis), SPS2B2 (proteasome), and RPS21 (ribosomal biogenesis). Several of these functions seem to be new in terms of CRC pathogenesis. Although most of these SNPs lie in noncoding regions, one of them (rs11064437) has a potential effect on protein sequence, as it falls within the intron 1 splice acceptor of SPSB2. A consideration of 2 of the loci reported by Zeng et al illustrates some of the difficulties in pinning down the functional variation underlying tagSNP signals, especially when comparing ethnic groups. First, we address the question of how many independent CRC SNPs exist near EIF3H? Zeng et al found that 2 SNPs (rs2450115 and rs6469656), near EIF3H, mapped to a haplotype block harboring a previously reported CRC SNP in Europeans (rs16892766 6 ), which happened to be monomorphic in Asians. Interestingly, rs2450115 and rs6469656, which are in mild linkage disequilibrium (LD) in Asians (r 2 ¼ 0.20), remained nominally significant (P ¼ 9.60 10 -6 for rs2450115 and P ¼ 8.30 10 -4 for rs6469656) after joint association analysis by Zeng et al. These 2 SNPs were also tested individually in European case- control studies where each was nominally associated with CRC (P ¼ .0003 for rs2450115 and P ¼ .02 for rs6469656). In Europeans, however, these 2 SNPs have stronger LD (r 2 ¼ 0.40) and Zeng et al’s joint analysis showed that only rs2450115 remained nominally associated with CRC (P ¼ .007), as we ourselves had found in our own previous fine- mapping study. 7 Altogether, these observations are incon- clusive, but are consistent with a scenario in which rs2450115 or a strongly correlated SNP is the mostly likely variant driving a single, independent chromosome 8q23 signal. Second, Zeng et al reported a CRC SNP (rs6061231) mapping to chromosome 20q13, a region containing another SNP (rs4925386) that has previously been associ- ated with CRC in Europeans. 8 These 2 SNPs are in weaker LD in Asians (r 2 ¼ 0.15) than in Europeans (r 2 ¼ 0.44). After conditional testing in the Asian data sets, only rs6061231 remained significant, naturally leading Zeng et al to suggest that rs6061231 better captured the 20q13 signal. Interest- ingly, a recent study by Al Tassan, 9 found a third 20q13 CRC variant (rs2427308), which based on 1000 Genomes (1KG) data, is in full LD with rs6061231 in Han Chinese (r 2 ¼ 1.0) and in very high LD in East Asians (r 2 ¼ 0.89, Figure 1). rs2427308 and rs6061231 also show strong LD in 1000 Genomes Project Europeans (r 2 ¼ 0.69, Figure 1). rs6061231 may thus not be an entirely new CRC variant and further studies are needed to assess whether it, rs2427308, and rs4925386 are tagging single or multiple functional 20q13 variants. This work is also important for the detailed functional studies required to determine the identity of the target gene in the region. Although Zeng et al clearly identified new CRC regions on chromosomes 6p21, 10q24 and 12p13, questions remain about the novelty and number of risk alleles on chromo- somes 8q23 and 20q13. Furthermore, although Zeng et al report heterogeneity between Asians and Europeans for 3 SNPs (rs4919687/10q24, rs4711689/6p21, and rs6061231/ 20q13), it seems unlikely that their preferred explanation, effect allele frequency, is the principal factor causing these differences. A further inherently troublesome area in GWAS is the identity of the gene(s) which are the targets of the under- lying functional variation that influences disease suscepti- bility. In an attempt to assign genes to SNPs, Zeng et al performed expression quantitative trait locus analysis in anatomically normal colon tissue from 188 Asian patients
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    5
    Citations
    NaN
    KQI
    []