M26 - The Colorectal Cancer Sequencing Project (Phase II/III)/Whole Exome Sequencing for Colorectal Cancer (Phase I)

 

Investigator Names and Contact Information

Ulrike Peters [upeters@fhcrc.org]

Introduction/Intent

Linkage analyses of pedigrees have identified idiosyncratic, high penetrance mutations predisposing to colorectal cancer (CRC).  More recently, genome-wide association studies (GWAS) have identified common risk alleles that confer mostly weak increased risks of developing disease.  However, a significant fraction of the excess familial risk of CRC remains to be explained.  The approaches used to date have limited ability to detect low-frequency alleles.  To comprehensively investigate rare variants we propose to use next generation sequencing technology to screen the protein coding regions of the entire genome ("the exome") for low frequency variants in a panel of 700 high-risk CRC cases and 700 controls, selected within our existing Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) of 12,119 cases and 12,165 controls of which WHI is part of.  Using newly evolving statistical methods we will identify most promising rare variants that will be followed up with genotyping and targeted sequencing in the entire GECCO population.

The newly identified susceptibility loci may provide meaningful prediction of disease risk to identify high-risk individuals who can benefit from enhanced screening and early interventions, and are certain to provide insights into the molecular pathways through which CRC develops. 

Colorectal cancer (CRC) is the second leading cause of cancer death in the US.  It has been estimated that up to 35% of CRC is attributable to inherited factors, and identification of associated genetic variants is important to elucidate mechanisms underlying this disease.  First results from genome-wide association studies (GWAS) have demonstrated considerable success in identifying genetic variants associated with various common complex diseases, including CRC.  Rare syndromes (mainly familial adenomatous polyposis (FAP) and Lynch syndrome) explain between three and five percent of excess familial disease risk for CRC, while GWAS regions identified to date explain about six percent.  We have estimated that even large GWAS using 50,000 to 100,000 individuals will capture at most about 17% of the heritable disease risk of complex diseases, leaving a significant fraction unexplained.  This variation has been referred to as "genetic dark matter," and its elucidation is a key next-step in understanding genetic susceptibility to CRC.

There are several possibilities that might account for the unexplained heritability of CRC, including gene-gene interactions, gene-environment interactions and copy number variation.  Another promising explanation is the existence of risk variants that have not have been identified by linkage analysis or GWAS; either because the effect size (strictly speaking strength of association) was not strong enough to show a peak in linkage analysis (odds ratio <4.0) or the allele frequency was too low to be included in GWAS, which mainly focus on common variants with allele frequency of >5%. 

Next generation sequencing methods make it feasible to sequence the whole coding genome (the "exome"), thereby allowing for comprehensive screening for less common polymorphisms (allele frequency 1% to 5%) and rare variants (allele frequency <1%).  Exomic sequencing of high-risk individuals, paired with newly evolving statistical methods that incorporate information on predicted function, will allow us to define a panel of promising coding variants.  These identified variants can rapidly and efficiently be screened for associations with disease in much larger case-control populations.  This strategy affords an exceptional opportunity to identify novel functional variants associated with CRC.  We propose to apply this strategy within our existing Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO; U01CA137088 and R01CA059045, PI: Peters), which includes 14,856 cases and 16,548 controls (12,119 cases and 12,165 controls currently have available DNA) from well characterized, case-control studies and prospective cohorts, including WHI with detailed information on environmental risk factors.  Within this consortium setting, we propose to identify novel variants associated with CRC by performing whole-exome sequencing with the following specific aims:

  1. To identify the most promising less common and rare variants associated with colorectal cancer by conducting whole-exome sequencing in 700 high-risk colorectal cancer cases and 700 controls, including about 350 cases and 350 controls from the WHI.  We propose to use next-generation technology to sequence the exome in each of 700 cases and 700 controls, including 350 cases and 350 controls from WHI.  High-risk cases are defined as those with one or more first degree family members with CRC and/or early onset disease (age at diagnosis <50 years), but without the known family syndromes FAP and Lynch syndrome by pre-screening cases for these mutations.  We will use newly evolving statistical methods paired with computational predictions of the functional effects to select the most promising variants.
  2. To follow-up the most promising variants identified in aim 1 in the large GECCO study population of 12,119 cases and 12,165 controls of the GECCO study populations, including about 2,200 cases and 2,200 controls from the WHI.  
    1. Less common polymorphisms (allele frequency 1%-5%) will be genotyped and testing association between polymorphisms and CRC risk
    2. Rare variants (allele frequency <1%) will be replicated through resequencing of identified gene regions

      This project brings together a highly-qualified, multi-disciplinary team of investigators, using next-generation sequencing technology to generate important sequence data.  Because the lead investigators on this project are also lead investigators in the WHI Sequencing Project (WHISP) we will gain ample experience with sequencing and analyzing data of this cutting edge technology.  We expect that findings from this study, particularly when used as a multi-marker panel, will help to identify high-risk individuals who can benefit from enhanced screening programs and early interventions, and may identify new drug targets.