Introduction Despite its dismal prognosis and steady increase in prevalence little is known about the genetic alterations that drive oesophageal adenocarcinoma. As part of the International Cancer Genome Consortium: Oesophageal Adenocarcinoma (ICGC-OAC) we performed a pilot study, sequencing the genomes of 32 OAC samples, to assess the feasibility of initiating a large scale project sequencing a total of 500 OAC genomes.
Methods A total of 56 genomes were selected for sequencing including 32 OAC genomes—16 chemotherapy-Naive, 16 chemotherapy-treated—and 24 matched normal genomes. Whole genome sequencing was performed on the Illumina Hiseq 2000 platform. Initial bioinformatic analysis, run by Illumina using the Casava pipeline, detected single nucleotide variants (SNVs), small (<50 bp) Insertion and Deletions events (INDELs) and large scale structural variants (SVs). Additionally, bioinformatic analysis of SVs was performed using a custom Perl script. To determine the specificity of the bioinformatic approach a subset of SNVs and SVs were selected for verification by Sanger capillary sequencing and PCR respectively.
Results A minimum of 50-fold mappable sequence data were generated for each of the 56 genomes. 161/167 (96%) of predicted SNVs were confirmed as somatic, two were miscalled germline variants while four were undetectable in either sample. For 2/75 (3%) SVs PCR amplicons could not be generated, for 18 of 75 SVs (24%) a PCR amplicon was detectable in the normal showing them to be germline polymorphisms. The True positive rate for SV detection was therefore 73%. Comparison of SNV information across all 24 samples revealed many recurrently mutated genes. These include previously reported mutations in TP53, CDKN2A and APC among others. No genes were significantly associated with chemotherapy-treated or chemotherapy-naive samples.
Conclusion Analysis of the Illumina bioinformatic pipeline suggests it is highly specific (96% true positive rate) for somatic SNVs. A true positive rate of 73% for SV detection is comparable to recent literature. Further analysis to determine the sensitivity of this pipeline is ongoing including resequencing of putatively non-mutated genes in samples sent for WGS and the application of alternative bioinformatic approaches for the calling of SNVs, INDELs and SVs. Initial analysis of the SNV data from 32 tumour genomes has revealed several recurrently mutated genes known to be altered in OAC validating the ability of our approach to detect candidate “driver” genes.
Competing interests None declared.