TCGA’s efforts to dissect the genomic complexity found in breast cancer patients represents only the beginning of a journey toward better understanding of the intricacy of the events that lead to this disease. Additional efforts are required to provide tailored and effective therapeutic interventions.
In their article in this issue of ONCOLOGY, Drs. Ma and Ellis report on the clinical implications of The Cancer Genome Atlas (TCGA) project for breast cancer,[1] a collaborative effort that resulted in the integration of DNA sequencing with genome-wide profiling of the epigenome, microRNAome, transcriptome, and proteome from more than 500 primary breast cancers.[2] Potential promising applications of this technology include target discovery, refined and more accurate disease classification, and improved therapeutic direction.[3] There remain significant pitfalls, however, that hinder direct clinical interventions-including the corroboration that few tumors are “addicted” to oncogenic mutations, a high tumor heterogeneity requiring the use of combinatorial therapy, and the rapid emergence of resistance to therapy even when it is appropriately applied. In addition, there are logistical obstacles to securing effective drugs for each oncogenic alteration, and to the development of a regulatory infrastructure that would facilitate matching the right patient with the right drug or combination of drugs. An example of the challenges in clinical application of TCGA findings in breast cancer, preliminary evidence indicates that no more than 30% of patients with metastatic breast cancer may be matched to a potentially effective agent based on mutational profiling, with only a fraction of those patients deriving clinical benefit from the intervention.[4]
Despite these challenges, there remains considerable promise for additional scientific discovery that may be clinically relevant. Whole-exome sequencing performed as part of TCGA identified 30,626 somatic mutations in 510 tumors, including 28,319 point mutations, 4 dinucleotide mutations, and 2,302 insertions/deletions (indels), making it challenging to easily identify druggable mutations. To reveal potential “driver” mutations, the team identified significantly mutated genes (SMGs; those with recurrent mutations observed at a higher frequency than expected from background mutation levels across the tumors) and employed an integrated pathway approach to develop graphic models for multi-platform data analysis.[5] As Ma and Ellis point out, although only 35 genes met the criteria of SMGs, and most were relatively uncommon, identification of actionable mutations may have a major impact on treatment, owing to the prevalence of breast cancer. For example, despite the low frequency of activating human epidermal growth factor receptor type 2 (HER2) mutations in HER2–nonamplified/nonoverexpressed breast cancers (2%), the mutation pattern and functional studies indicate they are likely driver mutations that may be particularly sensitive to neratinib, an irreversible pan-HER inhibitor.[6] A major challenge, therefore, is to leverage all of the molecular information that is currently hidden from view.
A first step is the development of more sophisticated computational tools for integrative analysis of TCGA data. The results to date, highlighted by Ma and Ellis, have examined the data from the conventional perspectives of individual mutations, and the likely effects that these mutations have within a gene’s interaction “neighborhood,” typically in the context of a specific pathway. Moreover, compromised epigenetic behavior is implied using this approach, and Ma and Ellis mention several tools, most notably PARADIGM (Pathway Recogntion Algorithm Using Data Integration on Genetic Models[7]), that attempt to determine the impact of such mutations in a larger pathway context. The discussion is almost entirely framed around the fruits of “first-order” analyses, based on a census of mutated genes. However, this is only skimming the surface of a deep well of potential knowledge.
Integration of the mRNA, miRNA and proteomic data for these breast cancers has yet to be presented. Such work would be of extraordinary value, as the subtle dynamics of RNAi in tumors has been previously shown to play a significant role in oncogenic pathway crosstalk in the case of glioblastoma, using TCGA data.[8] Such integrative analyses could consolidate our understanding of how to link driver mutations with aberrant pathway signaling, and provide alternative, complementary treatment opportunities based on RNAi therapeutics.
An alternative approach to studying the full spectrum of genomic and epigenomic variation for all TGCA breast cancers is to attempt to link all the data instances together, and then take advantage of these linkages to probe for associations and correlations across multiple data types and conditions. Such an approach has been successfully adopted by the “data warehouses” that employ semantic methodologies combined with databases containing similarly diverse genomics data for several model organisms, allowing the user to perform complex queries in a linguistic fashion.[9] Because these data warehouses use the same semantic “frame of reference,” it is possible to integrate differing data sources, including those hosting drug response data.[10] Such an informatics infrastructure could significantly increase the ability of clinicians to directly interact with these TCGA datasets from a therapeutic perspective. Development of these integrative tools, which we need to generate and optimize, requires a critical focus on their coordinated, progressive development, so as to avoid the risk of a software “Tower of Babel.” The National Cancer Institute’s new Informatics Technology for Cancer Research (ITCR) Initiative should provide a means to guarantee such “curated” informatics innovation, expediting the development of therapeutically focused data analysis approaches, with clear benefits to the clinical community.
As next-generation sequencing technologies evolve, the potential for applications building on TCGA data are rapidly growing. For example, one could envision a multi-tier sequencing approach based on several low-cost, custom-target sequencing panels. These could offer screening tailored to specific patient subgroups. Indeed, a small 35-gene panel could be offered as an initial screening tool, targeting mutations in genes that meet the criteria for classification as SMGs. Because the amount of input DNA required for sequencing is currently on the order of few nanograms, it is plausible to envision monitoring therapy response over time; this could be performed using the same custom panel used for initial screening. Additional panels targeting lower-frequency mutations could be developed and applied to patients who screen negative with the “standard” panel. As the throughput of sequencing improves, the coverage is increasing, making it possible to better dissect tumor heterogeneity and more clearly understand the genetic component of clonal expansion. Extremely rare variants present only in few cells may be important in nonresponders to standard therapies. Likewise, an area of active research is focused on the sequencing of circulating tumor cells (CTCs) and DNA sequenced under cell-free conditions. These applications are still in their infancy and can benefit enormously from the development of sequencing platforms that offer higher coverage at lower costs; however, the potential for success is enormous.
In summary, as pinpointed by Ma and Ellis, TCGA’s efforts to dissect the genomic complexity found in breast cancer patients represents only the beginning of a journey toward better understanding of the intricacy of the events that lead to this disease. Additional efforts are required to provide tailored and effective therapeutic interventions.
Financial Disclosure:The authors have no significant financial interest or other relationship with the manufacturers of any products or providers of any service mentioned in this article.
Acknowledgment: This work is upported in part by grants from the National Institutes of Health (P30-13330).
1. Ma CX, Ellis MJ. The Cancer Genome Atlas: clinical applications for breast cancer. Oncology (Williston Park). 2013;27:1263-79.
2. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature. 2012;490:61-70.
3. Sparano JA, Ostrer H, Kenny PA. Translating genomic research into clinical practice. Am Soc Clin Oncol Educ Book. 2013:15-23.
4. Andre F, Bachelot TD, Campone M, et al. : Array CGH and DNA sequencing to personalize targeted treatment of metastatic breast cancer (MBC) patients (pts): a prospective multicentric trial (SAFIR01). J Clin Oncol. 2013;31(suppl):abstr 511.
5. Vaske CJ, Benz SC, Sanborn JZ, et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics. 2010;26:i237-45.
6. Bose R, Kavuri SM, Searleman AC, et al. Activating HER2 mutations in HER2 gene amplification negative breast cancer. Cancer Discov. 2013;3:224-37.
7. Ng S, Collisson EA, Sokolov A, et al. PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis. Bioinformatics. 2012;28:i640-6.
8. Sumazin P, Yang X, Chiu HS, et al. An extensive microRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways in glioblastoma. Cell. 2011;147:370-81.
9. Contrino S, Smith RN, Butano D, et al. modMine: flexible access to modENCODE data. Nucleic Acids Res. 2012;40:D1082-8.
10. Chen YA, Tripathi LP, Mizuguchi K. TargetMine, an integrated data warehouse for candidate gene prioritization and target discovery. PLoS One. 2011;6:e17844.