Skip to main content

IRIS-TCGA: An Information Retrieval and Integration System for Genomic Data of Cancer

  • Conference paper
  • First Online:
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2016)

Abstract

Data integration is one of the most challenging research topic in many knowledge domains, and biology is surely one of them. However theory and state of the art methods make this task complex for most of the small research centers. Fortunately, several organizations are focusing on collecting heterogeneous data making an easier task to design analysis tools and test biological and medical hypothesis on integrated data. One of the most evident case of such efforts is The Cancer Genome Atlas (TCGA), a data base that contains a large variety of information related to different types of cancer. This data base offers a great opportunity to those interested in performing analysis of integrated data; however, its exploitation is not so easy since non trivial efforts are required to extract and combine data before it could be analyzed in an integrated perspective. In this paper we present IRIS-TCGA, an online web service developed to perform multiple queries for data integration on TCGA. Differently from other tools that have been proposed to interact with TCGA, IRIS-TCGA allows a direct access to the data and enables to extract detailed combinations of subsets of the repository, according to filters and high-order queries. The structure of the system is simple, as it is built on two main operators, union and intersection, that are then used to construct queries of higher complexity. The first version of the system supports the extraction and integration of gene expression (RNA-sequencing, microarrays), DNA-methylation, and DNA-sequencing (mutations) data from experiments on tissues of patients, together with their related meta data, in a gene oriented organization. The extracted data matrices are particularly suited for data mining applications (e.g., classification). Finally, we show two application examples, where IRIS-TCGA is used for integrating genomic data from RNA-sequencing and DNA-methylation experiments, and where state-of-the-art bioinformatics analysis tools are applied to the integrated data in order to extract new knowledge from them. IRIS-TCGA is freely available at http://bioinf.iasi.cnr.it/iristcga/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gomez-Cabrero, D., Abugessaisa, I., Maier, D., Teschendorff, A., Merkenschlager, M., Gisel, A., Ballestar, E., Bongcam-Rudloff, E., Conesa, A., Tegnér, J.: Data integration in the era of omics: current and future challenges. BMC Syst. Biol. 8(Suppl 2), I1 (2014)

    Article  Google Scholar 

  2. Hayden, E.C.: Technology: the $1,000 genome. Nature 507(7492), 294–5 (2014)

    Article  Google Scholar 

  3. Weitschek, E., Santoni, D., Fiscon, G., De Cola, M.C., Bertolazzi, P., Felici, G.: Next generation sequencing reads comparison with an alignment-free distance. BMC Res. Notes 7(1), 869 (2014)

    Article  Google Scholar 

  4. Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R.M., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C., Stuart, J.M., Network, T.: The cancer genome atlas pan-cancer analysis project. Nature Genet. 45(10), 1113–1120 (2013)

    Article  Google Scholar 

  5. Ovaska, K., Laakso, M., Haapa-Paananen, S., Louhimo, R., Chen, P., Aittomaki, V., Valo, E., Nunez-Fontarnau, J., Rantanen, V., Karinen, S., et al.: Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Med. 2(9), 65 (2010)

    Article  Google Scholar 

  6. Joly, Y., Dove, E.S., Knoppers, B.M., Bobrow, M., Chalmers, D.: Data sharing in the post-genomic world: the experience of the international cancer genome consortium (ICGC) data access compliance office (daco). PLoS Comput. Biol. 8(7), e1002549 (2012)

    Article  Google Scholar 

  7. Cerami, E., Gao, J., Dogrusoz, U., Gross, B.E., Sumer, S.O., Aksoy, B.A., Jacobsen, A., Byrne, C.J., Heuer, M.L., Larsson, E., et al.: The CBIO cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Disc. 2(5), 401–404 (2012)

    Article  Google Scholar 

  8. Zhu, Y., Qiu, P., Ji, Y.: TCGA-assembler: open-source software for retrieving and processing TCGA data. Nat. Methods 11(6), 599–600 (2014)

    Article  Google Scholar 

  9. Colaprico, A., Silva, T.C., Olsen, C., Garofano, L., Cava, C., Garolini, D., Sabedot, T.S., Malta, T.M., Pagnotta, S.M., Castiglioni, I., et al.: TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucl. Acids Res. 44, e71 (2015)

    Article  Google Scholar 

  10. Deng, M., Brägelmann, J., Schultze, J.L., Perner, S.: Web-TCGA: an online platform for integrated analysis of molecular cancer data sets. BMC Bioinform. 17(1), 1 (2016)

    Article  Google Scholar 

  11. Weitschek, E., Felici, G., Bertolazzi, P.: Clinical data mining: problems, pitfalls and solutions. In: 24th International Workshop on Database and Expert Systems Application, pp. 90–94, 10662 Los Vaqueros Circle, Los Alamitos, CA 90720, USA. IEEE Computer Society (2013)

    Google Scholar 

  12. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5(7), 621–628 (2008)

    Article  Google Scholar 

  13. Li, B., Dewey, C.N.: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 12(1), 323 (2011)

    Article  Google Scholar 

  14. Weitschek, E., Felici, G., Bertolazzi, P.: Mala: a microarray clustering and classification software. In: 23rd International Workshop on Database and Expert Systems Application, pp. 201–205, 10662 Los Vaqueros Circle, Los Alamitos, CA 90720, USA. IEEE Computer Society (2012)

    Google Scholar 

  15. Bird, A.P.: CpG-rich islands and the function of DNA methylation. Nature 321(6067), 209–213 (1985)

    Article  Google Scholar 

  16. Bird, A.: DNA methylation patterns and epigenetic memory. Genes Dev. 16(1), 6–21 (2002)

    Article  Google Scholar 

  17. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20(9), 1297–1303 (2010)

    Article  Google Scholar 

  18. Weitschek, E., Cumbo, F., Cappelli, E., Felici, G.: Genomic data integration: a case study on next generation sequencing of cancer. In: 27th International Workshop on Database and Expert Systems Application, pp. 49–53, 10662 Los Vaqueros Circle, Los Alamitos, CA 90720, USA. IEEE Computer Society (2016)

    Google Scholar 

  19. Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann (1995)

    Google Scholar 

  20. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995)

    MATH  Google Scholar 

  21. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)

    Book  MATH  Google Scholar 

  22. Bibikova, M., Barnes, B., Tsan, C., Ho, V., Klotzle, B., Le, J.M., Delano, D., Zhang, L., Schroth, G.P., Gunderson, K.L., et al.: High density dna methylation array with single cpg site resolution. Genomics 98(4), 288–295 (2011)

    Article  Google Scholar 

  23. Weitschek, E., Fiscon, G., Felici, G.: Supervised DNA Barcodes species classification: analysis, comparisons and results. BioData Mining 7(1), 1 (2014)

    Article  Google Scholar 

  24. Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Boston (2005). 75 Arlington Street, Suite 300

    Google Scholar 

  25. Cestarelli, V., Fiscon, G., Felici, G., Bertolazzi, P., Weitschek, E.: CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules. Bioinformatics 32(5), 697–704 (2016)

    Article  Google Scholar 

  26. Dandrea, D., Grassi, L., Mazzapioda, M., Tramontano, A.: Fidea: a server for the functional interpretation of differential expression analysis. Nucl. Acids Res. 41(W1), W84–W88 (2013)

    Article  Google Scholar 

  27. Khatri, P., Sirota, M., Butte, A.J.: Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput. Biol. 8(2), e1002375 (2012)

    Article  Google Scholar 

  28. Kulis, M., Heath, S., Bibikova, M., Queirós, A.C., Navarro, A., Clot, G., Martínez-Trillos, A., Castellano, G., Brun-Heath, I., Pinyol, M., et al.: Epigenomic analysis detects widespread gene-body DNA hypomethylation in chronic lymphocytic leukemia. Nature Genet. 44(11), 1236–1242 (2012)

    Article  Google Scholar 

  29. Chen, C., Zhang, C., Cheng, L., Reilly, J.L., Bishop, J.R., Sweeney, J.A., Chen, H.Y., Gershon, E.S., Liu, C.: Correlation between DNA methylation and gene expression in the brains of patients with bipolar disorder and schizophrenia. Bipolar Disorders 16(8), 790–799 (2014)

    Article  Google Scholar 

  30. Akalin, A., Garrett-Bakelman, F.E., Kormaksson, M., Busuttil, J., Zhang, L., Khrebtukova, I., Milne, T.A., Huang, Y., Biswas, D., Hess, J.L., et al.: Base-pair resolution DNA methylation sequencing reveals profoundly divergent epigenetic landscapes in acute myeloid leukemia. PLoS Genet. 8(6), e1002781 (2012)

    Article  Google Scholar 

  31. Maunakea, A.K., Nagarajan, R.P., Bilenky, M., Ballinger, T.J., DSouza, C., Fouse, S.D., Johnson, B.E., Hong, C., Nielsen, C., Zhao, Y., et al.: Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 466(7303), 253–257 (2010)

    Article  Google Scholar 

  32. Seber, G.A., Lee, A.J.: Linear Regression Analysis, vol. 936. Wiley, Hoboken (2012). 07030–5774

    MATH  Google Scholar 

  33. Conrad, D.F., Pinto, D., Redon, R., Feuk, L., Gokcumen, O., Zhang, Y., Aerts, J., Andrews, T.D., Barnes, C., Campbell, P., et al.: Origins and functional impact of copy number variation in the human genome. Nature 464(7289), 704–712 (2010)

    Article  Google Scholar 

  34. Zeng, Y., Cullen, B.R.: Sequence requirements for micro RNA processing and function in human cells. RNA 9(1), 112–123 (2003)

    Article  Google Scholar 

  35. Blankenberg, D., Kuster, G.V., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., Taylor, J.: Galaxy: a web-based genome analysis tool for experimentalists. Current Protocols Mol. Biol. 19, 1–21 (2010)

    Google Scholar 

Download references

Acknowledgments

The results reported here are based upon the data generated by the TCGA Research Network: http://cancergenome.nih.gov/.

Funding

The work was financially supported by the SysBioNet, Italian Roadmap Research Infrastructure, and the Epigenomics Flagship Project EPIGEN [PB.P01].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabio Cumbo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Cumbo, F., Weitschek, E., Bertolazzi, P., Felici, G. (2017). IRIS-TCGA: An Information Retrieval and Integration System for Genomic Data of Cancer. In: Bracciali, A., Caravagna, G., Gilbert, D., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2016. Lecture Notes in Computer Science(), vol 10477. Springer, Cham. https://doi.org/10.1007/978-3-319-67834-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67834-4_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67833-7

  • Online ISBN: 978-3-319-67834-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics