A Study on Big Cancer Data

  • Sabuzima Nayak
  • Ripon PatgiriEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 940)


The Cancer research is the utmost important research field nowadays for the well being of human being. Every year, thousands of people die in Cancer. Therefore, there is a high demand for Cancer research. The Cancer research requires both computing and medical knowledge. The Computer Scientists are engaged in Big Cancer Computing for storing, processing, and management of large sets of Cancer Data. The analysis and prediction of cancer data are also a prominent challenge in the Big Cancer Computing. In this paper, we present an investigation report on large scale computation of Cancer Data. Moreover, we focus on the relationship between Big Data and Cancer Data. In addition, we present in-depth insight on state-of-the-art Big Cancer Computing and its applicability in Big Data Analytics.


Big Data Cancer Big Cancer Data Big Cancer Computing Machine learning Big Data Analytics Precision medicine Privacy Cancer data visualization 


  1. 1.
    Biomed central journals. Accessed 25 Jan 2018
  2. 2.
    National Cancer Institute (NCI). Accessed 26 Jan 2018
  3. 3.
    National Human Genome Research Institute. Accessed 26 Jan 2018
  4. 4.
    Report on president’s council of advisors on science and technology (2008). Accessed 25 Jan 2018
  5. 5.
    Abuin, J.M., Pichel, J.C., Pena, T.F., Amigo, J.: BigBWA: approaching the burrows-wheeler aligner to big data technologies. Bioinformatics 31(24), 4003–4005 (2015)Google Scholar
  6. 6.
    Alliance, G.: Creating a global alliance to enable responsible sharing of genomic and clinical data (2014)Google Scholar
  7. 7.
    Alyass, A., Turcotte, M., Meyre, D.: From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med. Genomics 8, 33 (2015)Google Scholar
  8. 8.
    Bromley, D., Rysavy, S.J., Su, R., Toofanny, R.D., Schmidlin, T., Daggett, V.: DIVE: a data intensive visualization engine. Bioinformatics 30(4), 593–595 (2014)Google Scholar
  9. 9.
    Cai, Z., Xu, D., Zhang, Q., Zhang, J., Ngai, S.-M., Shao, J.: Classification of lung cancer using ensemble-based feature selection and machine learning methods. Mol. BioSyst. 11(3), 791–800 (2015)Google Scholar
  10. 10.
    Cattell, R.: Scalable SQL and NoSQL data stores. ACM Sigmod Rec. 39(4), 12–27 (2011)Google Scholar
  11. 11.
    Cerami, E., Gao, J., Dogrusoz, U., Gross, B.E., Sumer, S.O., Aksoy, B.A., Jacobsen, A., Byrne, C.J., Heuer, M.L., Larsson, E., et al.: The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data (2012)Google Scholar
  12. 12.
    Chen, H., Chen, W., Liu, C., Zhang, L., Su, J., Zhou, X.: Relational network for knowledge discovery through heterogeneous biomedical and clinical features. Sci. Rep. 6 (2016)Google Scholar
  13. 13.
    Choudhury, S., Fishman, J.R., McGowan, M.L., Juengst, E.T.: Big data, open science and the brain: lessons learned from genomics. Front. Hum. Neurosci. 8, 239 (2014)Google Scholar
  14. 14.
    Dunn, W., Burgun, A., Krebs, M.-O., Rance, B.: Exploring and visualizing multidimensional data in translational research platforms. Brief. Bioinform. 18(6), 1044 (2016)Google Scholar
  15. 15.
    Ethier, J.-F., Dameron, O., Curcin, V., McGilchrist, M.M., Verheij, R.A., Arvanitis, T.N., Taweel, A., Delaney, B.C., Burgun, A.: A unified structural/terminological interoperability framework based on LexEVS: application to transform. Am. Med. Inform. Assoc. 20, 986 (2013)Google Scholar
  16. 16.
    Gao, J., Aksoy, B.A., Dogrusoz, U., Dresdner, G., Gross, B., Sumer, S.O., Sun, Y., Jacobsen, A., Sinha, R., Larsson, E., et al.: Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 6(269), pl1 (2013)Google Scholar
  17. 17.
    Gligorijević, V., Malod-Dognin, N., Pržulj, N.: Patient-specific data fusion for cancer stratification and personalised treatment. In: Biocomputing 2016: Proceedings of the Pacific Symposium, pp. 321–332. World Scientific (2016)Google Scholar
  18. 18.
    Han, B., Kang, E.Y., Raychaudhuri, S., de Bakker, P.I., Eskin, E.: Fast pairwise IBD association testing in genome-wide association studies. Bioinformatics 30(2), 206–213 (2013)Google Scholar
  19. 19.
    Hood, L., Friend, S.H.: Predictive, personalized, preventive, participatory (P4) cancer medicine. Nat. Rev. Clin. Oncol. 8(3), 184 (2011)Google Scholar
  20. 20.
    Hoxha, J., Weng, C.: Leveraging dialog systems research to assist biomedical researchers’ interrogation of big clinical data. J. Biomed. Inform. 61, 176–184 (2016)Google Scholar
  21. 21.
    Knoppers, B.M., Thorogood, A.M.: Ethics and big data in health. Curr. Opin. Syst. Biol. 4, 53–57 (2017)Google Scholar
  22. 22.
    Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)Google Scholar
  23. 23.
    Krumholz, H.M.: Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. Health Aff. 33(7), 1163–1170 (2014)Google Scholar
  24. 24.
    Li, S., Besson, S., Blackburn, C., Carroll, M., Ferguson, R.K., Flynn, H., Gillen, K., Leigh, R., Lindner, D., Linkert, M., Moore, W.J., Ramalingam, B., Rozbicki, E., Rustici, G., Tarkowska, A., Walczysko, P., Williams, E., Allan, C., Burel, J.-M., Moore, J., Swedlow, J.R.: Metadata management for high content screening in omero. Methods 96(Supplement C), 27–32 (2016)Google Scholar
  25. 25.
    Luo, Z., Miotto, R., Weng, C.: A human-computer collaborative approach to identifying common data elements in clinical trial eligibility criteria. J. Biomed. Inform. 46(1), 33–39 (2013)Google Scholar
  26. 26.
    Machanavajjhala, A., Reiter, J.P.: Big privacy: protecting confidentiality in big data. XRDS 19(1), 20–23 (2012)Google Scholar
  27. 27.
    MacRury, S., Finlayson, J., Hussey-Wilson, S., Holden, S.: Development of a pseudo/anonymised primary care research database: proof-of-concept study. Health Inform. J. 22(2), 113–119 (2016)Google Scholar
  28. 28.
    Malod-Dognin, N., Petschnigg, J., Pržulj, N.: Precision medicine–a promising, yet challenging road lies ahead. Curr. Opin. Syst. Biol. 7, 1–7 (2017)Google Scholar
  29. 29.
    Marx, V.: Biology: the big challenges of big data. Nature 498(7453), 255–260 (2013)Google Scholar
  30. 30.
    Masseroli, M., Pinoli, P., Venco, F., Kaitoua, A., Jalili, V., Palluzzi, F., Muller, H., Ceri, S.: Genometric query language: a novel approach to large-scale genomic data management. Bioinformatics 31(12), 1881–1888 (2015)Google Scholar
  31. 31.
    Mirnezami, R., Nicholson, J., Darzi, A.: Preparing for precision medicine. N. Engl. J. Med. 366(6), 489–491 (2012)Google Scholar
  32. 32.
    Mittelstadt, B.D., Floridi, L.: The ethics of big data: current and foreseeable issues in biomedical contexts. Sci. Eng. Ethics 22, 303–341 (2016)Google Scholar
  33. 33.
    Murdoch, T.B., Detsky, A.S.: The inevitable application of big data to health care. Jama 309(13), 1351–1352 (2013)Google Scholar
  34. 34.
    Murphy, S.N., Dubey, A., Embi, P.J., Harris, P.A., Richter, B.G., Turisco, F., Weber, G.M., Tcheng, J.E., Keogh, D.: Current state of information technologies for the clinical research enterprise across academic medical centers. Clin. Transl. Sci. 5(3), 281–284 (2012)Google Scholar
  35. 35.
    Noor, A.M., Holmberg, L., Gillett, C., Grigoriadis, A.: Big data: the challenge for small research groups in the era of cancer genomics. Br. J. Cancer 113(10), 1405–1412 (2015)Google Scholar
  36. 36.
    Patgiri, R.: Issues and challenges in big data: a survey. In: Negi, A., Bhatnagar, R., Parida, L. (eds.) Distributed Computing and Internet Technology, pp. 295–300. Springer, Cham (2018)Google Scholar
  37. 37.
    Patgiri, R., Ahmed, A.: Big data: the v’s of the game changer paradigm. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 17–24 (2016)Google Scholar
  38. 38.
    Patgiri, R., Nayak, S., Akutota, T., Paul, B.: Machine learning: a dark side of cancer computing. In: Proceedings of the 2018 International Conference on Bioinformatics and Computational Biology, pp. 92–98 (2018)Google Scholar
  39. 39.
    Robbins, D.E., Gruneberg, A., Deus, H.F., Tanik, M.M., Almeida, J.S.: A self-updating road map of the cancer genome atlas. Bioinformatics 29(10), 1333–1340 (2013)Google Scholar
  40. 40.
    Rosenstein, B.S., Capala, J., Efstathiou, J.A., Hammerbacher, J., Kerns, S.L., Ostrer, H., Prior, F.W., Vikram, B., Wong, J., Xiao, Y., et al.: How will big data improve clinical and basic research in radiation therapy? Int. J. Radiat. Oncol. Biol. Phys. 95(3), 895–904 (2016)Google Scholar
  41. 41.
    Saranath, D., Khanna, A.: Current status of cancer burden: global and indian scenario. Biomed. Res. J. 1(1), 1–5 (2014)Google Scholar
  42. 42.
    Schroeder, M.P., Gonzalez-Perez, A., Lopez-Bigas, N.: Visualizing multidimensional cancer genomics data. Genome Med. 5(1), 9 (2013)Google Scholar
  43. 43.
    Sinha, S., Tsang, E.K., Zeng, H., Meister, M., Dill, D.L.: Mining TCGA data using boolean implications. PloS One 9(7), e102119 (2014)Google Scholar
  44. 44.
    Sinnott, R.O., Beuschlein, F., Effendy, J., Eisenhofer, G., Gloeckner, S., Stell, A.: Beyond a disease registry: an integrated virtual environment for adrenal cancer research. J. Grid Comput. 14(4), 515–532 (2016)Google Scholar
  45. 45.
    UCI. Breast cancer wisconsin (diagnostic) data set. Accessed 20 Jan 2018
  46. 46.
    Vaske, C.J., Benz, S.C., Sanborn, J.Z., Earl, D., Szeto, C., Zhu, J., Haussler, D., Stuart, J.M.: Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using paradigm. Bioinformatics 26(12), i237–i245 (2010)Google Scholar
  47. 47.
    Watts, N.A., Feltus, F.A.: Big data smart socket (BDSS): a system that abstracts data transfer habits from end users. Bioinformatics 33(4), 627–628 (2017)Google Scholar
  48. 48.
    Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R.M., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C., Stuart, J.M., Cancer Genome Atlas Research Network: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.National Institute of Technology SilcharSilcharIndia

Personalised recommendations