Skip to main content

Advertisement

Log in

Phenotype Algorithm based Big Data Analytics for Cancer Diagnose

  • Image & Signal Processing
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Nowadays, Cancer diagnosis is one of the major challenging characteristics for treating cancer. The reality of cancer patients rely on the diagnosis of cancer at the early stages (either in stage 1 or stage 2). If the cancer is diagnosed in stage 3 or later stages means the changes of survival of the patient will become more critical. Normally, single patient records will generate a huge amount of data if the data could be manage and analyze means to solve many problems for identifying the patterns it will leads to diagnose the cancer. Recent work several machine learning algorithms are introduced for the classification of cancer. However still the classification accuracy of machine learning algorithms are reduced because of huge number of samples. So the proposed work introduces a new Hadoop Distributed File System (HDFS) is focused in this work. In this paper, the proposed phenotype techniques are used which handle and classifies the raw EHR (Electronic Health Record) and EMR (Electronic Medical Record). It is based on the HDFS and Two-Phase Map Reduce. Phenotype algorithm uses NLP (National Language Processing) tool which will analyze and classify the cancer patient data like gene mapping, age related data, image and ultrasonic frequency processing, identification and analysis of irregularities, disease and personal histories. In this paper, the three factorized model is used which calculates the mean score values. The values are calculated by disease stage, pain status, etc. This paper focuses big data analytics for cancer diagnosis and the simulation results shows the proposed system produces the highest performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Turki, T., An empirical study of machine learning algorithms for cancer identification, In IEEE 15th International Conference on Networking, Sensing and Control (ICNSC) ,pp. 1–5, 2018.

  2. Mosquera-Lopez, C., Agaian, S., Velez-Hoyos, A., and Thompson, I., Computer-aided prostate cancer diagnosis from digitized histopathology: a review on texture-based systems. IEEE reviews in biomedical engineering 8:98–113, 2014.

    Article  Google Scholar 

  3. Martin, M. E., Wabuyele, M. B., Chen, K., Kasili, P., Panjehpour, M., Phan, M., Overholt, B., Cunningham, G., Wilson, D., DeNovo, R. C., and Vo-Dinh, T., Development of an advanced hyperspectral imaging (HSI) system with applications for cancer detection. Annals of biomedical engineering 34(6):1061–1068, 2006.

    Article  Google Scholar 

  4. Korupally, V. R., and Pinnamaneni, S. R., Bigdata analytics for diagnosis and prognosis of cancer using genetic algorithm. International Journal of Computer Science and Information Technologies (IJCSIT) 7(3):1251–1253, 2016.

    Google Scholar 

  5. Hajeer, M. H., and Dasgupta, D., Handling big data using a data-aware HDFS and evolutionary clustering technique, IEEE Transactions on Big Data. IEEE Transactions on Big Data 5(2):134–147, 2017.

    Article  Google Scholar 

  6. Triguero, I., Galar, M., Vluymans, S., Cornelis, C., Bustince, H., Herrera, F. and Saeys, Y., Evolutionary undersampling for imbalanced big data classification, In IEEE Congress on Evolutionary Computation (CEC), pp. 715–722, 2015.

  7. Aledhari, M., Di Pierro, M., Hefeida, M. and Saeed, F., A deep learning-based data minimization algorithm for fast and secure transfer of big genomic datasets, IEEE Transactions on Big Data, pp.1–13, 2018.

  8. García, S., and Herrera, F., Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evolutionary computation 17(3):275–306, 2009.

    Article  Google Scholar 

  9. Saeed, F., Big data proteogenomics and high performance computing: Challenges and opportunities, In IEEE Global Conference on Signal and Information Processing (GlobalSIP) , pp. 141–145, 2015.

  10. Manogaran, G., Vijayakumar, V., Varatharajan, R., Kumar, P. M., Sundarasekar, R., and Hsu, C. H., Machine learning based big data processing framework for cancer diagnosis using hidden Markov model and GM clustering. Wireless personal communications 102(3):2099–2116, 2018.

    Article  Google Scholar 

  11. Sun, J. and Reddy, C.K., Big data analytics for healthcare. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining , pp. 1525–1525, 2013.

  12. Rodrigues, Jr., J. F., Paulovich, F. V., de Oliveira, M. C., and de Oliveira, Jr., O. N., On the convergence of nanotechnology and Big Data analysis for computer-aided diagnosis. Nanomedicine 11(8):959–982, 2016.

    Article  CAS  Google Scholar 

  13. Mo, H., Thompson, W. K., Rasmussen, L. V., Pacheco, J. A., Jiang, G., Kiefer, R., Zhu, Q., Xu, J., Montague, E., Carrell, D. S., and Lingren, T., Desiderata for computable representations of electronic health records-driven phenotype algorithms. Journal of the American Medical Informatics Association 22(6):1220–1230, 2015.

    PubMed  PubMed Central  Google Scholar 

  14. McCarty, C. A., Chisholm, R. L., Chute, C. G., Kullo, I. J., Jarvik, G. P., Larson, E. B., Li, R., Masys, D. R., Ritchie, M. D., Roden, D. M., and Struewing, J. P., The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC medical genomics 4(1):1–13, 2011.

    Article  Google Scholar 

  15. Pendergrass, S. A., Brown-Gentry, K., Dudek, S. M., Torstenson, E. S., Ambite, J. L., Avery, C. L., Buyske, S., Cai, C., Fesinmeyer, M. D., Haiman, C., and Heiss, G., The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery. Genetic epidemiology 35(5):410–422, 2011.

    Article  Google Scholar 

  16. Milovic, B., Prediction and decision making in health care using data mining, Kuwait chapter of arabian journal of business and management review, vol.33, no.848, pp.1–11, 2012.

  17. Cruz, J.A. and Wishart, D.S., Applications of machine learning in cancer prediction and prognosis, Cancer informatics, 2, p.117693510600200030, 2006.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Sivakumar.

Ethics declarations

Conflict of interest

The authors have no conflict of interests and the paper has not been submitted to any other Journals.

Research involving human participants and/or animals

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

It is not required as the dataset is taken online databases.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Image & Signal Processing

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sivakumar, K., Nithya, N.S. & Revathy, O. Phenotype Algorithm based Big Data Analytics for Cancer Diagnose. J Med Syst 43, 264 (2019). https://doi.org/10.1007/s10916-019-1409-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-019-1409-z

Keywords

Navigation