Abstract
Nowadays, Cancer diagnosis is one of the major challenging characteristics for treating cancer. The reality of cancer patients rely on the diagnosis of cancer at the early stages (either in stage 1 or stage 2). If the cancer is diagnosed in stage 3 or later stages means the changes of survival of the patient will become more critical. Normally, single patient records will generate a huge amount of data if the data could be manage and analyze means to solve many problems for identifying the patterns it will leads to diagnose the cancer. Recent work several machine learning algorithms are introduced for the classification of cancer. However still the classification accuracy of machine learning algorithms are reduced because of huge number of samples. So the proposed work introduces a new Hadoop Distributed File System (HDFS) is focused in this work. In this paper, the proposed phenotype techniques are used which handle and classifies the raw EHR (Electronic Health Record) and EMR (Electronic Medical Record). It is based on the HDFS and Two-Phase Map Reduce. Phenotype algorithm uses NLP (National Language Processing) tool which will analyze and classify the cancer patient data like gene mapping, age related data, image and ultrasonic frequency processing, identification and analysis of irregularities, disease and personal histories. In this paper, the three factorized model is used which calculates the mean score values. The values are calculated by disease stage, pain status, etc. This paper focuses big data analytics for cancer diagnosis and the simulation results shows the proposed system produces the highest performance.
Similar content being viewed by others
References
Turki, T., An empirical study of machine learning algorithms for cancer identification, In IEEE 15th International Conference on Networking, Sensing and Control (ICNSC) ,pp. 1–5, 2018.
Mosquera-Lopez, C., Agaian, S., Velez-Hoyos, A., and Thompson, I., Computer-aided prostate cancer diagnosis from digitized histopathology: a review on texture-based systems. IEEE reviews in biomedical engineering 8:98–113, 2014.
Martin, M. E., Wabuyele, M. B., Chen, K., Kasili, P., Panjehpour, M., Phan, M., Overholt, B., Cunningham, G., Wilson, D., DeNovo, R. C., and Vo-Dinh, T., Development of an advanced hyperspectral imaging (HSI) system with applications for cancer detection. Annals of biomedical engineering 34(6):1061–1068, 2006.
Korupally, V. R., and Pinnamaneni, S. R., Bigdata analytics for diagnosis and prognosis of cancer using genetic algorithm. International Journal of Computer Science and Information Technologies (IJCSIT) 7(3):1251–1253, 2016.
Hajeer, M. H., and Dasgupta, D., Handling big data using a data-aware HDFS and evolutionary clustering technique, IEEE Transactions on Big Data. IEEE Transactions on Big Data 5(2):134–147, 2017.
Triguero, I., Galar, M., Vluymans, S., Cornelis, C., Bustince, H., Herrera, F. and Saeys, Y., Evolutionary undersampling for imbalanced big data classification, In IEEE Congress on Evolutionary Computation (CEC), pp. 715–722, 2015.
Aledhari, M., Di Pierro, M., Hefeida, M. and Saeed, F., A deep learning-based data minimization algorithm for fast and secure transfer of big genomic datasets, IEEE Transactions on Big Data, pp.1–13, 2018.
García, S., and Herrera, F., Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evolutionary computation 17(3):275–306, 2009.
Saeed, F., Big data proteogenomics and high performance computing: Challenges and opportunities, In IEEE Global Conference on Signal and Information Processing (GlobalSIP) , pp. 141–145, 2015.
Manogaran, G., Vijayakumar, V., Varatharajan, R., Kumar, P. M., Sundarasekar, R., and Hsu, C. H., Machine learning based big data processing framework for cancer diagnosis using hidden Markov model and GM clustering. Wireless personal communications 102(3):2099–2116, 2018.
Sun, J. and Reddy, C.K., Big data analytics for healthcare. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining , pp. 1525–1525, 2013.
Rodrigues, Jr., J. F., Paulovich, F. V., de Oliveira, M. C., and de Oliveira, Jr., O. N., On the convergence of nanotechnology and Big Data analysis for computer-aided diagnosis. Nanomedicine 11(8):959–982, 2016.
Mo, H., Thompson, W. K., Rasmussen, L. V., Pacheco, J. A., Jiang, G., Kiefer, R., Zhu, Q., Xu, J., Montague, E., Carrell, D. S., and Lingren, T., Desiderata for computable representations of electronic health records-driven phenotype algorithms. Journal of the American Medical Informatics Association 22(6):1220–1230, 2015.
McCarty, C. A., Chisholm, R. L., Chute, C. G., Kullo, I. J., Jarvik, G. P., Larson, E. B., Li, R., Masys, D. R., Ritchie, M. D., Roden, D. M., and Struewing, J. P., The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC medical genomics 4(1):1–13, 2011.
Pendergrass, S. A., Brown-Gentry, K., Dudek, S. M., Torstenson, E. S., Ambite, J. L., Avery, C. L., Buyske, S., Cai, C., Fesinmeyer, M. D., Haiman, C., and Heiss, G., The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery. Genetic epidemiology 35(5):410–422, 2011.
Milovic, B., Prediction and decision making in health care using data mining, Kuwait chapter of arabian journal of business and management review, vol.33, no.848, pp.1–11, 2012.
Cruz, J.A. and Wishart, D.S., Applications of machine learning in cancer prediction and prognosis, Cancer informatics, 2, p.117693510600200030, 2006.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interests and the paper has not been submitted to any other Journals.
Research involving human participants and/or animals
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
It is not required as the dataset is taken online databases.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Image & Signal Processing
Rights and permissions
About this article
Cite this article
Sivakumar, K., Nithya, N.S. & Revathy, O. Phenotype Algorithm based Big Data Analytics for Cancer Diagnose. J Med Syst 43, 264 (2019). https://doi.org/10.1007/s10916-019-1409-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-019-1409-z