Hybrid Optimization in Big Data: Error Detection and Data Repairing by Big Data Cleaning Using CSO-GSA

  • K. V. Rama Satish
  • N. P. Kavya
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 801)


Data cleaning is an important process in the history of data acquisition, data storage, data management and data analytics, and is still go through rapid development. In fact, cleaning of data is considered as a very important challenging task in the Big data era, due to the exponential growth of data in terms volume and variety of data in most of the applications. This paper focus to prove an accurate data extraction system in different ways of Data cleaning, i.e., error detection methods and data repairing algorithms. To achieve the accuracy of data extraction and improve the quality of data, this paper proposes a hybrid Cuckoo Search Optimization along with Gravitational Search algorithm (CSO-GSA) which is used to effectively detect the error from the data received by the source file and repairs the data before delivering it. Through the experiment on the MATLAB platform, it is exhibits the proposed approach to bringing down the time for error detection and correction in huge data sets with acceptable error detecting accuracy.


Big data Data cleaning Data repairing Error detection Data quality Hybrid Cuckoo Search Optimization Gravitational Search Algorithm 


  1. 1.
    Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A., Khan, S.U.: The rise of “Big data” on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2015)CrossRefGoogle Scholar
  2. 2.
    Wang, G., Gunasekaran, A., Ngai, E.W.T., Papadopoulos, T.: Big data analytics in logistics and supply chain management: certain Investigations for research and applications. Int. J. Prod. Econ. 176, 98–110 (2016)CrossRefGoogle Scholar
  3. 3.
    Assunção, M.D., Calheiros, R.N., Bianchi, S., Netto, M.A.S., Buyya, R.: Big data computing and clouds: trends and future directions. J. Parallel Distrib. Comput. 79, 3–15 (2015)CrossRefGoogle Scholar
  4. 4.
    Zhang, Y., Zhang, G., Chen, H., Porter, A.L., Zhu, D., Lu, J.: Topic analysis and forecasting for science, technology and innovation: methodology with a case study focusing on Big data research. Technol. Forecast. Soc. Change 105, 179–191 (2016)CrossRefGoogle Scholar
  5. 5.
    Zhang, H., Chen, G., Ooi, B.C., Tan, K.-L., Zhang, M.: In-memory Big data management and processing: a survey. IEEE Trans. Knowl. Data Eng. 27(7), 1920–1948 (2015)CrossRefGoogle Scholar
  6. 6.
    Jin, X., Wah, B.W., Cheng, X., Wang, Y.: Significance and challenges of Big data research. Big Data Res. 2(2), 59–64 (2015)CrossRefGoogle Scholar
  7. 7.
    Fong, S., Wong, R., Vasilakos, A.: Accelerated PSO swarm search feature selection for data stream mining Big data (2015)Google Scholar
  8. 8.
    Wu, D., Zhu, L., Xiwei, X., Sakr, S., Sun, D., Qinghua, L.: building pipelines for heterogeneous execution environments for Big data processing. IEEE Softw. 33(2), 60–67 (2016)CrossRefGoogle Scholar
  9. 9.
    Wamba, S.F., Akter, S., Edwards, A., Chopin, G., Gnanzou, D.: How ‘Big data’can make Big impact: findings from a systematic review and a longitudinal case study. Int. J. Prod. Econ. 165, 234–246 (2015)CrossRefGoogle Scholar
  10. 10.
    Tan, K.H., Zhan, Y., Ji, G., Ye, F., Chang, C.: Harvesting Big data to enhance supply chain innovation capabilities: an analytic infrastructure based on deduction graph. Int. J. Prod. Econ. 165, 223–233 (2015)CrossRefGoogle Scholar
  11. 11.
    Fan, C., Xiao, F., Madsen, H., Wang, D.: Temporal knowledge discovery in Big BAS data for building energy management. Energy Build. 109, 75–89 (2015)CrossRefGoogle Scholar
  12. 12.
    Dong, H., Wu, M., Ding, X., Chu, L., Jia, L., Qin, Y., Zhou, X.: Traffic zone division based on Big data from mobile phone base stations. Trans. Res. Part C: Emerg. Technol. 58, 278–291 (2015)CrossRefGoogle Scholar
  13. 13.
    Zhou, K., Chao, F., Yang, S.: Big data driven smart energy management: from Big data to Big insights. Renew. Sustain. Energy Rev. 56, 215–225 (2015)CrossRefGoogle Scholar
  14. 14.
    Triguero, I., Peralta, D., Bacardit, J., García, S., Herrera, F.: MRPR: a MapReduce solution for prototype reduction in Big data classification. Neurocomputing 150, 331–345 (2015)CrossRefGoogle Scholar
  15. 15.
    Suresh, S.: Big data and predictive analytics: applications in the care of children. Pediatr. Clin. N. Am. 63(2), 357–366 (2016)CrossRefGoogle Scholar
  16. 16.
    Pääkkönen, P., Pakkala, D.: Reference architecture and classification of technologies, products and services for Big data systems. Big data Res. 2(4), 166–186 (2016)CrossRefGoogle Scholar
  17. 17.
    Wang, Y., Kung, L., Byrd, T.A.: Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol. Forecast. Soc. Change (2016)Google Scholar
  18. 18.
    Zhang, Y., Qiu, M., Tsai, C.W., Hassan, M.M., Alamri, A.: Health-CPS: healthcare cyber-physical system assisted by cloud and Big data (2015)Google Scholar
  19. 19.
    Zhong, R.Y., Huang, G.Q., Lan, S., Dai, Q.Y., Chen, X., Zhang, T.: A Big data approach for logistics trajectory discovery from RFID-enabled production data. Int. J. Prod. Econ. 165, 260–272 (2015)CrossRefGoogle Scholar
  20. 20.
    D’Oca, S., Hong, T.: Occupancy schedules learning process through a data mining framework. Energy Build. 88, 395–408 (2015)CrossRefGoogle Scholar
  21. 21.
    Daneshmand, A., et al.: Hybrid random/deterministic parallel algorithms for convex and nonconvex Big data optimization. IEEE Trans. Sig. Process. 63(15), 3914–3929 (2015)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Wu, X., Zhu, X., Gong-Qing, W., Ding, W.: Data mining with Big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)CrossRefGoogle Scholar
  23. 23.
    Wang, L., Geng, H., Liu, P., Ke, L., Kolodziej, J., Ranjan, R., Zomaya, A.Y.: Particle swarm optimization based dictionary learning for remote sensing Big data. Knowl.-Based Syst. 79, 43–50 (2015)CrossRefGoogle Scholar
  24. 24.
    Zhang, L., Chuan, W., Li, Z., Guo, C., Chen, M., Lau, F.: Moving Big data to the cloud: an online cost-minimizing approach. IEEE J. Sel. Areas Commun. 31(12), 2710–2721 (2013)CrossRefGoogle Scholar
  25. 25.
    Zheng, K., Yang, Z., Zhang, K., Chatzimisios, P., Yang, K., Xiang, W.: Big data-driven optimization for mobile networks toward 5G. Network 30(1), 44–51 (2016)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.RNS Institute of TechnologyBengaluruIndia

Personalised recommendations