Advertisement

DL-GSA: A Deep Learning Metaheuristic Approach to Missing Data Imputation

  • Ayush Garg
  • Deepika Naryani
  • Garvit Aggarwal
  • Swati AggarwalEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10942)

Abstract

Incomplete data has emerged as a prominent problem in the fields of machine learning, big data and various other academic studies. Due to the surge in deep learning techniques for problem-solving, in this paper, authors have proposed a deep learning-metaheuristic approach to combat the problem of imputing missing data. The proposed approach (DL-GSA) makes use of the nature inspired metaheuristic, Gravitational search algorithm, in combination with a deep-autoencoder and performs better than existing methods in terms of both accuracy and time. Owing to these improvements, DL-GSA has wider applications in both time and accuracy sensitive areas like imputation of scientific and research datasets, data analysis, machine learning and big data.

Keywords

Autoencoder Missing at random Missing data imputation Gravitational search algorithm Missing completely at random 

References

  1. 1.
    Abdella, M., Marwala, T.: The use of genetic algorithms and neural networks to approximate missing data in database. In: 2005 IEEE 3rd International Conference on Computational Cybernetics, ICCC 2005, pp. 207–212. IEEE (2005)Google Scholar
  2. 2.
    Aydilek, I.B., Arslan, A.: A novel hybrid approach to estimating missing values in databases using k-nearest neighbors and neural networks. Int. J. Innovative Comput. Inf. Control 7(8), 4705–4717 (2012)Google Scholar
  3. 3.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  4. 4.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39, 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Enders, C.K.: Using the expectation maximization algorithm to estimate coefficient alpha for scales with item-level missing data. Psychol. Meth. 8(3), 322 (2003)CrossRefGoogle Scholar
  6. 6.
    Fischer, A., Igel, C.: An introduction to restricted boltzmann machines. In: Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) CIARP 2012. LNCS, vol. 7441, pp. 14–36. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33275-3_2CrossRefGoogle Scholar
  7. 7.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Isaacs, J.C.: Representational learning for sonar ATR. In: Proceedings SPIE. vol. 9072, p. 907203 (2014)Google Scholar
  9. 9.
    LeCun, Y., Cortes, C., Burges, C.J.: Mnist handwritten digit database. AT&T Labs (2010). http://yann.lecun.com/exdb/mnist2
  10. 10.
    Leke, C., Marwala, T.: Missing data estimation in high-dimensional datasets: a swarm intelligence-deep neural network approach. In: Tan, Y., Shi, Y., Niu, B. (eds.) ICSI 2016. Lecture Notes in Computer Science, vol. 9712. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-41000-5_26CrossRefGoogle Scholar
  11. 11.
    Leke, C., Ndjiongue, A.R., Twala, B., Marwala, T.: A deep learning-cuckoo search method for missing data estimation in high-dimensional datasets. In: Tan, Y., Takagi, H., Shi, Y. (eds.) ICSI 2017. LNCS, vol. 10385, pp. 561–572. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-61824-1_61CrossRefGoogle Scholar
  12. 12.
    Leke, C., Twala, B., Marwala, T.: Modeling of missing data prediction: computational intelligence and optimization algorithms. In: 2014 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 1400–1404. IEEE (2014)Google Scholar
  13. 13.
    Little, R.J., Rubin, D.B.: Statistical Analysis With Missing Data. Wiley, New York (2014)zbMATHGoogle Scholar
  14. 14.
    Marivate, V.N., Nelwamodo, F.V., Marwala, T.: Autoencoder, principal component analysis and support vector regression for data imputation. arXiv preprint arXiv:0709.2506 (2007)
  15. 15.
    Mistry, F.J., Nelwamondo, F.V., Marwala, T.: Missing data estimation using principle component analysis and autoassociative neural networks. J. Syst. Cybern. Inf. 7(3), 72–79 (2009)Google Scholar
  16. 16.
    Peng, C.Y.J., Harwell, M., Liou, S.M., Ehman, L.H., et al.: Advances in missing data methods and implications for educational research. Real data analysis, pp. 31–78 (2006)Google Scholar
  17. 17.
    Rashedi, E., Nezamabadi-Pour, H., Saryazdi, S.: GSA: a gravitational search algorithm. Inf. Sci. 179(13), 2232–2248 (2009)CrossRefGoogle Scholar
  18. 18.
    Rja, L., Rubin, D.: Statistical analysis with missing data. Wiley, New York (1987)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Ayush Garg
    • 1
  • Deepika Naryani
    • 1
  • Garvit Aggarwal
    • 1
  • Swati Aggarwal
    • 1
    Email author
  1. 1.Division of Computer Engineering, Netaji Subhas Institute of TechnologyUniversity of DelhiNew DelhiIndia

Personalised recommendations