Skip to main content

Missing Data Imputation for Machine Learning

  • Conference paper
  • First Online:
IoT as a Service (IoTaaS 2018)

Abstract

The imputation of missing values in datasets always plays an important role in the data preprocessing. In the process of data collection, because of the various reasons, the datasets often contain some missing values, and the excellent missing data imputation algorithms can increase the reliability of the dataset and reduce the impact of missing values on the whole dataset. In this paper, based on the Artificial Neural Network (ANN), we propose a missing data imputation method for the classification-type datasets. For each record which contains missing values, we make a list of the values that can be used to replace the missing data from the complete dataset. Our ANN model uses the complete records as the train dataset, and selects the most appropriate value in the list as the final result based on the label categories of the missing data. In our experiments, we compare our algorithm with the traditional single value imputation method and mean value imputation method with the Pima dataset. The result shows that our proposed algorithm can achieve better classification results when there are more missing values in the dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cheng, Y., Miao, D., Feng, Q.: Positive approximation and converse approximation in interval-valued fuzzy rough sets. Inf. Sci. 181, 2086–2110 (2011)

    Google Scholar 

  2. Meng, Z., Shi, Z.: Extended rough set-based attribute reduction in inconsistent incomplete decision systems. Inf. Sci. 204(20), 44–69 (2012)

    Article  MathSciNet  Google Scholar 

  3. Batista, G.E.A.P.A., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 17(5–6), 519–533 (2003)

    Google Scholar 

  4. Rahman, G., Islam, Z.: A decision tree-based missing value imputation technique for data pre-processing. In: The Australasian Data Mining Conference, pp. 41–50 (2010)

    Google Scholar 

  5. Silvaramírez, E.L., et al.: Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw. Official J. Int. Neural Netw. Soc. 24(1), 121–129 (2011)

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundations of CHINA (Grant No. 61771390, No. 61771392, No. 61501373, and No. 61271279), the National Science and Technology Major Project (Grant No. 2016ZX03001018-004, and No. 2015ZX03002006-004), and the Fundamental Research Funds for the Central Universities (Grant No. 3102017ZY018).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mao Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S., Li, B., Yang, M., Yan, Z. (2019). Missing Data Imputation for Machine Learning. In: Li, B., Yang, M., Yuan, H., Yan, Z. (eds) IoT as a Service. IoTaaS 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 271. Springer, Cham. https://doi.org/10.1007/978-3-030-14657-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14657-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14656-6

  • Online ISBN: 978-3-030-14657-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics