Skip to main content

Research on Wine Analysis Based on Data Preprocessing

  • Conference paper
  • First Online:
  • 1259 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1075))

Abstract

In the times of data increasing explosively, data preprocessing technology is particularly important for extracting information from massive data. In this paper, data preprocessing technology was implemented by building models including missing data imputation, duplicate values removal, outlier detections, data standardization and data statute based on the wine data in the UCI data set. Then the preprocessed data was compared with raw data with K-means algorithm, linear regression model and decision tree classification algorithm. The experimental results showed that after data preprocessing, the clustering error was significantly reduced, the fitness of the linear regression model increased and the classification accuracy of decision tree was higher, which showed the importance of data preprocessing and may have some referenced value to optimize data processing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Zhou, Q.: Analysis of common data preprocessing techniques. World Commun. 26(01), 17–18 (2019)

    Google Scholar 

  2. Han, J., et al.: Data preprocessing. In: Han, J., Kamber, M., Pei, J. (eds.) Data Mining, 3rd edn., pp. 83–124. Morgan Kaufmann, Boston (2012)

    Chapter  Google Scholar 

  3. Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall PTR, Upper Saddle River (2002)

    Google Scholar 

  4. Jian, Z., Jin, X.: Research on data preprocess in data mining and its application. Appl. Res. Comput. 7,117–118+157 (2004)

    Google Scholar 

  5. Sreenivas, P., Srikrishna, C.V.: An analytical approach for data preprocessing. In: 2013 International Conference on Emerging Trends in Communication, Control, Signal Processing and Computing Applications (C2SPCA), Bangalore, pp. 1–12 (2013)

    Google Scholar 

  6. Sun, B.: Research on data-preprocessing for construction of university information systems. In: 2010 International Conference on Computer Application and System Modeling (ICCASM 2010), Taiyuan, pp. V1-459–V1-462 (2010)

    Google Scholar 

  7. Liu, K.: Clinical data preprocessing and case studies of POMDP for TCM treatment knowledge discovery. In: IEEE International Conference on E-Health Networking. IEEE (2012)

    Google Scholar 

  8. Kumar, M., Kalia, A.: Preprocessing and symbolic representation of stock data. In: Second International Conference on Advanced Computing & Communication Technologies. IEEE (2012)

    Google Scholar 

  9. Hawkins, D.: Indentification of Outliers. Chapman and Hall, London (1980)

    Book  Google Scholar 

  10. Laurikkala, J., Juhola, M., Kentala, E.: Informal identification of outliers in medical data. In: Fifth International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, Berlin (2000)

    Google Scholar 

  11. Breunig, M., Kriegel, H.P., Ng, R., et al.: LOF: indentifying density based local outliers. In: Proceeding of ACM SIGMOD Conference, Dallas, pp. 93–104 (2009)

    Article  Google Scholar 

  12. Liu, J., Zhang, K., Wang, G.: Comparative study on data standardization methods in comprehensive evaluation. Digit. Technol. Appl. 36(06), 84–85 (2018)

    Google Scholar 

  13. Azar, A.T., Hassanien, A.E.: Dimensionality reduction of medical big data using neural-fuzzy classifier. Soft. Comput. 19, 1115–1127 (2015)

    Article  Google Scholar 

  14. Chu, F., Wang, L.P.: Applications of support vector machines to cancer classification with microarray data. Int. J. Neural Syst. 15(6), 475–484 (2005)

    Article  Google Scholar 

  15. Wang, L.P., Chu, F., Xie, W.: Accurate cancer classification using expressions of very few genes. IEEE-ACM Trans. Bioinf. Comput. Biol. 4, 40–53 (2007)

    Article  Google Scholar 

  16. Zhang, L., Wang, L.P., Lin, W.: Semi-supervised biased maximum margin analysis for interactive image retrieval. IEEE Trans. Image Process. 21(4), 2294–2308 (2012)

    Article  MathSciNet  Google Scholar 

  17. Gao, H.: Experimental research on decision tree J48 algorithm based on weka platform. J. Hunan Inst. Sci. Technol. (Nat. Sci. Ed.) 30(01), 21–25 (2017)

    Google Scholar 

Download references

Acknowledgements

This paper is partially supported by The National Natural Science Foundation of China (No. 61563044, 61866031); National Natural Science Foundation of Qinghai Province (No. 2017-ZJ-902); The Applied Basic Research Programs of Science and Technology Department of Sichuan Province (No. 2019YJ0110); Youth Foundation of Qinghai University (No. 2017-QGY-4, 2018-QGY-7); Teaching Research Project of Qinghai University(KC18038, SZ18015, JY201805); Open Research Fund Program of State key Laboratory of Hydroscience and Engineering (No. sklhse-2017-A-05).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaolan Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Meng, X., Zhu, X., Yang, S., Wang, L., Qi, J., Yang, P. (2020). Research on Wine Analysis Based on Data Preprocessing. In: Liu, Y., Wang, L., Zhao, L., Yu, Z. (eds) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. ICNC-FSKD 2019. Advances in Intelligent Systems and Computing, vol 1075. Springer, Cham. https://doi.org/10.1007/978-3-030-32591-6_63

Download citation

Publish with us

Policies and ethics