Abstract
This century has witnessed the emergence of new branch of science—data science that facilitates the analysis of large amounts of data which in turn helps in taking model-based data-driven decisions. The prelude to any successful analytical model building and implementation phase is a properly conducted initial data analysis stage. IDA encompasses laborious tasks of data cleansing: missing value treatment, outlier detection, checking the veracity of data, data transformation, and thus preparing data for model building. A systematic, disciplined, and non-personalized approach to IDA reduces the probability of incorrect and inaccurate results from the model. The amount of data presented for model building today makes the IDA stage a very crucial task which cannot be manually conducted. Machine learning can be applied to analyze complex and bigger data, find patterns accurately, etc. Hence, it could also be used for data preparation prior to model building. This paper tries to reduce the ad hoc nature of IDA by providing a conceptual framework using machine learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dwivedi SK, Rawat B (2015) A review paper on data preprocessing: a critical phase in web usage mining process. In: 2015 International conference on green computing and internet of things (ICGCIoT), Noida, 2015, pp 506–510
Pendharkar PC (2005) A data envelopment analysis-based approach for data preprocessing. IEEE Trans Knowl Data Eng 17(10):1379–1388. https://doi.org/10.1109/tkde.2005.155
Suneetha KR, Krishnamoorthi Dr R (2009) Data preprocessing and easy access retrieval of data through data ware house. In: Proceedings of the World congress on engineering and computer science 2009, vol I WCECS 2009, October 20–22, 2009, San Francisco, USA, 978-988-17012-6-8
Sudheer Reddy K, Kantha Reddy M, Sitaramulu V (2013) An effective data preprocessing method for Web Usage Mining. In: 2013 International conference on information communication and embedded systems (ICICES), Chennai, 2013, pp 7–10
Das K, Behera RN (2017) A survey on machine learning: concept, algorithms and applications. Int J Innovative Res Comput Commun Eng 5(2):2320–9801
Dey A (2016) Machine learning algorithms: a review. (IJCSIT) Int J Comput Sci Inf Technol 7(3):1174–1179
GarcĂa S, Luengo J, Herrera F (2015) Data preprocessing. In: Data mining. Springer International Publishing Switzerland
Alam S, Yao N (2018) The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. In: Computational and mathematical organization theory. Springer, Berlin
Xu S, Qian Y, Hu RQ (2017) A data-driven preprocessing scheme on anomaly detection in big data applications. In: 2017 IEEE conference on computer communications workshops (INFOCOM WKSHPS), Atlanta, GA, 2017, pp 814–819. https://doi.org/10.1109/infcomw.2017.8116481
Kaur S, Jindal S (2016) A survey on machine learning algorithms. Int J Innovative Res Adv Eng (IJIRAE) 3(11):2349–2763
Khanum M, Mahboob T (2015) A survey on unsupervised machine learning algorithms for automation, classification and maintenance. Int J Comput Appl 119(13):0975–8887
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Smitha Rao, M.S., Pallavi, M., Geetha, N. (2019). Conceptual Machine Learning Framework for Initial Data Analysis. In: Peng, SL., Dey, N., Bundele, M. (eds) Computing and Network Sustainability. Lecture Notes in Networks and Systems, vol 75. Springer, Singapore. https://doi.org/10.1007/978-981-13-7150-9_6
Download citation
DOI: https://doi.org/10.1007/978-981-13-7150-9_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7149-3
Online ISBN: 978-981-13-7150-9
eBook Packages: EngineeringEngineering (R0)