Abstract
Big data can be generally characterised by 5 VsāVolume, Velocity, Variety, Veracity and Variability. Many studies have been focused on using machine learning as a powerful tool of big data processing . In machine learning context, learning algorithms are typically evaluated in terms of accuracy, efficiency, interpretability and stability. These four dimensions can be strongly related to veracity, volume, variety and variability and are impacted by both the nature of learning algorithms and characteristics of data. This chapter analyses in depth how the quality of computational models can be impacted by data characteristics as well as strategies involved in learning algorithms. This chapter also introduces a unified framework for control of machine learning tasks towards appropriate employment of algorithms and efficient processing of big data. In particular, this framework is designed to achieve effective selection of data pre-processing techniques towards effective selection of relevant attributes, sampling of representative training and test data, and appropriate dealing with missing values and noise. More importantly, this framework allows the employment of suitable machine learning algorithms on the basis of the training data provided from the data pre-processing stage towards building of accurate, efficient and interpretable computational models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
H. Liu, A. Gegov and M. Cocea, Rule Based Systems for Big Data: A Machine Learning Approach, 1 ed., vol. 13, Switzerland: Springer, 2016.
āWhat is Big Data,ā SAS Institute Inc, [Online]. Available: http://www.sas.com/big-data/. [Accessed 17 May 2015].
āMaster Data Management for Big Data,ā IBM, [Online]. Available: http://www.01.ibm.com/software/data/infosphere/mdm-big-data/. [Accessed 17 May 2015].
W. Pedrycz and S. M. Chen, Eds., Information Granularity, Big Data, and Computational Intelligence, vol. 8, Switzerland: Springer, 2015.
P. Levine, āMachine LearningĀ +Ā Big Data,ā WorldPress, [Online]. Available: http://a16z.com/2015/01/22/machine-learning-big-data/. [Accessed 15 May 2015].
T. Condie, P. Mineiro, N. Polyzotis and M. Weimer, āMachine learning for big data,ā in ACM SIGMOD/PODS Conference, San Francisco, USA, 2013.
L. Wang and C. A. Alexander, āMachine Learning in Big Data,ā International Journal of Mathematical, Engineering and Management Sciences, vol. 1, no. 2, pp. 52ā61, 2016.
X. Wu, X. Zhu, G. Q. Wu and W. Ding, āData Mining with Big Data,ā IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97ā107, 2014.
S. Suthaharan, āBig data classification: problems and challenges in network intrusion prediction with machine learning,ā ACM SIGMETRICS Performance Evaluation Review, vol. 41, no. 4, pp. 70ā73, 2014.
O. Y. Al-Jarrah, P. D. Yoo, S. Muhaidat and G. K. Karagiannidis, āEfficient Machine Learning for Big Data: A Review,ā Big Data Research, vol. 2, no. 3, pp. 87ā93, 2015.
D. E. OāLeary, āArtificial Intelligence and Big Data,ā IEEE Intelligent Systems, vol. 28, no. 2, pp. 96ā99, 2013.
C. Ma, H. H. Zhang and X. Wang, āMachine learning for Big Data Analytics in Plants,ā Trends in Plant Science, vol. 19, no. 12, pp. 798ā808, 2014.
H. Adeli and N. Siddique, Computational Intelligence: Synergies of Fuzzy Logic, Neural Networks and Evolutionary Computing, New Jersey: John Wiley & Sons, 2013.
L. Rutkowski, Computational Intelligence: Methods and Techniques, Heidelberg: Springer, 2008.
J. Worrell, āComputational Learning Theory: 2014-2015,ā University of Oxford, 2014. [Online]. Available: https://www.cs.ox.ac.uk/teaching/courses/2014-2015/clt/. [Accessed 20 9 2016].
H. Liu, M. Cocea and A. Gegov, āInterpretability of Computational Models for Sentiment Analysis,ā in Sentiment Analysis and Ontology Engineering: An Environment of Computational Intelligence, vol. 639, W. Pedrycz and S. M. Chen, Eds., Switzerland, Springer, 2016, pp. 199ā220.
P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, New Jersey: Pearson Education, 2006.
T. Mitchell, Machine Learning, New York: McGraw Hill, 1997.
D. Barber, Bayesian Reasoning and Machine Learning, Cambridge: Cambridge University Press, 2012.
H. Liu, A. Gegov and F. Stahl, āCategorization and Construction of Rule Based Systems,ā in 15th International Conference on Engineering Applications of Neural Networks, Sofia, Bulgaria, 2014.
H. Liu, A. Gegov and M. Cocea, āNetwork Based Rule Representation for Knowledge Discovery and Predictive Modelling,ā in IEEE International Conference on Fuzzy Systems, Istanbul, 2015.
R. Quinlan, āInduction of Decision Trees,ā Machine Learning, vol. 1, pp. 81ā106, 1986.
J. Furnkranz, āSeparate-and-Conquer rule learning,ā Artificial Intelligence Review, vol. 13, pp. 3ā54, 1999.
J. Zhang, āSelecting typical instances in instance-based learning,ā in The 9th International Conference on Machine Learning, Aberdeen, Scotland, 1992.
H. e. Michiel, āBayes formula,ā in Encyclopedia of Mathematics, Springer, 2001.
I. Rish, āAn Empirical Study of the NaĆÆve Bayes Classifier,ā IJCAI 2001 workshop on empirical methods in artificial intelligence, vol. 3, no. 22, pp. 41ā46, 2001.
L. Breiman, āBagging predictors,ā Machine Learning, vol. 24, no. 2, pp. 123ā140, 1996.
Y. Freund and R. E. Schapire, āExperiments with a New Boosting Algorithm,ā in Machine Learning: Proceedings of the Thirteenth International Conference (ICML ā96), 1996.
āMachine Learning on Big Data,ā EBTIC, 19 August 2014. [Online]. Available: http://www.ebtic.org/pages/ebtic-view/ebtic-view-details/machine-learning-on-big-data-d/687. [Accessed 15 May 2015].
M. Banko and E. Brill, āScaling to very very large corpora for natural language disambiguation,ā in Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, 2001.
K. M. Tarwani, S. Saudagar and H. D. Misalkar, āMachine Learning in Big Data Analytics: An Overview,ā International Journal of Advanced Research in Computer Science and Software Engineering, vol. 5, no. 4, pp. 270ā274, 2015.
I. Kononenko and M. Kukar, Machine Learning and Data Mining: Introduction to Principles and Algorithms, Chichester, West Sussex: Horwood Publishing Limmited, 2007.
H. Liu, A. Gegov and M. Cocea, āCollaborative Rule Generation: An Ensemble Learning Approach,ā Journal of Intelligent and Fuzzy Systems, vol. 30, no. 4, pp. 2277ā2287, 2016.
H. Liu, A. Gegov and M. Cocea, āHybrid Ensemble Learning Approach for Generation of Classification Rules,ā in International Conference on Machine Learning and Cybernetics, Guangzhou, 2015.
J. Li and H. Liu, āKent Ridge Bio-medical Dataset,ā I2R Data Mining Department, 2003. [Online]. Available: http://datam.i2r.a-star.edu.sg/datasets/krbd/. [Accessed 18 May 2015].
S. Geisser, Predictive Inference, New York: Chapman and Hall, 1993.
M. Lichman, āUCI Machine Learning Repository,ā University of California, School of Information and Computer Science, 2013. [Online]. Available: http://archive.ics.uci.edu/ml. [Accessed 12 May 2015].
H. Liu, A. Gegov and M. Cocea, āNature and Biology Inspried Approach of Classification towards Reduction of Bias in Machine Learning,ā in International Conference on Machine Learning and Cybernetics, Jeju Island, South Korea, 2016.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Liu, H., Gegov, A., Cocea, M. (2017). Unified Framework for Control of Machine Learning Tasks Towards Effective and Efficient Processing of Big Data. In: Pedrycz, W., Chen, SM. (eds) Data Science and Big Data: An Environment of Computational Intelligence. Studies in Big Data, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-319-53474-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-53474-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53473-2
Online ISBN: 978-3-319-53474-9
eBook Packages: EngineeringEngineering (R0)