Unified Framework for Control of Machine Learning Tasks Towards Effective and Efficient Processing of Big Data

Liu, Han; Gegov, Alexander; Cocea, Mihaela

doi:10.1007/978-3-319-53474-9_6

Han Liu⁴,
Alexander Gegov⁴ &
Mihaela Cocea⁴

Part of the book series: Studies in Big Data ((SBD,volume 24))

2669 Accesses
7 Citations

Abstract

Big data can be generally characterised by 5 Vs—Volume, Velocity, Variety, Veracity and Variability. Many studies have been focused on using machine learning as a powerful tool of big data processing . In machine learning context, learning algorithms are typically evaluated in terms of accuracy, efficiency, interpretability and stability. These four dimensions can be strongly related to veracity, volume, variety and variability and are impacted by both the nature of learning algorithms and characteristics of data. This chapter analyses in depth how the quality of computational models can be impacted by data characteristics as well as strategies involved in learning algorithms. This chapter also introduces a unified framework for control of machine learning tasks towards appropriate employment of algorithms and efficient processing of big data. In particular, this framework is designed to achieve effective selection of data pre-processing techniques towards effective selection of relevant attributes, sampling of representative training and test data, and appropriate dealing with missing values and noise. More importantly, this framework allows the employment of suitable machine learning algorithms on the basis of the training data provided from the data pre-processing stage towards building of accurate, efficient and interpretable computational models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

H. Liu, A. Gegov and M. Cocea, Rule Based Systems for Big Data: A Machine Learning Approach, 1 ed., vol. 13, Switzerland: Springer, 2016.
Google Scholar
“What is Big Data,” SAS Institute Inc, [Online]. Available: http://www.sas.com/big-data/. [Accessed 17 May 2015].
“Master Data Management for Big Data,” IBM, [Online]. Available: http://www.01.ibm.com/software/data/infosphere/mdm-big-data/. [Accessed 17 May 2015].
W. Pedrycz and S. M. Chen, Eds., Information Granularity, Big Data, and Computational Intelligence, vol. 8, Switzerland: Springer, 2015.
Google Scholar
P. Levine, “Machine Learning + Big Data,” WorldPress, [Online]. Available: http://a16z.com/2015/01/22/machine-learning-big-data/. [Accessed 15 May 2015].
T. Condie, P. Mineiro, N. Polyzotis and M. Weimer, “Machine learning for big data,” in ACM SIGMOD/PODS Conference, San Francisco, USA, 2013.
Google Scholar
L. Wang and C. A. Alexander, “Machine Learning in Big Data,” International Journal of Mathematical, Engineering and Management Sciences, vol. 1, no. 2, pp. 52–61, 2016.
Google Scholar
X. Wu, X. Zhu, G. Q. Wu and W. Ding, “Data Mining with Big Data,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, 2014.
Google Scholar
S. Suthaharan, “Big data classification: problems and challenges in network intrusion prediction with machine learning,” ACM SIGMETRICS Performance Evaluation Review, vol. 41, no. 4, pp. 70–73, 2014.
Google Scholar
O. Y. Al-Jarrah, P. D. Yoo, S. Muhaidat and G. K. Karagiannidis, “Efficient Machine Learning for Big Data: A Review,” Big Data Research, vol. 2, no. 3, pp. 87–93, 2015.
Google Scholar
D. E. O’Leary, “Artificial Intelligence and Big Data,” IEEE Intelligent Systems, vol. 28, no. 2, pp. 96–99, 2013.
Google Scholar
C. Ma, H. H. Zhang and X. Wang, “Machine learning for Big Data Analytics in Plants,” Trends in Plant Science, vol. 19, no. 12, pp. 798–808, 2014.
Google Scholar
H. Adeli and N. Siddique, Computational Intelligence: Synergies of Fuzzy Logic, Neural Networks and Evolutionary Computing, New Jersey: John Wiley & Sons, 2013.
Google Scholar
L. Rutkowski, Computational Intelligence: Methods and Techniques, Heidelberg: Springer, 2008.
Google Scholar
J. Worrell, “Computational Learning Theory: 2014-2015,” University of Oxford, 2014. [Online]. Available: https://www.cs.ox.ac.uk/teaching/courses/2014-2015/clt/. [Accessed 20 9 2016].
H. Liu, M. Cocea and A. Gegov, “Interpretability of Computational Models for Sentiment Analysis,” in Sentiment Analysis and Ontology Engineering: An Environment of Computational Intelligence, vol. 639, W. Pedrycz and S. M. Chen, Eds., Switzerland, Springer, 2016, pp. 199–220.
Google Scholar
P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, New Jersey: Pearson Education, 2006.
Google Scholar
T. Mitchell, Machine Learning, New York: McGraw Hill, 1997.
Google Scholar
D. Barber, Bayesian Reasoning and Machine Learning, Cambridge: Cambridge University Press, 2012.
Google Scholar
H. Liu, A. Gegov and F. Stahl, “Categorization and Construction of Rule Based Systems,” in 15th International Conference on Engineering Applications of Neural Networks, Sofia, Bulgaria, 2014.
Google Scholar
H. Liu, A. Gegov and M. Cocea, “Network Based Rule Representation for Knowledge Discovery and Predictive Modelling,” in IEEE International Conference on Fuzzy Systems, Istanbul, 2015.
Google Scholar
R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1, pp. 81–106, 1986.
Google Scholar
J. Furnkranz, “Separate-and-Conquer rule learning,” Artificial Intelligence Review, vol. 13, pp. 3–54, 1999.
Google Scholar
J. Zhang, “Selecting typical instances in instance-based learning,” in The 9th International Conference on Machine Learning, Aberdeen, Scotland, 1992.
Google Scholar
H. e. Michiel, “Bayes formula,” in Encyclopedia of Mathematics, Springer, 2001.
Google Scholar
I. Rish, “An Empirical Study of the Naïve Bayes Classifier,” IJCAI 2001 workshop on empirical methods in artificial intelligence, vol. 3, no. 22, pp. 41–46, 2001.
Google Scholar
L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996.
Google Scholar
Y. Freund and R. E. Schapire, “Experiments with a New Boosting Algorithm,” in Machine Learning: Proceedings of the Thirteenth International Conference (ICML ‘96), 1996.
Google Scholar
“Machine Learning on Big Data,” EBTIC, 19 August 2014. [Online]. Available: http://www.ebtic.org/pages/ebtic-view/ebtic-view-details/machine-learning-on-big-data-d/687. [Accessed 15 May 2015].
M. Banko and E. Brill, “Scaling to very very large corpora for natural language disambiguation,” in Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, 2001.
Google Scholar
K. M. Tarwani, S. Saudagar and H. D. Misalkar, “Machine Learning in Big Data Analytics: An Overview,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 5, no. 4, pp. 270–274, 2015.
Google Scholar
I. Kononenko and M. Kukar, Machine Learning and Data Mining: Introduction to Principles and Algorithms, Chichester, West Sussex: Horwood Publishing Limmited, 2007.
Google Scholar
H. Liu, A. Gegov and M. Cocea, “Collaborative Rule Generation: An Ensemble Learning Approach,” Journal of Intelligent and Fuzzy Systems, vol. 30, no. 4, pp. 2277–2287, 2016.
Google Scholar
H. Liu, A. Gegov and M. Cocea, “Hybrid Ensemble Learning Approach for Generation of Classification Rules,” in International Conference on Machine Learning and Cybernetics, Guangzhou, 2015.
Google Scholar
J. Li and H. Liu, “Kent Ridge Bio-medical Dataset,” I2R Data Mining Department, 2003. [Online]. Available: http://datam.i2r.a-star.edu.sg/datasets/krbd/. [Accessed 18 May 2015].
S. Geisser, Predictive Inference, New York: Chapman and Hall, 1993.
Google Scholar
M. Lichman, “UCI Machine Learning Repository,” University of California, School of Information and Computer Science, 2013. [Online]. Available: http://archive.ics.uci.edu/ml. [Accessed 12 May 2015].
H. Liu, A. Gegov and M. Cocea, “Nature and Biology Inspried Approach of Classification towards Reduction of Bias in Machine Learning,” in International Conference on Machine Learning and Cybernetics, Jeju Island, South Korea, 2016.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth, PO1 3HE, UK
Han Liu, Alexander Gegov & Mihaela Cocea

Authors

Han Liu
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gegov
View author publications
You can also search for this author in PubMed Google Scholar
Mihaela Cocea
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Han Liu .

Editor information

Editors and Affiliations

Electrical & Computer Engineering, University of Alberta Electrical & Computer Engineering, Edmonton AL, Canada
Witold Pedrycz
Dept of CS and Information Engineering, National Taiwan Univ of Science and Tech Dept of CS and Information Engineering, Taipei, Taiwan
Shyi-Ming Chen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Liu, H., Gegov, A., Cocea, M. (2017). Unified Framework for Control of Machine Learning Tasks Towards Effective and Efficient Processing of Big Data. In: Pedrycz, W., Chen, SM. (eds) Data Science and Big Data: An Environment of Computational Intelligence. Studies in Big Data, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-319-53474-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-53474-9_6
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53473-2
Online ISBN: 978-3-319-53474-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics