Skip to main content

Unified Framework for Control of Machine Learning Tasks Towards Effective and Efficient Processing of Big Data

  • Chapter
  • First Online:
Data Science and Big Data: An Environment of Computational Intelligence

Part of the book series: Studies in Big Data ((SBD,volume 24))

Abstract

Big data can be generally characterised by 5 Vsā€”Volume, Velocity, Variety, Veracity and Variability. Many studies have been focused on using machine learning as a powerful tool of big data processing . In machine learning context, learning algorithms are typically evaluated in terms of accuracy, efficiency, interpretability and stability. These four dimensions can be strongly related to veracity, volume, variety and variability and are impacted by both the nature of learning algorithms and characteristics of data. This chapter analyses in depth how the quality of computational models can be impacted by data characteristics as well as strategies involved in learning algorithms. This chapter also introduces a unified framework for control of machine learning tasks towards appropriate employment of algorithms and efficient processing of big data. In particular, this framework is designed to achieve effective selection of data pre-processing techniques towards effective selection of relevant attributes, sampling of representative training and test data, and appropriate dealing with missing values and noise. More importantly, this framework allows the employment of suitable machine learning algorithms on the basis of the training data provided from the data pre-processing stage towards building of accurate, efficient and interpretable computational models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. H. Liu, A. Gegov and M. Cocea, Rule Based Systems for Big Data: A Machine Learning Approach, 1 ed., vol. 13, Switzerland: Springer, 2016.

    Google ScholarĀ 

  2. ā€œWhat is Big Data,ā€ SAS Institute Inc, [Online]. Available: http://www.sas.com/big-data/. [Accessed 17 May 2015].

  3. ā€œMaster Data Management for Big Data,ā€ IBM, [Online]. Available: http://www.01.ibm.com/software/data/infosphere/mdm-big-data/. [Accessed 17 May 2015].

  4. W. Pedrycz and S. M. Chen, Eds., Information Granularity, Big Data, and Computational Intelligence, vol. 8, Switzerland: Springer, 2015.

    Google ScholarĀ 

  5. P. Levine, ā€œMachine LearningĀ +Ā Big Data,ā€ WorldPress, [Online]. Available: http://a16z.com/2015/01/22/machine-learning-big-data/. [Accessed 15 May 2015].

  6. T. Condie, P. Mineiro, N. Polyzotis and M. Weimer, ā€œMachine learning for big data,ā€ in ACM SIGMOD/PODS Conference, San Francisco, USA, 2013.

    Google ScholarĀ 

  7. L. Wang and C. A. Alexander, ā€œMachine Learning in Big Data,ā€ International Journal of Mathematical, Engineering and Management Sciences, vol. 1, no. 2, pp. 52ā€“61, 2016.

    Google ScholarĀ 

  8. X. Wu, X. Zhu, G. Q. Wu and W. Ding, ā€œData Mining with Big Data,ā€ IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97ā€“107, 2014.

    Google ScholarĀ 

  9. S. Suthaharan, ā€œBig data classification: problems and challenges in network intrusion prediction with machine learning,ā€ ACM SIGMETRICS Performance Evaluation Review, vol. 41, no. 4, pp. 70ā€“73, 2014.

    Google ScholarĀ 

  10. O. Y. Al-Jarrah, P. D. Yoo, S. Muhaidat and G. K. Karagiannidis, ā€œEfficient Machine Learning for Big Data: A Review,ā€ Big Data Research, vol. 2, no. 3, pp. 87ā€“93, 2015.

    Google ScholarĀ 

  11. D. E. Oā€™Leary, ā€œArtificial Intelligence and Big Data,ā€ IEEE Intelligent Systems, vol. 28, no. 2, pp. 96ā€“99, 2013.

    Google ScholarĀ 

  12. C. Ma, H. H. Zhang and X. Wang, ā€œMachine learning for Big Data Analytics in Plants,ā€ Trends in Plant Science, vol. 19, no. 12, pp. 798ā€“808, 2014.

    Google ScholarĀ 

  13. H. Adeli and N. Siddique, Computational Intelligence: Synergies of Fuzzy Logic, Neural Networks and Evolutionary Computing, New Jersey: John Wiley & Sons, 2013.

    Google ScholarĀ 

  14. L. Rutkowski, Computational Intelligence: Methods and Techniques, Heidelberg: Springer, 2008.

    Google ScholarĀ 

  15. J. Worrell, ā€œComputational Learning Theory: 2014-2015,ā€ University of Oxford, 2014. [Online]. Available: https://www.cs.ox.ac.uk/teaching/courses/2014-2015/clt/. [Accessed 20 9 2016].

  16. H. Liu, M. Cocea and A. Gegov, ā€œInterpretability of Computational Models for Sentiment Analysis,ā€ in Sentiment Analysis and Ontology Engineering: An Environment of Computational Intelligence, vol. 639, W. Pedrycz and S. M. Chen, Eds., Switzerland, Springer, 2016, pp. 199ā€“220.

    Google ScholarĀ 

  17. P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, New Jersey: Pearson Education, 2006.

    Google ScholarĀ 

  18. T. Mitchell, Machine Learning, New York: McGraw Hill, 1997.

    Google ScholarĀ 

  19. D. Barber, Bayesian Reasoning and Machine Learning, Cambridge: Cambridge University Press, 2012.

    Google ScholarĀ 

  20. H. Liu, A. Gegov and F. Stahl, ā€œCategorization and Construction of Rule Based Systems,ā€ in 15th International Conference on Engineering Applications of Neural Networks, Sofia, Bulgaria, 2014.

    Google ScholarĀ 

  21. H. Liu, A. Gegov and M. Cocea, ā€œNetwork Based Rule Representation for Knowledge Discovery and Predictive Modelling,ā€ in IEEE International Conference on Fuzzy Systems, Istanbul, 2015.

    Google ScholarĀ 

  22. R. Quinlan, ā€œInduction of Decision Trees,ā€ Machine Learning, vol. 1, pp. 81ā€“106, 1986.

    Google ScholarĀ 

  23. J. Furnkranz, ā€œSeparate-and-Conquer rule learning,ā€ Artificial Intelligence Review, vol. 13, pp. 3ā€“54, 1999.

    Google ScholarĀ 

  24. J. Zhang, ā€œSelecting typical instances in instance-based learning,ā€ in The 9th International Conference on Machine Learning, Aberdeen, Scotland, 1992.

    Google ScholarĀ 

  25. H. e. Michiel, ā€œBayes formula,ā€ in Encyclopedia of Mathematics, Springer, 2001.

    Google ScholarĀ 

  26. I. Rish, ā€œAn Empirical Study of the NaĆÆve Bayes Classifier,ā€ IJCAI 2001 workshop on empirical methods in artificial intelligence, vol. 3, no. 22, pp. 41ā€“46, 2001.

    Google ScholarĀ 

  27. L. Breiman, ā€œBagging predictors,ā€ Machine Learning, vol. 24, no. 2, pp. 123ā€“140, 1996.

    Google ScholarĀ 

  28. Y. Freund and R. E. Schapire, ā€œExperiments with a New Boosting Algorithm,ā€ in Machine Learning: Proceedings of the Thirteenth International Conference (ICML ā€˜96), 1996.

    Google ScholarĀ 

  29. ā€œMachine Learning on Big Data,ā€ EBTIC, 19 August 2014. [Online]. Available: http://www.ebtic.org/pages/ebtic-view/ebtic-view-details/machine-learning-on-big-data-d/687. [Accessed 15 May 2015].

  30. M. Banko and E. Brill, ā€œScaling to very very large corpora for natural language disambiguation,ā€ in Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, 2001.

    Google ScholarĀ 

  31. K. M. Tarwani, S. Saudagar and H. D. Misalkar, ā€œMachine Learning in Big Data Analytics: An Overview,ā€ International Journal of Advanced Research in Computer Science and Software Engineering, vol. 5, no. 4, pp. 270ā€“274, 2015.

    Google ScholarĀ 

  32. I. Kononenko and M. Kukar, Machine Learning and Data Mining: Introduction to Principles and Algorithms, Chichester, West Sussex: Horwood Publishing Limmited, 2007.

    Google ScholarĀ 

  33. H. Liu, A. Gegov and M. Cocea, ā€œCollaborative Rule Generation: An Ensemble Learning Approach,ā€ Journal of Intelligent and Fuzzy Systems, vol. 30, no. 4, pp. 2277ā€“2287, 2016.

    Google ScholarĀ 

  34. H. Liu, A. Gegov and M. Cocea, ā€œHybrid Ensemble Learning Approach for Generation of Classification Rules,ā€ in International Conference on Machine Learning and Cybernetics, Guangzhou, 2015.

    Google ScholarĀ 

  35. J. Li and H. Liu, ā€œKent Ridge Bio-medical Dataset,ā€ I2R Data Mining Department, 2003. [Online]. Available: http://datam.i2r.a-star.edu.sg/datasets/krbd/. [Accessed 18 May 2015].

  36. S. Geisser, Predictive Inference, New York: Chapman and Hall, 1993.

    Google ScholarĀ 

  37. M. Lichman, ā€œUCI Machine Learning Repository,ā€ University of California, School of Information and Computer Science, 2013. [Online]. Available: http://archive.ics.uci.edu/ml. [Accessed 12 May 2015].

  38. H. Liu, A. Gegov and M. Cocea, ā€œNature and Biology Inspried Approach of Classification towards Reduction of Bias in Machine Learning,ā€ in International Conference on Machine Learning and Cybernetics, Jeju Island, South Korea, 2016.

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Han Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Liu, H., Gegov, A., Cocea, M. (2017). Unified Framework for Control of Machine Learning Tasks Towards Effective and Efficient Processing of Big Data. In: Pedrycz, W., Chen, SM. (eds) Data Science and Big Data: An Environment of Computational Intelligence. Studies in Big Data, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-319-53474-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53474-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53473-2

  • Online ISBN: 978-3-319-53474-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics