Skip to main content

Cloud Technologies: A New Level for Big Data Mining

  • Chapter
  • First Online:
  • 1539 Accesses

Part of the book series: Computer Communications and Networks ((CCN))

Abstract

Nowadays, the amount of data being collected and stored has been constantly increasing. Data come from different sources such as various devices, sensors, networks, transactional applications, web and social media. Conventional technologies and methods are not able to store and analyze such amount of data. In this paper, a comparative analysis of the existing data mining systems is performed and it shows that the most of existing data mining solutions are not appropriate to solve Big Data problems. In order to bring conventional data mining to a new level and to cope with challenges of massive and complex data of different nature, requirements for data mining systems suitable for Big Data are derived.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Barker, A., Van Hemert, J.I.: Scientific workflow: A survey and research directions. PPAM 4967, 746–753 (2008). doi:10.1007/978-3-540-68111-3_78

    Google Scholar 

  2. Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., Lopez, R., Goble, C.A.: Biocatalogue: A universal catalogue of web services for the life sciences. Nucleic Acid Res. 38 (2010). doi:10.1093/nar/gkq394

    Google Scholar 

  3. Birant, D.: Service-oriented data mining (2011). doi:10.5772/14066

    Google Scholar 

  4. Cerezo, N., Montagnat, J., Blay-Fornarino, M.: Computer-assisted scientific workflow design. J. Grid Comput. 11(3), 585–612 (2013). doi:10.1007/s10723-013-9264-5

    Article  Google Scholar 

  5. Che, D., Safran, M., Peng, Z.: From big data to big data mining: challenges, issues, and opportunities. In: Database Systems for Advanced Applications, Lecture Notes in Computer Science, pp. 1–15. Springer (2013). doi:10.1007/978-3-642-40270-8

    Google Scholar 

  6. Chen, X., Ye, Y., Williams, G., Xu, X.: A survey of open source data mining systems. Emerg. Technol. Knowl. Discov. Data Min. 4819, 3–14 (2007). doi:10.1007/978-3-540-77018-3_2

    Article  Google Scholar 

  7. Congiusta, A., Talia, D., Trunfio, P.: Service-oriented middleware for distributed data mining on the grid. J. Parallel Distrib. Comput. 68, 3–15 (2008). doi:10.1016/j.jpdc.2007.07.007

    Article  MATH  Google Scholar 

  8. Demšar, J., Curk, T., Erjavec, A., Črt Gorup, Hočevar, T., Milutinovič, M., Možina, M., Polajnar, M., Toplak, M., Starič, A., Štajdohar, M., Umek, L., Žagar, L., Žbontar, J., Žitnik, M., Zupan, B.: Orange: Data mining toolbox in Python. J. Mach. Learn. Res. 14, 2349–2353 (2013). http://jmlr.org/papers/v14/demsar13a.html

  9. De Roure, D., Goble, C., Stevens, R.: The design and realisation of the virtual research environment for social sharing of workflows (2009). doi:10.1016/j.future.2008.06.010

    Google Scholar 

  10. Domenico, T., Paolo, T.: Service-oriented distributed knowledge discovery. Chapman and Hall/CRC (2012)

    Google Scholar 

  11. Foster, I.: Globus toolkit version 4: Software for service-oriented systems. Netw. Parallel Comput. 3779, 2–13 (2005). doi:10.1007/11577188_2

    Article  Google Scholar 

  12. Goecks, J., Nekrutenko, A., Taylor, J.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010). doi:10.1186/gb-2010-11-8-r86

    Article  Google Scholar 

  13. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). doi:10.1145/1656274.1656278

    Article  Google Scholar 

  14. Heather, K.: Web services conceptual architecture (wsca 1.0). Architecture 5, 6–7 (2001)

    Google Scholar 

  15. Hmida, M.B.H., Slimani, Y.: Meta-learning in grid-based data mining systems. Int. J. Commun. Networks Distrib. Syst. 5(3), 214–228 (2010). 10.5121/ijcnc.2010.2514

    Google Scholar 

  16. Japkowicz, N., Stefanowski, J.: A machine learning perspective on big data analysis. In: Big Data Analysis: New Algorithms for a New Society, pp. 1–31. Springer (2016). doi:10.1007/978-3-319-26989-4

    Google Scholar 

  17. Jovic, A., Brkic, K., Bogunovic, N.: An overview of free software tools for general data mining. In: 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2014) 11(3), 1112–1117 (2014). doi:10.1109/MIPRO.2014.6859735

  18. Kranjc, J., Podpecan, V., Lavrac, N.: Clowdflows: A cloud based scientific workflow platform. In: Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, vol. 7524, pp. 816–819. Springer, Berlin, Heidelberg (2012). doi:10.1007/978-3-642-33486-3

    Google Scholar 

  19. Kranjc, J., Smailovič, J., Podpečan, V., Grčar, M., Žnidaršič, M., Lavrač, N.: Active learning for sentiment analysis on data streams: Methodology and workflow implementation in the clowdflows platform. Inf. Process. Manage. 51(2), 187–203 (2014). doi:10.1016/j.ipm.2014.04.001

    Article  Google Scholar 

  20. Kravtsov, V., Niessen, T., Stankovski, V., Schuster, A.: Service-based resource brokering for grid-based data mining. In: in: Proceedings of the International Conference on Grid Computing and Applications, pp. 163–169 (2006)

    Google Scholar 

  21. Kurasova, O., Marcinkevičius, V., Medvedev, V., Rapečka, A., Stefanovič, P.: Strategies for big data clustering. In: 26th International Conference on Tools with Artificial Intelligence (ICTAI2014), pp. 740–747. IEEE (2014). doi:10.1109/ICTAI.2014.115

  22. Massimo, B., Giuseppe, L., Castellani, M., Cavuoti, S., D’Abrusco, R., Laurino, O.: Dame: A distributed web based framework for knowledge discovery in databases. Metnorie della Soc. Astron. Ital. Suppl. 19, 324–329 (2012)

    Google Scholar 

  23. Meinl, T., Cebron, N., Gabriel, T.R., Dill, F., Kötter, T.: The konstanz information miner 2, (2009). doi:10.1145/1656274.1656280

    Google Scholar 

  24. Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., Williams, A., Oinn, T., Goble, C.: Taverna, reloaded. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6187 LNCS, pp. 471–481 (2010). doi:10.1007/978-3-642-13818-8

    Google Scholar 

  25. Pattnaik, K., Mishra, B.S.P.: Introduction to big data analysis. In: Techniques and Environments for Big Data Analysis, pp. 1–20. Springer (2016). doi:10.1007/978-3-319-27520-8

    Google Scholar 

  26. Podpečan, V., Zemenova, M., Lavrač, N.: Orange4ws environment for service-oriented data mining. Comput. J. 55, 82–98 (2012). doi:10.1093/comjnl/bxr077

    Article  Google Scholar 

  27. Schmidt, S.: Data is exploding: the 3 versus of big data. Bus. Comput. World 15 (2012)

    Google Scholar 

  28. Stankovski, V., Swain, M., Kravtsov, V., Niessen, T., Wegener, D., Kindermann, J., Dubitzky, W.: Grid-enabling data mining applications with DataMiningGrid: An architectural perspective. Future Gener. Comput. Syst. 24, 259–279 (2008). doi:10.1016/j.future.2007.05.004

    Article  Google Scholar 

  29. Talia, D., Trunfio, P.: How distributed data mining tasks can thrive as knowledge services. Commun. ACM 53, 132–137 (2010). doi:10.1145/1785414.1785451

    Article  Google Scholar 

  30. Talia, D., Trunfio, P., Verta, O.: The weka4ws framework for distributed data mining in service-oriented grids. Concurrency Comput. Pract. Experience 20, 1933–1951 (2008). doi:10.1002/cpe.v20:16

    Article  Google Scholar 

  31. Werner, D.: Data Mining Meets Grid Computing: Time to Dance? John Wiley and Sons. Ltd (2009). doi:10.1002/9780470699904.ch1

    Google Scholar 

  32. White, T.: Hadoop: The definitive guide, vol. 54. O’Reilly Media (2012)

    Google Scholar 

  33. Wojnarski, M., Stawicki, S., Wojnarowski, P.: Tunedit.org: System for automated evaluation of algorithms in repeatable experiments. Rough Sets Current Trends Comput. 6086, 20–29 (2010). doi:10.1007/978-3-642-13529-3_4

    Article  Google Scholar 

  34. Wolstencroft, K., Haines, R., Fellows, D., Williams, A., Withers, D., Owen, S., Soiland-Reyes, S., Dunlop, I., Nenadic, A., Fisher, P., Bhagat, J., Belhajjame, K., Bacall, F., Hardisty, A., Nieva de la Hidalga, A., Balcazar Vargas, M.P., Sufi, S., Goble, C.: The Taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res. 41(W1), W557–W561 (2013). doi:10.1093/nar/gkt328

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Viktor Medvedev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this chapter

Cite this chapter

Medvedev, V., Kurasova, O. (2016). Cloud Technologies: A New Level for Big Data Mining. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44881-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44880-0

  • Online ISBN: 978-3-319-44881-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics