Advertisement

Cloud Technologies: A New Level for Big Data Mining

  • Viktor MedvedevEmail author
  • Olga Kurasova
Chapter
Part of the Computer Communications and Networks book series (CCN)

Abstract

Nowadays, the amount of data being collected and stored has been constantly increasing. Data come from different sources such as various devices, sensors, networks, transactional applications, web and social media. Conventional technologies and methods are not able to store and analyze such amount of data. In this paper, a comparative analysis of the existing data mining systems is performed and it shows that the most of existing data mining solutions are not appropriate to solve Big Data problems. In order to bring conventional data mining to a new level and to cope with challenges of massive and complex data of different nature, requirements for data mining systems suitable for Big Data are derived.

Keywords

Cloud technologies Data mining Big Data Scientific workflow High performance computing 

References

  1. 1.
    Barker, A., Van Hemert, J.I.: Scientific workflow: A survey and research directions. PPAM 4967, 746–753 (2008). doi: 10.1007/978-3-540-68111-3_78 Google Scholar
  2. 2.
    Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., Lopez, R., Goble, C.A.: Biocatalogue: A universal catalogue of web services for the life sciences. Nucleic Acid Res. 38 (2010). doi: 10.1093/nar/gkq394
  3. 3.
    Birant, D.: Service-oriented data mining (2011). doi: 10.5772/14066
  4. 4.
    Cerezo, N., Montagnat, J., Blay-Fornarino, M.: Computer-assisted scientific workflow design. J. Grid Comput. 11(3), 585–612 (2013). doi: 10.1007/s10723-013-9264-5 CrossRefGoogle Scholar
  5. 5.
    Che, D., Safran, M., Peng, Z.: From big data to big data mining: challenges, issues, and opportunities. In: Database Systems for Advanced Applications, Lecture Notes in Computer Science, pp. 1–15. Springer (2013). doi: 10.1007/978-3-642-40270-8
  6. 6.
    Chen, X., Ye, Y., Williams, G., Xu, X.: A survey of open source data mining systems. Emerg. Technol. Knowl. Discov. Data Min. 4819, 3–14 (2007). doi: 10.1007/978-3-540-77018-3_2 CrossRefGoogle Scholar
  7. 7.
    Congiusta, A., Talia, D., Trunfio, P.: Service-oriented middleware for distributed data mining on the grid. J. Parallel Distrib. Comput. 68, 3–15 (2008). doi: 10.1016/j.jpdc.2007.07.007 CrossRefzbMATHGoogle Scholar
  8. 8.
    Demšar, J., Curk, T., Erjavec, A., Črt Gorup, Hočevar, T., Milutinovič, M., Možina, M., Polajnar, M., Toplak, M., Starič, A., Štajdohar, M., Umek, L., Žagar, L., Žbontar, J., Žitnik, M., Zupan, B.: Orange: Data mining toolbox in Python. J. Mach. Learn. Res. 14, 2349–2353 (2013). http://jmlr.org/papers/v14/demsar13a.html
  9. 9.
    De Roure, D., Goble, C., Stevens, R.: The design and realisation of the virtual research environment for social sharing of workflows (2009). doi: 10.1016/j.future.2008.06.010
  10. 10.
    Domenico, T., Paolo, T.: Service-oriented distributed knowledge discovery. Chapman and Hall/CRC (2012)Google Scholar
  11. 11.
    Foster, I.: Globus toolkit version 4: Software for service-oriented systems. Netw. Parallel Comput. 3779, 2–13 (2005). doi: 10.1007/11577188_2 CrossRefGoogle Scholar
  12. 12.
    Goecks, J., Nekrutenko, A., Taylor, J.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010). doi: 10.1186/gb-2010-11-8-r86 CrossRefGoogle Scholar
  13. 13.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). doi: 10.1145/1656274.1656278 CrossRefGoogle Scholar
  14. 14.
    Heather, K.: Web services conceptual architecture (wsca 1.0). Architecture 5, 6–7 (2001)Google Scholar
  15. 15.
    Hmida, M.B.H., Slimani, Y.: Meta-learning in grid-based data mining systems. Int. J. Commun. Networks Distrib. Syst. 5(3), 214–228 (2010). 10.5121/ijcnc.2010.2514Google Scholar
  16. 16.
    Japkowicz, N., Stefanowski, J.: A machine learning perspective on big data analysis. In: Big Data Analysis: New Algorithms for a New Society, pp. 1–31. Springer (2016). doi: 10.1007/978-3-319-26989-4
  17. 17.
    Jovic, A., Brkic, K., Bogunovic, N.: An overview of free software tools for general data mining. In: 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2014) 11(3), 1112–1117 (2014). doi: 10.1109/MIPRO.2014.6859735
  18. 18.
    Kranjc, J., Podpecan, V., Lavrac, N.: Clowdflows: A cloud based scientific workflow platform. In: Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, vol. 7524, pp. 816–819. Springer, Berlin, Heidelberg (2012). doi: 10.1007/978-3-642-33486-3
  19. 19.
    Kranjc, J., Smailovič, J., Podpečan, V., Grčar, M., Žnidaršič, M., Lavrač, N.: Active learning for sentiment analysis on data streams: Methodology and workflow implementation in the clowdflows platform. Inf. Process. Manage. 51(2), 187–203 (2014). doi: 10.1016/j.ipm.2014.04.001 CrossRefGoogle Scholar
  20. 20.
    Kravtsov, V., Niessen, T., Stankovski, V., Schuster, A.: Service-based resource brokering for grid-based data mining. In: in: Proceedings of the International Conference on Grid Computing and Applications, pp. 163–169 (2006)Google Scholar
  21. 21.
    Kurasova, O., Marcinkevičius, V., Medvedev, V., Rapečka, A., Stefanovič, P.: Strategies for big data clustering. In: 26th International Conference on Tools with Artificial Intelligence (ICTAI2014), pp. 740–747. IEEE (2014). doi: 10.1109/ICTAI.2014.115
  22. 22.
    Massimo, B., Giuseppe, L., Castellani, M., Cavuoti, S., D’Abrusco, R., Laurino, O.: Dame: A distributed web based framework for knowledge discovery in databases. Metnorie della Soc. Astron. Ital. Suppl. 19, 324–329 (2012)Google Scholar
  23. 23.
    Meinl, T., Cebron, N., Gabriel, T.R., Dill, F., Kötter, T.: The konstanz information miner 2, (2009). doi: 10.1145/1656274.1656280
  24. 24.
    Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., Williams, A., Oinn, T., Goble, C.: Taverna, reloaded. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6187 LNCS, pp. 471–481 (2010). doi: 10.1007/978-3-642-13818-8
  25. 25.
    Pattnaik, K., Mishra, B.S.P.: Introduction to big data analysis. In: Techniques and Environments for Big Data Analysis, pp. 1–20. Springer (2016). doi: 10.1007/978-3-319-27520-8
  26. 26.
    Podpečan, V., Zemenova, M., Lavrač, N.: Orange4ws environment for service-oriented data mining. Comput. J. 55, 82–98 (2012). doi: 10.1093/comjnl/bxr077 CrossRefGoogle Scholar
  27. 27.
    Schmidt, S.: Data is exploding: the 3 versus of big data. Bus. Comput. World 15 (2012)Google Scholar
  28. 28.
    Stankovski, V., Swain, M., Kravtsov, V., Niessen, T., Wegener, D., Kindermann, J., Dubitzky, W.: Grid-enabling data mining applications with DataMiningGrid: An architectural perspective. Future Gener. Comput. Syst. 24, 259–279 (2008). doi: 10.1016/j.future.2007.05.004 CrossRefGoogle Scholar
  29. 29.
    Talia, D., Trunfio, P.: How distributed data mining tasks can thrive as knowledge services. Commun. ACM 53, 132–137 (2010). doi: 10.1145/1785414.1785451 CrossRefGoogle Scholar
  30. 30.
    Talia, D., Trunfio, P., Verta, O.: The weka4ws framework for distributed data mining in service-oriented grids. Concurrency Comput. Pract. Experience 20, 1933–1951 (2008). doi: 10.1002/cpe.v20:16 CrossRefGoogle Scholar
  31. 31.
    Werner, D.: Data Mining Meets Grid Computing: Time to Dance? John Wiley and Sons. Ltd (2009). doi: 10.1002/9780470699904.ch1
  32. 32.
    White, T.: Hadoop: The definitive guide, vol. 54. O’Reilly Media (2012)Google Scholar
  33. 33.
    Wojnarski, M., Stawicki, S., Wojnarowski, P.: Tunedit.org: System for automated evaluation of algorithms in repeatable experiments. Rough Sets Current Trends Comput. 6086, 20–29 (2010). doi: 10.1007/978-3-642-13529-3_4 CrossRefGoogle Scholar
  34. 34.
    Wolstencroft, K., Haines, R., Fellows, D., Williams, A., Withers, D., Owen, S., Soiland-Reyes, S., Dunlop, I., Nenadic, A., Fisher, P., Bhagat, J., Belhajjame, K., Bacall, F., Hardisty, A., Nieva de la Hidalga, A., Balcazar Vargas, M.P., Sufi, S., Goble, C.: The Taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res. 41(W1), W557–W561 (2013). doi: 10.1093/nar/gkt328

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Institute of Mathematics and InformaticsVilnius UniversityVilniusLithuania

Personalised recommendations