Survey of Machine Learning Algorithms on Spark Over DHT-based Structures

Sioutas, Spyros; Mylonas, Phivos; Panaretos, Alexandros; Gerolymatos, Panagiotis; Vogiatzis, Dimitrios; Karavaras, Eleftherios; Spitieris, Thomas; Kanavos, Andreas

doi:10.1007/978-3-319-57045-7_9

Spyros Sioutas¹⁵,
Phivos Mylonas¹⁵,
Alexandros Panaretos¹⁵,
Panagiotis Gerolymatos¹⁵,
Dimitrios Vogiatzis¹⁵,
Eleftherios Karavaras¹⁵,
Thomas Spitieris¹⁵ &
…
Andreas Kanavos¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10230))

Included in the following conference series:

International Workshop of Algorithmic Aspects of Cloud Computing

1651 Accesses
5 Citations

Abstract

Several solutions have been proposed over the past few years on data storage, data management as well as data retrieval systems. These solutions can process massive amount of data stored in relational or distributed database management systems. In addition, decision making analytics and predictive computational statistics are some of the most common and well studied fields in computer science. In this paper, we demonstrate the implementation of machine learning algorithms over an open-source distributed database management system that can run in parallel on a cluster. In order to accomplish that, a system architecture scheme (e.g. Apache Spark) over Apache Cassandra is proposed. This paper also presents a survey of the most common machine learning algorithms and the results of the experiments performed over a Point-Of-Sales (POS) data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Cho, Y.H., Kim, J.K., Kim, S.H.: A personalized recommender system based on web usage mining and decision tree induction. Expert Syst. Appl. 23(3), 329–342 (2002)
Article Google Scholar
Dickson, P.R., Sawyer, A.G.: The price knowledge and search of supermarket shoppers. J. Mark. 54, 42–53 (1990)
Article Google Scholar
Gourgaris, P., Kanavos, A., Makris, C., Perrakis, G.: Review-based entity-ranking refinement. In: Proceedings of the 11th International Conference on Web Information Systems and Technologies (WEBIST), pp. 402–410 (2015)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000)
Google Scholar
Iakovou, S.A., Kanavos, A., Tsakalidis, A.: Customer behaviour analysis for recommendation of supermarket ware. In: Iliadis, L., Maglogiannis, I. (eds.) AIAI 2016. IAICT, vol. 475, pp. 471–480. Springer, Cham (2016). doi:10.1007/978-3-319-44944-9_41
Chapter Google Scholar
Jagadish, H.V., Ooi, B.C., Tan, K., Vu, Q.H., Zhang, R.: Speeding up search in peer-to-peer networks with a multi-way tree structure. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2006)
Google Scholar
Jagadish, H.V., Ooi, B.C., Vu, Q.H.: BATON: A balanced tree structure for peer-to-peer networks. In: Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), pp. 661–672 (2005)
Google Scholar
Kanavos, A., Kafeza, E., Makris, C.: Can we rank emotions? A brand love ranking system for emotional terms. In: 2015 IEEE International Congress on Big Data, pp. 71–78 (2015)
Google Scholar
Leskovec, J., Adamic, L.A., Huberman, B.A.: The dynamics of viral marketing. ACM Trans. Web (TWEB) 1(1), 5 (2007)
Article Google Scholar
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems (RecSys), pp. 107–114 (2008)
Google Scholar
Pennacchioli, D., Coscia, M., Rinzivillo, S., Pedreschi, D., Giannotti, F.: Explaining the product range effect in purchase data. In: Proceedings of the 2013 IEEE International Conference on Big Data, pp. 648–656 (2013)
Google Scholar
Sioutas, S., Papaloukopoulos, G., Sakkopoulos, E., Tsichlas, K., Manolopoulos, Y.: A novel distributed P2P simulator architecture: D-P2P-sim. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 2069–2070 (2009)
Google Scholar
Sioutas, S., Papaloukopoulos, G., Sakkopoulos, E., Tsichlas, K., Manolopoulos, Y., Triantafillou, P.: Brief announcement: Art: Sub-logarithmic decentralized range query processing with probabilistic guarantees. In: Proceedings of the 29th Annual ACM Symposium on Principles of Distributed Computing (PODC), pp. 118–119 (2010)
Google Scholar
Sioutas, S., Triantafillou, P., Papaloukopoulos, G., Sakkopoulos, E., Tsichlas, K., Manolopoulos, Y.: ART: Sub-logarithmic decentralized range query processing with probabilistic guarantees. Distrib. Parallel Databases 31(1), 71–109 (2013)
Article Google Scholar
Weng, S., Liu, M.: Feature-based recommendations for one-to-one marketing. Expert Syst. Appl. 26(4), 493–508 (2004)
Article Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2016)
Google Scholar
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A.F.M., Liu, B., Yu, P.S., Zhou, Z., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. In. Syst. 14(1), 1–37 (2008)
Article Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp. 15–28 (2012)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. In: 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud) (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Ionian University, Corfu, Greece
Spyros Sioutas, Phivos Mylonas, Alexandros Panaretos, Panagiotis Gerolymatos, Dimitrios Vogiatzis, Eleftherios Karavaras & Thomas Spitieris
Computer Engineering and Informatics Department, University of Patras, Patras, Greece
Andreas Kanavos

Authors

Spyros Sioutas
View author publications
You can also search for this author in PubMed Google Scholar
Phivos Mylonas
View author publications
You can also search for this author in PubMed Google Scholar
Alexandros Panaretos
View author publications
You can also search for this author in PubMed Google Scholar
Panagiotis Gerolymatos
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Vogiatzis
View author publications
You can also search for this author in PubMed Google Scholar
Eleftherios Karavaras
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Spitieris
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Kanavos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Kanavos .

Editor information

Editors and Affiliations

Computer Science and Software Engineering, Swinburne University of Technology, Hawthorn, Victoria, Australia
Timos Sellis
Informatics, Ionian University, Kerkyra, Greece
Konstantinos Oikonomou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sioutas, S. et al. (2017). Survey of Machine Learning Algorithms on Spark Over DHT-based Structures. In: Sellis, T., Oikonomou, K. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2016. Lecture Notes in Computer Science(), vol 10230. Springer, Cham. https://doi.org/10.1007/978-3-319-57045-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-57045-7_9
Published: 11 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57044-0
Online ISBN: 978-3-319-57045-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics