Distributed Data Mining Methodology with Classification Model Example

Gorawski, Marcin; Płuciennik-Psota, Ewa

doi:10.1007/978-3-642-04441-0_9

Marcin Gorawski²² &
Ewa Płuciennik-Psota²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5796))

Included in the following conference series:

International Conference on Computational Collective Intelligence

2638 Accesses
3 Citations

Abstract

Distributed computing and data mining are two elements essential for many commercial and scientific organizations. Data mining is a time and hardware resources consuming process of building analytical models of data. Distribution is often a part of organizations’ structure. Authors propose methodology of distributed data mining by combining local analytical models (build in parallel in nodes of a distributed computer system) into a global one without necessity to construct distributed version of data mining algorithm. Different combining strategies are proposed and their verification method as well. Proposed solutions were tested with data sets coming from UCI Machine Learning Repository.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chan, P., Prodromidis, A., Stolfo, G.: Meta-learning in distributed data mining systems: Issues and approaches. In: Advances of Distributed Data Mining. AAAI Press, Menlo Park (2000)
Google Scholar
Guo, Y., Reuger, S.M., Sutiwaraphun, J., Forbes-Millot, J.: Meta-learning for parallel data mining. In: Proceedings of the 7th Parallel Computing Workshop (1997)
Google Scholar
Caragea, D., Silvescu, A., Honavar, V.: Invited Paper. A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees. International Journal of Hybrid Intelligent Systems 1(2), 80–89 (2004)
Article MATH Google Scholar
Grossman, R., Turinsky, A.: A Framework for Finding Distributed Data Mining Strategies That Are Intermediate Between Centralized Strategies and In-Place Strategies. In: Proceedings of Workshop on Distributed and Parallel Knowledge Discovery at KDD-2000, pp. 1–7 (2000)
Google Scholar
Gorawski, M., Pluciennik, E.: Analytical Models Combining Methodology with Classification Model Example. In: First International Conference on Information Technology, Gdansk (2008), http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4621623 , ISBN: 978-1-4244-2244-9
Pluim, J.P.W., Maintz, J.B.A., Viergever, M.A.: Mutual-information-based registration of medical images: a survey. IEEE Transactions on Medical Imaging 22(8), 986–1004 (2003)
Article MATH Google Scholar
International Organization for Standardization (ISO). Information Technology, Database Language, SQL Multimedia and Application Packages, Part 6: Data Mining Draft Standard No. ISO/IEC 13249-6 (2003)
Google Scholar
Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: DMQL: A Data Mining Query Language for Relational Database. In: Proc. of the SIGMOD 1996 Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada, pp. 27–34 (1996)
Google Scholar
Imieliński, T., Virmani, A.: MSQL: A Query Language for Database Mining. Data Mining and Knowledge Discovery 3(4), 373–408 (1999)
Article Google Scholar
Meo, R., Psaila, G., Ceri, S.: A New SQL-like Operator for Mining Association Rules. In: Proc. 22nd VLDB Conference, Bombaj, India, pp. 122–133 (1996)
Google Scholar
Meo, R., Psaila, G., Ceri, S.: An Extention to SQL for Mining Association Rules. Data Mining and Knowledge Discovery 2(2), 195–224 (1998)
Article Google Scholar
Morzy, T., Zakrzewicz, M.: SQL-like language for database mining. In: Proc. of the First East-European, Symposium on Advances in Databases and Information Systems - ADBIS, St. Petersburg, vol. 1, pp. 311–317(1997)
Google Scholar
Baglioni, M., Turini, F.: MQL: An algebraic query language for knowledge discovery. In: Cappelli, A., Turini, F. (eds.) AI*IA 2003. LNCS, vol. 2829, pp. 225–236. Springer, Heidelberg (2003)
Chapter Google Scholar
Romei, A., Ruggieri, S., Turini, F.: KDDML: a middleware language and system for knowledge discovery in databases. Data & Knowledge Engineering 57(2), 179–220 (2006)
Article Google Scholar
Cereghini, P., Ordonez, C.: SQLEM: Fast Clustering in SQL using the EM Algorithm. In: SIGMOD Conference, pp. 559–570 (2000)
Google Scholar
Dunemann, O., Sattler, K.: SQL Database Primitives for Decision Tree Classifiers. In: Proc. of the 10th ACM CIKM Int. Conf. on Information and Knowledge Management, pp. 379–386 (2001)
Google Scholar
Gorawski, M., Pluciennik, E.: Distributed Data Mining by Means of SQL Enhancement. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM-WS 2008. LNCS, vol. 5333, pp. 34–35. Springer, Heidelberg (2008)
Chapter Google Scholar
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Quinlan, R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
Marcin Gorawski & Ewa Płuciennik-Psota

Authors

Marcin Gorawski
View author publications
You can also search for this author in PubMed Google Scholar
Ewa Płuciennik-Psota
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Informatics, Wroclaw University of Technology, Str. Janiszweskiego 11/17, 50-370, Wroclaw, Poland
Ngoc Thanh Nguyen
Centre for Complex Software Systems and Services, Swinburne University of Technology, Hawthorn, P.O. Box 218, 3122, Victoria, Australia
Ryszard Kowalczyk
Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, G#43, Sec. 4, Keelung Rd, 106, Taipei, Taiwan, R.O.C.
Shyi-Ming Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gorawski, M., Płuciennik-Psota, E. (2009). Distributed Data Mining Methodology with Classification Model Example. In: Nguyen, N.T., Kowalczyk, R., Chen, SM. (eds) Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems. ICCCI 2009. Lecture Notes in Computer Science(), vol 5796. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04441-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-04441-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04440-3
Online ISBN: 978-3-642-04441-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics