Distributed Data Mining Methodology with Classification Model Example

  • Marcin Gorawski
  • Ewa Płuciennik-Psota
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5796)


Distributed computing and data mining are two elements essential for many commercial and scientific organizations. Data mining is a time and hardware resources consuming process of building analytical models of data. Distribution is often a part of organizations’ structure. Authors propose methodology of distributed data mining by combining local analytical models (build in parallel in nodes of a distributed computer system) into a global one without necessity to construct distributed version of data mining algorithm. Different combining strategies are proposed and their verification method as well. Proposed solutions were tested with data sets coming from UCI Machine Learning Repository.


Distributed data mining data analysis data models analytical SQL 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chan, P., Prodromidis, A., Stolfo, G.: Meta-learning in distributed data mining systems: Issues and approaches. In: Advances of Distributed Data Mining. AAAI Press, Menlo Park (2000)Google Scholar
  2. 2.
    Guo, Y., Reuger, S.M., Sutiwaraphun, J., Forbes-Millot, J.: Meta-learning for parallel data mining. In: Proceedings of the 7th Parallel Computing Workshop (1997)Google Scholar
  3. 3.
    Caragea, D., Silvescu, A., Honavar, V.: Invited Paper. A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees. International Journal of Hybrid Intelligent Systems 1(2), 80–89 (2004)CrossRefzbMATHGoogle Scholar
  4. 4.
    Grossman, R., Turinsky, A.: A Framework for Finding Distributed Data Mining Strategies That Are Intermediate Between Centralized Strategies and In-Place Strategies. In: Proceedings of Workshop on Distributed and Parallel Knowledge Discovery at KDD-2000, pp. 1–7 (2000)Google Scholar
  5. 5.
    Gorawski, M., Pluciennik, E.: Analytical Models Combining Methodology with Classification Model Example. In: First International Conference on Information Technology, Gdansk (2008),, ISBN: 978-1-4244-2244-9
  6. 6.
    Pluim, J.P.W., Maintz, J.B.A., Viergever, M.A.: Mutual-information-based registration of medical images: a survey. IEEE Transactions on Medical Imaging 22(8), 986–1004 (2003)CrossRefzbMATHGoogle Scholar
  7. 7.
    International Organization for Standardization (ISO). Information Technology, Database Language, SQL Multimedia and Application Packages, Part 6: Data Mining Draft Standard No. ISO/IEC 13249-6 (2003)Google Scholar
  8. 8.
    Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: DMQL: A Data Mining Query Language for Relational Database. In: Proc. of the SIGMOD 1996 Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada, pp. 27–34 (1996)Google Scholar
  9. 9.
    Imieliński, T., Virmani, A.: MSQL: A Query Language for Database Mining. Data Mining and Knowledge Discovery 3(4), 373–408 (1999)CrossRefGoogle Scholar
  10. 10.
    Meo, R., Psaila, G., Ceri, S.: A New SQL-like Operator for Mining Association Rules. In: Proc. 22nd VLDB Conference, Bombaj, India, pp. 122–133 (1996)Google Scholar
  11. 11.
    Meo, R., Psaila, G., Ceri, S.: An Extention to SQL for Mining Association Rules. Data Mining and Knowledge Discovery 2(2), 195–224 (1998)CrossRefGoogle Scholar
  12. 12.
    Morzy, T., Zakrzewicz, M.: SQL-like language for database mining. In: Proc. of the First East-European, Symposium on Advances in Databases and Information Systems - ADBIS, St. Petersburg, vol. 1, pp. 311–317(1997)Google Scholar
  13. 13.
    Baglioni, M., Turini, F.: MQL: An algebraic query language for knowledge discovery. In: Cappelli, A., Turini, F. (eds.) AI*IA 2003. LNCS, vol. 2829, pp. 225–236. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  14. 14.
    Romei, A., Ruggieri, S., Turini, F.: KDDML: a middleware language and system for knowledge discovery in databases. Data & Knowledge Engineering 57(2), 179–220 (2006)CrossRefGoogle Scholar
  15. 15.
    Cereghini, P., Ordonez, C.: SQLEM: Fast Clustering in SQL using the EM Algorithm. In: SIGMOD Conference, pp. 559–570 (2000)Google Scholar
  16. 16.
    Dunemann, O., Sattler, K.: SQL Database Primitives for Decision Tree Classifiers. In: Proc. of the 10th ACM CIKM Int. Conf. on Information and Knowledge Management, pp. 379–386 (2001)Google Scholar
  17. 17.
    Gorawski, M., Pluciennik, E.: Distributed Data Mining by Means of SQL Enhancement. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM-WS 2008. LNCS, vol. 5333, pp. 34–35. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science (2007),
  19. 19.
    Quinlan, R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Marcin Gorawski
    • 1
  • Ewa Płuciennik-Psota
    • 1
  1. 1.Institute of Computer ScienceSilesian University of TechnologyGliwicePoland

Personalised recommendations