Grid-Based Data Mining and Knowledge Discovery

  • Mario Cannataro
  • Antonio Congiusta
  • Carlo Mastroianni
  • Andrea Pugliese
  • Domenico Talia
  • Paolo Trunfio


The increasing use of computers in all the areas of human activities is resulting in huge collections of digital data. Databases are common everywhere and are used as repositories of every kind of data. Knowledge discovery techniques and tools are used today to analyze those very large data sets to identify interesting patterns and trends in them. When data is maintained over geographically distributed sites the computational power of distributed and parallel systems can be exploited for knowledge discovery in databases. In this scenario the Grid can provide an effective computational support for distributed knowledge discovery on large data sets. To this purpose we designed a system called Knowledge Grid This chapter describes the Knowledge Grid architecture and discusses some related systems and models recently proposed for knowledge discovery on Grids. The chapter presents also how to design and implement distributed data mining applications by using the Knowledge Grid tools starting from searching Grid resources, composing software and data elements, and executing the resulting application on a Grid.


Data Mining Knowledge Discovery Grid Service Execution Plan Data Mining Tool 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 2.1
    I. Foster, C. Kesselman, S. Tuecke: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. Int. J. of Supercomputing Applications, 15 (3) (2001)Google Scholar
  2. 2.2
    W. Hoschek, J.J. Martinez, A. Samar, H. Stockinger, K. Stockinger: Data Management in an International Data Grid Project. Proc. IEEE/ACM Int. Workshop on Grid Computing, Grid 2000 (LNCS Vol. 1971, Springer Verlag) pp. 77–90Google Scholar
  3. 2.3
    P. Avery, I. Foster: GriPhyN Project Description. Available at
  4. 2.4
    Y. Morita et al.: Grid Data Farm for Atlas Simulation Data Challenges. Proc. of Int. Conf. on Computing of High Energy and Nuclear Physics, 2001 pp. 699–701Google Scholar
  5. 2.5
    A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke: The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Data sets. Journal of Network and Computer Applications, 23, 187200 (2001)Google Scholar
  6. 2.6
    F. Berman: From TeraGrid to Knowledge Grid. Communications of the ACM, 44 (11), 27–28 (2001)CrossRefGoogle Scholar
  7. 2.7
    W.E. Johnston: Computational and Data Grids in Large-Scale Science and Engineering. Future Generation Computer Systems, 18 (8), 1085–1100 (2002)MATHCrossRefGoogle Scholar
  8. 2.8
    M. Cannataro, D. Talia, P. Trunfio: Knowledge Grid: High Performance Knowledge Discovery Services on the Grid. Proc. GRID 2001 (Springer-Verlag, 2001 ) pp. 38–50Google Scholar
  9. 2.9
    M. Cannataro, A. Congiusta, D. Talia, P. Trunfio: A Data Mining Toolset for Distributed High-performance Platforms. Proc. Conf. Data Mining 2002 ( Wessex Inst. Press, Bologna, Italy, 2002 )Google Scholar
  10. 2.10
    H. Kargupta, P. Chan (eds.): Advances in Distributed and Parallel Knowledge Discovery (AAAI/MIT Press, 2000 )Google Scholar
  11. 2.11
    C. Catlett: The TeraGrid: a Primer. Available at Scholar
  12. 2.12
    F. Berman: Private communication (November 2001)Google Scholar
  13. 2.13
    N. Giannadakis, A. Rowe, M. Ghanem, Y. Guo: InfoGrid: Providing Information Integration for Knowledge Discovery. To Appear in the Journal of Information ScienceGoogle Scholar
  14. 2.14
    The DataCutter project:
  15. 2.15
    D. Skillicorn, D. Talia: Mining Large Data Sets on Grids: Issues and Prospects. Computing and Informatics, 21, 347–362 (2002)MATHGoogle Scholar
  16. 2.16
    The ADaM system.
  17. 2.17
    T. Hinke, J. Novonty: Data Mining on NASA’s Information Power Grid. Proc. Ninth IEEE Int. Symposium on High Performance Distributed Computing, 2000 Google Scholar
  18. 2.18
    Discovery Net.
  19. 2.19
    V. Curcin, M. Ghanem, Y. Guo, M. Kohler, A. Rowe, J. Syed, P. Wendel: Discovery Net: Towards a Grid of Knowledge Discovery. Proc. Eighth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining ( Edmonton, Canada, 2002 )Google Scholar
  20. 2.20
    Testbeds, National Center for Data Mining, Laboratory for Advanced Computing, University of Illinois at Chicago.
  21. 2.21
    I. Foster: Building the Grid: An Integrated Services and Toolkit Architecture for Next Generation Networked Applications. Technical Report, available at (2000)Google Scholar
  22. 2.22
    I. Foster, C. Kesselman: Globus: a metacomputing infrastructure toolkit. Int. J. of Supercomputing Applications, 11, pp. 115–128 (1997)CrossRefGoogle Scholar
  23. 2.23
    The Globus Project. The Globus Resource Specification Language RSL v1.0. Available at
  24. 2.24
    C. Mastroianni, D. Talia, P. Trunfio: Managing Heterogeneous Resources in Data Mining Applications on Grids Using XML-based Metadata. To appear on Heterogeneous Computing Workshop (HCW 2003, Nice, France, April 2003 )Google Scholar
  25. 2.25
    M.S. Chen, J. Han, P.S. Yu: Data Mining: An Overview from a Database Perspective. IEEE Trans. Knowledge and Data Engineering, 8 (6), 866–883 (1996)CrossRefGoogle Scholar
  26. 2.26
    R.L. Grossman, M.F. Hornick, G. Meyer: Data Mining Standard Initiatives. Communications of the ACM, 45 (8) (August 2002)Google Scholar
  27. 2.27
    J. MacQueen: Some Methods for Classification and Analysis of Multivariate Observations. Proc. 5th Symp. on Mathematical Statistics and Probability, 1967, pp. 281–297Google Scholar
  28. 2.28
    IBM Intelligent Miner.
  29. 2.29
    P. Cheeseman, J. Stutz: Bayesian Classification (AutoClass): Theory and Results. In: U.M. Fayyad, G.P. Shapiro, P. Smyth, R. Uthurusamy (ads.), Advances in Knowledge Discovery and Data Mining (AAAI Press/MIT Press, 1996 ) pp. 61–83Google Scholar
  30. 2.30
    J.R. Quinlan: See5/C5.0, version 1.16. (2002)

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Mario Cannataro
    • 1
  • Antonio Congiusta
    • 2
  • Carlo Mastroianni
    • 3
  • Andrea Pugliese
    • 2
  • Domenico Talia
    • 2
  • Paolo Trunfio
    • 2
  1. 1.Informatics and Biomedical EngineeringUniversity Magna Græcia of CatanzaroItaly
  2. 2.DEISUniversità della CalabriaItaly
  3. 3.ICAR-CNRItalian National Research CouncilItaly

Personalised recommendations