Skip to main content

From Parallel Data Mining to Grid-Enabled Distributed Knowledge Discovery

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4482))

Abstract

Data mining often is a compute intensive and time requiring process. For this reason, several data mining systems have been implemented on parallel computing platforms to achieve high performance in the analysis of large data sets. Moreover, when large data repositories are coupled with geographical distribution of data, users and systems, more sophisticated technologies are needed to implement high-performance distributed KDD systems. Recently computational Grids emerged as privileged platforms for distributed computing and a growing number of Grid-based KDD systems have been designed. In this paper we first outline different ways to exploit parallelism in the main data mining techniques and algorithms, then we discuss Grid-based KDD systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cohen, W.W.: Fast Effective Rule Induction. In: Proc. of the 12th Int. Conf. Machine Learning (ICML’95), Tahoe City, California, USA, pp. 115–123 (1995)

    Google Scholar 

  2. Provost, F.J., Aronis, J.M.: Scaling up inductive learning with massive parallelism. International Journal of Machine Learning 23(1), 33–46 (1996)

    Google Scholar 

  3. Skillicorn, D.: Strategies for Parallel Data Mining. IEEE Concurrency 7(4), 26–35 (1999)

    Article  Google Scholar 

  4. Talia, D.: Parallelism in Knowledge Discovery Techniques. In: Fagerholm, J., et al. (eds.) PARA 2002. LNCS, vol. 2367, pp. 127–136. Springer, Heidelberg (2002)

    Google Scholar 

  5. Foster, I., et al.: The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Globus Project (2002), http://www.globus.org/alliance/publications/papers/ogsa.pdf

  6. Congiusta, A., Talia, D., Trunfio, P.: Parallel and Grid-Based Data Mining. In: Data Mining and Knowledge Discovery Handbook, pp. 1017–1041. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Pawlak, Z.: Rough Sets. International Journal of Computer and Information Science 11, 341–356 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  8. Düntsch, I., Günther, G.: Roughian: Rough information analysis. International Journal of Intelligent Systems 16(1), 121–147 (2001)

    Article  MATH  Google Scholar 

  9. Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Systems. In: Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht (1992)

    Google Scholar 

  10. Park, B., Kargupta, H.: Distributed Data Mining: Algorithms, Systems, and Applications. In: Data Mining Handbook, pp. 341–358. IEA Publisher, Amsterdam (2002)

    Google Scholar 

  11. Moore, R.: Knowledge-based Grids. In: Proc. of the 18th IEEE Symposium on Mass Storage Systems and 9th Goddard Conference on Mass Storage Systems and Technologies, San Diego, USA (2001)

    Google Scholar 

  12. Berman, F.: From TeraGrid to Knowledge Grid. Communications of the ACM 44(11), 27–28 (2001)

    Article  Google Scholar 

  13. Johnston, W.E.: Computational and Data Grids in Large Scale Science and Engineering. Future Generation Computer Systems 18(8), 1085–1100 (2002)

    Article  MATH  Google Scholar 

  14. Talia, D., Cannataro, M., Trunfio, P.: KNOWLEDGE GRID: High Performance Knowledge Discovery Services on the Grid. In: Lee, C.A. (ed.) GRID 2001. LNCS, vol. 2242, Springer, Heidelberg (2001)

    Google Scholar 

  15. Cannataro, M., Talia, D.: The Knowledge Grid. Communications of the ACM 46(1), 89–93 (2003)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cesario, E., Talia, D. (2007). From Parallel Data Mining to Grid-Enabled Distributed Knowledge Discovery. In: An, A., Stefanowski, J., Ramanna, S., Butz, C.J., Pedrycz, W., Wang, G. (eds) Rough Sets, Fuzzy Sets, Data Mining and Granular Computing. RSFDGrC 2007. Lecture Notes in Computer Science(), vol 4482. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72530-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72530-5_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72529-9

  • Online ISBN: 978-3-540-72530-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics