Advertisement

Active Mining in a Distributed Setting

  • Srinivasan Parthasarathy
  • Sandhya Dwarkadas
  • Mitsunori Ogihara
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1759)

Abstract

Most current work in data mining assumes that the data is static, and a database update requires re-mining both the old and new data. In this article, we propose an alternative approach. We outline a general strategy by which data mining algorithms can be made active — i.e., maintain valid mined information in the presence of user interaction and database updates. We describe a runtime framework that allows efficient caching and sharing of data among clients and servers. We then demonstrate how existing algorithms for four key mining tasks: Discretization, Association Mining, Sequence Mining, and Similarity Discovery, can be re-architected so that they maintain valid mined information across i) database updates, and ii) user interactions in a client-server setting, while minimizing the amount of data re-accessed.

Keywords

Association Rule Active Mining Mining Association Rule Server Load Sequence Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    C. Aggarwal and P. Yu. Online generation of association rules. In IEEE International Conference on Data Engineering, February 1998.Google Scholar
  2. 2.
    R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In ACM SIGMOD Conf. Management of Data, May 1993.Google Scholar
  3. 3.
    R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri Verkamo. Fast discovery of association rules. In U. Fayyad and et al, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.Google Scholar
  4. 4.
    D. Cheung, J. Han, V. Ng, A. Fu, and Y. Fu. A fast distributed algorithm for mining association rules. In 4th Intl. Conf. Parallel and Distributed Info. Systems, December 1996.Google Scholar
  5. 5.
    D. Cheung, J. Han, V. Ng, and C. Wong. Maintenance of discovered association rules in large databases: an incremental updating technique. In 12th IEEE Intl. Conf. on Data Engineering, February 1996.Google Scholar
  6. 6.
    G. Das, H. Mannila, and P. Ronkainen. Similarity of attributes by external probes. In Proceedings of the 4th Symposium on Knowledge Discovery and Data-Mining, 1998.Google Scholar
  7. 7.
    L. Devroye. A course in density estimation. In Birkhauser: Boston MA, 1987.zbMATHGoogle Scholar
  8. 8.
    J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. 12th ICML, 1995.Google Scholar
  9. 9.
    U. Fayyad and K. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. 14th IJCAI, 1993.Google Scholar
  10. 10.
    R. Feldman, Y. Aumann, A. Amir, and H. Mannila. Efficient algorithms for discovering frequent sets in incremental databases. In 2rd ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, May 1997.Google Scholar
  11. 11.
    R. Grossman, S. Bailey, S. Kasif, D. Mon, A. Ramu, and B. Malhi. Design of papyrus: A system for high performance, distributed data mining over clusters, meta-clusters and super-clusters. In Proceedings of Workshop on Distributed Data Mining, alongwith KDD98, Aug 1998.Google Scholar
  12. 12.
    Y. Guo, S. Rueger, J. Sutiwaraphun, and J. Forbes-Millot. Metalearning for parallel data mining. In Proceedings of the Seventh Parallel Computing Workshop, 1997.Google Scholar
  13. 13.
    H. Kargupta, I. Hamzaoglu, and B. Stafford. Scalable, distributed data mining using an agent based architecture. In KDD, Aug 1997.Google Scholar
  14. 14.
    M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding interesting rules from large sets of discovered association rules. In 3rd Intl. Conf. Information and Knowledge Management, pages 401–407, November 1994.Google Scholar
  15. 15.
    R. T. Ng, L. Lakshmanan, J. Jan, and A. Pang. Exploratory mining and pruning optimizations of constrained association rules. In ACM SIGMOD Intl. Conf. Management of Data, June 1998.Google Scholar
  16. 16.
    S. Parthasarathy and S. Dwarkadas. Shared state for client server applications. TR716, Department of Computers Science, University of Rochester, June 1999.Google Scholar
  17. 17.
    S. Parthasarathy, R. Subramonian, and R. Venkata. Generalized discretization for summarization and classification. In PADD98, January 1998.Google Scholar
  18. 18.
    S. Parthasarathy, M. Zaki, M. Ogihara, and S. Dwarkadas. Incremental and interactive sequence mining. TR715, Department of Computers Science, University of Rochester, June 1999.Google Scholar
  19. 19.
    J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Altos CA, 1993.Google Scholar
  20. 20.
    J. Shafer, R. Agrawal, and M. Mehta. Sprint: A scalable parallel classifier for data mining. In 22nd VLDB Conference, March 1996.Google Scholar
  21. 21.
    R. Srikant, Q. Vu, and R. Agrawal. Mining Association Rules with Item Constraints. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining, August 1997.Google Scholar
  22. 22.
    S. Stolfo, A. Prodromidis, and P. Chan. Jam:java agents for meta-learning over distributed databases. In KDD, Aug 1997.Google Scholar
  23. 23.
    R. Subramonian and S. Parthasarathy. A framework for distributed data mining. In Proceedings of Workshop on Distributed Data Mining, alongwith KDD98, Aug 1998.Google Scholar
  24. 24.
    R. Subramonian, R. Venkata, and J. Chen. A visual interactive framework for attribute discretization. In Third International Conference on Knowledge Discovery and Data Mining, pages 82–88, 1997.Google Scholar
  25. 25.
    S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka. Incremental updation of association rules. In KDD97, Aug 1997.Google Scholar
  26. 26.
    K. Wang. Discovering patterns from large and dynamic sequential data. J. Intelligent Information Systems, 9(1), August 1997.Google Scholar
  27. 27.
    M. J. Zaki. Efficient enumeration of frequent sequences. In 7th Intl. Conf. on Information and Knowledge Management, November 1998.Google Scholar
  28. 28.
    M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery: An International Journal, 1(4):343–373, December 1997.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Srinivasan Parthasarathy
    • 1
  • Sandhya Dwarkadas
    • 1
  • Mitsunori Ogihara
    • 1
  1. 1.Department of Computer ScienceUniversity of RochesterRochester

Personalised recommendations