Active Mining in a Distributed Setting

Parthasarathy, Srinivasan; Dwarkadas, Sandhya; Ogihara, Mitsunori

doi:10.1007/3-540-46502-2_4

Srinivasan Parthasarathy³,
Sandhya Dwarkadas³ &
Mitsunori Ogihara³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1759))

708 Accesses
3 Citations

Abstract

Most current work in data mining assumes that the data is static, and a database update requires re-mining both the old and new data. In this article, we propose an alternative approach. We outline a general strategy by which data mining algorithms can be made active — i.e., maintain valid mined information in the presence of user interaction and database updates. We describe a runtime framework that allows efficient caching and sharing of data among clients and servers. We then demonstrate how existing algorithms for four key mining tasks: Discretization, Association Mining, Sequence Mining, and Similarity Discovery, can be re-architected so that they maintain valid mined information across i) database updates, and ii) user interactions in a client-server setting, while minimizing the amount of data re-accessed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

C. Aggarwal and P. Yu. Online generation of association rules. In IEEE International Conference on Data Engineering, February 1998.
Google Scholar
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In ACM SIGMOD Conf. Management of Data, May 1993.
Google Scholar
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri Verkamo. Fast discovery of association rules. In U. Fayyad and et al, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
D. Cheung, J. Han, V. Ng, A. Fu, and Y. Fu. A fast distributed algorithm for mining association rules. In 4th Intl. Conf. Parallel and Distributed Info. Systems, December 1996.
Google Scholar
D. Cheung, J. Han, V. Ng, and C. Wong. Maintenance of discovered association rules in large databases: an incremental updating technique. In 12th IEEE Intl. Conf. on Data Engineering, February 1996.
Google Scholar
G. Das, H. Mannila, and P. Ronkainen. Similarity of attributes by external probes. In Proceedings of the 4th Symposium on Knowledge Discovery and Data-Mining, 1998.
Google Scholar
L. Devroye. A course in density estimation. In Birkhauser: Boston MA, 1987.
MATH Google Scholar
J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. 12th ICML, 1995.
Google Scholar
U. Fayyad and K. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. 14th IJCAI, 1993.
Google Scholar
R. Feldman, Y. Aumann, A. Amir, and H. Mannila. Efficient algorithms for discovering frequent sets in incremental databases. In 2rd ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, May 1997.
Google Scholar
R. Grossman, S. Bailey, S. Kasif, D. Mon, A. Ramu, and B. Malhi. Design of papyrus: A system for high performance, distributed data mining over clusters, meta-clusters and super-clusters. In Proceedings of Workshop on Distributed Data Mining, alongwith KDD98, Aug 1998.
Google Scholar
Y. Guo, S. Rueger, J. Sutiwaraphun, and J. Forbes-Millot. Metalearning for parallel data mining. In Proceedings of the Seventh Parallel Computing Workshop, 1997.
Google Scholar
H. Kargupta, I. Hamzaoglu, and B. Stafford. Scalable, distributed data mining using an agent based architecture. In KDD, Aug 1997.
Google Scholar
M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding interesting rules from large sets of discovered association rules. In 3rd Intl. Conf. Information and Knowledge Management, pages 401–407, November 1994.
Google Scholar
R. T. Ng, L. Lakshmanan, J. Jan, and A. Pang. Exploratory mining and pruning optimizations of constrained association rules. In ACM SIGMOD Intl. Conf. Management of Data, June 1998.
Google Scholar
S. Parthasarathy and S. Dwarkadas. Shared state for client server applications. TR716, Department of Computers Science, University of Rochester, June 1999.
Google Scholar
S. Parthasarathy, R. Subramonian, and R. Venkata. Generalized discretization for summarization and classification. In PADD98, January 1998.
Google Scholar
S. Parthasarathy, M. Zaki, M. Ogihara, and S. Dwarkadas. Incremental and interactive sequence mining. TR715, Department of Computers Science, University of Rochester, June 1999.
Google Scholar
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Altos CA, 1993.
Google Scholar
J. Shafer, R. Agrawal, and M. Mehta. Sprint: A scalable parallel classifier for data mining. In 22nd VLDB Conference, March 1996.
Google Scholar
R. Srikant, Q. Vu, and R. Agrawal. Mining Association Rules with Item Constraints. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining, August 1997.
Google Scholar
S. Stolfo, A. Prodromidis, and P. Chan. Jam:java agents for meta-learning over distributed databases. In KDD, Aug 1997.
Google Scholar
R. Subramonian and S. Parthasarathy. A framework for distributed data mining. In Proceedings of Workshop on Distributed Data Mining, alongwith KDD98, Aug 1998.
Google Scholar
R. Subramonian, R. Venkata, and J. Chen. A visual interactive framework for attribute discretization. In Third International Conference on Knowledge Discovery and Data Mining, pages 82–88, 1997.
Google Scholar
S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka. Incremental updation of association rules. In KDD97, Aug 1997.
Google Scholar
K. Wang. Discovering patterns from large and dynamic sequential data. J. Intelligent Information Systems, 9(1), August 1997.
Google Scholar
M. J. Zaki. Efficient enumeration of frequent sequences. In 7th Intl. Conf. on Information and Knowledge Management, November 1998.
Google Scholar
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery: An International Journal, 1(4):343–373, December 1997.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Rochester, Rochester, NY, 14627-0226
Srinivasan Parthasarathy, Sandhya Dwarkadas & Mitsunori Ogihara

Authors

Srinivasan Parthasarathy
View author publications
You can also search for this author in PubMed Google Scholar
Sandhya Dwarkadas
View author publications
You can also search for this author in PubMed Google Scholar
Mitsunori Ogihara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Mohammed J. Zaki
K55/B1, IBM Almaden Research Center, 650 Harry Road, San Jose, CA, 95120, USA
Ching-Tien Ho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Parthasarathy, S., Dwarkadas, S., Ogihara, M. (2000). Active Mining in a Distributed Setting. In: Zaki, M.J., Ho, CT. (eds) Large-Scale Parallel Data Mining. Lecture Notes in Computer Science(), vol 1759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46502-2_4

Download citation

DOI: https://doi.org/10.1007/3-540-46502-2_4
Published: 17 May 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67194-7
Online ISBN: 978-3-540-46502-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics