Efficient Data Mining by Active Learning

Mamitsuka, Hiroshi; Abe, Naoki

doi:10.1007/3-540-45884-0_17

Efficient Data Mining by Active Learning

Hiroshi Mamitsuka² &
Naoki Abe³

Chapter
First Online: 01 January 2002

555 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2281))

Abstract

An important issue in data mining and knowledge discovery is the issue of data scalability. We propose an approach to this problem by applying active learning as a method for data selection. In particular, we propose and evaluate a selective sampling method that belongs to the general category of ‘uncertainty sampling,’ by adopting and extending the ‘query by bagging’ method, proposed earlier by the authors as a query learning method. We empirically evaluate the effectiveness of the proposed method by comparing its performance against Breiman’s Ivotes, a representative sampling method for scaling up inductive algorithms. Our results show that the performance of the proposed method compares favorably against that of Ivotes, both in terms of the predictive accuracy achieved using a fixed amount of computation time, and the final accuracy achieved. This is found to be especially the case when the data size approaches a million, a typical data size encountered in real world data mining applications. We have also examined the effect of noise in the data and found that the advantage of the proposed method becomes more pronounced for larger noise levels.

Supported in part by a Grant-in-Aid for Scientific Research on Priority Areas “Discovery Science” from the Ministry of Education, Science, Sports and Culture of Japan. This work was carried out while this author was with NEC and Tokyo Institute of Technology.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

N. Abe and H. Mamitsuka. Query Learning Strategies Using Boosting and Bagging Proceedings of Fifteenth International Conference on Machine Learning, 1–9, 1998.
Google Scholar
R. Agrawal and T. Imielinski and A. Swami. Database Mining: A Performance Perspective IEEE Transactions on Knowledge and Data Engineering, 5(6):914–925, 1993.
Article Google Scholar
L. Breiman. Bagging Predictors Machine Learning 24:123–140, 1996.
MATH MathSciNet Google Scholar
L. Breiman. Pasting Small Votes for Classification in Large Databases and on-line Machine Learning 36:85–103, 1999.
Article Google Scholar
J. Catlett. Megainduction: Atest flight Proceedings of Eighth International Workshop on Machine Learning, 596–599, 1991.
Google Scholar
Y. Freund and R. Schapire. Adecision-theoretic generalization of on-line learning and an application to boosting Journal of Computer and System Sciences 55(1), 119–139, 1997.
Article MATH MathSciNet Google Scholar
J. Furnkranz Integrative windowing Journal of Artificial Intelligence Research 8:129–164, 1998.
Google Scholar
J. Gehrke and V. Ganti and R. Ramakrishnan and W-Y. Loh BOAT — Optimistic Decision Tree Construction Proceedings of the ACM SIGMOD International Conference on Management of Data, 169–180, 1999.
Google Scholar
D. Michie and D. Spiegelhalter and C. Taylor (Editors). Machine Learning, Neural and Statistical Classification, Ellis Horwood, London, 1994.
MATH Google Scholar
F. Provost and V. Kolluri. A Survey of Methods for Scaling Up Inductive Algorithms Knowledge Discovery and Data Mining 3(2):131–169, 1999.
Article Google Scholar
J. R. Quinlan. Learning efficient classification procedures and their applications to chess endgames Machine Learning: An artificial intelligence approach, R. S. Michalski and J. G. Carbonell and T. M. Mitchell (Editors), San Francisco, Morgan Kaufmann, 1983.
Google Scholar
J. R. Quinlan C4.5: Programs for Machine Learning, San Francisco, Morgan Kaufmann, 1993.
Google Scholar
R. Rastogi and K. Shim Public: ADecision Tree Classifier that integrates building and pruning Proceedings of 24th International Conference on Very Large Data Bases, New York, Morgan Kaufmann, 404–415, 1998.
Google Scholar
H. S. Seung and M. Opper and H. Sompolinsky. Query by committee Proceedings of 5th Annual Workshop on Computational Learning Theory, 287–294, New York, ACM Press, 1992.
Book Google Scholar
S. M. Weiss and N. Indurkhya Predictive Data Mining, Morgan Kaufmann, San Francisco, 1998.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computational Engineering Technology Group Computer and Communications Research, NEC Corporation, Japan
Hiroshi Mamitsuka
Mathematical Sciences Department, IBM Thomas J. Watson Research Center, Japan
Naoki Abe

Authors

Hiroshi Mamitsuka
View author publications
You can also search for this author in PubMed Google Scholar
Naoki Abe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, 812-8581, Fukuoka, Japan
Setsuo Arikawa & Ayumi Shinohara &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mamitsuka, H., Abe, N. (2002). Efficient Data Mining by Active Learning. In: Arikawa, S., Shinohara, A. (eds) Progress in Discovery Science. Lecture Notes in Computer Science(), vol 2281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45884-0_17

Download citation

DOI: https://doi.org/10.1007/3-540-45884-0_17
Published: 14 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43338-5
Online ISBN: 978-3-540-45884-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics