Skip to main content

Efficient Data Mining by Active Learning

  • Chapter
  • First Online:
  • 555 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2281))

Abstract

An important issue in data mining and knowledge discovery is the issue of data scalability. We propose an approach to this problem by applying active learning as a method for data selection. In particular, we propose and evaluate a selective sampling method that belongs to the general category of ‘uncertainty sampling,’ by adopting and extending the ‘query by bagging’ method, proposed earlier by the authors as a query learning method. We empirically evaluate the effectiveness of the proposed method by comparing its performance against Breiman’s Ivotes, a representative sampling method for scaling up inductive algorithms. Our results show that the performance of the proposed method compares favorably against that of Ivotes, both in terms of the predictive accuracy achieved using a fixed amount of computation time, and the final accuracy achieved. This is found to be especially the case when the data size approaches a million, a typical data size encountered in real world data mining applications. We have also examined the effect of noise in the data and found that the advantage of the proposed method becomes more pronounced for larger noise levels.

Supported in part by a Grant-in-Aid for Scientific Research on Priority Areas “Discovery Science” from the Ministry of Education, Science, Sports and Culture of Japan. This work was carried out while this author was with NEC and Tokyo Institute of Technology.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. Abe and H. Mamitsuka. Query Learning Strategies Using Boosting and Bagging Proceedings of Fifteenth International Conference on Machine Learning, 1–9, 1998.

    Google Scholar 

  2. R. Agrawal and T. Imielinski and A. Swami. Database Mining: A Performance Perspective IEEE Transactions on Knowledge and Data Engineering, 5(6):914–925, 1993.

    Article  Google Scholar 

  3. L. Breiman. Bagging Predictors Machine Learning 24:123–140, 1996.

    MATH  MathSciNet  Google Scholar 

  4. L. Breiman. Pasting Small Votes for Classification in Large Databases and on-line Machine Learning 36:85–103, 1999.

    Article  Google Scholar 

  5. J. Catlett. Megainduction: Atest flight Proceedings of Eighth International Workshop on Machine Learning, 596–599, 1991.

    Google Scholar 

  6. Y. Freund and R. Schapire. Adecision-theoretic generalization of on-line learning and an application to boosting Journal of Computer and System Sciences 55(1), 119–139, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  7. J. Furnkranz Integrative windowing Journal of Artificial Intelligence Research 8:129–164, 1998.

    Google Scholar 

  8. J. Gehrke and V. Ganti and R. Ramakrishnan and W-Y. Loh BOAT — Optimistic Decision Tree Construction Proceedings of the ACM SIGMOD International Conference on Management of Data, 169–180, 1999.

    Google Scholar 

  9. D. Michie and D. Spiegelhalter and C. Taylor (Editors). Machine Learning, Neural and Statistical Classification, Ellis Horwood, London, 1994.

    MATH  Google Scholar 

  10. F. Provost and V. Kolluri. A Survey of Methods for Scaling Up Inductive Algorithms Knowledge Discovery and Data Mining 3(2):131–169, 1999.

    Article  Google Scholar 

  11. J. R. Quinlan. Learning efficient classification procedures and their applications to chess endgames Machine Learning: An artificial intelligence approach, R. S. Michalski and J. G. Carbonell and T. M. Mitchell (Editors), San Francisco, Morgan Kaufmann, 1983.

    Google Scholar 

  12. J. R. Quinlan C4.5: Programs for Machine Learning, San Francisco, Morgan Kaufmann, 1993.

    Google Scholar 

  13. R. Rastogi and K. Shim Public: ADecision Tree Classifier that integrates building and pruning Proceedings of 24th International Conference on Very Large Data Bases, New York, Morgan Kaufmann, 404–415, 1998.

    Google Scholar 

  14. H. S. Seung and M. Opper and H. Sompolinsky. Query by committee Proceedings of 5th Annual Workshop on Computational Learning Theory, 287–294, New York, ACM Press, 1992.

    Book  Google Scholar 

  15. S. M. Weiss and N. Indurkhya Predictive Data Mining, Morgan Kaufmann, San Francisco, 1998.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Mamitsuka, H., Abe, N. (2002). Efficient Data Mining by Active Learning. In: Arikawa, S., Shinohara, A. (eds) Progress in Discovery Science. Lecture Notes in Computer Science(), vol 2281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45884-0_17

Download citation

  • DOI: https://doi.org/10.1007/3-540-45884-0_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43338-5

  • Online ISBN: 978-3-540-45884-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics