Skip to main content

Fast Data Acquisition in Cost-Sensitive Learning

  • Conference paper
Advances in Data Mining. Applications and Theoretical Aspects (ICDM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6870))

Included in the following conference series:

  • 1472 Accesses

Abstract

Data acquisition is the first and one of the most important steps in many data mining applications. It is a time consuming and costly task. Acquiring an insufficient number of examples makes the learned model and future prediction inaccurate, while acquiring more examples than necessary wastes time and money. Thus it is very important to estimate the number examples needed for learning algorithms in machine learning. However, most previous learning algorithms learn from a given and fixed set of examples. To our knowledge, little previous work in machine learning can dynamically acquire examples as it learns, and decide the ideal number of examples needed. In this paper, we propose a simple on-line framework for fast data acquisition (FDA). FDA is an extrapolation method that estimates the number of examples needed in each acquisition and acquire them simultaneously. Comparing to the naïve step-by-step data acquisition strategy, FDA reduces significantly the number of times of data acquisition and model building. This would significantly reduce the total cost of misclassification, data acquisition arrangement, computation, and examples acquired costs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abe, N., Zadrozny, B., Langford, J.: An iterative method for multiclass cost-sensitive learning. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, pp. 3–11 (2004)

    Google Scholar 

  2. Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases (website). University of California, Department of Information and Computer Science, Irvine, CA (1998)

    Google Scholar 

  3. Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: Proceedings of the 9th European Conference on Artificial Intelligence, Sweden, pp. 147–149 (1990)

    Google Scholar 

  4. Domingos, P.: MetaCost: A General Method for Making Classifiers Cost-Sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM Press, San Diego (1999)

    Google Scholar 

  5. Elkan, C.: The Foundations of Cost-Sensitive Learning. In: Proceedings of the Seventeenth International Joint Conference of Artificial Intelligence, pp. 973–978. Morgan Kaufmann, Seattle (2001)

    Google Scholar 

  6. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued at-tributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, France (1993)

    Google Scholar 

  7. Good, I.J.: The estimation of probabilities: An essay on modern Bayesian methods. M.I.T. Press, Cambridge (1965)

    MATH  Google Scholar 

  8. Kapoor, A., Greiner, R.: Learning and Classifying under Hard Budgets. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 170–181. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Ling, C.X., Yang, Q., Wang, J., Zhang, S.: Decision Trees with Minimal Costs. In: Proceedings of the Twenty-First International Conference on Machine Learning. Morgan Kaufmann, Banff (2004)

    Google Scholar 

  10. Lizotte, D., Madani, O., Greiner, R.: Budgeted Learning of Naive-Bayes Classi-fiers. In: Proceeding of the Conference on Uncertainty in Artificial Intelligence, Acapulco, Mexico (August 2003)

    Google Scholar 

  11. Melville, P., Saar-Tsechansky, M., Provost, F., Mooney, R.J.: Active Feature Acquisition for Classifier Induction. In: Proceedings of the Fourth International Conference on Data Mining, Brighton, UK (2004)

    Google Scholar 

  12. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  13. Ting, K.M.: Inducing Cost-Sensitive Trees via Instance Weighting. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 23–26. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  14. Turney, P.D.: Types of cost in inductive concept learning. In: Proceedings of the Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning. Stanford University, California (2000)

    Google Scholar 

  15. Turney, P.D.: Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Ge-netic Decision Tree Induction Algorithm. Journal of Artificial Intelligence Research 2, 369–409 (1995)

    Google Scholar 

  16. Weiss, G.M., Tian, Y.: Maximizing Classifier Utility when Training Data is Costly. In: UBDM 2006, Philadelphia, Pennsylvania, USA, August 20 (2006)

    Google Scholar 

  17. Zhou, Z.-H., Liu, X.-Y.: On multi-class cost-sensitive learning. In: Proceedings of the 21st National Conference on Artificial Intelligence, Boston, MA, pp. 567–572 (2006)

    Google Scholar 

  18. Zhu, X., Wu, X.: Cost-constrained Data Acquisition for Intelligent Data Preparation. IEEE Transactions on Knowledge and Data Engineering 17(11) (November 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sheng, V.S. (2011). Fast Data Acquisition in Cost-Sensitive Learning. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2011. Lecture Notes in Computer Science(), vol 6870. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23184-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23184-1_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23183-4

  • Online ISBN: 978-3-642-23184-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics