Fast Data Acquisition in Cost-Sensitive Learning

Sheng, Victor S.

doi:10.1007/978-3-642-23184-1_6

Victor S. Sheng²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6870))

Included in the following conference series:

Industrial Conference on Data Mining

1472 Accesses

Abstract

Data acquisition is the first and one of the most important steps in many data mining applications. It is a time consuming and costly task. Acquiring an insufficient number of examples makes the learned model and future prediction inaccurate, while acquiring more examples than necessary wastes time and money. Thus it is very important to estimate the number examples needed for learning algorithms in machine learning. However, most previous learning algorithms learn from a given and fixed set of examples. To our knowledge, little previous work in machine learning can dynamically acquire examples as it learns, and decide the ideal number of examples needed. In this paper, we propose a simple on-line framework for fast data acquisition (FDA). FDA is an extrapolation method that estimates the number of examples needed in each acquisition and acquire them simultaneously. Comparing to the naïve step-by-step data acquisition strategy, FDA reduces significantly the number of times of data acquisition and model building. This would significantly reduce the total cost of misclassification, data acquisition arrangement, computation, and examples acquired costs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abe, N., Zadrozny, B., Langford, J.: An iterative method for multiclass cost-sensitive learning. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, pp. 3–11 (2004)
Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases (website). University of California, Department of Information and Computer Science, Irvine, CA (1998)
Google Scholar
Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: Proceedings of the 9th European Conference on Artificial Intelligence, Sweden, pp. 147–149 (1990)
Google Scholar
Domingos, P.: MetaCost: A General Method for Making Classifiers Cost-Sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM Press, San Diego (1999)
Google Scholar
Elkan, C.: The Foundations of Cost-Sensitive Learning. In: Proceedings of the Seventeenth International Joint Conference of Artificial Intelligence, pp. 973–978. Morgan Kaufmann, Seattle (2001)
Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued at-tributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, France (1993)
Google Scholar
Good, I.J.: The estimation of probabilities: An essay on modern Bayesian methods. M.I.T. Press, Cambridge (1965)
MATH Google Scholar
Kapoor, A., Greiner, R.: Learning and Classifying under Hard Budgets. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 170–181. Springer, Heidelberg (2005)
Chapter Google Scholar
Ling, C.X., Yang, Q., Wang, J., Zhang, S.: Decision Trees with Minimal Costs. In: Proceedings of the Twenty-First International Conference on Machine Learning. Morgan Kaufmann, Banff (2004)
Google Scholar
Lizotte, D., Madani, O., Greiner, R.: Budgeted Learning of Naive-Bayes Classi-fiers. In: Proceeding of the Conference on Uncertainty in Artificial Intelligence, Acapulco, Mexico (August 2003)
Google Scholar
Melville, P., Saar-Tsechansky, M., Provost, F., Mooney, R.J.: Active Feature Acquisition for Classifier Induction. In: Proceedings of the Fourth International Conference on Data Mining, Brighton, UK (2004)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Ting, K.M.: Inducing Cost-Sensitive Trees via Instance Weighting. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 23–26. Springer, Heidelberg (1998)
Chapter Google Scholar
Turney, P.D.: Types of cost in inductive concept learning. In: Proceedings of the Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning. Stanford University, California (2000)
Google Scholar
Turney, P.D.: Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Ge-netic Decision Tree Induction Algorithm. Journal of Artificial Intelligence Research 2, 369–409 (1995)
Google Scholar
Weiss, G.M., Tian, Y.: Maximizing Classifier Utility when Training Data is Costly. In: UBDM 2006, Philadelphia, Pennsylvania, USA, August 20 (2006)
Google Scholar
Zhou, Z.-H., Liu, X.-Y.: On multi-class cost-sensitive learning. In: Proceedings of the 21st National Conference on Artificial Intelligence, Boston, MA, pp. 567–572 (2006)
Google Scholar
Zhu, X., Wu, X.: Cost-constrained Data Acquisition for Intelligent Data Preparation. IEEE Transactions on Knowledge and Data Engineering 17(11) (November 2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Central Arkansas, Conway, AR, 72034, USA
Victor S. Sheng

Authors

Victor S. Sheng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intitute of Computer Vision and Applied Computer Sciences, IBaI, Kohlenstraße 2, 04107, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sheng, V.S. (2011). Fast Data Acquisition in Cost-Sensitive Learning. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2011. Lecture Notes in Computer Science(), vol 6870. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23184-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-23184-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23183-4
Online ISBN: 978-3-642-23184-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics