Abstract
Data acquisition is the first and one of the most important steps in many data mining applications. It is a time consuming and costly task. Acquiring an insufficient number of examples makes the learned model and future prediction inaccurate, while acquiring more examples than necessary wastes time and money. Thus it is very important to estimate the number examples needed for learning algorithms in machine learning. However, most previous learning algorithms learn from a given and fixed set of examples. To our knowledge, little previous work in machine learning can dynamically acquire examples as it learns, and decide the ideal number of examples needed. In this paper, we propose a simple on-line framework for fast data acquisition (FDA). FDA is an extrapolation method that estimates the number of examples needed in each acquisition and acquire them simultaneously. Comparing to the naïve step-by-step data acquisition strategy, FDA reduces significantly the number of times of data acquisition and model building. This would significantly reduce the total cost of misclassification, data acquisition arrangement, computation, and examples acquired costs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abe, N., Zadrozny, B., Langford, J.: An iterative method for multiclass cost-sensitive learning. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, pp. 3–11 (2004)
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases (website). University of California, Department of Information and Computer Science, Irvine, CA (1998)
Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: Proceedings of the 9th European Conference on Artificial Intelligence, Sweden, pp. 147–149 (1990)
Domingos, P.: MetaCost: A General Method for Making Classifiers Cost-Sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM Press, San Diego (1999)
Elkan, C.: The Foundations of Cost-Sensitive Learning. In: Proceedings of the Seventeenth International Joint Conference of Artificial Intelligence, pp. 973–978. Morgan Kaufmann, Seattle (2001)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued at-tributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, France (1993)
Good, I.J.: The estimation of probabilities: An essay on modern Bayesian methods. M.I.T. Press, Cambridge (1965)
Kapoor, A., Greiner, R.: Learning and Classifying under Hard Budgets. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 170–181. Springer, Heidelberg (2005)
Ling, C.X., Yang, Q., Wang, J., Zhang, S.: Decision Trees with Minimal Costs. In: Proceedings of the Twenty-First International Conference on Machine Learning. Morgan Kaufmann, Banff (2004)
Lizotte, D., Madani, O., Greiner, R.: Budgeted Learning of Naive-Bayes Classi-fiers. In: Proceeding of the Conference on Uncertainty in Artificial Intelligence, Acapulco, Mexico (August 2003)
Melville, P., Saar-Tsechansky, M., Provost, F., Mooney, R.J.: Active Feature Acquisition for Classifier Induction. In: Proceedings of the Fourth International Conference on Data Mining, Brighton, UK (2004)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Ting, K.M.: Inducing Cost-Sensitive Trees via Instance Weighting. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 23–26. Springer, Heidelberg (1998)
Turney, P.D.: Types of cost in inductive concept learning. In: Proceedings of the Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning. Stanford University, California (2000)
Turney, P.D.: Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Ge-netic Decision Tree Induction Algorithm. Journal of Artificial Intelligence Research 2, 369–409 (1995)
Weiss, G.M., Tian, Y.: Maximizing Classifier Utility when Training Data is Costly. In: UBDM 2006, Philadelphia, Pennsylvania, USA, August 20 (2006)
Zhou, Z.-H., Liu, X.-Y.: On multi-class cost-sensitive learning. In: Proceedings of the 21st National Conference on Artificial Intelligence, Boston, MA, pp. 567–572 (2006)
Zhu, X., Wu, X.: Cost-constrained Data Acquisition for Intelligent Data Preparation. IEEE Transactions on Knowledge and Data Engineering 17(11) (November 2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sheng, V.S. (2011). Fast Data Acquisition in Cost-Sensitive Learning. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2011. Lecture Notes in Computer Science(), vol 6870. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23184-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-23184-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23183-4
Online ISBN: 978-3-642-23184-1
eBook Packages: Computer ScienceComputer Science (R0)