Abstract
Feature selection is a powerful tool of dimension reduction from datasets. In the last decade, more and more researchers have paid attentions on feature selection. Further, some researchers begin to focus on feature selection from probabilistic datasets. However, in the existing method of feature selection from probabilistic data, the distance hidden in probabilistic data is neglected. In this paper, we design a new distance measure to select informative feature from probabilistic databases, in which both the distance and randomness in the data are considered. And then, we propose a feature selection algorithm based on the new distance and develop two accelerative algorithms to boost the computation. Furthermore, we introduce a parameter into the distance to reduce the sensitivity to noise. Finally, the experimental results verify the effectiveness of our algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering 21(5) (May 2009)
Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: Pro. 17th Int’l Conf. Machine Learning (2000)
Dash, M., Liu, H.: Consistency-based search in feature selection. Artificial Intelligence 151, 155–176 (2003)
DasSarma, A., Ben, O., Halevy, A., Widom, J.: Working models for uncertain data. In: ICDE (2006)
Ngai, W., Kao, B., Chui, C., Cheng, R., Chau, M., Yip, K.Y.: Efficient Clustering of Uncertain Data. In: Proc. Sixth IEEE Int’l Conf. Data Mining, ICDM (2006)
Srikant, R., Agrawal, R.: Mining generalized association rules. VLDB (1995)
Tsang, S., Kao, B., Yip, K.Y., Ho, W.-S., Lee, S.D.: Decision trees for uncertain data. TKDE (2011)
Ren, J., Lee, S.D., Chen, X., Kao, B., Cheng, R., Cheung, D.: Naïve Bayes Classification of Uncertain Data. In: ICDM 2009 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhao, T., Pei, B., Zhao, S., Chen, H., Li, C. (2013). Distance-Based Feature Selection from Probabilistic Data. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-38562-9_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38561-2
Online ISBN: 978-3-642-38562-9
eBook Packages: Computer ScienceComputer Science (R0)