Distance-Based Feature Selection from Probabilistic Data

Zhao, Tingting; Pei, Bin; Zhao, Suyun; Chen, Hong; Li, Cuiping

doi:10.1007/978-3-642-38562-9_29

Tingting Zhao^21,22,
Bin Pei^21,22,
Suyun Zhao²¹,
Hong Chen^21,22 &
…
Cuiping Li^21,22

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7923))

Included in the following conference series:

International Conference on Web-Age Information Management

3446 Accesses

Abstract

Feature selection is a powerful tool of dimension reduction from datasets. In the last decade, more and more researchers have paid attentions on feature selection. Further, some researchers begin to focus on feature selection from probabilistic datasets. However, in the existing method of feature selection from probabilistic data, the distance hidden in probabilistic data is neglected. In this paper, we design a new distance measure to select informative feature from probabilistic databases, in which both the distance and randomness in the data are considered. And then, we propose a feature selection algorithm based on the new distance and develop two accelerative algorithms to boost the computation. Furthermore, we introduce a parameter into the distance to reduce the sensitivity to noise. Finally, the experimental results verify the effectiveness of our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering 21(5) (May 2009)
Google Scholar
Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: Pro. 17th Int’l Conf. Machine Learning (2000)
Google Scholar
Dash, M., Liu, H.: Consistency-based search in feature selection. Artificial Intelligence 151, 155–176 (2003)
Article MathSciNet MATH Google Scholar
DasSarma, A., Ben, O., Halevy, A., Widom, J.: Working models for uncertain data. In: ICDE (2006)
Google Scholar
Ngai, W., Kao, B., Chui, C., Cheng, R., Chau, M., Yip, K.Y.: Efficient Clustering of Uncertain Data. In: Proc. Sixth IEEE Int’l Conf. Data Mining, ICDM (2006)
Google Scholar
Srikant, R., Agrawal, R.: Mining generalized association rules. VLDB (1995)
Google Scholar
Tsang, S., Kao, B., Yip, K.Y., Ho, W.-S., Lee, S.D.: Decision trees for uncertain data. TKDE (2011)
Google Scholar
Ren, J., Lee, S.D., Chen, X., Kao, B., Cheng, R., Cheung, D.: Naïve Bayes Classification of Uncertain Data. In: ICDM 2009 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Lab of Data Engineering and Knowledge Engineering, Ministry of Education, China
Tingting Zhao, Bin Pei, Suyun Zhao, Hong Chen & Cuiping Li
Department of Computer Science, Renmin University of China, China
Tingting Zhao, Bin Pei, Hong Chen & Cuiping Li

Authors

Tingting Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Bin Pei
View author publications
You can also search for this author in PubMed Google Scholar
Suyun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Hong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Cuiping Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Jianyong Wang
Management Science and Information Systems Department, Rutgers, the State University of New Jersey, 1, Washington Park, 07102, Newark, NJ, USA
Hui Xiong
Department of Information Engineering, Nagoya University, 464-8601, Nagoya, Japan
Yoshiharu Ishikawa
Department of Computer Science, Hong Kong Baptist University, Hong Kong
Jianliang Xu
School of Information Science and Engineering, Yanshan University, Qinhuangdao, China
Junfeng Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, T., Pei, B., Zhao, S., Chen, H., Li, C. (2013). Distance-Based Feature Selection from Probabilistic Data. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-38562-9_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38561-2
Online ISBN: 978-3-642-38562-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics