New Feature Selection and Weighting Methods Based on Category Information

Liu, Gongshen; Li, Jianhua; Li, Xiang; Li, Qiang

doi:10.1007/978-3-540-30544-6_35

New Feature Selection and Weighting Methods Based on Category Information

Gongshen Liu²²,
Jianhua Li²²,
Xiang Li²² &
…
Qiang Li²²

Conference paper

942 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3334))

Abstract

The traditional methods of feature selection and weighting make the best of document information, but despise or ignore the category information. The new feature selection and weighting methods use category information as a factor, which make up the disadvantages of traditional methods. Using new methods, the features distributed equally on a single category are more important than using old methods. It is proved by the experiment that four famous classifiers based on new feature selection and weighting methods are more effective than those based on traditional methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of ECML-1998, 10^th European Conference on Machines Learning, Chemnitz, Germany, pp. 137–142 (1998)
Google Scholar
Hull, D.A.: Improving text retrieval for the routing problem using latent semantic indexing. In: Proceedings of SIGIR-1994, 17th ACM International Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 282–289 (1994)
Google Scholar
Fuhr, N., Pfeifer, U.: Probabilistic information retrieval as combination of abstraction inductive learning and probabilistic assumptions. ACM Trans. Inform Syst. 12(1), 92–115 (1994)
Article Google Scholar
Greecy, R.H., Masand, B.M., Smith, S.J., Waltz, D.L.: Trading mips and memory for knowledge engineering: classifying census returns on the connection machine. Comm. ACM 35, 48–63 (1992)
Google Scholar
Lewis, D.D.: Naïve Bayes at forty: The independence assumption in information retrieval. In: Proceedings of ECML-1998, 10th European Conference on Machine Learning, Chemnitz, Germany, pp. 4–15 (1998)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1996)
MATH Google Scholar
Dagan, I., Karov, Y., Roth, D.: Mistake-driven learning in text categorization. In: Proceedings of EMNLP-97, 2^nd Conference on Empirical Methods in Natural Language Processing, Providence, RI, pp. 55–63 (1997)
Google Scholar
Moulinier, I., Raskinis, G., Ganascia, J.: Text Categorization: a symbolic approach. In: Proceedings of the 5^th Annual Symposium on Document Analysis and Information Retrieval (1996)
Google Scholar
Lewis, D.D., Schapire, R.E., Callan, J.P.: Training algorithm for linear text classifiers. In: SIGIR 1996: Proceedings of the 19^th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 298–306 (1996)
Google Scholar
Schapire, R.E., Singer, Y.: BoosTexter: a boosting –based system for text categorization. Mach. Learn. 39(2/3), 135–168 (2000)
Article MATH Google Scholar
Yang, Y., Pedersen, J.P.: Feature selection in statistical learning of text categorization. In: the 14th Int.Conf.on Machine Learning., pp. 412–420 (1997)
Google Scholar
Salton, G., McGill, M.J.: An Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Google Scholar
Dumais, S.T.: Improving the retrieval information from external sources. Behavior Research Methods, Instruments and Computers 23(2), 229–236 (1991)
Article Google Scholar
Breyer, L.A.: The DBACL Text Classifier, http://www.lbreyer.com
Chang, C.C., Lin, C.J.: LIBSVM: a Library for Support Vector Machines, http://www.csie.ntu.edu.tw
Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1), 67–88 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Security Engineering, Shanghai Jiaotong University, Shanghai, 200030, China
Gongshen Liu, Jianhua Li, Xiang Li & Qiang Li

Authors

Gongshen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, P.R. China
Zhaoneng Chen
Department of Management Information Systems, Eller College of Management, The University of Arizona, 85721, AZ, USA
Hsinchun Chen
Shanghai Library, Shanghai, P.R. China
Qihao Miao
BASICS, Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200030, Shanghai, China
Yuxi Fu
Digital Library Research Laboratory, Virginia Tech, USA
Edward Fox
School of Computer Engineering, Nanyang Technological University,
Ee-peng Lim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, G., Li, J., Li, X., Li, Q. (2004). New Feature Selection and Weighting Methods Based on Category Information. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, Ep. (eds) Digital Libraries: International Collaboration and Cross-Fertilization. ICADL 2004. Lecture Notes in Computer Science, vol 3334. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30544-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-540-30544-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24030-3
Online ISBN: 978-3-540-30544-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics