Abstract
The traditional methods of feature selection and weighting make the best of document information, but despise or ignore the category information. The new feature selection and weighting methods use category information as a factor, which make up the disadvantages of traditional methods. Using new methods, the features distributed equally on a single category are more important than using old methods. It is proved by the experiment that four famous classifiers based on new feature selection and weighting methods are more effective than those based on traditional methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of ECML-1998, 10th European Conference on Machines Learning, Chemnitz, Germany, pp. 137–142 (1998)
Hull, D.A.: Improving text retrieval for the routing problem using latent semantic indexing. In: Proceedings of SIGIR-1994, 17th ACM International Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 282–289 (1994)
Fuhr, N., Pfeifer, U.: Probabilistic information retrieval as combination of abstraction inductive learning and probabilistic assumptions. ACM Trans. Inform Syst. 12(1), 92–115 (1994)
Greecy, R.H., Masand, B.M., Smith, S.J., Waltz, D.L.: Trading mips and memory for knowledge engineering: classifying census returns on the connection machine. Comm. ACM 35, 48–63 (1992)
Lewis, D.D.: Naïve Bayes at forty: The independence assumption in information retrieval. In: Proceedings of ECML-1998, 10th European Conference on Machine Learning, Chemnitz, Germany, pp. 4–15 (1998)
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1996)
Dagan, I., Karov, Y., Roth, D.: Mistake-driven learning in text categorization. In: Proceedings of EMNLP-97, 2nd Conference on Empirical Methods in Natural Language Processing, Providence, RI, pp. 55–63 (1997)
Moulinier, I., Raskinis, G., Ganascia, J.: Text Categorization: a symbolic approach. In: Proceedings of the 5th Annual Symposium on Document Analysis and Information Retrieval (1996)
Lewis, D.D., Schapire, R.E., Callan, J.P.: Training algorithm for linear text classifiers. In: SIGIR 1996: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 298–306 (1996)
Schapire, R.E., Singer, Y.: BoosTexter: a boosting –based system for text categorization. Mach. Learn. 39(2/3), 135–168 (2000)
Yang, Y., Pedersen, J.P.: Feature selection in statistical learning of text categorization. In: the 14th Int.Conf.on Machine Learning., pp. 412–420 (1997)
Salton, G., McGill, M.J.: An Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Dumais, S.T.: Improving the retrieval information from external sources. Behavior Research Methods, Instruments and Computers 23(2), 229–236 (1991)
Breyer, L.A.: The DBACL Text Classifier, http://www.lbreyer.com
Chang, C.C., Lin, C.J.: LIBSVM: a Library for Support Vector Machines, http://www.csie.ntu.edu.tw
Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1), 67–88 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, G., Li, J., Li, X., Li, Q. (2004). New Feature Selection and Weighting Methods Based on Category Information. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, Ep. (eds) Digital Libraries: International Collaboration and Cross-Fertilization. ICADL 2004. Lecture Notes in Computer Science, vol 3334. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30544-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-540-30544-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24030-3
Online ISBN: 978-3-540-30544-6
eBook Packages: Computer ScienceComputer Science (R0)