Skip to main content

New Feature Selection and Weighting Methods Based on Category Information

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3334))

Abstract

The traditional methods of feature selection and weighting make the best of document information, but despise or ignore the category information. The new feature selection and weighting methods use category information as a factor, which make up the disadvantages of traditional methods. Using new methods, the features distributed equally on a single category are more important than using old methods. It is proved by the experiment that four famous classifiers based on new feature selection and weighting methods are more effective than those based on traditional methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of ECML-1998, 10th European Conference on Machines Learning, Chemnitz, Germany, pp. 137–142 (1998)

    Google Scholar 

  2. Hull, D.A.: Improving text retrieval for the routing problem using latent semantic indexing. In: Proceedings of SIGIR-1994, 17th ACM International Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 282–289 (1994)

    Google Scholar 

  3. Fuhr, N., Pfeifer, U.: Probabilistic information retrieval as combination of abstraction inductive learning and probabilistic assumptions. ACM Trans. Inform Syst. 12(1), 92–115 (1994)

    Article  Google Scholar 

  4. Greecy, R.H., Masand, B.M., Smith, S.J., Waltz, D.L.: Trading mips and memory for knowledge engineering: classifying census returns on the connection machine. Comm. ACM 35, 48–63 (1992)

    Google Scholar 

  5. Lewis, D.D.: Naïve Bayes at forty: The independence assumption in information retrieval. In: Proceedings of ECML-1998, 10th European Conference on Machine Learning, Chemnitz, Germany, pp. 4–15 (1998)

    Google Scholar 

  6. Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1996)

    MATH  Google Scholar 

  7. Dagan, I., Karov, Y., Roth, D.: Mistake-driven learning in text categorization. In: Proceedings of EMNLP-97, 2nd Conference on Empirical Methods in Natural Language Processing, Providence, RI, pp. 55–63 (1997)

    Google Scholar 

  8. Moulinier, I., Raskinis, G., Ganascia, J.: Text Categorization: a symbolic approach. In: Proceedings of the 5th Annual Symposium on Document Analysis and Information Retrieval (1996)

    Google Scholar 

  9. Lewis, D.D., Schapire, R.E., Callan, J.P.: Training algorithm for linear text classifiers. In: SIGIR 1996: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 298–306 (1996)

    Google Scholar 

  10. Schapire, R.E., Singer, Y.: BoosTexter: a boosting –based system for text categorization. Mach. Learn. 39(2/3), 135–168 (2000)

    Article  MATH  Google Scholar 

  11. Yang, Y., Pedersen, J.P.: Feature selection in statistical learning of text categorization. In: the 14th Int.Conf.on Machine Learning., pp. 412–420 (1997)

    Google Scholar 

  12. Salton, G., McGill, M.J.: An Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    Google Scholar 

  13. Dumais, S.T.: Improving the retrieval information from external sources. Behavior Research Methods, Instruments and Computers 23(2), 229–236 (1991)

    Article  Google Scholar 

  14. Breyer, L.A.: The DBACL Text Classifier, http://www.lbreyer.com

  15. Chang, C.C., Lin, C.J.: LIBSVM: a Library for Support Vector Machines, http://www.csie.ntu.edu.tw

  16. Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1), 67–88 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, G., Li, J., Li, X., Li, Q. (2004). New Feature Selection and Weighting Methods Based on Category Information. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, Ep. (eds) Digital Libraries: International Collaboration and Cross-Fertilization. ICADL 2004. Lecture Notes in Computer Science, vol 3334. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30544-6_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30544-6_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24030-3

  • Online ISBN: 978-3-540-30544-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics