Skip to main content

Using Discriminative Phrases for Text Categorization

  • Conference paper
  • 3587 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8227))

Abstract

This work aims at finding discriminative multi-word features or phrases from a given set of text documents. We use a modified Shapley value measure to select the list of discriminative features. The feature selection algorithm is employed on 20-newsgroup and Wikipedia datasets, along with several existing classifiers. Based on the results obtained, we show that adding phrases to the feature list can improve classification performance in comparison to using words alone; further, the improvement varies depending upon the separability of the classes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  2. Bekkerman, R., Allan, J.: Using Bigrams in Text Categorization. Technical report IR-408, Center for Intelligent Information Retrieval, UMass Amherst (2004)

    Google Scholar 

  3. Tan, C.M., Wang, Y.F., Lee, C.D.: The use of bigrams to enhance text categorization. Information Processing and Management 38, 529–546 (2002)

    Article  MATH  Google Scholar 

  4. Caropreso, M.F., Matwin, S., Sebastiani, F.: A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. In: Text Databases and Document Management: Theory and Practice, pp. 78–102. Idea Group Publishing, Hershey (2001)

    Google Scholar 

  5. Cohen, S., Dror, G., Ruppin, E.: Feature selection via coalitional game theory. Neural Computation 19, 1939–1961 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  6. Sun, X., Liu, Y., Li, J., Zhu, J., Chen, H., Liu, X.: Feature evaluation and selection with cooperative game theory. Pattern Recognition 45, 2992–3002 (2012)

    Article  Google Scholar 

  7. Sun, X., Liu, Y., Li, J., Zhu, J., Liu, X., Chen, H.: Using cooperative game theory to optimize the feature selection problem. Neurocomputing 97, 86–93 (2012)

    Article  Google Scholar 

  8. Garg, V.K., Narahari, Y., Murty, M.N.: Novel biobjective clustering (bigc) based on cooperative game theory. IEEE Transactions on Knowledge and Data Engineering 25, 1070–1082 (2013)

    Article  Google Scholar 

  9. Bhattacharyya, A.K.: On a measure of divergence between two statistical populations defined by their probability distribution. Bulletin of the Calcutta Mathematical Society 35, 99–110 (1943)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dey, S., Murty, M.N. (2013). Using Discriminative Phrases for Text Categorization. In: Lee, M., Hirose, A., Hou, ZG., Kil, R.M. (eds) Neural Information Processing. ICONIP 2013. Lecture Notes in Computer Science, vol 8227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42042-9_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-42042-9_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-42041-2

  • Online ISBN: 978-3-642-42042-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics