Abstract
This work aims at finding discriminative multi-word features or phrases from a given set of text documents. We use a modified Shapley value measure to select the list of discriminative features. The feature selection algorithm is employed on 20-newsgroup and Wikipedia datasets, along with several existing classifiers. Based on the results obtained, we show that adding phrases to the feature list can improve classification performance in comparison to using words alone; further, the improvement varies depending upon the separability of the classes.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Bekkerman, R., Allan, J.: Using Bigrams in Text Categorization. Technical report IR-408, Center for Intelligent Information Retrieval, UMass Amherst (2004)
Tan, C.M., Wang, Y.F., Lee, C.D.: The use of bigrams to enhance text categorization. Information Processing and Management 38, 529–546 (2002)
Caropreso, M.F., Matwin, S., Sebastiani, F.: A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. In: Text Databases and Document Management: Theory and Practice, pp. 78–102. Idea Group Publishing, Hershey (2001)
Cohen, S., Dror, G., Ruppin, E.: Feature selection via coalitional game theory. Neural Computation 19, 1939–1961 (2007)
Sun, X., Liu, Y., Li, J., Zhu, J., Chen, H., Liu, X.: Feature evaluation and selection with cooperative game theory. Pattern Recognition 45, 2992–3002 (2012)
Sun, X., Liu, Y., Li, J., Zhu, J., Liu, X., Chen, H.: Using cooperative game theory to optimize the feature selection problem. Neurocomputing 97, 86–93 (2012)
Garg, V.K., Narahari, Y., Murty, M.N.: Novel biobjective clustering (bigc) based on cooperative game theory. IEEE Transactions on Knowledge and Data Engineering 25, 1070–1082 (2013)
Bhattacharyya, A.K.: On a measure of divergence between two statistical populations defined by their probability distribution. Bulletin of the Calcutta Mathematical Society 35, 99–110 (1943)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dey, S., Murty, M.N. (2013). Using Discriminative Phrases for Text Categorization. In: Lee, M., Hirose, A., Hou, ZG., Kil, R.M. (eds) Neural Information Processing. ICONIP 2013. Lecture Notes in Computer Science, vol 8227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42042-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-42042-9_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-42041-2
Online ISBN: 978-3-642-42042-9
eBook Packages: Computer ScienceComputer Science (R0)