Abstract
An important usage of natural language processing is creating vector representations of documents as features in a classification task. The traditional bag-of-word approach uses one-hot vector representations of words that aggregate into sparse vector document representation. This representation can be enhanced by weighting words that contribute the most to a classification task. In this paper, we propose a generalization of the Bi-Normal Separation metric that enhances vector representations of documents and outperforms TF-IDF scaling algorithms for one-of-m classification tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Code repository: https://github.com/jtbai/extended-bns.
References
De Boom, C., Van Canneyt, S., Demeester, T., Dhoedt, B.: Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn. Lett. 80, 150–156 (2016)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
Forman, G.: BNS feature scaling: an improved representation over TF-IDF for SVM text classification. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 263–270. ACM (2008)
Kapoor, A., Dhavale, S.: Control flow graph based multiclass malware detection using bi-normal separation. Defence Sci. J. 66(2), 138–145 (2016)
Kendall, M.G.: Further contributions to the theory of paired comparisons. Biometrics 11(1), 43–62 (1955)
Li, X., Roth, D.: Learning question classifiers: the role of semantic information. Natural Lang. Eng. 12(3), 229–249 (2006)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Acknowledgements
The authors gratefully acknowledge the Natural Sciences and Engineering Research Council of Canada, the Chaire d’actuariat de l’Université Laval and Intact Financial Corporation for financial support, and Véronique Barras-Fugère for her illustrations.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Baillargeon, JT., Lamontagne, L., Marceau, É. (2019). Weighting Words Using Bi-Normal Separation for Text Classification Tasks with Multiple Classes. In: Meurs, MJ., Rudzicz, F. (eds) Advances in Artificial Intelligence. Canadian AI 2019. Lecture Notes in Computer Science(), vol 11489. Springer, Cham. https://doi.org/10.1007/978-3-030-18305-9_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-18305-9_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18304-2
Online ISBN: 978-3-030-18305-9
eBook Packages: Computer ScienceComputer Science (R0)