Skip to main content

Weighting Words Using Bi-Normal Separation for Text Classification Tasks with Multiple Classes

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11489))

Included in the following conference series:

Abstract

An important usage of natural language processing is creating vector representations of documents as features in a classification task. The traditional bag-of-word approach uses one-hot vector representations of words that aggregate into sparse vector document representation. This representation can be enhanced by weighting words that contribute the most to a classification task. In this paper, we propose a generalization of the Bi-Normal Separation metric that enhances vector representations of documents and outperforms TF-IDF scaling algorithms for one-of-m classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Code repository: https://github.com/jtbai/extended-bns.

References

  1. De Boom, C., Van Canneyt, S., Demeester, T., Dhoedt, B.: Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn. Lett. 80, 150–156 (2016)

    Article  Google Scholar 

  2. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)

    MATH  Google Scholar 

  3. Forman, G.: BNS feature scaling: an improved representation over TF-IDF for SVM text classification. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 263–270. ACM (2008)

    Google Scholar 

  4. Kapoor, A., Dhavale, S.: Control flow graph based multiclass malware detection using bi-normal separation. Defence Sci. J. 66(2), 138–145 (2016)

    Article  Google Scholar 

  5. Kendall, M.G.: Further contributions to the theory of paired comparisons. Biometrics 11(1), 43–62 (1955)

    Article  MathSciNet  Google Scholar 

  6. Li, X., Roth, D.: Learning question classifiers: the role of semantic information. Natural Lang. Eng. 12(3), 229–249 (2006)

    Article  Google Scholar 

  7. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)

    Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the Natural Sciences and Engineering Research Council of Canada, the Chaire d’actuariat de l’Université Laval and Intact Financial Corporation for financial support, and Véronique Barras-Fugère for her illustrations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean-Thomas Baillargeon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baillargeon, JT., Lamontagne, L., Marceau, É. (2019). Weighting Words Using Bi-Normal Separation for Text Classification Tasks with Multiple Classes. In: Meurs, MJ., Rudzicz, F. (eds) Advances in Artificial Intelligence. Canadian AI 2019. Lecture Notes in Computer Science(), vol 11489. Springer, Cham. https://doi.org/10.1007/978-3-030-18305-9_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18305-9_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18304-2

  • Online ISBN: 978-3-030-18305-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics