Skip to main content

Multiclass Classification of Online Reviews Using NLP & Machine Learning for Non-english Language

  • Conference paper
  • First Online:
Intelligent Human Computer Interaction (IHCI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13741))

Included in the following conference series:

Abstract

The classification of reviews or comments provided by the customers after shopping has a wide scope in terms of the categories it can be classified. Big companies like Walmart, Tesco and Amazon have customers from all over the world with a variety of product range and can have reviews written in any language. Sometimes customers intend to provide reviews not only on the same platform but on various other platforms like Facebook, Twitter. To get an overall picture of the products it’s required to check the reviews from all these platforms at a single place. This paper classifies the comments\reviews written in Spanish language and category names are taken in English language for 30 product categories. The purpose is to get the product categorized from comments/reviews on different platforms in non-English language, to gather insights of that product and to reduce the dependency faced during the manual process of classification and barrier to have command on that language. The approach used reduces the chances of manual errors during prediction of new reviews/comments to a particular category. A multiclass Classification model is trained using traditional Machine Learning algorithms & NLP with an accuracy of 90%. It is envisioned that the proposed methodology is scalable for other non-English languages as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Keung, P., Lu, Y., Szarvas, G., Smith, N.A.: The multilingual Amazon reviews corpus. arXiv2010.02573v1 (2020)

    Google Scholar 

  2. Amazon Inc. Amazon customer reviews dataset. https://registry.opendata.aws/amazon-reviews/ (2015)

  3. Artetxe, M., Schwenk, H.: Massively multilingual sentence embeddings for zeroshot cross-lingual transfer and beyond. Trans. Assoc. Comput. Linguist. 7, 597–610 (2019)

    Article  Google Scholar 

  4. Bel, N., Koster, C.H.A., Villegas, M.: Cross-lingual text categorization. In: Koch, T. (ed.) Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science, vol. 2769, pp. 126–139. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45175-4_13

    Chapter  Google Scholar 

  5. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguis. 5, 135–146 (2017)

    Article  Google Scholar 

  6. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  7. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Vol. 2, Short Papers, pp 427–431, Valencia, Spain (2017)

    Google Scholar 

  8. Singh, R.P., Haque, R., Hasanuzzaman, M., Way, A.: Identifying complaints from product reviews: a case study on Hindi, CEUR-WS.org, Vol. 2771, Paper 28, Ireland (2020)

    Google Scholar 

  9. Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics in (2015)

    Google Scholar 

  10. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)

  11. Conneau, A.: Xnli: evaluating crosslingual sentence representations. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2018)

    Google Scholar 

  12. de Melo, G., Siersdorfer, S.: Multilingual text classification using ontologies. In: Amati, G., Carpineto, C., Romano, G. (eds.) Advances in Information Retrieval. Lecture Notes in Computer Science, vol. 4425, pp. 541–548. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71496-5_49

    Chapter  Google Scholar 

  13. Yu, S., Su, J., Luo, D.: Improving bert-based text classification with auxiliary sentence and domain knowledge. IEEE Access, 7, 176600176612 (2019)

    Google Scholar 

  14. Babhulgaonkar, A., Sonavane, S.: Language identification for multilingual machine translation. In: IEEE International Conference on Communication and Signal Processing, pp. 0401–0405 (2020)

    Google Scholar 

  15. Wu, G., He, Y., Hu, X.: Entity linking: a problem to extract corresponding entity with knowledge base IEEE Access, 6220 – 6231 (2016)

    Google Scholar 

  16. GĂ¼rcan, F.: Multi-class classification of turkish texts with machine learning algorithms In: IEEE (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pritee Parwekar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharma, P., Parwekar, P. (2023). Multiclass Classification of Online Reviews Using NLP & Machine Learning for Non-english Language. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27199-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27198-4

  • Online ISBN: 978-3-031-27199-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics