Skip to main content

Text Categorization Improvement via User Interaction

  • Conference paper
  • First Online:
  • 1881 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10842))

Abstract

In this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use Bag-of-Words with weighting based on Term Frequency-Inverse Document Frequency that has been used for evaluation of neural representations of words and documents: Word2Vec and Paragraph Vector. In the representation, we identify subsets of features that are the most useful for differentiating classes. They have been presented to the user, and his or her selection allow increase the coherence of the articles that belong to the same category and thus are close on the SOM.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Tayal, S., Goel, S.K., Sharma, K.: A comparative study of various text mining techniques. In: 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1637–1642 (2015)

    Google Scholar 

  2. Schütze, H., Manning, C.D., Raghavan, P.: Introduction to Information Retrieval, pp. 117–119. Cambridge University Press, New York (2008)

    Google Scholar 

  3. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)

    Google Scholar 

  4. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. CoRR abs/1405.4053 (2014)

    Google Scholar 

  5. Mujtaba, G., Shuib, L., Raj, R.G., Rajandram, R., Shaikh, K.: Automatic text classification of ICD-10 related CoD from complex and free text forensic autopsy reports. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1055–1058 (2016)

    Google Scholar 

  6. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  7. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1799 (2013)

    Article  Google Scholar 

  8. Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. CoRR abs/1105.5444, p. 95 (2011)

    Google Scholar 

  9. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, pp. 406–414. ACM (2001)

    Google Scholar 

  10. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, vol. 97, pp. 412–420 (1997)

    Google Scholar 

  11. Godbole, S., Harpale, A., Sarawagi, S., Chakrabarti, S.: Document classification through interactive supervision of document and term labels. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 185–196. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30116-5_19

    Chapter  Google Scholar 

  12. Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., Kochut, K.: A brief survey of text mining: classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919 (2017)

  13. Stanković, R., Krstev, C., Obradović, I., Kitanović, O.: Improving document retrieval in large domain specific textual databases using lexical resources. In: Nguyen, N.T., Kowalczyk, R., Pinto, A.M., Cardoso, J. (eds.) TCCI XXVI. LNCS, vol. 10190, pp. 162–185. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59268-8_8

    Chapter  Google Scholar 

  14. Hu, Y., Milios, E.E., Blustein, J.: Interactive feature selection for document clustering. In: Proceedings of the 2011 ACM Symposium on Applied Computing, SAC 2011, pp. 1143–1150. ACM, New York (2011)

    Google Scholar 

  15. Raghavan, H., Madani, O., Jones, R.: Interactive feature selection. In: IJCAI, vol. 5, pp. 841–846 (2005)

    Google Scholar 

  16. Dzemyda, G., Kurasova, O., Žilinskas, J.: Multidimensional Data Visualization. SOIA, vol. 75. Springer, New York (2012). https://doi.org/10.1007/978-1-4419-0236-8

    Book  MATH  Google Scholar 

  17. Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling: Theory and Applications. SSS. Springer, New York (2005). https://doi.org/10.1007/0-387-28981-X

    Book  MATH  Google Scholar 

  18. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  19. Kohonen, T.: The self-organizing map. Proc. IEEE 78, 1464–1465, 1474 (1990)

    Article  Google Scholar 

  20. Ultsch, A.: Emergence in self-organizing feature maps. University Library of Bielefeld (2007)

    Google Scholar 

  21. Szymański, J.: Self-organizing map representation for clustering Wikipedia search results. In: Nguyen, N.T., Kim, C.-G., Janiak, A. (eds.) ACIIDS 2011. LNCS (LNAI), vol. 6592, pp. 140–149. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20042-7_15

    Chapter  Google Scholar 

  22. Szymański, J., Duch, W.: Self organizing maps for visualization of categories. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012. LNCS, vol. 7663, pp. 160–167. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34475-6_20

    Chapter  Google Scholar 

  23. Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., Anand, A., Liu, H.: Advancing feature selection research. ASU feature selection repository, pp. 1–28 (2010)

    Google Scholar 

  24. Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. In: Data Classification: Algorithms and Applications, p. 37 (2014)

    Google Scholar 

  25. Vergara, J.R., Estévez, P.A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24, 175–186 (2014)

    Article  Google Scholar 

  26. Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55, 78–87 (2012)

    Article  Google Scholar 

  27. Kotsiantis, S.B., Zaharakis, I.D., Pintelas, P.E.: Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 26, 159–190 (2006)

    Article  Google Scholar 

  28. Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Models Methods Appl. Sci. 1, 300–302, 306 (2007)

    Google Scholar 

  29. Ultsch, A., Mörchen, F.: ESOM-maps: tools for clustering, visualization, and classification with emergent SOM. Technical report, Department of Mathematics and Computer Science, University of Marburg, Germany (2005)

    Google Scholar 

  30. Draszawka, K., Szymański, J.: External validation measures for nested clustering of text documents. In: Ryżko, D., Rybiński, H., Gawrysiak, P., Kryszkiewicz, M. (eds.) Emerging Intelligent Technologies in Industry. SCI, vol. 369, pp. 207–225. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22732-5_18

    Chapter  Google Scholar 

  31. Szymański, J., Duch, W.: Semantic memory knowledge acquisition through active dialogues. In: 2007 International Joint Conference on Neural Networks, IJCNN 2007, pp. 536–541. IEEE (2007)

    Google Scholar 

  32. Czarnul, P., Rościszewski, P., Matuszek, M., Szymański, J.: Simulation of parallel similarity measure computations for large data sets. In: 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF), pp. 472–477. IEEE (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julian Szymański .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Atroszko, J., Szymański, J., Gil, D., Mora, H. (2018). Text Categorization Improvement via User Interaction. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science(), vol 10842. Springer, Cham. https://doi.org/10.1007/978-3-319-91262-2_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91262-2_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91261-5

  • Online ISBN: 978-3-319-91262-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics