Text Mining pp 41-58 | Cite as

Text Encoding

  • Taeho Jo
Part of the Studies in Big Data book series (SBD, volume 45)


This chapter is concerned with the process of encoding texts into numerical vectors as their representations, and its overview will be presented in Sect. 3.1.


  1. 3.
    Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2014)Google Scholar
  2. 22.
    Hyvarinen, A., Oja, E.: Independent component analysis: algorihtms and applications. Neural Netw. 4–5, 411–430 (2000)CrossRefGoogle Scholar
  3. 25.
    Jo, T.: The Implementation of Dynamic Document Organization Using the Integration of Text Clustering and Text Categorization, University of Ottawa (2006)Google Scholar
  4. 26.
    Jo, T.: Modified version of SVM for text categorization. Int. J. Fuzzy Log. Intell. Syst. 8, 52–60 (2008)CrossRefGoogle Scholar
  5. 27.
    Jo, T.: Inverted Index based modified version of KNN for text categorization. J. Inf. Process. Syst. 4, 17–26 (2008)CrossRefGoogle Scholar
  6. 28.
    Jo, T.: Neural text categorizer for exclusive text categorization. J. Inf. Process. Syst. 4, 77–86 (2008)CrossRefGoogle Scholar
  7. 30.
    Jo, T.: NTC (Neural Text Categorizer): neural network for text categorization. Int. J. Inf. Stud. 2, 83–96 (2010)Google Scholar
  8. 31.
    Jo, T.: Definition of table similarity for news article classification. In: The Proceedings of Fourth International Conference on Data Mining, pp. 202–207 (2012)Google Scholar
  9. 35.
    Jo, T.: Index optimization with KNN considering similarities among features. In: The Proceedings of 14th International Conference on Advances in Information and Knowledge Engineering, pp. 120–124 (2015)Google Scholar
  10. 36.
    Jo, T.: Normalized table matching algorithm as approach to text categorization. Soft Comput. 19, 839–849 (2015)MathSciNetCrossRefGoogle Scholar
  11. 37.
    Jo, T.: Keyword extraction by KNN considering feature similarities. In: The Proceedings of The 2nd International Conference on Advances in Big Data Analysis, pp. 64–68 (2015)Google Scholar
  12. 39.
    Jo, T.: KNN based word categorization considering feature similarities. In: The Proceedings of 17th International Conference on Artificial Intelligence, pp. 343–346 (2015)Google Scholar
  13. 42.
    Jo, T., Cho, D.: Index based approach for text categorization. Int. J. Math. Comput. Simul. 2, 127–132 (2008)Google Scholar
  14. 43.
    Jo, T., Japkowicz, N.: Text clustering using NTSO. In: The Proceedings of IJCNN, pp. 558–563 (2005)Google Scholar
  15. 46.
    Jo, T., Lee, M., Kim, Y.: String vectors as a representation of documents with numerical vectors in text categorization. J. Converg. Inf. Technol. 2 66–73 (2007)Google Scholar
  16. 50.
    Kaski, S., Honkela, T., Lagus, K., Kohonen, T.: WEBSOM-Self organizing maps of document collections. Neurocomputing 21, 101–117 (1998)CrossRefGoogle Scholar
  17. 58.
    Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch String Kernels for Discriminative Protein Classification. Bioinformatics 20, 467–476 (2004)CrossRefGoogle Scholar
  18. 61.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification with string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)Google Scholar
  19. 77.
    Poole, D.: Linear Algebra: A Modern Introduction. Brooks/Collen, Pacific Grove (2003)Google Scholar
  20. 85.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Taeho Jo
    • 1
  1. 1.School of Game, Hongik UniversitySeoulKorea (Republic of)

Personalised recommendations