Internet Articles Classification by Industry Types Based on TF-IDF

  • Jonghun Cha
  • Jee-Hyong LeeEmail author
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 474)


In order to understand a specific industry field, people usually look at the financial statements of the companies relevant to the industry field. Financial statements have diverse and numerical information but have past financial states of companies because those are usually quarterly reported. So, needs to timely obtain the current states of an industry field is increasing. Proposed method is focusing on internet articles because they are easy to obtain and updated with new information every day. As a preliminary study of extracting information on industries from internet articles, this paper proposes a method to classify internet articles by industry types. The proposed method in this paper computes importance values of nouns in internet articles based on TF-IDF. Using calculated importance values, proposed method classifies articles by industry types. Through experiments, it is proven that proposed method can achieve high accuracy in industry article classification.


TF-IDF Classification Internet article Industry 


  1. 1.
    Aizawa, A.: An information-theoretic perspective of TF-IDF measures. Inf. Process. Manag. 39(1), 45–65 (2003). National Institute of InformaticsMathSciNetCrossRefGoogle Scholar
  2. 2.
    Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting TF-IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. (TOIS) 26(3), 13–37 (2008)CrossRefGoogle Scholar
  3. 3.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: A framework and graphical development environment for robust NLP tools and applications. In: 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 168–175 (2002) Google Scholar
  4. 4.
    Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: International Conference on Computational Linguistics and Intelligent Text Processing (CICLing). LNCS, vol. 2276, pp. 1–15 (2002) Google Scholar
  5. 5.
    Shim, K., Yang, J.: MACH: a supersonic Korean morphological analyzer. In: COLING 2002 Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7 (2002)Google Scholar
  6. 6.
    Kim, N., Kim, S., Lee, J.: Identifying relations between documents. In: The 11th Asia Pacific International Conference on Information Science and Technology on Information Science and Technology (APIC-IST), pp. 215–217 (2016)Google Scholar
  7. 7.
    Lee, J., Kim, H., Kim, N., Lee, J.: An approach for multi-label classification by directed acyclic graph with label correlation maximization. Inf. Sci. 351, 101–114 (2016). Informatics and Computer Science Intelligent Systems ApplicationsCrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of Platform SoftwareSungkyunkwan UniversitySuwon-siSouth Korea

Personalised recommendations