Internet Articles Classification by Industry Types Based on TF-IDF
In order to understand a specific industry field, people usually look at the financial statements of the companies relevant to the industry field. Financial statements have diverse and numerical information but have past financial states of companies because those are usually quarterly reported. So, needs to timely obtain the current states of an industry field is increasing. Proposed method is focusing on internet articles because they are easy to obtain and updated with new information every day. As a preliminary study of extracting information on industries from internet articles, this paper proposes a method to classify internet articles by industry types. The proposed method in this paper computes importance values of nouns in internet articles based on TF-IDF. Using calculated importance values, proposed method classifies articles by industry types. Through experiments, it is proven that proposed method can achieve high accuracy in industry article classification.
KeywordsTF-IDF Classification Internet article Industry
- 3.Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: A framework and graphical development environment for robust NLP tools and applications. In: 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 168–175 (2002) Google Scholar
- 4.Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: International Conference on Computational Linguistics and Intelligent Text Processing (CICLing). LNCS, vol. 2276, pp. 1–15 (2002) Google Scholar
- 5.Shim, K., Yang, J.: MACH: a supersonic Korean morphological analyzer. In: COLING 2002 Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7 (2002)Google Scholar
- 6.Kim, N., Kim, S., Lee, J.: Identifying relations between documents. In: The 11th Asia Pacific International Conference on Information Science and Technology on Information Science and Technology (APIC-IST), pp. 215–217 (2016)Google Scholar