Abstract
Text categorization is an important task due to the rapid growth of online available text data in various domains such as web search snippets, news documents, etc. Traditional supervised methods require a significant amount of training data and manually labeling such data can be very time-consuming and costly. Moreover, in case the text to be labeled is of a specific domain, then only the expensive domain experts are able to fulfill the manual labeling task. This thesis focuses on the problem of missing labeled data and aims to develop a novel and generic model which does not require any labeled training data to categorize text. Instead, it utilizes the semantic similarity between documents and the predefined categories by leveraging graph embedding techniques.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chang, M.W., Ratinov, L.A., Roth, D., Srikumar, V.: Importance of semantic representation: dataless classification. In: AAAI (2008)
Conneau, A., Schwenk, H., Barrault, L., LeCun, Y.: Very deep convolutional networks for natural language processing. CoRR (2016)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: IJCAI (2007)
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014)
Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks. In: CoRR (2016)
Li, C., Xing, J., Sun, A., Ma, Z.: Effective document labeling with very few seed words: a topic model approach. In: CIKM (2016)
Li, Y., Zheng, R., Tian, T., Hu, Z., Iyer, R., Sycara, K.P.: Joint embedding of hierarchical categories and entities for concept categorization and dataless classification. In: COLING (2016)
Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised neural text classification. In: ACM (2018)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39, 103–134 (2000)
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: KDD (2014)
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW (2008)
Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_30
Song, Y., Roth, D.: On dataless hierarchical text classification. In: AAAI (2014)
Türker, R., Zhang, L., Koutraki, M., Sack, H.: Knowledge-based short text categorization using entity and category embedding. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 346–362. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_23
Wang, C., Song, Y., Li, H., Zhang, M., Han, J.: Text classification with heterogeneous information network kernels. In: AAAI (2016)
Wang, J., Wang, Z., Zhang, D., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI (2017)
Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansionusing word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016)
Xuan, J., Jiang, H., Ren, Z., Yan, J., Luo, Z.: Automatic bug triage using semi-supervised text classification. In: SEKE (2010)
Zhang, X., LeCun, Y.: Text understanding from scratch. CoRR (2015)
Acknowledgement
This thesis is supervised by Prof. Harald Sack and Dr. Lei Zhang.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Türker, R. (2019). Knowledge-Based Dataless Text Categorization. In: Hitzler, P., et al. The Semantic Web: ESWC 2019 Satellite Events. ESWC 2019. Lecture Notes in Computer Science(), vol 11762. Springer, Cham. https://doi.org/10.1007/978-3-030-32327-1_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-32327-1_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32326-4
Online ISBN: 978-3-030-32327-1
eBook Packages: Computer ScienceComputer Science (R0)