Short Text Understanding Based on Conceptual and Semantic Enrichment
Due to the limited length and freely constructed sentence structures, short text is different from normal text, which makes traditional algorithm of text representation does not work well on it. This paper proposes a model called Conceptual and Semantic Enrichment with Topic Model (CSET) by combining Biterm Topic Model (BTM), a widely used probabilistic topic model which is designed for short text with Probase, a large-scale probabilistic knowledge base. CSET is able to capture semantic relations between words to enrich short text. Our model enables large amount of applications that rely on semantic understanding of short text, including short text classification and word similarity measurement in context.
KeywordsShort text Text enrichment Similarity
This work is supported in part by the National Natural Science Foundation of China under Grant 61170035, 61272420 and 81674099, Six talent peaks project in Jiangsu Province (Grant No. 2014 WLW-004), the Fundamental Research Funds for the Central Universities (Grant No. 30916011328, 30918015103), Jiangsu Province special funds for transformation of science and technology achievement (Grant No. BA2013047).
- 1.Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788 (2007)Google Scholar
- 3.Chen, M., Shen, D., Shen, D.: Short text classification improved by learning multi-granularity topics. In: International Joint Conference on Artificial Intelligence, pp. 1776–1781 (2011)Google Scholar
- 4.Hu, J., et al.: Enhancing text clustering by leveraging Wikipedia semantics, pp. 179–186 (2008)Google Scholar
- 5.Kim, D., Wang, H., Oh, A.: Context-dependent conceptualization. In: International Joint Conference on Artificial Intelligence, pp. 2654–2661 (2013)Google Scholar
- 7.Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW, pp. 91–100 (2015)Google Scholar
- 10.Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. In: International Joint Conference on Artificial Intelligence, pp. 2330–2336 (2011)Google Scholar
- 11.Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding, pp. 481–492 (2012)Google Scholar
- 12.Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts, pp. 1445–1456 (2013)Google Scholar