Wikipedia Based Short Text Classification Method

Li, Junze; Cai, Yi; Cai, Zhiwei; Leung, Hofung; Yang, Kai

doi:10.1007/978-3-319-55705-2_22

Junze Li¹⁷,
Yi Cai¹⁷,
Zhiwei Cai¹⁷,
Hofung Leung¹⁸ &
…
Kai Yang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10179))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1887 Accesses
10 Citations

Abstract

Short text is usually expressed in refined slightly, insufficient information, which makes text classification difficult. But we can try to introduce some information from the existing knowledge base to strengthen the performance of short text classification. Wikipedia [2, 13, 15] is now the largest human-edited knowledge base of high quality. It would benefit to short text classification if we can make full use of Wikipedia information in short text classification. This paper presents a new concept based [22] on Wikipedia short text representation method, by identifying the concept of Wikipedia mentioned in short text, and then expand the concept of wiki correlation and short text messages to the feature vector representation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://en.wikipedia.org/wiki/Wikipedia:Database_download.

References

Cai, Y., Chen, W.-H., Leung, H.-F., Li, Q., Xie, H., Lau, R.Y., Min, H., Wang, F.L.: Context-aware ontologies generation with basic level concepts from collaborative tags. Neurocomputing 208, 25–38 (2016)
Article Google Scholar
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. EMNLP-CoNLL 7, 708–716 (2007)
Google Scholar
Dai, H.K., Zhao, L., Nie, Z., Wen, J.-R., Wang, L., Li, Y.: Detecting online commercial intention (OCI). In: Proceedings of the 15th International Conference on World Wide Web, pp. 829–837. ACM (2006)
Google Scholar
Davidson, D., Harman, G.: Semantics of Natural Language, vol. 40. Springer Science & Business Media, Netherlands (2012)
Google Scholar
Du, Q., Xie, H., Cai, Y., Leung, H.-F., Li, Q., Min, H., Wang, F.L.: Folksonomy-based personalized search by hybrid user profiles in multiple levels. Neurocomputing 204, 142–152 (2016)
Article Google Scholar
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Faruqui, M., Dyer, C.: Improving vector space word representations using multilingual correlation. Association for Computational Linguistics (2014)
Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJcAI 7, 1606–1611 (2007)
Google Scholar
Guo, A., Yang, T.: Research and improvement of feature words weight based on TFIDF algorithm. In: Information Technology, Networking, Electronic and Automation Control Conference, pp. 415–419. IEEE (2016)
Google Scholar
Han, X., Zhao, J.: Named entity disambiguation by leveraging Wikipedia semantic knowledge. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 215–224. ACM (2009)
Google Scholar
Hu, X., Zhang, X., Lu, C., Park, E.K., Zhou, X.: Exploiting Wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 389–396. ACM (2009)
Google Scholar
Kiela, D., Clark, S.: A systematic study of semantic vector space model parameters. In: Proceedings of the 2nd Workshop on Continuous Vector Space Models and Their Compositionality (CVSC) at EACL, pp. 21–30 (2014)
Google Scholar
Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 233–242. ACM (2007)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM (2008)
Google Scholar
Ni, X., Sun, J.-T., Hu, J., Chen, Z.: Mining multilingual topics from Wikipedia. In: Proceedings of the 18th International Conference on World Wide Web, pp. 1155–1156. ACM (2009)
Google Scholar
Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100. ACM (2008)
Google Scholar
Salton, G., Wong, A., Yang, C.-S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: Building bridges for web query classification. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 131–138. ACM (2006)
Google Scholar
Sidorov, G., Gelbukh, A., Gómez-Adorno, H., Pinto, D.: Soft similarity and soft cosine measure: similarity of features in vector space model. Comput. Sist. 18(3), 491–504 (2014)
Google Scholar
Szpektor, I., Gionis, A., Maarek, Y.: Improving recommendation for long-tail queries via templates. In: Proceedings of the 20th International Conference on World Wide Web, pp. 47–56. ACM (2011)
Google Scholar
Wang, F., Wang, Z., Li, Z., Wen, J.-R.: Concept-based short text classification and ranking. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, pp. 1069–1078. ACM (2014)
Google Scholar
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 481–492. ACM (2012)
Google Scholar

Download references

Acknowledgement

This work is supported by National Natural Science Foundation of China (project no. 61300137), Science and Technology Planning Project of Guangdong Province, China (No. 2013B010406004), Tip-top Scientific and Technical Innovative Youth Talents of Guangdong special support program (No. 2015TQ01X633) and Science and Technology Planning Major Project of Guangdong Province (No. 2015A070711001).

Author information

Authors and Affiliations

School of Software Engineering, South China University of Technology, Guangzhou, China
Junze Li, Yi Cai, Zhiwei Cai & Kai Yang
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, China
Hofung Leung

Authors

Junze Li
View author publications
You can also search for this author in PubMed Google Scholar
Yi Cai
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Cai
View author publications
You can also search for this author in PubMed Google Scholar
Hofung Leung
View author publications
You can also search for this author in PubMed Google Scholar
Kai Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Cai .

Editor information

Editors and Affiliations

Royal Melbourne Institute of Technology , Melbourne, Australia
Zhifeng Bao
Northwestern University , Evanston, Illinois, USA
Goce Trajcevski
University of New South Wales , Sydney, New South Wales, Australia
Lijun Chang
The University of Queensland , Brisbane, Queensland, Australia
Wen Hua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Cai, Y., Cai, Z., Leung, H., Yang, K. (2017). Wikipedia Based Short Text Classification Method. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10179. Springer, Cham. https://doi.org/10.1007/978-3-319-55705-2_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-55705-2_22
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55704-5
Online ISBN: 978-3-319-55705-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics