Abstract
Recently, some machine learning techniques like support vector machines are employed for question classification. However, these techniques heavily depend on the availability of large amounts of training data, and may suffer many difficulties while facing various new questions from the real users on the Web. To mitigate the problem of lacking sufficient training data, in this paper, we present a simple learning method that explores Web search results to collect more training data automatically by a few seed terms (question answers). In addition, we propose a novel semantically related feature model (SRFM), which takes advantage of question focuses and their semantically related features learned from the larger number of collected training data to support the determination of question type. Our experimental results show that the proposed new learning method can obtain better classification performance than the bigram language modeling (LM) approach for the questions with untrained question focuses.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Moldovan, D., Pasca, M., Harabagiu, S., Surdeanu, M.: Performance Issues and Error Analysis in an Open-Domain Question Answering System. ACM Transactions on Information systems (2003)
Li, W.: Question Classification Using Language Modeling, CIIR Technical Report (2002)
Li, X. Roth, D.: Learning Question Classifiers. In: COLING 2002 (2002)
Day, M.-Y., Lee, C.-W., Wu, S.-H., Ong, C.-S., Hsu, W.-L.: An Integrated Knowledge-based and Machine Learning Approach for Chinese Question Classification. In: IEEE NLPKE 2005 (2005)
Solorio, T., Perez-Coutino, M., Montes-y-Gomez, M., Villasenor-Pineda, L. Lopez-Lopez, A.: A Language Independent Method for Question Classification. In: CLING 2004 (2004)
Suzuki, J., Taira, H., Sasaki, Y., Maeda, E.: Question Classification using HDAG Kernel. In: ACL 2003 Workshop on Multilingual Summarization and Question Answering (2003)
Zhang, D., Lee, W.S.: Question Classification using Support Vector Machines. In: ACM SIGIR 2003 (2003)
Brill, E., Dumais, S., Banko, M.: An analysis of the Ask MSR question-answering system. In: Proceedings of 2002 Conference on Empirical Methods in Natural Language Processing (2002)
Ravichandran, D., Hovy, E.: Learning surface text patterns for a question answering system. In: Association for Computational Linguistics Conference, ACL (2002)
Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Goodrum, R., Gîrju, R., Rus, V.: Lasso: A Tool for Surfing the Answer Net. In: Proceedings of the 8th TExt Retrieval Conference (TREC-8), pp. 175–183 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, SJ., Lu, WH. (2006). Learning Question Focus and Semantically Related Features from Web Search Results for Chinese Question Classification. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_22
Download citation
DOI: https://doi.org/10.1007/11880592_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45780-0
Online ISBN: 978-3-540-46237-8
eBook Packages: Computer ScienceComputer Science (R0)