Question Classification Based on Hadoop Platform
- 1.3k Downloads
The statistical supervised learning model for question classification needs a large amount of labeled training examples. However, labeled data are difficult to collected but unlabeled data are readily obtained. To solve the lack of labeled data, we utilize the method of transfer learning to build the learning model with the labeled and the unlabeled training examples. Based on the feature spaces of source and target domain, the common space are build. Then, those examples from source domain whose conditional probability is like to be similar to the target domain are selected into the common space. Therefore, the question classifier is trained by the labeled data in the source domain and the unlabeled data in the target domain. Meanwhile, the method of Map/Reduce based on the Hadoop platform is used to reduce the time complexity in kernel mapping. The subtasks are constructed for the mapping process and then the final result is obtained by assembling the subtasks. Experiments on question classification show that the proposed method could improve the classification accuracy. Furthermore, the learning model based on the Hadoop Platform could ask each computing resources to reduce the running time.
KeywordsQuestion answering Question classification Hadoop platform Kernel mapping
This work was supported by the National Natural Science Foundation of China (No. 61365010), Yunnan Nature Science Foundation (2011FZ069), Yunnan Province Department of Education Foundation (2011Y387).
- 1.Zhang, D., Lee, W.S.: Question classification using supports vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 26–32 (2003)Google Scholar
- 2.Nguyen, M.L., Nguyen, T.T., Shimazu, A.: Subtree mining for question classification problem. In: Proceedings of the 20th International Conference on Artificial Intelligence, Hyderabad, India, pp. 1695–1700 (2007)Google Scholar
- 3.Moschitti, A., Quateroni, S., Basili, R., Manandhar, S.: Exploiting syntactic and shallow semantic kernels for question/answer classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech, pp. 776–783 (2007)Google Scholar
- 4.Zhong, E., Fan, W., Peng, J., et al.: Cross domain distribution adaptation via kernel mapping. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, pp. 1027–1036 (2009)Google Scholar
- 5.Pang, X.: Research on Classification Algorithm Based on Active Learning SVM in Hadoop Platform. Master dissertation, South China University of Technology (2011)Google Scholar
- 11.Ren, J., Shi, X., Fan, W., et al.: Type-independent correction of sample selection bias via structural discovery and re-balancing. In: Proceedings of the 2008 SIAM International Conference on Data Mining, pp. 565–576 (2008)Google Scholar
- 12.Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Proceeding of the 23rd International Conference on Computational Linguistics, Demonstrations Volume, 23–27 August 2010, Beijing, China, pp. 13–16 (2010)Google Scholar