Question Classification Based on Hadoop Platform

  • XiangXiang Qi
  • Lei SuEmail author
  • Bin Yang
  • Jun Chen
  • Yiyang Li
  • Junhui Liu
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 142)


The statistical supervised learning model for question classification needs a large amount of labeled training examples. However, labeled data are difficult to collected but unlabeled data are readily obtained. To solve the lack of labeled data, we utilize the method of transfer learning to build the learning model with the labeled and the unlabeled training examples. Based on the feature spaces of source and target domain, the common space are build. Then, those examples from source domain whose conditional probability is like to be similar to the target domain are selected into the common space. Therefore, the question classifier is trained by the labeled data in the source domain and the unlabeled data in the target domain. Meanwhile, the method of Map/Reduce based on the Hadoop platform is used to reduce the time complexity in kernel mapping. The subtasks are constructed for the mapping process and then the final result is obtained by assembling the subtasks. Experiments on question classification show that the proposed method could improve the classification accuracy. Furthermore, the learning model based on the Hadoop Platform could ask each computing resources to reduce the running time.


Question answering Question classification Hadoop platform Kernel mapping 



This work was supported by the National Natural Science Foundation of China (No. 61365010), Yunnan Nature Science Foundation (2011FZ069), Yunnan Province Department of Education Foundation (2011Y387).


  1. 1.
    Zhang, D., Lee, W.S.: Question classification using supports vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 26–32 (2003)Google Scholar
  2. 2.
    Nguyen, M.L., Nguyen, T.T., Shimazu, A.: Subtree mining for question classification problem. In: Proceedings of the 20th International Conference on Artificial Intelligence, Hyderabad, India, pp. 1695–1700 (2007)Google Scholar
  3. 3.
    Moschitti, A., Quateroni, S., Basili, R., Manandhar, S.: Exploiting syntactic and shallow semantic kernels for question/answer classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech, pp. 776–783 (2007)Google Scholar
  4. 4.
    Zhong, E., Fan, W., Peng, J., et al.: Cross domain distribution adaptation via kernel mapping. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, pp. 1027–1036 (2009)Google Scholar
  5. 5.
    Pang, X.: Research on Classification Algorithm Based on Active Learning SVM in Hadoop Platform. Master dissertation, South China University of Technology (2011)Google Scholar
  6. 6.
    Chen, M., Mao, S., Liu, Y.: Big data: a survey. ACM/Springer Mob. Netw. Appl. 19(2), 171–209 (2014)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Chen, M.: NDNC-BAN: supporting rich media healthcare services via named data networking in cloud-assisted wireless body area networks. Inf. Sci. 284(10), 142–156 (2014)CrossRefGoogle Scholar
  8. 8.
    Sinno Jialin Pan and Qiang Yang: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  9. 9.
    Baudat, G., Anouar, F.: Generalized discriminant analysis using a kernel approach. Neural Comput. 12(10), 2385–2404 (2000)CrossRefGoogle Scholar
  10. 10.
    Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: Helmbold, D.P., Williamson, B. (eds.) COLT 2001 and EuroCOLT 2001. LNCS (LNAI), vol. 2111, pp. 416–426. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  11. 11.
    Ren, J., Shi, X., Fan, W., et al.: Type-independent correction of sample selection bias via structural discovery and re-balancing. In: Proceedings of the 2008 SIAM International Conference on Data Mining, pp. 565–576 (2008)Google Scholar
  12. 12.
    Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Proceeding of the 23rd International Conference on Computational Linguistics, Demonstrations Volume, 23–27 August 2010, Beijing, China, pp. 13–16 (2010)Google Scholar

Copyright information

© Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2015

Authors and Affiliations

  • XiangXiang Qi
    • 1
  • Lei Su
    • 1
    Email author
  • Bin Yang
    • 1
  • Jun Chen
    • 2
  • Yiyang Li
    • 1
  • Junhui Liu
    • 2
  1. 1.School of Information Engineering and AutomationKunming University of Science and TechnologyKunmingChina
  2. 2.School of SoftwareYunnan UniversityKunmingChina

Personalised recommendations