Skip to main content

Question Classification Based on Hadoop Platform

  • Conference paper
  • First Online:
  • 1437 Accesses

Abstract

The statistical supervised learning model for question classification needs a large amount of labeled training examples. However, labeled data are difficult to collected but unlabeled data are readily obtained. To solve the lack of labeled data, we utilize the method of transfer learning to build the learning model with the labeled and the unlabeled training examples. Based on the feature spaces of source and target domain, the common space are build. Then, those examples from source domain whose conditional probability is like to be similar to the target domain are selected into the common space. Therefore, the question classifier is trained by the labeled data in the source domain and the unlabeled data in the target domain. Meanwhile, the method of Map/Reduce based on the Hadoop platform is used to reduce the time complexity in kernel mapping. The subtasks are constructed for the mapping process and then the final result is obtained by assembling the subtasks. Experiments on question classification show that the proposed method could improve the classification accuracy. Furthermore, the learning model based on the Hadoop Platform could ask each computing resources to reduce the running time.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Zhang, D., Lee, W.S.: Question classification using supports vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 26–32 (2003)

    Google Scholar 

  2. Nguyen, M.L., Nguyen, T.T., Shimazu, A.: Subtree mining for question classification problem. In: Proceedings of the 20th International Conference on Artificial Intelligence, Hyderabad, India, pp. 1695–1700 (2007)

    Google Scholar 

  3. Moschitti, A., Quateroni, S., Basili, R., Manandhar, S.: Exploiting syntactic and shallow semantic kernels for question/answer classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech, pp. 776–783 (2007)

    Google Scholar 

  4. Zhong, E., Fan, W., Peng, J., et al.: Cross domain distribution adaptation via kernel mapping. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, pp. 1027–1036 (2009)

    Google Scholar 

  5. Pang, X.: Research on Classification Algorithm Based on Active Learning SVM in Hadoop Platform. Master dissertation, South China University of Technology (2011)

    Google Scholar 

  6. Chen, M., Mao, S., Liu, Y.: Big data: a survey. ACM/Springer Mob. Netw. Appl. 19(2), 171–209 (2014)

    Article  MathSciNet  Google Scholar 

  7. Chen, M.: NDNC-BAN: supporting rich media healthcare services via named data networking in cloud-assisted wireless body area networks. Inf. Sci. 284(10), 142–156 (2014)

    Article  Google Scholar 

  8. Sinno Jialin Pan and Qiang Yang: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  9. Baudat, G., Anouar, F.: Generalized discriminant analysis using a kernel approach. Neural Comput. 12(10), 2385–2404 (2000)

    Article  Google Scholar 

  10. Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: Helmbold, D.P., Williamson, B. (eds.) COLT 2001 and EuroCOLT 2001. LNCS (LNAI), vol. 2111, pp. 416–426. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  11. Ren, J., Shi, X., Fan, W., et al.: Type-independent correction of sample selection bias via structural discovery and re-balancing. In: Proceedings of the 2008 SIAM International Conference on Data Mining, pp. 565–576 (2008)

    Google Scholar 

  12. Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Proceeding of the 23rd International Conference on Computational Linguistics, Demonstrations Volume, 23–27 August 2010, Beijing, China, pp. 13–16 (2010)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China (No. 61365010), Yunnan Nature Science Foundation (2011FZ069), Yunnan Province Department of Education Foundation (2011Y387).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Su .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Qi, X., Su, L., Yang, B., Chen, J., Li, Y., Liu, J. (2015). Question Classification Based on Hadoop Platform. In: Leung, V., Lai, R., Chen, M., Wan, J. (eds) Cloud Computing. CloudComp 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 142. Springer, Cham. https://doi.org/10.1007/978-3-319-16050-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16050-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16049-8

  • Online ISBN: 978-3-319-16050-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics