Skip to main content

A Large-Scale Community Questions Classification Accounting for Category Similarity: An Exploratory Study

  • Chapter
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 505))

Abstract

The paper reports on a large-scale topical categorization of questions from a Russian community question answering (CQA) service Otvety@Mail.Ru. We used a data set containing all the questions (more than 11 millions) asked by Otvety@Mail.Ru users in 2012. This is the first study on question categorization dealing with non-English data of this size. The study focuses on adjusting category structure in order to get more robust classification results. We investigate several approaches to measure similarity between categories: the share of identical questions, language models, and user activity. The results show that the proposed approach is promising.

This work is partially supported by the Russian Foundation for Basic Research, project #14-07-00589 “Data Analysis and User Modelling in Narrow-Domain Social Media”.

This is a preview of subscription content, log in via an institution.

Notes

  1. 1.

    http://yanswersblog.com/index.php/archives/2010/05/03/1-billion-answers-served.

  2. 2.

    http://otvet.mail.ru/news/#hbd2012 – accessed in July 2013.

  3. 3.

    http://www.answerbag.com.

  4. 4.

    http://www.dmoz.org.

  5. 5.

    See http://otvet.mail.ru/categories for a full list of categories.

  6. 6.

    http://otvet.mail.ru/question/167517346.

  7. 7.

    http://otvet.mail.ru/question/83696264.

  8. 8.

    http://otvet.mail.ru/question/69108691.

  9. 9.

    http://otvet.mail.ru/question/69166385.

  10. 10.

    http://otvet.mail.ru/question/69656908.

  11. 11.

    http://otvet.mail.ru/question/69709407.

  12. 12.

    http://otvet.mail.ru/api/v2/question?qid=24141950.

  13. 13.

    http://www.aot.ru/.

  14. 14.

    http://otvet.mail.ru/profile/id9112629.

  15. 15.

    http://otvet.mail.ru/question/76074787.

  16. 16.

    http://otvet.mail.ru/question/75570807.

  17. 17.

    http://otvet.mail.ru/question/167836262.

  18. 18.

    http://otvet.mail.ru/question/167848364.

References

  1. Chan, W., Yang, W., Tang, J., Du, J., Zhou, X., Wang, W.: Community question topic categorization via hierarchical kernelized classification. In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, pp. 959–968 (2013)

    Google Scholar 

  2. Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, pp. 380–388 (2002)

    Google Scholar 

  3. Cao, X., Cong, G., Cui, B., Jensen, C.S., Zhang, C.: The use of categorization information in language models for question retrieval. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 265–274 (2009)

    Google Scholar 

  4. Bigi, B.: Using Kullback-Leibler distance for text categorization. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 305–319. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Blooma, M.J., Coh, D.H.-L., Chua, A.Y.: Question classification in social media. Int. J. Inf. Stud. 1(2), 101–109 (2009)

    Google Scholar 

  6. Li, B., King, I., Lyu, M.R.: Question routing in community question answering: putting category in its place. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 2041–2044 (2011)

    Google Scholar 

  7. Duan, H., Cao, Y., Lin, C.Y., Yu, Y.: Searching Questions by Identifying Question Topic and Question Focus. In: ACL, pp. 156–164 (2008)

    Google Scholar 

  8. Cao, X., Cong, G., Cui, B., Jensen, C.S.: A generalized framework of exploring category information for question retrieval in community question answer archives. In: Proceedings of the 19th International Conference on World Wide Web, pp. 201–210 (2010)

    Google Scholar 

  9. Cai, L., Zhou, G., Liu, K., Zhao, J.: Large-scale question classification in CQA by leveraging Wikipedia semantic knowledge. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1321–1330 (2011)

    Google Scholar 

  10. Broder, A.Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: Robust classification of rare queries using web knowledge. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 231–238 (2007)

    Google Scholar 

  11. Qu, B., Cong, G., Li, C., Sun, A., Chen, H.: An evaluation of classification models for question topic categorization. J. Am. Soc. Inf. Sci. Technol. 63(5), 889–903 (2012)

    Article  Google Scholar 

  12. Yuan, Q., Cong, G., Sun, A., Lin, C.Y., Thalmann, N.M.: Category hierarchy maintenance: a data-driven approach. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 791–800 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Galina Lezina .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Lezina, G., Braslavski, P. (2015). A Large-Scale Community Questions Classification Accounting for Category Similarity: An Exploratory Study. In: Braslavski, P., Karpov, N., Worring, M., Volkovich, Y., Ignatov, D.I. (eds) Information Retrieval. RuSSIR 2014. Communications in Computer and Information Science, vol 505. Springer, Cham. https://doi.org/10.1007/978-3-319-25485-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25485-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25484-5

  • Online ISBN: 978-3-319-25485-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics