Abstract
Query performance prediction (QPP) is a fundamental task in information retrieval, which concerns predicting the effectiveness of a ranking model for a given query in the absence of relevance information. Despite being an active research area, this task has not yet been explored in the context of automatic text classification. In this paper, we study the task of predicting the effectiveness of a classifier for a given document, which we refer to as document performance prediction (DPP). Our experiments on several text classification datasets for both categorization and sentiment analysis attest the effectiveness and complementarity of several DPP inspired by related QPP approaches. Finally, we also explore the usefulness of DPP for improving the classification itself, by using them as additional features in a classification ensemble.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
Bashir, S.: Combining pre-retrieval query quality predictors using genetic programming. Appl. Intell. 40(3), 525–535 (2014)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Campos, R., Canuto, S., Salles, T., de Sá, C.C., Gonçalves, M.A.: Stacking bagged and boosted forests for effective automated classification. In: Proceedings of SIGIR, pp. 105–114 (2017)
Carmel, D., Yom-Tov, E.: Estimating the query difficulty for information retrieval. Synth. Lect. Inf. Concepts Retrieval Serv. 2(1), 1–89 (2010)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of SIGKDD, pp. 785–794. ACM (2016)
Chifu, A.G., Laporte, L., Mothe, J., Ullah, M.Z.: Query performance prediction focused on summarized LETOR features. In: Proceedings of SIGIR, pp. 1177–1180 (2018)
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of SIGIR, pp. 299–306 (2002)
Cummins, R., Jose, J., O’Riordan, C.: Improved query performance prediction using standard deviation. In: Proceedings of SIGIR, pp. 1089–1090 (2011)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Gopal, S., Yang, Y.: Multilabel classification with meta-level features. In: Proceedings of SIGIR, pp. 315–322 (2010)
Hauff, C.: Predicting the effectiveness of queries and retrieval systems. Ph.D. thesis. EEMCS (2010)
Hauff, C., Azzopardi, L., Hiemstra, D.: The combination and evaluation of query performance prediction methods. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 301–312. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00958-7_28
Hauff, C., Hiemstra, D., de Jong, F.: A survey of pre-retrieval query performance predictors. In: Proceedings of CIKM, pp. 1419–1420 (2008)
He, B., Ounis, I.: Inferring query performance using pre-retrieval predictors. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 43–54. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30213-1_5
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)
Kurland, O., Shtok, A., Carmel, D., Hummel, S.: A unified framework for post-retrieval query-performance prediction. In: Amati, G., Crestani, F. (eds.) ICTIR 2011. LNCS, vol. 6931, pp. 15–26. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23318-0_4
Macdonald, C., Santos, R.L.T., Ounis, I.: On the usefulness of query features for learning to rank. In: Proceedings of CIKM, pp. 2559–2562 (2012)
Mizzaro, S., Mothe, J., Roitero, K., Ullah, M.Z.: Query performance prediction and effectiveness evaluation without relevance judgments: two sides of the same coin. In: Proceedings of SIGIR, pp. 1233–1236 (2018)
Mothe, J., Tanguy, L.: Linguistic features to predict query difficulty. In: Proceedings of QP Workshop at SIGIR, pp. 7–10 (2005)
Pang, G., Jin, H., Jiang, S.: CenKNN: a scalable and effective text classifier. Data Min. Knowl. Discov. 29(3), 593–625 (2015)
Pérez-Iglesias, J., Araujo, L.: Standard deviation as a query hardness estimator. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 207–212. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16321-0_21
Raiber, F., Kurland, O.: Using document-quality measures to predict web-search effectiveness. In: Serdyukov, P., et al. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 134–145. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_12
Raiber, F., Kurland, O.: Query-performance prediction: setting the expectations straight. In: Proceedings of SIGIR, pp. 13–22 (2014)
Roitman, H.: Query performance prediction using passage information. In: Proceedings of SIGIR, pp. 893–896. ACM (2018)
Roitman, H., Erera, S., Weiner, B.: Robust standard deviation estimation for query performance prediction. In: Proceedings of ICTIR, pp. 245–248 (2017)
Roitman, H., Hummel, S., Kurland, O.: Using the cross-entropy method to re-rank search results. In: Proceedings of SIGIR, pp. 839–842 (2014)
Salles, T., Gonçalves, M., Rodrigues, V., Rocha, L.: BROOF: exploiting out-of-bag errors, boosting and random forests for effective automated classification. In: Proceedings of SIGIR, pp. 353–362 (2015)
Shtok, A., Kurland, O., Carmel, D.: Predicting query performance by query-drift estimation. In: Azzopardi, L., et al. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 305–312. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04417-5_30
Shtok, A., Kurland, O., Carmel, D.: Using statistical decision theory and relevance models for query-performance prediction. In: Proceedings of SIGIR, pp. 259–266 (2010)
Tao, Y., Wu, S.: Query performance prediction by considering score magnitude and variance together. In: Proceedings of CIKM, pp. 1891–1894 (2014)
Zamani, H., Croft, W.B., Culpepper, J.S.: Neural query performance prediction using weak supervision from multiple signals. In: Proceedings of SIGIR, pp. 105–114 (2018)
Zhang, H.: The optimality of Naive Bayes. AA 1(2), 3 (2004)
Zhao, Y., Scholer, F., Tsegay, Y.: Effective pre-retrieval query performance prediction using similarity and variability evidence. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 52–64. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_8
Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of SIGIR, pp. 543–550 (2007)
Acknowledgements
Work partially funded by project MASWeb (FAPEMIG APQ-01400-14) and by the authors’ individual grants from CNPq and FAPEMIG.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Penha, G., Campos, R., Canuto, S., Gonçalves, M.A., Santos, R.L.T. (2019). Document Performance Prediction for Automatic Text Classification. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds) Advances in Information Retrieval. ECIR 2019. Lecture Notes in Computer Science(), vol 11438. Springer, Cham. https://doi.org/10.1007/978-3-030-15719-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-15719-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15718-0
Online ISBN: 978-3-030-15719-7
eBook Packages: Computer ScienceComputer Science (R0)