Abstract
While there are many studies on information retrieval models using full-text, there are presently no comparison studies of full-text retrieval vs. retrieval only over the titles of documents. On the one hand, the full-text of documents like scientific papers is not always available due to, e.g., copyright policies of academic publishers. On the other hand, conducting a search based on titles alone has strong limitations. Titles are short and therefore may not contain enough information to yield satisfactory search results. In this paper, we compare different retrieval models regarding their search performance on the full-text vs. only titles of documents. We use different datasets, including the three digital library datasets: EconBiz, IREON, and PubMed. The results show that it is possible to build effective title-based retrieval models that provide competitive results comparable to full-text retrieval. The difference between the average evaluation results of the best title-based retrieval models is only 3% less than those of the best full-text-based retrieval models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Galke, L., Mai, F., Schelten, A., Brunsch, D., Scherp, A.: Using titles vs. full-text as source for automated semantic document annotation. In: International Conference on Knowledge Capture (K-CAP), May 2017
Nishioka, C., Scherp, A.: Profiling vs. time vs. content: what does matter for top-k publication recommendation based on twitter profiles? In: 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 171–180. IEEE (2016)
Croft, W.B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice, vol. 283. Addison-Wesley, Reading (2010)
Christopher, D.M., Prabhakar, R., Hinrich, S.: Introduction to Information Retrieval, vol. 151, p. 177 (2008)
Barker, F.H., Veal, D.C., Wyatt, B.K.: Comparative efficiency of searching titles, abstracts, and index terms in a free-text data base. J. Doc. 28(1), 22–36 (1972)
Lin, J.: Is searching full text more effective than searching abstracts? BMC Bioinform. 10(1), 46 (2009)
Hemminger, B.M., Saelim, B., Sullivan, P.F., Vision, T.J.: Comparison of full-text searching to metadata searching for genes in two biomedical literature cohorts. J. Am. Soc. Inf. Sci. Technol. 58(14), 2341–2352 (2007)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Goossen, F., IJntema, W., Frasincar, F., Hogenboom, F., Kaymak, U.: News personalization using the CF-IDF semantic recommender. In: The International Conference on Web Intelligence, Mining and Semantics. ACM (2011)
Chen, R.C., Spina, D., Croft, W.B., Sanderson, M., Scholer, F.: Harnessing semantics for answer sentence retrieval. In: Workshop on Exploiting Semantic Annotations in Information Retrieval, pp. 21–27. ACM (2015)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates, Inc. (2013)
Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009)
Burges, C., et al.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 89–96. ACM (2005)
Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retr. 13(3), 254–270 (2010)
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)
Xu, J., Li, H.: AdaRank: a boosting algorithm for information retrieval. In: The Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 391–398. ACM (2007)
Metzler, D., Croft, W.B.: Linear feature-based models for information retrieval. Inf. Retr. 10(3), 257–274 (2007)
Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: The 24th International Conference on Machine Learning, pp. 129–136. ACM (2007)
Zhang, Y., et al.: Neural information retrieval: a literature review. arXiv preprint arXiv:1611.06792 (2016)
Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: International Conference on Information and Knowledge Management (2013)
Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: Learning semantic representations using convolutional neural networks for web search. In: The International Conference on World Wide Web, pp. 373–374. ACM (2014)
Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with convolutional-pooling structure for information retrieval. In: The International Conference on Information and Knowledge Management. ACM (2014)
Metzler, D., Kanungo, T.: Machine learned sentence selection strategies for query-biased summarization. In: SIGIR Learning to Rank Workshop (2008)
Qin, T., Liu, T.Y.: Introducing LETOR 4.0 Datasets. CoRR (2013)
Qin, T., Liu, T.Y., Xu, J., Li, H.: How to make LETOR more useful and reliable. In: SIGIR Workshop on Learning to Rank for Information Retrieval (2008)
Minka, T., Robertson, S.: Selection bias in the LETOR datasets. In: SIGIR Workshop on Learning to Rank for Information Retrieval, pp. 48–51. Citeseer (2008)
Fortmann-Roe, S.: Understanding the bias-variance tradeoff (2012)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: The LREC Workshop on New Challenges for NLP Frameworks (2010)
Hall, M.A.: Correlation-based feature selection of discrete and numeric class machine learning (2000)
Cohen, D., Ai, Q., Croft, W.B.: Adaptability of neural networks on varying granularity IR tasks. arXiv preprint arXiv:1606.07565 (2016)
Acknowledgement
This work was supported by the EU’s Horizon 2020 programme under grant agreement H2020-693092 MOVING.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Saleh, A., Beck, T., Galke, L., Scherp, A. (2018). Performance Comparison of Ad-Hoc Retrieval Models over Full-Text vs. Titles of Documents. In: Dobreva, M., Hinze, A., Žumer, M. (eds) Maturity and Innovation in Digital Libraries. ICADL 2018. Lecture Notes in Computer Science(), vol 11279. Springer, Cham. https://doi.org/10.1007/978-3-030-04257-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-04257-8_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04256-1
Online ISBN: 978-3-030-04257-8
eBook Packages: Computer ScienceComputer Science (R0)