Performance Comparison of Ad-Hoc Retrieval Models over Full-Text vs. Titles of Documents

Saleh, Ahmed; Beck, Tilman; Galke, Lukas; Scherp, Ansgar

doi:10.1007/978-3-030-04257-8_30

Ahmed Saleh^16,17,
Tilman Beck¹⁶,
Lukas Galke^16,17 &
…
Ansgar Scherp^16,18

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11279))

Included in the following conference series:

International Conference on Asian Digital Libraries

Abstract

While there are many studies on information retrieval models using full-text, there are presently no comparison studies of full-text retrieval vs. retrieval only over the titles of documents. On the one hand, the full-text of documents like scientific papers is not always available due to, e.g., copyright policies of academic publishers. On the other hand, conducting a search based on titles alone has strong limitations. Titles are short and therefore may not contain enough information to yield satisfactory search results. In this paper, we compare different retrieval models regarding their search performance on the full-text vs. only titles of documents. We use different datasets, including the three digital library datasets: EconBiz, IREON, and PubMed. The results show that it is possible to build effective title-based retrieval models that provide competitive results comparable to full-text retrieval. The difference between the average evaluation results of the best title-based retrieval models is only 3% less than those of the best full-text-based retrieval models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://bitbucket.org/a_saleh/icadl2018.

References

Galke, L., Mai, F., Schelten, A., Brunsch, D., Scherp, A.: Using titles vs. full-text as source for automated semantic document annotation. In: International Conference on Knowledge Capture (K-CAP), May 2017
Google Scholar
Nishioka, C., Scherp, A.: Profiling vs. time vs. content: what does matter for top-k publication recommendation based on twitter profiles? In: 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 171–180. IEEE (2016)
Google Scholar
Croft, W.B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice, vol. 283. Addison-Wesley, Reading (2010)
Google Scholar
Christopher, D.M., Prabhakar, R., Hinrich, S.: Introduction to Information Retrieval, vol. 151, p. 177 (2008)
Google Scholar
Barker, F.H., Veal, D.C., Wyatt, B.K.: Comparative efficiency of searching titles, abstracts, and index terms in a free-text data base. J. Doc. 28(1), 22–36 (1972)
Article Google Scholar
Lin, J.: Is searching full text more effective than searching abstracts? BMC Bioinform. 10(1), 46 (2009)
Article Google Scholar
Hemminger, B.M., Saelim, B., Sullivan, P.F., Vision, T.J.: Comparison of full-text searching to metadata searching for genes in two biomedical literature cohorts. J. Am. Soc. Inf. Sci. Technol. 58(14), 2341–2352 (2007)
Article Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article Google Scholar
Goossen, F., IJntema, W., Frasincar, F., Hogenboom, F., Kaymak, U.: News personalization using the CF-IDF semantic recommender. In: The International Conference on Web Intelligence, Mining and Semantics. ACM (2011)
Google Scholar
Chen, R.C., Spina, D., Croft, W.B., Sanderson, M., Scholer, F.: Harnessing semantics for answer sentence retrieval. In: Workshop on Exploiting Semantic Annotations in Information Retrieval, pp. 21–27. ACM (2015)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates, Inc. (2013)
Google Scholar
Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009)
Article Google Scholar
Burges, C., et al.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 89–96. ACM (2005)
Google Scholar
Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retr. 13(3), 254–270 (2010)
Article Google Scholar
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)
MathSciNet MATH Google Scholar
Xu, J., Li, H.: AdaRank: a boosting algorithm for information retrieval. In: The Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 391–398. ACM (2007)
Google Scholar
Metzler, D., Croft, W.B.: Linear feature-based models for information retrieval. Inf. Retr. 10(3), 257–274 (2007)
Article Google Scholar
Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: The 24th International Conference on Machine Learning, pp. 129–136. ACM (2007)
Google Scholar
Zhang, Y., et al.: Neural information retrieval: a literature review. arXiv preprint arXiv:1611.06792 (2016)
Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: International Conference on Information and Knowledge Management (2013)
Google Scholar
Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: Learning semantic representations using convolutional neural networks for web search. In: The International Conference on World Wide Web, pp. 373–374. ACM (2014)
Google Scholar
Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with convolutional-pooling structure for information retrieval. In: The International Conference on Information and Knowledge Management. ACM (2014)
Google Scholar
Metzler, D., Kanungo, T.: Machine learned sentence selection strategies for query-biased summarization. In: SIGIR Learning to Rank Workshop (2008)
Google Scholar
Qin, T., Liu, T.Y.: Introducing LETOR 4.0 Datasets. CoRR (2013)
Google Scholar
Qin, T., Liu, T.Y., Xu, J., Li, H.: How to make LETOR more useful and reliable. In: SIGIR Workshop on Learning to Rank for Information Retrieval (2008)
Google Scholar
Minka, T., Robertson, S.: Selection bias in the LETOR datasets. In: SIGIR Workshop on Learning to Rank for Information Retrieval, pp. 48–51. Citeseer (2008)
Google Scholar
Fortmann-Roe, S.: Understanding the bias-variance tradeoff (2012)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: The LREC Workshop on New Challenges for NLP Frameworks (2010)
Google Scholar
Hall, M.A.: Correlation-based feature selection of discrete and numeric class machine learning (2000)
Google Scholar
Cohen, D., Ai, Q., Croft, W.B.: Adaptability of neural networks on varying granularity IR tasks. arXiv preprint arXiv:1606.07565 (2016)

Download references

Acknowledgement

This work was supported by the EU’s Horizon 2020 programme under grant agreement H2020-693092 MOVING.

Author information

Authors and Affiliations

Kiel University, Kiel, Germany
Ahmed Saleh, Tilman Beck, Lukas Galke & Ansgar Scherp
ZBW – Leibniz Information Centre for Economics, Kiel, Germany
Ahmed Saleh & Lukas Galke
University of Stirling, Stirling, UK
Ansgar Scherp

Authors

Ahmed Saleh
View author publications
You can also search for this author in PubMed Google Scholar
Tilman Beck
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Galke
View author publications
You can also search for this author in PubMed Google Scholar
Ansgar Scherp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Saleh .

Editor information

Editors and Affiliations

University College London Qatar, Doha, Qatar
Milena Dobreva
University of Waikato, Hamilton, New Zealand
Annika Hinze
University of Ljubljana, Ljubljana, Slovenia
Maja Žumer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saleh, A., Beck, T., Galke, L., Scherp, A. (2018). Performance Comparison of Ad-Hoc Retrieval Models over Full-Text vs. Titles of Documents. In: Dobreva, M., Hinze, A., Žumer, M. (eds) Maturity and Innovation in Digital Libraries. ICADL 2018. Lecture Notes in Computer Science(), vol 11279. Springer, Cham. https://doi.org/10.1007/978-3-030-04257-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-04257-8_30
Published: 15 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04256-1
Online ISBN: 978-3-030-04257-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics