Advertisement

Performance Comparison of Ad-Hoc Retrieval Models over Full-Text vs. Titles of Documents

  • Ahmed SalehEmail author
  • Tilman Beck
  • Lukas Galke
  • Ansgar Scherp
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11279)

Abstract

While there are many studies on information retrieval models using full-text, there are presently no comparison studies of full-text retrieval vs. retrieval only over the titles of documents. On the one hand, the full-text of documents like scientific papers is not always available due to, e.g., copyright policies of academic publishers. On the other hand, conducting a search based on titles alone has strong limitations. Titles are short and therefore may not contain enough information to yield satisfactory search results. In this paper, we compare different retrieval models regarding their search performance on the full-text vs. only titles of documents. We use different datasets, including the three digital library datasets: EconBiz, IREON, and PubMed. The results show that it is possible to build effective title-based retrieval models that provide competitive results comparable to full-text retrieval. The difference between the average evaluation results of the best title-based retrieval models is only 3% less than those of the best full-text-based retrieval models.

Keywords

Information retrieval Learning to rank Deep Learning 

Notes

Acknowledgement

This work was supported by the EU’s Horizon 2020 programme under grant agreement H2020-693092 MOVING.

References

  1. 1.
    Galke, L., Mai, F., Schelten, A., Brunsch, D., Scherp, A.: Using titles vs. full-text as source for automated semantic document annotation. In: International Conference on Knowledge Capture (K-CAP), May 2017Google Scholar
  2. 2.
    Nishioka, C., Scherp, A.: Profiling vs. time vs. content: what does matter for top-k publication recommendation based on twitter profiles? In: 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 171–180. IEEE (2016)Google Scholar
  3. 3.
    Croft, W.B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice, vol. 283. Addison-Wesley, Reading (2010)Google Scholar
  4. 4.
    Christopher, D.M., Prabhakar, R., Hinrich, S.: Introduction to Information Retrieval, vol. 151, p. 177 (2008)Google Scholar
  5. 5.
    Barker, F.H., Veal, D.C., Wyatt, B.K.: Comparative efficiency of searching titles, abstracts, and index terms in a free-text data base. J. Doc. 28(1), 22–36 (1972)CrossRefGoogle Scholar
  6. 6.
    Lin, J.: Is searching full text more effective than searching abstracts? BMC Bioinform. 10(1), 46 (2009)CrossRefGoogle Scholar
  7. 7.
    Hemminger, B.M., Saelim, B., Sullivan, P.F., Vision, T.J.: Comparison of full-text searching to metadata searching for genes in two biomedical literature cohorts. J. Am. Soc. Inf. Sci. Technol. 58(14), 2341–2352 (2007)CrossRefGoogle Scholar
  8. 8.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRefGoogle Scholar
  9. 9.
    Goossen, F., IJntema, W., Frasincar, F., Hogenboom, F., Kaymak, U.: News personalization using the CF-IDF semantic recommender. In: The International Conference on Web Intelligence, Mining and Semantics. ACM (2011)Google Scholar
  10. 10.
    Chen, R.C., Spina, D., Croft, W.B., Sanderson, M., Scholer, F.: Harnessing semantics for answer sentence retrieval. In: Workshop on Exploiting Semantic Annotations in Information Retrieval, pp. 21–27. ACM (2015)Google Scholar
  11. 11.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates, Inc. (2013)Google Scholar
  12. 12.
    Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009)CrossRefGoogle Scholar
  13. 13.
    Burges, C., et al.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 89–96. ACM (2005)Google Scholar
  14. 14.
    Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retr. 13(3), 254–270 (2010)CrossRefGoogle Scholar
  15. 15.
    Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Xu, J., Li, H.: AdaRank: a boosting algorithm for information retrieval. In: The Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 391–398. ACM (2007)Google Scholar
  17. 17.
    Metzler, D., Croft, W.B.: Linear feature-based models for information retrieval. Inf. Retr. 10(3), 257–274 (2007)CrossRefGoogle Scholar
  18. 18.
    Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: The 24th International Conference on Machine Learning, pp. 129–136. ACM (2007)Google Scholar
  19. 19.
    Zhang, Y., et al.: Neural information retrieval: a literature review. arXiv preprint arXiv:1611.06792 (2016)
  20. 20.
    Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: International Conference on Information and Knowledge Management (2013)Google Scholar
  21. 21.
    Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: Learning semantic representations using convolutional neural networks for web search. In: The International Conference on World Wide Web, pp. 373–374. ACM (2014)Google Scholar
  22. 22.
    Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with convolutional-pooling structure for information retrieval. In: The International Conference on Information and Knowledge Management. ACM (2014)Google Scholar
  23. 23.
    Metzler, D., Kanungo, T.: Machine learned sentence selection strategies for query-biased summarization. In: SIGIR Learning to Rank Workshop (2008)Google Scholar
  24. 24.
    Qin, T., Liu, T.Y.: Introducing LETOR 4.0 Datasets. CoRR (2013)Google Scholar
  25. 25.
    Qin, T., Liu, T.Y., Xu, J., Li, H.: How to make LETOR more useful and reliable. In: SIGIR Workshop on Learning to Rank for Information Retrieval (2008)Google Scholar
  26. 26.
    Minka, T., Robertson, S.: Selection bias in the LETOR datasets. In: SIGIR Workshop on Learning to Rank for Information Retrieval, pp. 48–51. Citeseer (2008)Google Scholar
  27. 27.
    Fortmann-Roe, S.: Understanding the bias-variance tradeoff (2012)Google Scholar
  28. 28.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  29. 29.
    Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: The LREC Workshop on New Challenges for NLP Frameworks (2010)Google Scholar
  30. 30.
    Hall, M.A.: Correlation-based feature selection of discrete and numeric class machine learning (2000)Google Scholar
  31. 31.
    Cohen, D., Ai, Q., Croft, W.B.: Adaptability of neural networks on varying granularity IR tasks. arXiv preprint arXiv:1606.07565 (2016)

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Ahmed Saleh
    • 1
    • 2
    Email author
  • Tilman Beck
    • 1
  • Lukas Galke
    • 1
    • 2
  • Ansgar Scherp
    • 1
    • 3
  1. 1.Kiel UniversityKielGermany
  2. 2.ZBW – Leibniz Information Centre for EconomicsKielGermany
  3. 3.University of StirlingStirlingUK

Personalised recommendations