Skip to main content

Advertisement

Log in

An automatic methodology to evaluate personalized information retrieval systems

  • Original Paper
  • Published:
User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Due to the information overload we are faced with nowadays, personalization services are becoming almost essential, in order to find relevant information tailored to each individual or group of people with common interests. Therefore, it is very important to be able to build efficient and robust personalization techniques to be part of these services. The evaluation step is a crucial stage in their development and improvement, so much more research is needed to overcome this issue. We have proposed an automatic evaluation methodology for personalized information retrieval systems (ASPIRE), which joins the advantages of both system-centred (repeatable, comparable and generalizable results) and user-centred (considers the user) evaluation approaches, and makes the evaluation process easy and fast. Its reliability and robustness have been assessed by means of a user-oriented evaluation. ASPIRE may be considered as an interesting alternative to the costly and difficult user studies, able to discriminate between either different personalization techniques or different parameter configurations of a given personalization method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. A Kendall \(\tau \) correlation always below 0.5.

  2. The fact of focusing on XML information retrieval requires the adaptation of some search engine components. For example the retrievable elements are not only complete documents but document components (called structural units), which may overlap. However this does not represent any problem when using ASPIRE.

  3. Lucene is a popular open-source search software. It provides indexing and search technologies, which is frequently used by several applications all over the world, ranging from mobile devices to sites like Twitter, Apple and Wikipedia. This search engine is designed to work with plain (non-structured) documents http://lucene.apache.org/.

  4. It should be noticed that the user did not judge if a given retrieved result was the best possible one, but only whether or not its content was relevant to the given query and profile (binary assessments).

  5. The source of the problem is the limitation of judging only the first 50 results retrieved by the IRS, but it was necessary since the evaluation of a great number of results would require too much time and effort from the users.

  6. As Hard reranking only considers the list of results of the original non personalized query, it does not introduce any relevance assessments not present in the original results list.

  7. The correlation values in this case are greater than those in Fig. 2, because here we are correlating the averaged NDCG values for ASPIRE and the user study, not the underlying and more diverse 126 evaluation triplets of each of these combinations.

References

  • Abowd, G.D., Dey, A.K., Brown, P.J., Davies, N., Smith, M., Steggles, P.: Towards a better understanding of context and context-awareness. Handheld Ubiquitous Comput. LNCS 1707, 304–307 (1999)

    Article  Google Scholar 

  • Allan, J.: Hard track overview in TREC 2003: High accuracy retrieval from documents. In: Proceedings of the 14th Text Retrieval Conference, Gaithersburg-Maryland, USA (2005)

  • Azzopardi, L., de Rijke, M., Balog, K.: Building simulated queries for known-item topics: an analysis using six european languages. In: Proceedings of the 30th ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp. 455–462 (2007)

  • Baiely, P., Craswell, N., Soboroff, I., Thomas, P., de Vries, A.P., Yilmaz, E.: Relevance assessment: are judges exchangeable and does it matter? In: Proceedings of the 31st ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, pp. 667–674 (2008)

  • Borlund, P.: The IIR evaluation model: a framework for evaluation of interactive information retrieval systems. J. Inf. Res. 8(3), 152 (2003)

    Google Scholar 

  • Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of the 27th ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, pp. 25–32 (2004)

  • Bystrom, K., Jarvelin, K.: Task complexity affects information seeking and use. Inf. Process. Manag. 31(2), 191–213 (1995)

    Article  Google Scholar 

  • de Campos, L.M., Fernández-Luna, J.M., Huete, J.F.: Using context information in structured document retrieval: an approach using influence diagrams. Inf. Process. Manag. 40(5), 829–847 (2004)

    Article  Google Scholar 

  • de Campos, L.M., Fernández-Luna, J.M., Huete, J.F.: Improving the context-based influence diagram model for structured document retrieval: removing topological restrictions and adding new evaluation methods. Lect. Notes Comput. Sci. Adv. Inf. Retr. 3408, 215–229 (2005)

    Article  Google Scholar 

  • de Campos, L.M., Fernández-Luna, J.M., Huete, J.F., Romero, A.E.: Garnata: An information retrieval system for structured documents based on probabilistic graphical models. In: Proceedings of the 11th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Paris, France, pp. 1024–1031 (2006)

  • de Campos, L.M., Fernández-Luna, J.M., Huete, J.F., Martín-Dancausa, C., Romero, A.E.: New utility models for the Garnata information retrieval system at INEX’08. Lect. Notes Comput. Sci. Adv. Focus. Retr. 5631, 39–45 (2009)

    Article  Google Scholar 

  • de Campos, L.M., Fernández-Luna, J.M., Huete, J.F., Martín-Dancausa, C., Tagua-Jiménez, A., Tur-Vigil, C.: An integrated system for managing the andalusian parliament’s digital library. Prog. Electron. Lib. Inf. Syst. 43(2), 156–174 (2009)

    Google Scholar 

  • de Campos, L.M., Fernández-Luna, J.M., Huete, J.F., Vicente-López, E.: XML search personalization strategies using query expansion, reranking and a search engine modification. In: Proceedings of the 28th ACM Symposium on Applied Computing, Coimbra, Portugal, pp. 872–877 (2013)

  • de Campos, L.M., Fernández-Luna, J.M., Huete, J.F., Vicente-López, E.: Using personalization to improve XML retrieval. IEEE Trans. Knowl. Data Eng. 26(5), 1280–1292 (2014)

  • Carterette, B., Pavlu, V., Kanoulas, E., Aslam, J., Allan, J.: Evaluation over thousands of queries. In: Proceedings of the 31st ACM International Conference on Reseach and Developments in Information Retrieval, Singapore, pp. 651–658 (2008)

  • Carterette, B., Soboroff, I.: The effect of Assessor Error on IR System Evaluation. In: Proceedings of the 33rd ACM International Conference on Reseach and Developments in Information Retrieval, Geneva, Switzerland, pp. 539–546 (2010)

  • Chin, D.N.: Empirical evaluation of user models and user-adapted systems. User Model. User-Adapt. Interact. 11(1–2), 181–194 (2001)

    Article  MATH  Google Scholar 

  • Cleverdon, C.W., Mills, J., Keen, M.: Factors determining the performance of indexing systems, vol. 1—design. ASLIB Cranfield Project. Technical Report (1966)

  • Daoud, M., Tamine-Lechani, L., Boughanem, M.: A contextual evaluation protocol for a session-based personalized search. In: Proceedings of the 2nd Workshop on Contextual Information Access, Seeking and Retrieval Evaluation (CIRSE) in conjunction with the 32nd European Conference on Information Retrieval, Toulouse, France (2009)

  • Díaz, A., García, A., Gervás, P.: User-centred versus system-centred evaluation of a personalization system. Inf. Process. Manag. 44(3), 1293–1307 (2008)

    Article  Google Scholar 

  • Ding, C., Patra, J.C.: User modeling for personalized Web search with self-organizing map. J. Am. Soc. Inf. Sci. Technol. 58(4), 494–507 (2007)

    Article  Google Scholar 

  • Dou, Z., Song, R., Wen, J.R.: A large-scale evaluation and analysis of personalized search strategies. In: Proceedings of the 16th International Conference on World Wide Web, Banff, Canada, pp. 581–590 (2007)

  • Elsweiler, D., Losada, D.E., Toucedo, J.C., Fernandez, R.T.: Seeding simulated queries with user-study data for personal search evaluation. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pp. 25–34 (2011)

  • Ghorab, M.R., Zhou, D., O’Connor, A., Wade, V.: Personalised information retrieval: survey and classification. User Model. User Adapt. Interact. 23, 381–443 (2013)

    Article  Google Scholar 

  • Harman, D.: Overview of the fourth text retrieval conference (trec-4). In: Proceedings of the 4th Text Retrieval Conference, Gaithersburg-Maryland, USA (1995)

  • Ingwersen, P.: Cognitive perspectives of information retrieval interaction: elements of a cognitive IR theory. J. Doc. 52(1), 3–50 (1996)

    Article  Google Scholar 

  • Jarvelin, K., Kekalainen, J.: Cumulative gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)

    Article  Google Scholar 

  • Liu, F., Yu, C., Meng, W.: Personalized web search for improving retrieval effectiveness. IEEE Trans. Knowl. Data Eng. 16(1), 28–40 (2004)

    Article  Google Scholar 

  • Mostafa, J., Mukhopadhyay, S., Palakal, M.: Simulation studies of different dimensions of users’ interests and their impact on user modeling and information filtering. Inf. Retr. 6(2), 199–223 (2003)

    Article  Google Scholar 

  • Nuray, R., Can, F.: Automatic ranking of retrieval systems in imperfect environments. In: Proceedings of the 26th ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, Canada, pp. 379–380 (2003)

  • Pal, S., Mitra, M., Kamps, J.: Evaluation effort, reliability and reusability in XML retrieval. J. Am. Soc. Inf. Sci. Technol. 62(2), 375–394 (2011)

    Article  Google Scholar 

  • Petrelli, D.: On the role of user-centred evaluation in the advancement of interactive information retrieval. Inf. Process. Manag. 44(1), 22–38 (2008)

    Article  Google Scholar 

  • Ramesh, V., Glass, Robert L., Vessey, Iris: Research in computer science: an empirical study. J. Syst. Softw. 70(12), 165–176 (2004)

    Article  Google Scholar 

  • Sakai, T., Kando, N.: On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf. Retr. 11(5), 447–470 (2008)

  • Sanderson, M., Soboroff, I.: Problems with Kendall’s tau. In: Proceedings of the 30th ACM SIGIR Conference on Research and Development in Information Retrieval. Amsterdam, The Netherlands, pp. 839–840 (2007)

  • Santos, Jr, E., Zhao, Q., Nguyen, H., Wang, H.: Impacts of user modeling on personalization of information retrieval: an evaluation with human intelligence analysts. In: Proceedings of the 4th Workshop on the Evaluation of Adaptive Systems (Held in Conjunction with the 10th International Conference on User Modeling). Edinburgh, UK, pp. 27–36 (2005)

  • Saracevic, T.: Relevance: a review of and a framework for thinking on the notion in information science. J. Am. Soc. Inf. Sci. 26(6), 321–343 (1975)

    Article  Google Scholar 

  • Sieg, A., Mobasher, B., Burke, R.: Web search personalization with ontological user profiles. In: Proceedings of the 16th ACM International Conference on Information and Knowledge Management, Lisbon, Portugal, pp. 525–534 (2007)

  • Soboroff, I., Nicholas, C., Cahan, P.: Ranking retrieval systems without relevance judgments. In: Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, pp. 66–73 (2001)

  • Spink, A., Ozmutlu, S., Ozmutlu, H.C., Jansen, B.J.: US versus european web searching trends. In: Proceedings of the 25th ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 32–38 (2002)

  • Steichen, B., Ashman, H., Wade, V.: A comparative survey of personalised information retrieval and adaptive hypermedia techniques. Inf. Process. Manag. 48(4), 698–724 (2012)

    Article  Google Scholar 

  • Sugiyama, K., Hatano, K., Yoshikawa, M.: Adaptive web search based on user profile constructed without any effort from users. In: Proceedings of the 13th International Conference on World Wide Web, Manhattan, NY, USA pp. 675–684 (2004)

  • Taghavi, M., Patel, A., Schmidt, N., Wills, C., Tew, Y.: An analysis of web proxy logs with query distribution pattern approach for search engines. Comput. Stand. Interfaces 34(1), 162–170 (2012)

    Article  Google Scholar 

  • Tamine-Lechani, L., Boughanem, M., Daoud, M.: Evaluation of contextual information retrieval effectiveness: overview of issues and research. Knowl. Inf. Syst. 24(1), 1–34 (2009)

    Article  Google Scholar 

  • Tao, X., Li, Y., Zhong, N.: A personalized ontology model for web information gathering. IEEE Trans. Knowl. Data Eng. 23(4), 496–511 (2011)

    Article  Google Scholar 

  • Turpin, A.H., William, H.: Why batch and user evaluations do not give the same results. In: Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, pp. 225–231 (2001)

  • White, R.W., Ruthven, I., Jose, J.M., Van Rijsbergen, C.J.: Evaluating implicit feedback models using searcher simulations. ACM Trans. Inf. Syst. 23(3), 325–361 (2005)

    Article  Google Scholar 

  • White, R.W.: Contextual simulations for information retrieval evaluation. In: Proceedings of the 2nd ACM SIGIR Workshop on Information Retrieval in Context, Sheffield, UK, pp. 27–28 (2005)

  • Yang, Y., Padmanabhan, B.: Evaluation of online personalization systems: a survey of evaluation schemes and a knowledge-based approach. J. Electron. Commer. Res. 6(2), 112–122 (2005)

    Google Scholar 

  • Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 307–314 (1998)

Download references

Acknowledgments

This paper has been supported by the Spanish “Consejería de Innovación, Ciencia y Empresa de la Junta de Andalucía” and the “Ministerio de Ciencia e Innovación” under the Projects P09-TIC-4526 and TIN2011-28538-C02-02, respectively.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis M. de Campos.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vicente-López, E., de Campos, L.M., Fernández-Luna, J.M. et al. An automatic methodology to evaluate personalized information retrieval systems. User Model User-Adap Inter 25, 1–37 (2015). https://doi.org/10.1007/s11257-014-9148-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11257-014-9148-9

Keywords

Navigation