Advertisement

Information Retrieval

, Volume 8, Issue 4, pp 655–681 | Cite as

A Bayesian Framework for XML Information Retrieval: Searching and Learning with the INEX Collection

  • Benjamin Piwowarski
  • Patrick Gallinari
Article

Abstract

Most recent document standards like XML rely on structured representations. On the other hand, current information retrieval systems have been developed for flat document representations and cannot be easily extended to cope with more complex document types. The design of such systems is still an open problem. We present a new model for structured document retrieval which allows computing scores of document parts. This model is based on Bayesian networks whose conditional probabilities are learnt from a labelled collection of structured documents—which is composed of documents, queries and their associated assessments. Training these models is a complex machine learning task and is not standard. This is the focus of the paper: we propose here to train the structured Bayesian Network model using a cross-entropy training criterion. Results are presented on the INEX corpus of XML documents.

Keywords

Bayesian Networks structured information retrieval XML machine learning for structured retrieval 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Callan JP, Croft WB and Harding SM (1992) The INQUERY retrieval system. In: Min Tjoa A and Isidro Ramos, Eds., Database and Expert Systems Applications, Proceedings of the International Conference, Valencia, Spain. Springer-Verlag, pp. 78–83.Google Scholar
  2. Crestani F, de Campos LM, Ferna’ndez-Luna JM and Huete JF (2003) A multi-layered bayesian network model for structured document retrieval. In: Nielsen TD and Zhang NL, Eds., Symbolic and Quantitative Approaches to Reasoning with Uncertainty: 7th European Conference, ECSQARU 2003, Aalborg, Denmark, Springer-Verlag, pp. 74–86.Google Scholar
  3. Crestani F, de Campos LM, Ferna’ndez-Luna JM and Huete JF (2003a) Ranking structured documents using utility theory in the bayesian network retrieval model. In: Nascimento MA, de Moura ES and Oliveira AL, Eds. SPIRE(String Processing and Information Retrieval) 2003, volume 2857 of Lecture Notes in Computer Science, Brazil, Springer-Verlag Heidelberg, pp. 168–182.Google Scholar
  4. Culioli J-C (1994) Introduction à l’optimisation. Ellipses.Google Scholar
  5. de Campos LM, Ferna’ndez-Luna JM and Huete JF (2003b) The BNR model: Foundations and performance of a bayesian network-based retrieval model. International Journal of Approximate Reasoning, 34(2):265–285.Google Scholar
  6. de Campos LM, Ferna’ndez-Luna JM and Huete JF (2003) Improving the efficiency of the bayesian network retrieval model by reducing relationships between terms. International Journal of Uncertainty Fuzziness Knowledge-Based Systems, 11(Supplement):101–116.Google Scholar
  7. Dempster AP, Laird NM and Rubin DB (1977) Maximum likelihood from incomplete data via de EM algorithm. The Journal of Royal Statistical Society, 39:1–37.Google Scholar
  8. Fuhr N and Malik S (2003) Overview of the initiative for the evaluation of XML retrieval (INEX 2003). In: INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the Second INEX Workshop.Google Scholar
  9. Fuhr N and Rölleke T (1998) Hyspirit–-a probabilistic inference engine for hypermedia retrieval in large databases. In: Schek H-J, Saltor F, Ramos I and Alonso G, Eds., Proceedings of the 6th International Conference on Extending Database Technology (EDBT), Valencia, Spain, Springer, Berlin.Google Scholar
  10. Gövert N and Kazai G (2002) Overview of the initiative for the evaluation of XML retrieval (INEX 2002). In: Proceedings of the First Annual Workshop of the Initiative for the Evaluation of XML retrieval (INEX), DELOS workshop, Dagstuhl, Germany, ERCIM.Google Scholar
  11. Gövert N, Kazai G, Fuhr N and Lalmas M (2003) Evaluating the effectiveness of content-oriented XML retrieval. Technical report, University of Dortmund, Computer Science 6.Google Scholar
  12. Indrawan M, Ghazfan D and Srinivasan B (1994) Using bayesian networks as retrieval engines. In: ACIS 5th Australasian Conference on Information Systems, Melbourne, Australia, pp. 259–271.Google Scholar
  13. Jensen FV (1996) An Introduction to Bayesian Networks. UCL Press, London, England.Google Scholar
  14. Kazai G (2003) Report on the INEX 2003 metrics group. In: INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the Second INEX Workshop.Google Scholar
  15. Kazai G, Lalmas M and Vries AP (2004) The overlap problem in content-oriented XML retrieval evaluation. In: INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the Second INEX Workshop.Google Scholar
  16. Krause PJ (1998) Learning probabilistic networks. The Knowledge Engineering Review, 13(4):321–351.Google Scholar
  17. Lalmas M (1997) Dempster-shafer’s theory of evidence applied to structured documents: Modelling uncertainty. In: Proceedings of the 20th Annual International ACM SIGIR, Philadelphia, PA, USA, ACM, pp. 110–118.Google Scholar
  18. Myaeng SH, Jang D-H, Kim M-S and Zhoo Z-C (1998) A flexible model for retrieval of SGML documents. In: Croft WB, Moffat A, van Rijsbergen CJ, Wilkinson R and Zobel J, Eds., Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, ACM Press, New York, pp. 138–140.Google Scholar
  19. Pearl J (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.Google Scholar
  20. Piwowarski B and Gallinari P (2003) Expected ratio of relevant units: A measure for structured information retrieval. In: Fuhr N, Lalmas M and Malik S, Eds., INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the Second INEX Workshop, Dagstuhl, France.Google Scholar
  21. Piwowarski B and Lalmas M (2004) Providing consistent and exhaustive relevance assessments for XML retrieval evaluation. In: Proceedings of the Thirteenth Conference on Information and Knowledge Management (CIKM 2004), Washington D.C., USA.Google Scholar
  22. Ribeiro BAN and Muntz R (1996) A belief network model for IR. In: Proceedings of the 19th ACM-SIGIR Conference, pp. 253–260.Google Scholar
  23. Robertson SE (2002) Threshold setting and performance optimization in adaptive filtering. Information Retrieval, 5(2/3):239–256.Google Scholar
  24. Walker S and Robertson SE (1999) Okapi/keenbow at TREC-8. In: Voorhees EM and Harman DK, Eds., NIST Special Publication 500-246: The Eighth Text REtrieval Conference (TREC-8), Gaithersburg, Maryland, USA.Google Scholar
  25. Wilkinson R (1994) Effective retrieval of structured documents. In: Croft WB and van Rijsbergen CJ, Eds., Proceedings of the 17th Annual International Conference on Research and Development in Information Retrieval, Dublin, Ireland: Springer-Verlag, pp. 311–31.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.Center for Web Research, DCCUniversidad de ChileSantiagoChile
  2. 2.LIP6ParisFrance

Personalised recommendations