Skip to main content

Combining Web Document Representations in a Bayesian Inference Network Model Using Link and Content-Based Evidence

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2291))

Included in the following conference series:

Abstract

This paper introduces an expressive formal Information Retrieval model developed for the Web. It is based on the Bayesian inference network model and views IR as an evidential reasoning process. It supports the explicit combination of multiple Web document representations under a single framework. Information extracted from the content of Web documents and derived from the analysis of the Web link structure is used as source of evidence in support of the ranking algorithm. This content and link-based evidential information is utilised in the generation of the multiple Web document representations used in the combination.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agosti, M. & Melucci, M. Information Retrieval on the Web. Lectures on Information Retrieval: Third European Summer-School ESSIR 2000, Varenna, Italy, September 11–15, 2000, Agosti, M. Crestani, F. & Pasi, G. eds. Revised Lectures, Springer-Verlag, Berlin/Heidelberg, 2001, 242–285.

    Chapter  Google Scholar 

  2. Amitay, E. Using common hypertext links to identify the best phrasal description of target Web documents. In Proceedings of the SIGIR Post-Conference Workshop on Hypertext Information Retrieval for the Web, Melbourne, Australia, 1998.

    Google Scholar 

  3. Amitay, E. InCommonSense-Rethinking Web Results. IEEE International Conference on Multimedia and Expo (ICME 2000), New York City, NY, USA.

    Google Scholar 

  4. Attardi, G., Gullì, A. & Sebastiani, F. Automatic Web Page Categorization by Link and Context Analysis. European Symposium on Telematics, Hypermedia and Artificial Intelligence, Varese, 1999.

    Google Scholar 

  5. Belkin, N. J., Kantor, P., Fox, E. A. & Shaw, J. A. Combining the evidence of multiple query representations for information retrieval. Information Processing & Management, 31(3), pp. 431–448, 1995.

    Article  Google Scholar 

  6. Bharat, K. & Henzinger, M. Improved algorithms for topic distillation in hyperlinked environments. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998, pp. 104–111.

    Google Scholar 

  7. Brin, S. & Page, L. The Anatomy of a Large-Scale HyperTextual Web Search Engine. In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, 1998.

    Google Scholar 

  8. Callan, J.P., Croft, W.B., & Harding, S.M. The INQUERY Retrieval System. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications, Valencia, Spain, 1992, pp. 78–83.

    Google Scholar 

  9. Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P. & Rajagopalan, S. Automatic resource list compilation by analysing hyperlink structure and associated text. In Proceedings of the 7th International World Wide Web Conference, 1998.

    Google Scholar 

  10. Croft, W.B. Combining Approaches to Information Retrieval, Advances in Information Retrieval: Recent Research from the CIIR, W. Bruce Croft, ed., Kluwer Academic Publishers, Chapter 1, pp.1–36, 2000.

    Google Scholar 

  11. Croft, W.B. & Turtle, H. A Retrieval Model Incorporating Hypertext Links. In Proceedings of the second annual ACM conference on Hypertext, Pittsburgh, PA USA, 1989, pp. 213–224.

    Google Scholar 

  12. Cutler, M., Deng H., Manicaam S., & Meng W. A New Study on Using HTML Structures to Improve Retrieval. The Eleventh IEEE International Conference on Tools with Artificial Intelligence (ICTAI99), Chicago IL, November 9–11, 1999

    Google Scholar 

  13. Davison, B. D. Topical Locality in the Web. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 24–28, pages 272–279.

    Google Scholar 

  14. Dunlop, M. D. & van Rijsbergen, C. J. Hypermedia and free text retrieval, Information Processing and Management, vol. 29(3), May 1993.

    Google Scholar 

  15. Fischer, H. & Elchesen, D. Effectiveness of combining title words and index terms in machine retrieval searches, Nature, 238:109–11, 1972.

    Article  Google Scholar 

  16. Fox, E., Nunn, G. & Lee, W. Coefficients for combining concept classes in a collection. In Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 291–308, 1988.

    Google Scholar 

  17. Gauch, S., Wang, H. & Gomez, M. ProFusion: Intelligent Fusion from Multiple, Distributed Search Engines. Journal of Universal Computing, Springer-Verlag, Volume 2 (9), September 1996.

    Google Scholar 

  18. Géry, M. & Chevallet, J. P. Toward a Structured Information Retrieval System on the Web: Automatic Structure Extraction of Web Pages. In International Workshop on Web Dynamics. In conjunction with the 8th International Conference on Database Theory. London, UK, 3 January 2001.

    Google Scholar 

  19. Katzer, J., McGill, M., Tessier, J., Frakes, W. & DasGupta, P. A study of the overlap among document representations. Information Technology: Research and Development, 1(4): 261–274, 1982.

    Google Scholar 

  20. Kleinberg, J. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46[1999]. Also appears as IBM Research Report RJ 10076, May 1997.

    Google Scholar 

  21. Lawrence, S. Context in Web Search. IEEE Data Engineering Bulletin, Volume 23, Number 3, pp.25–32, 2000.

    Google Scholar 

  22. Pearl, J. Probabilistic Reasoning in Intelligent systems: Networks of plausible inference., Revised second printing, Morgan Kaufmann Publishers Inc., 1997.

    Google Scholar 

  23. Rajashekar, T. & Croft, B. Combining automatic and manual index representation in probabilistic retrieval. Journal of the American Society for Information Science, 46(4):272–283, 1995.

    Article  Google Scholar 

  24. Robertson, S.E. Theories and models in Information Retrieval. Journal of Documentation, 33, pp. 126–148, 1977.

    Article  Google Scholar 

  25. Robertson, S. & Sparck-Jones, K. Relevance weighting of search terms. Journal of American society for Information Science, 27:129–146, 1976.

    Article  Google Scholar 

  26. Ribeiro-Neto, B., daSilva, I. & Muntz, R. Bayesian Network Models for IR. In Soft Computing in Information Retrieval: Techniques and Applications, Crestani, F. & Pasi, G. editors. Springer Verlag, 2000. pp 259–291

    Google Scholar 

  27. Ruthven, I., Lalmas, M. & van Rijsbergen, K. Combining and selecting characteristics of information use. Journal of the American Society of Information Science and Technology, 2002 (To appear).

    Google Scholar 

  28. Salton, G., Yang, C. & Wong, A. A vector space model for automatic indexing, Communications of the ACM, 18(11), pp. 613–620, 1975.

    Article  MATH  Google Scholar 

  29. Savoy, J., Le Calvé, A. & Vrajitoru, D. Report on the TREC-5 Experiment: Data Fusion and Collection Fusion. Proceedings TREC5, 1996.NIST Publication 500-238, Gaithersburg (MD), 489–502, 1996.

    Google Scholar 

  30. Selberg, E. & Etzioni, O. The MetaCrawler Architecture for Resource Aggregation on the Web. IEEE Expert, January / February 1997, Volume 12 No. 1, pp. 8–14.

    Article  Google Scholar 

  31. Silva, I., Ribeiro-Neto, B., Calado, P., Moura, E. & Ziviani, N. Link-Based and Content-Based Evidential Information in aBelief Network Model. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 2000, pp 96–103

    Google Scholar 

  32. Sparck Jones, K. & Willett, P. Readings in Information Retrieval, Sparck Jones, K. & Willett, P. eds, Morgan Kaufmann Publishers, 1997.

    Google Scholar 

  33. Tsikrika, T. & Lalmas, M. Merging Techniques for Performing Data Fusion on the Web. Proceedings of the Tenth International Conference on Information and Knowledge Management (ACM CIKM 2001), Atlanta, Georgia, November 5–10, 2001.

    Google Scholar 

  34. Turtle H. R. Inference Networks for Document Retrieval. Ph.D. dissertation.

    Google Scholar 

  35. Turtle, H. & Croft, W.B. Evaluation of an Inference Network-Based Retrieval Model. ACM Transactions on Information Systems, 9(3), pp. 187–222.

    Google Scholar 

  36. van Rijsbergen, C. J. A Non-Classical Logic for Information Retrieval. In Readings in Information Retrieval, Sparck-Jones, K. & Willett, P. editors. The Morgan Kaufmann Series in Multimedia Information and Systems, Edward Fox Series Editor, 1997.

    Google Scholar 

  37. Zhu, X. & Gauch, S. Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web. In the Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 24–28, 2000, Athens, Greece, pp. 288–295.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tsikrika, T., Lalmas, M. (2002). Combining Web Document Representations in a Bayesian Inference Network Model Using Link and Content-Based Evidence. In: Crestani, F., Girolami, M., van Rijsbergen, C.J. (eds) Advances in Information Retrieval. ECIR 2002. Lecture Notes in Computer Science, vol 2291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45886-7_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-45886-7_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43343-9

  • Online ISBN: 978-3-540-45886-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics