Abstract
We present a learning model for probabilistic learning in information retrieval and information filtering which is based on the concept of “uncertainty sampling”. Uncertainty sampling is a technique that exploits user relevance feed-back both for relevant and non-relevant documents. In particular, relevance sampling uses those documents whose relevance is most uncertain to speed up the learning of the user relevance criteria. We extend the use of uncertainty sampling by considering multiple levels of relevance and we show how this new learning model for information retrieval and filtering could be evaluated using collections with non-binary relevance assessments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allan, J. (1996). Incremental relevance feedback for Information filtering. In Proceedings of ACM SIGIR, pages 270–278, Zürich, Switzerland.
Amati, G. and Crestani, F. (1999). Probabilistic learning for selective dissem-ination of information. Information Processing and Management In press.
Amati, G., Crestani, F., Ubaldini, F., and De Nardis, S. (1997). Probabilistic learning for information filtering. In Proceedings of the RIAO Conference, volume 1, pages 513–530, Montreal, Canada.
Amati, G. and van Rijsbergen, C. (1995). Probability, information and Information Retrieval. In Proceedings of the First International Workshop on Information Retrieval, Uncertanty and Logic, Glasgow, Scotland, UK.
Amati, G. and van Rijsbergen, C. (1998). Semantic Information Retrieval. In Crestani, F., Lalmas, M., and van Rijsbergen, C, editors, Information Retrieval: Uncertainty and Logics, pages 189–220. Kluwer Academic Publishers, Norwell, MA, USA.
Belew, R. (1996). Rave reviews: acquiring relevance assessments from multiple users. In Proceedings of the AAAI Spring Symposium on Machine Learning in Information Access, Stanford, CA, USA.
Belkin, N. and Croft, W. (1992). Information Filtering and Information Retrieval: two sides of the same coin? Communications ofthe ACM, 35(12):29–38.
Callan, J. (1996). Document filtering with inference networks. In Proceedings of ACM SIGIR, pages 262–269, Zürich, Switzerland.
Carnap, R. (1950). Logical Foundations of probability. Routledge and Kegan Paul Ltd, London, UK.
Cleverdon, C, Mills, J., and Keen, M. (1966). ASLIB Cranfield Research Project: factors determining the Performance of indexing Systems. ASLIB.
Cooper, W. (1971). A definition of relevance for Information Retrieval. Information Storage and Retrieval, 7:19–37.
Crestani, F., Lalmas, M., van Rijsbergen, C, and Campbell, I. (1998). Is this document relevant?…probably. A survey of probabilistic models in Information Retrieval. ACM Computing Surveys, 30(4):528–552.
Cuadra, C. and Katter, R. (1967). Opening the black box of relevance. Journal of Documentation, 23(4):291–303.
Ghosh, G. (1991). A brief history of sequential analisys. Marcel Dekker, New York, USA.
Harman, D. (1992). Relevance feedback and other query modification tech-niques. In Frakes, W. and Baeza-Yates, R., editors, Information Retrieval: data structures and algorithms, chapter 11. Prentice Hall, Englewood Cliffs, New Jersey, USA.
Harman, D. (1996). Overview of the fifth text retrieval Conference (TREC-5). In Proceedings of the TREC Conference, Gaithersburg, MD, USA.
Harter, S. (1996). Variations in relevance assessments and the measurements of retrieval effectiveness. Journal ofthe American Society for Information Science, 47(l):37–49.
Hintikka, J. (1970). On semantic information. In Information and inference. Synthese Library, Reidel, Dordrecht, The Netherlands.
Lewis, D. (1995). A sequential algorithm for training text classifiers: corrigen-dum and additional data. SIGIR FORUM, 29(2):13–19.
Lewis, D. and Gale, W. (1994). A sequential algorithm for training classifiers. In Proceedings of ACM SIGIR, pages 3–11, Dublin, Ireland.
Mira (1995–98). Evaluation framework for interactive multimedia Information Retrieval applications. ESPRIT Working Group Number 20039.
Mizzaro, S. (1997). Relevance: the whole history. Journal of the American Society for Information Science, 48(9):810–832.
Pejtersen, A. and Fidel, R. (1998). A framework for work centred evaluation and design: a case study of IR and the Web. Working paper for Mira Workshop, Grenoble, France.
Renyi, A. (1969). Foundations of probability. Holden-Day Press, San Francisco, USA.
Robertson, S. and Sparck Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129–146.
Salton, G. and McGill, M. (1983). Introduction to modern Information Retrieval. McGraw-Hill, New York.
Shaw, W., Wood, J., Wood, R., and Tibbo, H. (1991). The Cystic Fibrosis Database: content and research opportunities. LISR, 13:347–366.
Turtle, H. (1990). Inference Networks for Document Retrieval. PhD Thesis, Computer and Information Science Department, University of Massachusetts, Amherst, USA.
van Rijsbergen, C. (1979). Information Retrieval. Butterworths, London, sec-ond edition.
Wilbur, W. (1998). The knowledge in multiple human relevance judgements. ACM Transactions on Information Systems, 16(2):101–126.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Amati, G., Crestani, F. (2000). Probabilistic Learning by Uncertainty Sampling with Non-Binary Relevance. In: Crestani, F., Pasi, G. (eds) Soft Computing in Information Retrieval. Studies in Fuzziness and Soft Computing, vol 50. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1849-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-7908-1849-9_12
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-2473-5
Online ISBN: 978-3-7908-1849-9
eBook Packages: Springer Book Archive