Bpref is a preference-based information retrieval measure that considers whether relevant documents are ranked above irrelevant ones. It is designed to be robust to missing relevance judgments, such that it gives the same experimental outcome with incomplete judgments that Mean Average Precision would with complete judgments.
In a test collection where all relevant documents have been identified, experiments using bpref and MAP should give the same outcome, for example both systems should agree that system A is better than system B. However, if the relevance judgments are incomplete, for example where only half the pool has been judged, MAP becomes unstable and may incorrectly show that system B is better than system A. The bpref measure was developed to maintain the correct ordering of systems (A better than B) even with incomplete judgments.
- 1.Buckley C, Voorhees EM. Retrieval evaluation with incomplete information. In: Proceedings of 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2004. p. 25–32.Google Scholar
- 2.Yilmaz E, Aslam JA. Estimating average precision with incomplete and imperfect judgments. In: Proceedings of International Conference on Information and Knowledge Management; 2006. p. 102–11.Google Scholar