Abstract
Although document filtering is simple to define, there is a wide range of different evaluation measures that have been proposed in the literature, all of which have been subject to criticism. We present a unified, comparative view of the strenghts and weaknesses of proposed measures based on two formal constraints (which should be satisfied by any suitable evaluation measure) and various properties (which help differentiating measures according to their behaviour). We conclude that (i) some smoothing process is necessary process to satisfy the basic constraints; and (ii) metrics can be grouped into three families, each satisfying one out of three formal properties, which are mutually exclusive, i.e. no metric can satisfy all three properties simultaneously.
This research was partially supported by the Spanish Ministry of Science and Innovation (Holopedia Project, TIN2010-21128-C02) and the Regional Government of Madrid and the European Social Fund under MA2VICMR (S2009/TIC-1542) .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amigó, E., Artiles, J., Gonzalo, J., Spina, D., Liu, B., Corujo, A.: WePS3 Evaluation Campaign: Overview of the On-line Reputation Management Task. In: 2nd Web People Search Evaluation Workshop (WePS 2010), CLEF 2010 Conference, Padova Italy (2010)
Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C.D.: An evaluation of naive bayesian anti-spam filtering. CoRR cs.CL/0006013 (2000)
Callan, J.: Document filtering with inference networks. In: Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 262–269 (1996)
Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37 (1960)
Cormack, G., Lynam, T.: Trec 2005 spam track overview. In: Proceedings of the fourteenth Text Retrieval Conference 8TREC 2005 (2005)
Cunningham, P., Nowlan, N., Delany, S.J., Haahr, M.: A case-based approach to spam filtering that can track concept drift. In: The ICCBR 2003 Workshop on Long-Lived CBR Systems, pp. 03–2003 (2003)
Fawcett, T., Niculescu-Mizil, A.: Pav and the roc convex hull. Mach. Learn. 68, 97–106 (2007)
Good, I.J.: ational decisions. Journal of the Royal Statistical Society. Series B Methodological 14, 107–114 (1952)
Hedin, B., Tomlinson, S., Baron, J.R., Oard, D.W.: Overview of the trec 2009 legal track (2009)
Hoashi, K., Matsumoto, K., Inoue, N., Hashimoto, K.: Document filtering method using non-relevant information profile. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2000, pp. 176–183. ACM, New York (2000), http://doi.acm.org/10.1145/345508.345573
Hull, D.A.: The trec-6 filtering track: Description and analysis. In: Proceedings of the TREC 6, pp. 33–56 (1997)
Hull, D.A.: The TREC-7 filtering track: description and analysis. In: Voorhees, E.M., Harman, D.K. (eds.) Proceedings of TREC-7, 7th Text Retrieval Conference, pp. 33–56. National Institute of Standards and Technology, Gaithersburg (1998), citeseer.ist.psu.edu/126480.html
Karon, B.P., Alexander, I.E.: Association and estimation in contingency tables. Journal of the American Statistical Association 23(2), 1–28 (1958), http://www.jstor.org/stable/2283825
Ling, C.X., Huang, J., Zhang, H.: Auc: a statistically consistent and more discriminating measure than accuracy. In: IJCAI, pp. 519–526 (2003)
Mitchell, T.M.: Machine learning. McGraw Hill, New York (1997)
Persin, M.: Document filtering for fast ranking. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1994, pp. 339–348. Springer, New York (1994), http://portal.acm.org/citation.cfm?id=188490.188597
Provost, F.J., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Knowledge Discovery and Data Mining, pp. 43–48 (1997)
Qi, H., Yang, M., He, X., Li, S.: Re-examination on lam% in spam filtering. In: Proceedings of the SIGIR 2010 Conference, Geneva, Switzerland (2010)
Robertson, S., Hull, D.A.: The trec-9 filtering track final report. In: Proceedings of TREC-9, pp. 25–40 (2001)
Schapire, R.E., Singer, Y., Singhal, A.: Boosting and rocchio applied to text filtering. In: Proceedings of ACM SIGIR, pp. 215–223. ACM Press, New York (1998)
Sokolova, M.V., Japkowicz, N., Szpakowicz, S.: Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. In: Sattar, A., Kang, B.-h. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 1015–1021. Springer, Heidelberg (2006)
Van Rijsbergen, C.: Foundation of evaluation. Journal of Documentation 30(4), 365–373 (1974)
Wei, C.P., Chen, H.C., Cheng, T.H.: Effective spam filtering: A single-class learning and ensemble approach. Decis. Support Syst. 45(3), 491–503 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amigó, E., Gonzalo, J., Verdejo, F. (2011). A Comparison of Evaluation Metrics for Document Filtering. In: Forner, P., Gonzalo, J., Kekäläinen, J., Lalmas, M., de Rijke, M. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2011. Lecture Notes in Computer Science, vol 6941. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23708-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-23708-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23707-2
Online ISBN: 978-3-642-23708-9
eBook Packages: Computer ScienceComputer Science (R0)