A Comparison of Evaluation Metrics for Document Filtering

Amigó, Enrique; Gonzalo, Julio; Verdejo, Felisa

doi:10.1007/978-3-642-23708-9_6

Enrique Amigó²¹,
Julio Gonzalo²¹ &
Felisa Verdejo²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6941))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

668 Accesses
3 Citations

Abstract

Although document filtering is simple to define, there is a wide range of different evaluation measures that have been proposed in the literature, all of which have been subject to criticism. We present a unified, comparative view of the strenghts and weaknesses of proposed measures based on two formal constraints (which should be satisfied by any suitable evaluation measure) and various properties (which help differentiating measures according to their behaviour). We conclude that (i) some smoothing process is necessary process to satisfy the basic constraints; and (ii) metrics can be grouped into three families, each satisfying one out of three formal properties, which are mutually exclusive, i.e. no metric can satisfy all three properties simultaneously.

This research was partially supported by the Spanish Ministry of Science and Innovation (Holopedia Project, TIN2010-21128-C02) and the Regional Government of Madrid and the European Social Fund under MA2VICMR (S2009/TIC-1542) .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amigó, E., Artiles, J., Gonzalo, J., Spina, D., Liu, B., Corujo, A.: WePS3 Evaluation Campaign: Overview of the On-line Reputation Management Task. In: 2nd Web People Search Evaluation Workshop (WePS 2010), CLEF 2010 Conference, Padova Italy (2010)
Google Scholar
Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C.D.: An evaluation of naive bayesian anti-spam filtering. CoRR cs.CL/0006013 (2000)
Google Scholar
Callan, J.: Document filtering with inference networks. In: Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 262–269 (1996)
Google Scholar
Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37 (1960)
Article Google Scholar
Cormack, G., Lynam, T.: Trec 2005 spam track overview. In: Proceedings of the fourteenth Text Retrieval Conference 8TREC 2005 (2005)
Google Scholar
Cunningham, P., Nowlan, N., Delany, S.J., Haahr, M.: A case-based approach to spam filtering that can track concept drift. In: The ICCBR 2003 Workshop on Long-Lived CBR Systems, pp. 03–2003 (2003)
Google Scholar
Fawcett, T., Niculescu-Mizil, A.: Pav and the roc convex hull. Mach. Learn. 68, 97–106 (2007)
Article Google Scholar
Good, I.J.: ational decisions. Journal of the Royal Statistical Society. Series B Methodological 14, 107–114 (1952)
MathSciNet Google Scholar
Hedin, B., Tomlinson, S., Baron, J.R., Oard, D.W.: Overview of the trec 2009 legal track (2009)
Google Scholar
Hoashi, K., Matsumoto, K., Inoue, N., Hashimoto, K.: Document filtering method using non-relevant information profile. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2000, pp. 176–183. ACM, New York (2000), http://doi.acm.org/10.1145/345508.345573
Google Scholar
Hull, D.A.: The trec-6 filtering track: Description and analysis. In: Proceedings of the TREC 6, pp. 33–56 (1997)
Google Scholar
Hull, D.A.: The TREC-7 filtering track: description and analysis. In: Voorhees, E.M., Harman, D.K. (eds.) Proceedings of TREC-7, 7th Text Retrieval Conference, pp. 33–56. National Institute of Standards and Technology, Gaithersburg (1998), citeseer.ist.psu.edu/126480.html
Karon, B.P., Alexander, I.E.: Association and estimation in contingency tables. Journal of the American Statistical Association 23(2), 1–28 (1958), http://www.jstor.org/stable/2283825
MathSciNet Google Scholar
Ling, C.X., Huang, J., Zhang, H.: Auc: a statistically consistent and more discriminating measure than accuracy. In: IJCAI, pp. 519–526 (2003)
Google Scholar
Mitchell, T.M.: Machine learning. McGraw Hill, New York (1997)
MATH Google Scholar
Persin, M.: Document filtering for fast ranking. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1994, pp. 339–348. Springer, New York (1994), http://portal.acm.org/citation.cfm?id=188490.188597
Google Scholar
Provost, F.J., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Knowledge Discovery and Data Mining, pp. 43–48 (1997)
Google Scholar
Qi, H., Yang, M., He, X., Li, S.: Re-examination on lam% in spam filtering. In: Proceedings of the SIGIR 2010 Conference, Geneva, Switzerland (2010)
Google Scholar
Robertson, S., Hull, D.A.: The trec-9 filtering track final report. In: Proceedings of TREC-9, pp. 25–40 (2001)
Google Scholar
Schapire, R.E., Singer, Y., Singhal, A.: Boosting and rocchio applied to text filtering. In: Proceedings of ACM SIGIR, pp. 215–223. ACM Press, New York (1998)
Google Scholar
Sokolova, M.V., Japkowicz, N., Szpakowicz, S.: Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. In: Sattar, A., Kang, B.-h. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 1015–1021. Springer, Heidelberg (2006)
Chapter Google Scholar
Van Rijsbergen, C.: Foundation of evaluation. Journal of Documentation 30(4), 365–373 (1974)
Article Google Scholar
Wei, C.P., Chen, H.C., Cheng, T.H.: Effective spam filtering: A single-class learning and ensemble approach. Decis. Support Syst. 45(3), 491–503 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

UNED NLP & IR Group, Juan del Rosal, 16, 28040, Madrid, Spain
Enrique Amigó, Julio Gonzalo & Felisa Verdejo

Authors

Enrique Amigó
View author publications
You can also search for this author in PubMed Google Scholar
Julio Gonzalo
View author publications
You can also search for this author in PubMed Google Scholar
Felisa Verdejo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for the Evaluation of Language and Communication Technologies (CELCT), Via alla Casata 56/c, 38123, Povo, Italy
Pamela Forner
National University of Distance Education, E.T.S.I. Informática de la UNED, c/Juan del Rosal 16, 28040, Madrid, Spain
Julio Gonzalo
School of Information Sciences, University of Tampere, Kanslerinrinne 1, 33014, Tampere, Finland
Jaana Kekäläinen
Yahoo! Research, Avinguda Diagonal 177, 8th Floor, 08018, Barcelona, Spain
Mounia Lalmas
Intelligent Systems Laboratory, University of Amsterdam, Science Park 107, 1098 XG, Amsterdam, The Netherlands
Marteen de Rijke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amigó, E., Gonzalo, J., Verdejo, F. (2011). A Comparison of Evaluation Metrics for Document Filtering. In: Forner, P., Gonzalo, J., Kekäläinen, J., Lalmas, M., de Rijke, M. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2011. Lecture Notes in Computer Science, vol 6941. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23708-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-23708-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23707-2
Online ISBN: 978-3-642-23708-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics