Automated Early Leaderboard Generation from Comparative Tables
A leaderboard is a tabular presentation of performance scores of the best competing techniques that address a specific scientific problem. Manually maintained leaderboards take time to emerge, which induces a latency in performance discovery and meaningful comparison. This can delay dissemination of best practices to non-experts and practitioners. Regarding papers as proxies for techniques, we present a new system to automatically discover and maintain leaderboards in the form of partial orders between papers, based on performance reported therein. In principle, a leaderboard depends on the task, data set, other experimental settings, and the choice of performance metrics. Often there are also tradeoffs between different metrics. Thus, leaderboard discovery is not just a matter of accurately extracting performance numbers and comparing them. In fact, the levels of noise and uncertainty around performance comparisons are so large that reliable traditional extraction is infeasible. We mitigate these challenges by using relatively cleaner, structured parts of the papers, e.g., performance tables. We propose a novel performance improvement graph with papers as nodes, where edges encode noisy performance comparison information extracted from tables. Every individual performance edge is extracted from a table with citations to other papers. These extractions resemble (noisy) outcomes of ‘matches’ in an incomplete tournament. We propose several approaches to rank papers from these noisy ‘match’ outcomes. We show that our ranking scheme can reproduce various manually curated leaderboards very well. Using widely-used lists of state-of-the-art papers in 27 areas of Computer Science, we demonstrate that our system produces very reliable rankings. We also show that commercial scholarly search systems cannot be used for leaderboard discovery, because of their emphasis on citations, which favors classic papers over recent performance breakthroughs. Our code and data sets will be placed in the public domain.
Partly supported by grants from IBM and Amazon.
- 1.Al-Zaidy, R.A., Giles, C.L.: Automatic extraction of data from bar charts. In: Proceedings of the 8th International Conference on Knowledge Capture, p. 30. ACM (2015)Google Scholar
- 2.Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: WebTables: exploring the power of tables on the web. PVLDB 1(1), 538–549 (2008). ISSN 2150–8097. http://doi.acm.org/10.1145/1453856.1453916, http://www.eecs.umich.edu/~michjc/papers/webtables_vldb08.pdfCrossRefGoogle Scholar
- 3.David, H.A.: Ranking from unbalanced paired-comparison data. Biometrika 74(2), 432–436 (1987). https://academic.oup.com/biomet/article-pdf/74/2/432/659083/74-2-432.pdfMathSciNetCrossRefGoogle Scholar
- 4.Hashimoto, H., Shinoda, K., Yokono, H., Aizawa, A.: Automatic generation of review matrices as multi-document summarization of scientific papers. In: Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), vol. 7, pp. 850–865 (2017)Google Scholar
- 6.Jung, D., et al.: Chartsense: interactive data extraction from chart images. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems CHI 2017, pp. 6706–6717 (2017). ISBN 978-1-4503-4655-9Google Scholar
- 7.Mitra, P., Giles, C.L., Wang, J.Z., Lu, X.: Automatic categorization of figures in scientific documents. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries JCDL2006, pp. 129–138. IEEE (2006)Google Scholar
- 8.Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the Web. Manuscript, Stanford University (1998)Google Scholar
- 10.Sarawagi, S., Chakrabarti, S.: Open-domain quantity queries on web tables: annotation, response, and consensus models. In: SIGKDD Conference (2014)Google Scholar
- 12.Xing, W., Ghorbani, A.: Weighted pagerank algorithm. In: Proceedings of Second Annual Conference on Communication Networks and Services Research, pp. 305–314. IEEE (2004)Google Scholar