Abstract
This paper describes how the current lexical similarity and analogy gold standards are built to conform to certain ideas about what the models they are designed to evaluate are used for. Topical relevance has always been the most important target notion for information access tools and related language technology technologies, and while this has proven a useful starting point for much of what information technology is used for, it does not always align well with other uses to which technologies are being put, most notably use cases from digital scholarship in the humanities or social sciences. This paper argues for more systematic formulation of requirements from the digital humanities and social sciences and more explicit description of the assumptions underlying model design.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Spärck Jones argues that this should be understood in terms of occurrence statistics rather than more elusive statistical notions. However, the target notion is a relevance-oriented one.
References
Baroni, M., Bernardi, R., Do, N.Q., Shan, C.C.: Entailment above the word level in distributional semantics. In: Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. ACL (2012)
Baroni, M., Lenci, A.: How we BLESSed distributional semantic evaluation. In: Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics. Association for Computational Linguistics (2011)
Chiarello, C., Burgess, C., Richards, L., Pollock, A.: Semantic and associative priming in the cerebral hemispheres: some words do, some words don’t... sometimes, some places. Brain Lang. 38(1), 75–104 (1990)
Da, N.Z.: The computational case against computational literary studies. Crit. Inq. 45(3), 601–639 (2019)
Da, N.Z.: The digital humanities debacle—computational methods repeatedly come up short. The Chronicle of Higher Education (2019)
Finkelstein, L., et al.: Placing search in context: the concept revisited. In: Proceedings of the International Conference on World Wide Web. ACM (2001)
Fitzpatrick, K.: The humanities, done digitally. The Chronicle of Higher Education (2011)
Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41, 665–695 (2016)
Jänicke, S., Franzini, G., Cheema, M.F., Scheuermann, G.: On close and distant reading in digital humanities: a survey and future challenges. In: Eurographics Conference on Visualization (EuroVis), vol. 2 (2015)
Katz, S.M.: Distribution of content words and phrases in text and language modelling. Nat. Lang. Eng. 2(1), 15–59 (1996)
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (2013)
Moretti, F.: Distant Reading. Verso Books, London (2013)
O’Connor, B., Bamman, D., Smith, N.A.: Computational text analysis for social science: model assumptions and complexity. In: Second Workshop on Computational Social Science and the Wisdom of Crowds (2011)
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)
Schwartz, H.A., Gomez, F.: Evaluating semantic metrics on tasks of concept similarity. In: Proceedings of FLAIRS (2011)
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Underwood, T.: Dear Humanists: Fear Not the Digital Revolution. The Chronicle of Higher Education (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Karlgren, J. (2019). How Lexical Gold Standards Have Effects on the Usefulness of Text Analysis Tools for Digital Scholarship. In: Crestani, F., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019. Lecture Notes in Computer Science(), vol 11696. Springer, Cham. https://doi.org/10.1007/978-3-030-28577-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-28577-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28576-0
Online ISBN: 978-3-030-28577-7
eBook Packages: Computer ScienceComputer Science (R0)