How Lexical Gold Standards Have Effects on the Usefulness of Text Analysis Tools for Digital Scholarship

Karlgren, Jussi

doi:10.1007/978-3-030-28577-7_14

Jussi Karlgren ORCID: orcid.org/0000-0003-4042-4919¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11696))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1087 Accesses

Abstract

This paper describes how the current lexical similarity and analogy gold standards are built to conform to certain ideas about what the models they are designed to evaluate are used for. Topical relevance has always been the most important target notion for information access tools and related language technology technologies, and while this has proven a useful starting point for much of what information technology is used for, it does not always align well with other uses to which technologies are being put, most notably use cases from digital scholarship in the humanities or social sciences. This paper argues for more systematic formulation of requirements from the digital humanities and social sciences and more explicit description of the assumptions underlying model design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Spärck Jones argues that this should be understood in terms of occurrence statistics rather than more elusive statistical notions. However, the target notion is a relevance-oriented one.

References

Baroni, M., Bernardi, R., Do, N.Q., Shan, C.C.: Entailment above the word level in distributional semantics. In: Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. ACL (2012)
Google Scholar
Baroni, M., Lenci, A.: How we BLESSed distributional semantic evaluation. In: Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics. Association for Computational Linguistics (2011)
Google Scholar
Chiarello, C., Burgess, C., Richards, L., Pollock, A.: Semantic and associative priming in the cerebral hemispheres: some words do, some words don’t... sometimes, some places. Brain Lang. 38(1), 75–104 (1990)
Article Google Scholar
Da, N.Z.: The computational case against computational literary studies. Crit. Inq. 45(3), 601–639 (2019)
Article Google Scholar
Da, N.Z.: The digital humanities debacle—computational methods repeatedly come up short. The Chronicle of Higher Education (2019)
Google Scholar
Finkelstein, L., et al.: Placing search in context: the concept revisited. In: Proceedings of the International Conference on World Wide Web. ACM (2001)
Google Scholar
Fitzpatrick, K.: The humanities, done digitally. The Chronicle of Higher Education (2011)
Google Scholar
Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41, 665–695 (2016)
Article MathSciNet Google Scholar
Jänicke, S., Franzini, G., Cheema, M.F., Scheuermann, G.: On close and distant reading in digital humanities: a survey and future challenges. In: Eurographics Conference on Visualization (EuroVis), vol. 2 (2015)
Google Scholar
Katz, S.M.: Distribution of content words and phrases in text and language modelling. Nat. Lang. Eng. 2(1), 15–59 (1996)
Article Google Scholar
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)
Article Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (2013)
Google Scholar
Moretti, F.: Distant Reading. Verso Books, London (2013)
Google Scholar
O’Connor, B., Bamman, D., Smith, N.A.: Computational text analysis for social science: model assumptions and complexity. In: Second Workshop on Computational Social Science and the Wisdom of Crowds (2011)
Google Scholar
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)
Article Google Scholar
Schwartz, H.A., Gomez, F.: Evaluating semantic metrics on tasks of concept similarity. In: Proceedings of FLAIRS (2011)
Google Scholar
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Article Google Scholar
Underwood, T.: Dear Humanists: Fear Not the Digital Revolution. The Chronicle of Higher Education (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

KTH Royal Institute of Technology and Gavagai, Stockholm, Sweden
Jussi Karlgren

Authors

Jussi Karlgren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jussi Karlgren .

Editor information

Editors and Affiliations

Universita della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
Zurich University of Applied Sciences, Winterthur, Switzerland
Martin Braschler
University of Neuchâtel, Neuchâtel, Switzerland
Jacques Savoy
Technische Universität Wien, Vienna, Austria
Andreas Rauber
HES-SO Valais-Wallis, Sierre, Switzerland
Henning Müller
University of Santiago de Compostela, Santiago de Compostela, Spain
David E. Losada
Swiss Alliance for Data-Intensive Services, Thun, Switzerland
Gundula Heinatz Bürki
University of Padua, Padua, Italy
Linda Cappellato
University of Padua, Padua, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karlgren, J. (2019). How Lexical Gold Standards Have Effects on the Usefulness of Text Analysis Tools for Digital Scholarship. In: Crestani, F., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019. Lecture Notes in Computer Science(), vol 11696. Springer, Cham. https://doi.org/10.1007/978-3-030-28577-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-28577-7_14
Published: 03 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28576-0
Online ISBN: 978-3-030-28577-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics