Abstract
Knowing the behavior of terms in written texts can help us tailor fit models, algorithms and resources to improve access to digital libraries and help us answer information needs in longer spanning archives. In this paper we investigate the behavior of English written text in blogs in comparison to traditional texts from the New York Times, The Times Archive, and the British National Corpus. We show that user generated content, similar to spoken content, differs in characteristics from ‘professionally’ written text and experiences a more dynamic behavior.
This work is partly funded by the European Commission under ARCOMEM (ICT 270239).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abecker, A., Stojanovic, L.: Ontology evolution: Medline case study. In: Wirtschaftsinformatik: eEconomy, eGovernment, eSociety, pp. 1291–1308 (2005)
Bamman, D., Crane, G.: Measuring historical word sense variation. In: JCDL, pp. 1–10 (2011)
The British National Corpus, version 3, BNC Consortium (2007)
Christiansen, M., Kirby, S.: Language evolution. Studies in the evolution of language. Oxford University Press (2003)
Ernst-Gerlach, A., Fuhr, N.: Retrieval in text collections with historic spelling using linguistic and spelling variants. In: JCDL, pp. 333–341 (2007)
Kanhabua, N., Nørvåg, K.: Exploiting time-based synonyms in searching document archives. In: JCDL, pp. 79–88 (2010)
Macdonald, C., Ounis, I.: The TREC Blogs06 Collection: Creating and Analysing a Blog Test Collection. DCS Technical Report Series (2006)
Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38, 39–41 (1995)
Pinker, S., Bloom, P.: Natural selection and natural language. Behavioral and Brain Sciences 13(4), 707–784 (1990)
Segerstad, Y.: Use and adaptation of written language to the conditions of computer-mediated communication. Ph.D. thesis, Göteborg University (2002)
Tahmasebi, N., Niklas, K., Theuerkauf, T., Risse, T.: Using Word Sense Discrimination on Historic Document Collections. In: JCDL, pp. 89–98 (2010)
TREC-BLOG (2012), http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tahmasebi, N., Gossen, G., Risse, T. (2012). Which Words Do You Remember? Temporal Properties of Language Use in Digital Archives. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds) Theory and Practice of Digital Libraries. TPDL 2012. Lecture Notes in Computer Science, vol 7489. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33290-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-33290-6_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33289-0
Online ISBN: 978-3-642-33290-6
eBook Packages: Computer ScienceComputer Science (R0)