Textual Signatures: Identifying Text-Types Using Latent Semantic Analysis to Measure the Cohesion of Text Structures

McCarthy, Philip M.; Briner, Stephen W.; Rus, Vasile; McNamara, Danielle S.

doi:10.1007/978-1-84628-754-1_7

Textual Signatures: Identifying Text-Types Using Latent Semantic Analysis to Measure the Cohesion of Text Structures

Philip M. McCarthy²,
Stephen W. Briner²,
Vasile Rus² &
…
Danielle S. McNamara²

Chapter

4816 Accesses
11 Citations

Abstract

Just as a sentence is far more than a mere concatenation of words, a text is far more than a mere concatenation of sentences. Texts contain pertinent information that co-refers across sentences and paragraphs [30]; texts contain relations between phrases, clauses, and sentences that are often causally linked [21], [51], [56]; and texts that depend on relating a series of chronological events contain temporal features that help the reader to build a coherent representation of the text [19], [55]. We refer to textual features such as these as cohesive elements, and they occur within paragraphs (locally), across paragraphs (globally), and in forms such as referential, causal, temporal, and structural [18], [22], [36]. But cohesive elements, and by consequence cohesion, does not simply feature in a text as dialogues tend to feature in narratives, or as cartoons tend to feature in newspapers. That is, cohesion is not present or absent in a binary or optional sense. Instead, cohesion in text exists on a continuum of presence, which is sometimes indicative of the text-type in question [12], [37], [41] and sometimes indicative of the audience for which the text was written [44], [47]. In this chapter, we discuss the nature and importance of cohesion; we demonstrate a computational tool that measures cohesion; and, most importantly, we demonstrate a novel approach to identifying text-types by incorporating contrasting rates of cohesion.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 159.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Best, R.M., Floyd, R.G., & McNamra, D.S. (2004). Understanding the fourthgrade slump: Comprehension difficulties as a function of reader aptitudes and text genre. Paper presented at the 85th Annual Meeting of the American Educational Research Association.
Google Scholar
Biber, D. (1987). A textual comparison of British and American writing. American Speech, 62, 99–119.
Article Google Scholar
Biber, D. (1988). Linguistic features: algorithms and functions in variation across speech and writing. Cambridge: Cambridge University Press.
Google Scholar
Brill, E. (1995). Unsupervised learning of disambiguation rules for part of speech tagging. In Proceedings of the Third Workshop on Very Large Corpora, Cambridge, MA.
Google Scholar
Britton, B. K., & Gulgoz, S. (1991). Using Kintschs computational model to improve instructional text: Effects of inference calls on recall and cognitive structures. Journal of Educational Psychology, 83, 329–345
Article Google Scholar
Burrows, J. (1987). Word-patterns and story-shapes: The statistical analysis of narrative style. Literary and Linguistic Computing, 2, 6170.
Article Google Scholar
Charniak, E. (1997) Statistical Parsing with a context-free grammar and word statistics Proceedings of the Fourteenth National Conference on Artificial Intelligence, Menlo Park: AAAI/MIT Press
Google Scholar
Charniak, E. (2000) A Maximum-Entropy-Inspired Parser. Proceedings of the North-American Chapter of Association for Computational Linguistics, Seattle, WA
Google Scholar
Charniak, E. & Johnson, M. (2005) Coarse-to-fine n-best parsing and Max-Ent discriminative reranking. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (pp. 173–180). Ann Arbor, MI
Google Scholar
Collins, M. (1996) A New Statistical Parser Based on Bigram Lexical Dependencies. Proceedings of the 34th Annual Meeting of the ACL, Santa Cruz, CA
Google Scholar
Collins, M. (1997) Three Generative, Lexicalised Models for Statistical Parsing Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, Madrid, Spain.
Google Scholar
Crossley, S., Louwerse, M.M., McCarthy, P.M., & McNamara, D.S. (forthcoming 2007). A linguistic analysis of simplified and authentic texts. Modern Language Journal, 91, (2).
Google Scholar
Dennis, S., Landauer, T., Kintsch, W. & Quesada, J. (2003). Introduction to Latent Semantic Analysis. Slides from the tutorial given at the 25th Annual Meeting of the Cognitive Science Society, Boston.
Google Scholar
Duran, N., McCarthy, P.M., Graesser, A.C., McNamara, D.S., (2006). An empirical study of temporal indices. Proceedings of the 28th annual conference of the Cognitive Science Society, 2006.
Google Scholar
Foltz, P. W., Britt, M. A., & Perfetti, C. A. (1996). Reasoning from multiple texts: An automatic analysis of readers’ situation models. In G. W. Cottrell (Ed.) Proceedings of the 18th Annual Cognitive Science Conference (pp. 110–115). Lawrence Erlbaum, NJ.
Google Scholar
Foltz, P. W., Kintsch, W., & Landauer, T. K. (1998). The measurement of textual Coherence with Latent Semantic Analysis. Discourse Processes, 25, 285–307.
Google Scholar
Foltz, P. W., Gilliam, S., & Kendall, S. (2000). Supporting content-based feedback in on-line writing evaluation with LSA. Interactive Learning Environments, 8, 111–127.
Article Google Scholar
Gernsbacher, M.A. (1990). Language comprehension as structure building. Hillsdale, NJ: Erlbaum.
Google Scholar
Givn, T. (1995). Coherence in the text and coherence in the mind. In Gernsbacher, M.A. & Givn, T., Coherence in spontaneous text. (pp. 59–115). Amsterdam/Philadelphia, John Benjamins.
Google Scholar
Graesser, A.C. (1993). Inference generation during text comprehension. Discourse Processes, 16, 1–2.
Google Scholar
Graesser, A.C., Singer, M., & Trabasso, T. (1994). Constructing inferences during narrative text comprehension. Psychological Review, 101, 371–95.
Article Google Scholar
Graesser, A.C., McNamara, D., Louwerse, M., & Cai, Z. (2004). Coh-Metrix: CohMetrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers, 36, 193–202.
Google Scholar
Hearst, M.A. (1994) Multi-paragraph Segmentation of Expository Text. Proceedings of the Association of Computational Linguistics, Las Cruces, NM.
Google Scholar
Hobbs, J.R. (1985). On the coherence and structure of discourse. CSLI Technical Report, 85–37. Stanford, CA.
Google Scholar
Hovy, E. (1990). Parsimonious and profligate approaches to the question of discourse structure relations. Proceedings of the Fifth International Workshop on Natural Language generation, East Stroudsburg, PA, Association for Computational Linguistics.
Google Scholar
Karlsgren J. & Cutting, D. (1994). Recognizing text genres with simple metrics using discriminant analysis. International Conference on Computational Linguistics Proceedings of the 15th conference on Computational linguistics-Volume 2 (pp. 1071–1075). Kyoto, Japan.
Google Scholar
Kessler, Nunberg, G., & Schutze, H. (1997). Automatic detection of text genre. In Proceedings of 35th Annual Meeting of Association for Computational Linguistics, and in 8th Conference of European Chapter of Association for Computational Linguistics (pp. 32–38). Madrid, Spain.
Google Scholar
Kintsch, W. & Bowles, A. (2002) Metaphor comprehension: What makes a metaphor difficult to understand? Metaphor and Symbol, 2002, 17, 249–262.
Google Scholar
Kintsch, E., Steinhart, D., Stahl, G., LSA Research Group, Matthews, C., & Lamb, R. (2000). Developing summarization skills through the use of LSAbased feedback. Interactive Learning Environments 8, 87–109.
Article Google Scholar
Kintsch, W., & van Dijk, T.A. (1978). Toward a model of text comprehension and production. Psychological Review, 85, 363–394.
Article Google Scholar
Labov, W. (1972). The Transformation of Experience in Narrative Syntax, In W. Labov (ed.), Language in the Inner City, 1972, University of Pennsylvania Press, Philadelphia.
Google Scholar
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.
Article Google Scholar
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259–284.
Google Scholar
Lehman, S., & Schraw, G. (2002). Effects of coherence and relevance on shallow and deep text processing. Journal of Educational Psychology, 94, 738–750.
Article Google Scholar
Linderholm, T., Everson, M.G., van den Broek, Mischinski, M., Crittenden, A., & Samuels, J. (2000). Effects of causal text revisions on more and less skilled readers comprehension of easy and difficult text. Cognition and Instruction, 18, 525–556.
Article Google Scholar
Louwerse, M.M. (2002). Computational retrieval of themes. In M.M. Louwerse & W. van Peer (Eds.), Thematics: Interdisciplinary Studies (pp. 189–212). Amsterdam/Philadelphia: John Benjamins.
Google Scholar
Louwerse, M. M., McCarthy, P. M., McNamara, D. S., & Graesser, A. C. (2004). Variation in language and cohesion across written and spoken registers. In K. Forbus, D. Gentner, & T. Regier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society (pp. 843–848). Mahwah, NJ: Erlbaum.
Google Scholar
Loxterman, J.A., Beck, I. L., & McKeown, M.G. (1994). The effects of thinking aloud during reading on students’ comprehension of more or less coherent text. Reading Research Quarterly, 29, 353–367.
Article Google Scholar
Mani, I. & Pustejovsky, J. (2004). Temporal discourse markers for narrative structures. ACL Workshop on Discourse Annotation, Barcelona, Spain. East Stoudsburg, PA, Association for Computational Linguistics.
Google Scholar
Mann, W. C. & Thompson, S. A. (1988). Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8 (3). 243–281
Google Scholar
McCarthy, P.M., Lightman, E.J., Dufty, D.F. & McNamara (in press). Using Coh-Metrix to assess distributions of cohesion and difficulty in high-school textbooks. Proceedings of the 28th annual conference of the Cognitive Science Society.
Google Scholar
McCarthy, P.M., Lewis, G.A., Dufty, D.F., & McNamara, D.S. (2006). Analyzing Writing Styles with Coh-Metrix. 19th International FLAIRS Conference 2006.
Google Scholar
McNamara, D.S., Kintsch, E., Songer, N.B., & Kintsch, W. (1996). Are good texts always better? Text coherence, background knowledge, and levels of understanding in learning from text. Cognition and Instruction, 14, 1–43.
Article Google Scholar
McNamara, D. S. (2001). Reading both high and low coherence texts: Effects of text sequence and prior knowledge. Canadian Journal of Experimental Psychology, 55, 51–62.
Google Scholar
Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Girju, R., Goodrum, R., & Rus, V. (2000): The Structure and Performance of an Open-Domain Question Answering System, in Proceedings of ACL 2000, Hong Kong, October
Google Scholar
Morris, J., Hirst, G. (1991) Lexical cohesion computed by thesaural relations as an indicator of the structure of text, Computational Linquistics, 17, 21–48.
Google Scholar
Ozuru, Y., Dempsey, K., Sayroo, J., & McNamara, D. S. (2005). Effects of text cohesion on comprehension of biology texts. Proceedings of the 27th Annual Meeting of the Cognitive Science Society (pp. 1696–1701). Hillsdale, NJ: Erlbaum.
Google Scholar
Propp, V. (1968). Morphology of the folk tale. Baltimore: Port City Press, pp 19–65.
Google Scholar
Ratnaparkhi, A. (1996), A maximum entropy model for part-of-speech tagging. Proceedings of Conference on Empirical Methods in Natural Language Processing, University of Pennsylvania.
Google Scholar
Stamatatos, E., Fakotatos, N., & Kokkinakis, G. (2001). Computer-based authorship attribution without lexical measures. Computers and the Humanities, 35, 193–214.
Article Google Scholar
Trabasso, T., & van den Broek, P. (1985). Causal thinking and the representation of narrative events. Journal of Memory and Language, 24, 612–630.
Article Google Scholar
Voorhees, E. M. & Tice, D.M. (2000). Building a question answering test collection. Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Google Scholar
Wolfe, M. B., Schreiner, M. E., Rehder, B., Laham, D., Foltz, P. W., Kintsch, W., & Landauer, T. K. (1998). Learning from text: Matching readers and text by Latent Semantic Analysis. Discourse Processes, 25, 309–336.
Article Google Scholar
Wolfe, M. B.W., & Goldman S.R. (2003). Use of latent semantic analysis for predicting psychological phenomena: Two issues and proposed solutions. Behavior Research Methods, Instruments, & Computers, 35, 22–31.
Google Scholar
Zwaan, R.A.(1996). Processing narrative time shifts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 1196–1207.
Article Google Scholar
Zwaan, R.A. & Radvansky, G.A. (1998). Situation models in language comprehension and Memory. Psychological Bulletin, 123, 162–185.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, Institute for Intelligent Systems, University of Memphis, Memphis, TN, 38152, USA
Philip M. McCarthy, Stephen W. Briner, Vasile Rus & Danielle S. McNamara

Authors

Philip M. McCarthy
View author publications
You can also search for this author in PubMed Google Scholar
Stephen W. Briner
View author publications
You can also search for this author in PubMed Google Scholar
Vasile Rus
View author publications
You can also search for this author in PubMed Google Scholar
Danielle S. McNamara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Bellevue, WA, 98008, USA
Anne Kao BA, MA, MS, PhD & Stephen R. Poteet BA, MA, CPhil &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

McCarthy, P.M., Briner, S.W., Rus, V., McNamara, D.S. (2007). Textual Signatures: Identifying Text-Types Using Latent Semantic Analysis to Measure the Cohesion of Text Structures. In: Kao, A., Poteet, S.R. (eds) Natural Language Processing and Text Mining. Springer, London. https://doi.org/10.1007/978-1-84628-754-1_7

Download citation

DOI: https://doi.org/10.1007/978-1-84628-754-1_7
Publisher Name: Springer, London
Print ISBN: 978-1-84628-175-4
Online ISBN: 978-1-84628-754-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics