Skip to main content

Textual Signatures: Identifying Text-Types Using Latent Semantic Analysis to Measure the Cohesion of Text Structures

  • Chapter

Abstract

Just as a sentence is far more than a mere concatenation of words, a text is far more than a mere concatenation of sentences. Texts contain pertinent information that co-refers across sentences and paragraphs [30]; texts contain relations between phrases, clauses, and sentences that are often causally linked [21], [51], [56]; and texts that depend on relating a series of chronological events contain temporal features that help the reader to build a coherent representation of the text [19], [55]. We refer to textual features such as these as cohesive elements, and they occur within paragraphs (locally), across paragraphs (globally), and in forms such as referential, causal, temporal, and structural [18], [22], [36]. But cohesive elements, and by consequence cohesion, does not simply feature in a text as dialogues tend to feature in narratives, or as cartoons tend to feature in newspapers. That is, cohesion is not present or absent in a binary or optional sense. Instead, cohesion in text exists on a continuum of presence, which is sometimes indicative of the text-type in question [12], [37], [41] and sometimes indicative of the audience for which the text was written [44], [47]. In this chapter, we discuss the nature and importance of cohesion; we demonstrate a computational tool that measures cohesion; and, most importantly, we demonstrate a novel approach to identifying text-types by incorporating contrasting rates of cohesion.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   149.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   159.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Best, R.M., Floyd, R.G., & McNamra, D.S. (2004). Understanding the fourthgrade slump: Comprehension difficulties as a function of reader aptitudes and text genre. Paper presented at the 85th Annual Meeting of the American Educational Research Association.

    Google Scholar 

  2. Biber, D. (1987). A textual comparison of British and American writing. American Speech, 62, 99–119.

    Article  Google Scholar 

  3. Biber, D. (1988). Linguistic features: algorithms and functions in variation across speech and writing. Cambridge: Cambridge University Press.

    Google Scholar 

  4. Brill, E. (1995). Unsupervised learning of disambiguation rules for part of speech tagging. In Proceedings of the Third Workshop on Very Large Corpora, Cambridge, MA.

    Google Scholar 

  5. Britton, B. K., & Gulgoz, S. (1991). Using Kintschs computational model to improve instructional text: Effects of inference calls on recall and cognitive structures. Journal of Educational Psychology, 83, 329–345

    Article  Google Scholar 

  6. Burrows, J. (1987). Word-patterns and story-shapes: The statistical analysis of narrative style. Literary and Linguistic Computing, 2, 6170.

    Article  Google Scholar 

  7. Charniak, E. (1997) Statistical Parsing with a context-free grammar and word statistics Proceedings of the Fourteenth National Conference on Artificial Intelligence, Menlo Park: AAAI/MIT Press

    Google Scholar 

  8. Charniak, E. (2000) A Maximum-Entropy-Inspired Parser. Proceedings of the North-American Chapter of Association for Computational Linguistics, Seattle, WA

    Google Scholar 

  9. Charniak, E. & Johnson, M. (2005) Coarse-to-fine n-best parsing and Max-Ent discriminative reranking. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (pp. 173–180). Ann Arbor, MI

    Google Scholar 

  10. Collins, M. (1996) A New Statistical Parser Based on Bigram Lexical Dependencies. Proceedings of the 34th Annual Meeting of the ACL, Santa Cruz, CA

    Google Scholar 

  11. Collins, M. (1997) Three Generative, Lexicalised Models for Statistical Parsing Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, Madrid, Spain.

    Google Scholar 

  12. Crossley, S., Louwerse, M.M., McCarthy, P.M., & McNamara, D.S. (forthcoming 2007). A linguistic analysis of simplified and authentic texts. Modern Language Journal, 91, (2).

    Google Scholar 

  13. Dennis, S., Landauer, T., Kintsch, W. & Quesada, J. (2003). Introduction to Latent Semantic Analysis. Slides from the tutorial given at the 25th Annual Meeting of the Cognitive Science Society, Boston.

    Google Scholar 

  14. Duran, N., McCarthy, P.M., Graesser, A.C., McNamara, D.S., (2006). An empirical study of temporal indices. Proceedings of the 28th annual conference of the Cognitive Science Society, 2006.

    Google Scholar 

  15. Foltz, P. W., Britt, M. A., & Perfetti, C. A. (1996). Reasoning from multiple texts: An automatic analysis of readers’ situation models. In G. W. Cottrell (Ed.) Proceedings of the 18th Annual Cognitive Science Conference (pp. 110–115). Lawrence Erlbaum, NJ.

    Google Scholar 

  16. Foltz, P. W., Kintsch, W., & Landauer, T. K. (1998). The measurement of textual Coherence with Latent Semantic Analysis. Discourse Processes, 25, 285–307.

    Google Scholar 

  17. Foltz, P. W., Gilliam, S., & Kendall, S. (2000). Supporting content-based feedback in on-line writing evaluation with LSA. Interactive Learning Environments, 8, 111–127.

    Article  Google Scholar 

  18. Gernsbacher, M.A. (1990). Language comprehension as structure building. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  19. Givn, T. (1995). Coherence in the text and coherence in the mind. In Gernsbacher, M.A. & Givn, T., Coherence in spontaneous text. (pp. 59–115). Amsterdam/Philadelphia, John Benjamins.

    Google Scholar 

  20. Graesser, A.C. (1993). Inference generation during text comprehension. Discourse Processes, 16, 1–2.

    Google Scholar 

  21. Graesser, A.C., Singer, M., & Trabasso, T. (1994). Constructing inferences during narrative text comprehension. Psychological Review, 101, 371–95.

    Article  Google Scholar 

  22. Graesser, A.C., McNamara, D., Louwerse, M., & Cai, Z. (2004). Coh-Metrix: CohMetrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers, 36, 193–202.

    Google Scholar 

  23. Hearst, M.A. (1994) Multi-paragraph Segmentation of Expository Text. Proceedings of the Association of Computational Linguistics, Las Cruces, NM.

    Google Scholar 

  24. Hobbs, J.R. (1985). On the coherence and structure of discourse. CSLI Technical Report, 85–37. Stanford, CA.

    Google Scholar 

  25. Hovy, E. (1990). Parsimonious and profligate approaches to the question of discourse structure relations. Proceedings of the Fifth International Workshop on Natural Language generation, East Stroudsburg, PA, Association for Computational Linguistics.

    Google Scholar 

  26. Karlsgren J. & Cutting, D. (1994). Recognizing text genres with simple metrics using discriminant analysis. International Conference on Computational Linguistics Proceedings of the 15th conference on Computational linguistics-Volume 2 (pp. 1071–1075). Kyoto, Japan.

    Google Scholar 

  27. Kessler, Nunberg, G., & Schutze, H. (1997). Automatic detection of text genre. In Proceedings of 35th Annual Meeting of Association for Computational Linguistics, and in 8th Conference of European Chapter of Association for Computational Linguistics (pp. 32–38). Madrid, Spain.

    Google Scholar 

  28. Kintsch, W. & Bowles, A. (2002) Metaphor comprehension: What makes a metaphor difficult to understand? Metaphor and Symbol, 2002, 17, 249–262.

    Google Scholar 

  29. Kintsch, E., Steinhart, D., Stahl, G., LSA Research Group, Matthews, C., & Lamb, R. (2000). Developing summarization skills through the use of LSAbased feedback. Interactive Learning Environments 8, 87–109.

    Article  Google Scholar 

  30. Kintsch, W., & van Dijk, T.A. (1978). Toward a model of text comprehension and production. Psychological Review, 85, 363–394.

    Article  Google Scholar 

  31. Labov, W. (1972). The Transformation of Experience in Narrative Syntax, In W. Labov (ed.), Language in the Inner City, 1972, University of Pennsylvania Press, Philadelphia.

    Google Scholar 

  32. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.

    Article  Google Scholar 

  33. Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259–284.

    Google Scholar 

  34. Lehman, S., & Schraw, G. (2002). Effects of coherence and relevance on shallow and deep text processing. Journal of Educational Psychology, 94, 738–750.

    Article  Google Scholar 

  35. Linderholm, T., Everson, M.G., van den Broek, Mischinski, M., Crittenden, A., & Samuels, J. (2000). Effects of causal text revisions on more and less skilled readers comprehension of easy and difficult text. Cognition and Instruction, 18, 525–556.

    Article  Google Scholar 

  36. Louwerse, M.M. (2002). Computational retrieval of themes. In M.M. Louwerse & W. van Peer (Eds.), Thematics: Interdisciplinary Studies (pp. 189–212). Amsterdam/Philadelphia: John Benjamins.

    Google Scholar 

  37. Louwerse, M. M., McCarthy, P. M., McNamara, D. S., & Graesser, A. C. (2004). Variation in language and cohesion across written and spoken registers. In K. Forbus, D. Gentner, & T. Regier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society (pp. 843–848). Mahwah, NJ: Erlbaum.

    Google Scholar 

  38. Loxterman, J.A., Beck, I. L., & McKeown, M.G. (1994). The effects of thinking aloud during reading on students’ comprehension of more or less coherent text. Reading Research Quarterly, 29, 353–367.

    Article  Google Scholar 

  39. Mani, I. & Pustejovsky, J. (2004). Temporal discourse markers for narrative structures. ACL Workshop on Discourse Annotation, Barcelona, Spain. East Stoudsburg, PA, Association for Computational Linguistics.

    Google Scholar 

  40. Mann, W. C. & Thompson, S. A. (1988). Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8 (3). 243–281

    Google Scholar 

  41. McCarthy, P.M., Lightman, E.J., Dufty, D.F. & McNamara (in press). Using Coh-Metrix to assess distributions of cohesion and difficulty in high-school textbooks. Proceedings of the 28th annual conference of the Cognitive Science Society.

    Google Scholar 

  42. McCarthy, P.M., Lewis, G.A., Dufty, D.F., & McNamara, D.S. (2006). Analyzing Writing Styles with Coh-Metrix. 19th International FLAIRS Conference 2006.

    Google Scholar 

  43. McNamara, D.S., Kintsch, E., Songer, N.B., & Kintsch, W. (1996). Are good texts always better? Text coherence, background knowledge, and levels of understanding in learning from text. Cognition and Instruction, 14, 1–43.

    Article  Google Scholar 

  44. McNamara, D. S. (2001). Reading both high and low coherence texts: Effects of text sequence and prior knowledge. Canadian Journal of Experimental Psychology, 55, 51–62.

    Google Scholar 

  45. Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Girju, R., Goodrum, R., & Rus, V. (2000): The Structure and Performance of an Open-Domain Question Answering System, in Proceedings of ACL 2000, Hong Kong, October

    Google Scholar 

  46. Morris, J., Hirst, G. (1991) Lexical cohesion computed by thesaural relations as an indicator of the structure of text, Computational Linquistics, 17, 21–48.

    Google Scholar 

  47. Ozuru, Y., Dempsey, K., Sayroo, J., & McNamara, D. S. (2005). Effects of text cohesion on comprehension of biology texts. Proceedings of the 27th Annual Meeting of the Cognitive Science Society (pp. 1696–1701). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  48. Propp, V. (1968). Morphology of the folk tale. Baltimore: Port City Press, pp 19–65.

    Google Scholar 

  49. Ratnaparkhi, A. (1996), A maximum entropy model for part-of-speech tagging. Proceedings of Conference on Empirical Methods in Natural Language Processing, University of Pennsylvania.

    Google Scholar 

  50. Stamatatos, E., Fakotatos, N., & Kokkinakis, G. (2001). Computer-based authorship attribution without lexical measures. Computers and the Humanities, 35, 193–214.

    Article  Google Scholar 

  51. Trabasso, T., & van den Broek, P. (1985). Causal thinking and the representation of narrative events. Journal of Memory and Language, 24, 612–630.

    Article  Google Scholar 

  52. Voorhees, E. M. & Tice, D.M. (2000). Building a question answering test collection. Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

    Google Scholar 

  53. Wolfe, M. B., Schreiner, M. E., Rehder, B., Laham, D., Foltz, P. W., Kintsch, W., & Landauer, T. K. (1998). Learning from text: Matching readers and text by Latent Semantic Analysis. Discourse Processes, 25, 309–336.

    Article  Google Scholar 

  54. Wolfe, M. B.W., & Goldman S.R. (2003). Use of latent semantic analysis for predicting psychological phenomena: Two issues and proposed solutions. Behavior Research Methods, Instruments, & Computers, 35, 22–31.

    Google Scholar 

  55. Zwaan, R.A.(1996). Processing narrative time shifts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 1196–1207.

    Article  Google Scholar 

  56. Zwaan, R.A. & Radvansky, G.A. (1998). Situation models in language comprehension and Memory. Psychological Bulletin, 123, 162–185.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag London Limited

About this chapter

Cite this chapter

McCarthy, P.M., Briner, S.W., Rus, V., McNamara, D.S. (2007). Textual Signatures: Identifying Text-Types Using Latent Semantic Analysis to Measure the Cohesion of Text Structures. In: Kao, A., Poteet, S.R. (eds) Natural Language Processing and Text Mining. Springer, London. https://doi.org/10.1007/978-1-84628-754-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-84628-754-1_7

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84628-175-4

  • Online ISBN: 978-1-84628-754-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics