Skip to main content

Alzheimer’s Disease and Speech Background

  • Chapter
  • First Online:
The Art and Science of Machine Intelligence

Abstract

Developing a possible diagnostic test for Alzheimer’s disease (AD) based on speech is used throughout this book to illustrate the application of the machine learning methods we describe. This application has many of the characteristics typical of such tasks: one has some idea that a set of features characterizing samples that one possesses might be able to classify the objects into two or more classes. Lacking sufficient depth of understanding of how this might be caused, one goes searching for useful patterns in the data—a fishing expedition.

This chapter provides background material on AD for the reader interested in how it is defined, what is known about its underlying pathology, how it is usually diagnosed in the clinic, and some of its known effects on speech. We then quickly summarize previous attempts at this task, older ones doing linguistic operations largely by hand, and more current attempts using computers. We then provide details of the speech samples we have collected, and how an array of speech features was extracted from them fully automatically, including speech to text with punctuation. Brief comments are included on issues related to experimental design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Zoom H1 Handy, 4-4-3 Recorder, Zoom Corp Kanda-surugadai, Chiyoda-ku, Tokyo, 1001-0062 Japan.

  2. 2.

    We gratefully acknowledge the generosity of the Roark team in sharing their software early in our research.

  3. 3.

    DementiaBank 140 plus our own 72.

  4. 4.

    Demographics (age, gender, race, years of education) plus MMSE, plus approximately 230 speech features (we discarded features with little or no variations among subjects).

Abbreviations

ABeta:

Amyloid beta protein fragment

ACADIE:

Atlantic Canada Alzheimer’s Disease Investigation of Expectations

AD:

Alzheimer’s disease

ADAS-Cog:

Alzheimer’s Disease Assessment Scale-Cognitive

APOE4:

Apolipoprotein allele variant e4

ASR:

Automatic speech recognition

BDAE:

Boston Diagnostic Aphasia Examination

BN:

Bayesian network

CSF:

Cerebrospinal fluid

FTD:

Frontotemporal dementia

kDa:

Kilo daltons

LOO:

Leave one out cross-validation

MMSE:

Mini Mental State Exam

MRI:

Magnetic resonance imaging

NL:

Normal control subjects

PET:

Positron emission tomography

POS:

Part of speech

SUNY:

State University of New York

TDP-43:

Transactive response DNA binding protein 43 kDa

TTR:

Type token ratio

US:

United States

WAB:

Western Aphasia Battery

References

  • Adams DR, Kern DW, Wroblewski KE, McClintock MK, Dale W, Pinto JM (2018) Olfactory dysfunction predicts subsequent dementia in older U.S. adults. J Am Geriatr Soc 66:140–144

    Article  Google Scholar 

  • Allison RS (1962) The senile brain. Edward Arnold, London

    Google Scholar 

  • Alzheimer’s Association (2017 ) Alzheimer’s disease facts and figures

    Google Scholar 

  • Audacity software (n.d.) download available. http://www.audacityteam.org/

  • Björnsson CH (1968) Läsbarhet. Liber, Stockholm

    Google Scholar 

  • Louis Bologna, Anthony Padua, Mao Mao, Xiao Chen (2013) Early detection of Alzheimer’s by analyzing speech patter anomalies, Senior Capstone Project, Binghamton University Computer and Electrical Engineering Department

    Google Scholar 

  • Brown C, Kemper S, Herman R, Covington M (2008) Automatic measurement of propositional density from part-of-speech tagging. Behav Res Methods 40:540–545

    Article  Google Scholar 

  • Brunet E (1978) Vocabulaire de Jean Giraudoux: Structure et Évolution. Slatkine, Geneva

    Google Scholar 

  • Charniak E (2000) A maximum-entropy-inspired parser. In: Proceedings of the 1st North American chapter of the Association for Computational Linguistics Conference, pp 132–139

    Google Scholar 

  • Coleman M, Liau TL (1975) A computer readability formula designed for machine scoring. J Appl Psychol 60:283–284

    Article  Google Scholar 

  • Dementia Bank (n.d.) [https://talkbank.org/DementiaBank/ last visited 2018-03-26]

  • Dubois B, Padovani A, Scheltens P, Rossi A, Dell’Agnello G (2016) Timely diagnosis for Alzheimer’s disease: a literature review on benefits and challenges. J Alzheimers Dis 49:617–631

    Article  Google Scholar 

  • Flesch R (1948) A new readability yardstick. J Appl Psychol 32(3):221–233

    Article  Google Scholar 

  • Folstein MF, Folstein SE, McHugh PR (1975) “Mini-mental status”. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res 12(3):189–198

    Article  Google Scholar 

  • Forbes-McKay KE, Venneri A (2005) Detecting subtle spontaneous language decline in early Alzheimer’s disease with a picture description task. Neurol Sci 26:243–254

    Article  Google Scholar 

  • Fraser KC, Meltzer JA, Rudzicz F (2016) Linguistic features identify Alzheimer’s disease in narrative speech. J Alzheimers Dis 49(2):407–422

    Article  Google Scholar 

  • Frazier L (1985) Syntactic complexity. Cambridge University Press, Cambridge, pp 129–189

    Google Scholar 

  • Fukui T, Yamazaki T, Kinno R (2011) The “head turning sign” can be a clinical marker of Alzheimer’s disease? Dement Geriatr Cogn Disord Extra 1:310–317

    Article  Google Scholar 

  • Goodglass H, Barresi B, Kaplan E (1983) The Boston diagnostic aphasia examination. Lippincott Williams and Wilkins, Philadelphia

    Google Scholar 

  • Gunning R (1952) The technique of clear writing. McGraw-Hill, New York, pp 36–37

    Google Scholar 

  • Herbert LE, Weuve J, Scherr PA, Evans DA (2013) Alzheimer disease in the United States (2010–2050) estimated using the 2010 census. Neurology 80:1778–1783

    Article  Google Scholar 

  • Hewlett S (2007) Emotion detection from speech. Stanford University, TechRpt. [http://cs229.stanford.edu/proj2007/ShahHewlett%20-%20Emotion%20Detection%20from%20Speech.pdf]

  • Honoré A (1979) Some simple measures of richness of vocabulary. Assoc Lit Linguist Comput Bull 7(2):172–177

    Google Scholar 

  • Hux K, Wallace SE, Evans K, Snell J (2008) Performing cookie theft picture content analyses to delineate cognitive-communication impairments. J Med Speech Lang Pathol 16(2):83–99

    Google Scholar 

  • Hyman BT, Phelps CH, Beach TG, Bigio EH, Cairns NJ, Carrillo MC, Dickson DW, Duyckaerts C, Frosch MP, Masliah E, Mirra SS, Nelson PT, Schneider JA, Thal DR, Thies B, Trojanowski JQ, Vinters HV, Montine TJ (2012) National Institute on Aging-Alzheimer’s Association guidelines for the neuropathologic assessment of Alzheimer’s disease. Alzheimers Dement 8(1):1–13

    Article  Google Scholar 

  • Kertesz A (1982) The western aphasia battery

    Google Scholar 

  • Kertesz A (2004) Language in Alzheimer’s disease, cognitive neuropsychology of alzheimer’s disease. Oxford University Press, Oxford, pp 197–210

    Google Scholar 

  • Kim H (2013) The Clockme System: computer-assisted screening tool for dementia. PhD thesis, Georgia Institute of Technical

    Google Scholar 

  • Kincaid JP, Fishburne RP, Rogers RL, and Chissom BS (1975) Derivation of new readability formulas for Navy enlisted personnel. Research Branch Report 8-75, Chief of Naval Technical Training: Naval Air Station Memphis

    Google Scholar 

  • Konig AS, Sorin A, Hoory R, Toledo-Ronen O, Derreumaux A, Manera V, Verhey F, Aalten P, Robert PH, David R (2015) Automatic speech analysis for the assessment of patients with predementia and Alzheimer’s disease. Alzheimers Dement 1:112–124

    Google Scholar 

  • López-de-Ipiña K, Ecay M, Solé-Casals J, Ezeiza A, Barroso N, Martinez-Lage P, Beitia B (2015) Feature selection for spontaneous speech analysis to aid in Alzheimer’s disease diagnosis: a fractal dimension approach. Comput Speech Lang 30(1):43–60

    Article  Google Scholar 

  • MacWhinney B (2000) The CHILDES Project: tools for analyzing talk, 3rd edn. Lawrence Erlbaum Associates, Mahwah

    Google Scholar 

  • Marcus M, Santorini B, Marcinkiewicz M (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19(2):313–330

    Google Scholar 

  • McLaughlin GH (1969) SMOG grading—a new readability formula. J Reading 12(8):639–646

    Google Scholar 

  • Mielke MM, Vemuri P, Rocca WA (2014) Clinical epidemiology of Alzheimer’s disease: assessing sex and gender differences. Clin Epidemiol 6:37–48

    Article  Google Scholar 

  • Mirheidari B, Blackburn D, Reuber M, Walker T, Christensen H (2016) Diagnosing people with dementia using automatic conversation analysis. In: Proceedings of interspeech, pp 1220–1224

    Google Scholar 

  • Nagashima K, Wu J, Takahashi S (2011) Human characteristics of sound localization under masking for the early detection of dementia. In: Early detection and rehabilitation technologies for dementia: neuroscience and biomedical applications. Medical Information Science Reference, Hershey, pp 65–71

    Chapter  Google Scholar 

  • Nespoulous JL, Lecours AR, Lafond D, Lemay A, Puel M, Joanette Y, et al (1992) Protocole Montréal-Toulouse. Examen de l’aphasie. Version Béta modifiée, Éditions Ortho

    Google Scholar 

  • Orimaye SO, Wong JS, Golden KJ (2014) Learning predictive linguistic features for Alzheimer’s disease and related dementias using verbal utterances. In: Workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, Baltimore

    Google Scholar 

  • Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric properties of LIWC2015. University of Texas at Austin, Austin

    Google Scholar 

  • Rentoumi V, Raoufian L, Ahmed S, de Jager CA, Garrard P (2014) Features and machine learning classification of connected speech samples from patients with autopsy proven Alzheimer’s disease with and without additional vascular pathology. J Alzheimers Dis 42:S3–S17

    Article  Google Scholar 

  • Roark B, Mitchell M, Hosom J, Hollingshead K, Kaye J (2011) Spoken language derived measures for detecting mild cognitive impairment. IEEE Trans Audio Speech Lang Process 19(7):2081–2090

    Article  Google Scholar 

  • Rockwood K, Graham J, Fay S (2002) Goal setting and attainment in Alzheimer’s disease patients treated with donepezil. J Neurol Neurosurg Psychiatry 73:500–507

    Article  Google Scholar 

  • Rosen WG, Mohs RC, Davis KL (1984) A new rating scale for Alzheimer’s disease. Am J Psychiatry 141(11):1356–1364

    Article  Google Scholar 

  • Rosenberg S, Abbeduto L (1987) Indicators of linguistic competence in the peer group conversational behavior of mildly retarded adults. Appl Psycholinguist 8:19–32

    Article  Google Scholar 

  • Sadeghian R, David Schaffer J, Zahorian SA (2017) Speech processing approach for diagnosing dementia in an early stage. In: Interspeech conference 2017, Stockholm, Sweden

    Google Scholar 

  • Sakhnov K, Verteletskaya E, Simak B (2009) Dynamical energy-based speech/silence detector for speech enhancement applications. In: Proceedings of the world congress on engineering, vol 1

    Google Scholar 

  • Savica R, Wennberg AM, Hagen C, Edwards K, Roberts RO, Hollman JH, Knopman DS, Boeve BF, Machulda MM, Petersen RC, Mielke MM (2017) Comparison of gait parameters for predicting cognitive decline: the Mayo Clinic study of aging. J Alzheimers Dis 55(2):559–567

    Article  Google Scholar 

  • Semenza C, Cipolotti L (1989) Neuropsicologia con carta e matita. Cleup Editrice Padova, Padova

    Google Scholar 

  • Senter RJ, Smith EA (1967) Automated readability index. Wright-Patterson Air Force Base, AMRL-TR-6620, p iii

    Google Scholar 

  • Sharp EM, Gatz M (2011) The relationship between education and dementia an updated systematic review. Alzheimer Dis Assoc Disord 25(4):289–304

    Article  Google Scholar 

  • Sichel HS (1975) On a distribution law for word frequencies. J Am Stat Assoc 70:542–547

    Google Scholar 

  • Simpson EH (1949) Measurement of diversity. Nature:163–168

    Google Scholar 

  • Sohn J, Kim NS, Sung W (1999) A statistical model-based voice activity detection. IEEE Signal Process Lett 6(1):1–3

    Article  Google Scholar 

  • Snowdon DA, Kemper SJ, Mortimer JA, Greiner LH, Wekstein DR, Markesbery WR (1996) Linguistic ability in early life and cognitive function and Alzheimer’s disease in late life. JAMA 275(7):528–532

    Article  Google Scholar 

  • Snowdon D (2002) Aging with grace what the nun study teaches us about leading longer, healthier, and more meaningful lives, Bantam

    Google Scholar 

  • Stelzmann RA, Norman Schnitzlein H, Reed Murtagh F (1995) An English translation of Alzheimer’s 1907 paper, “ijber eine eigenartige Erlranliung der Hirnrinde”. Clin Anat 8:429–431. http://info-centre.jenage.de/assets/pdfs/library/stelzmann_et_al_alzheimer_CLIN_ANAT_1995.pdf [last accessed 2018-03-19]

    Article  Google Scholar 

  • Swinburn K, Porter G, Howard D (2004) Comprehensive aphasia test. Psychology Press, Hove

    Google Scholar 

  • Thomas C, Keselj V, Cercone N, Rockwood K, Asp E (2005) Automatic detection and rating of dementia of Alzheimer type through lexical analysis of spontaneous speech. In: Proceedings of the IEEE international conference on mechatronics and automation, pp 1569–1574

    Google Scholar 

  • Turner A, Greene E (1977) The construction and use of a propositional text base. University of Colorado Psychology Department, University of Colorado Boulder. [http://www.colorado.edu/ics/sites/default/files/attached-files/77-63.pdf last visited 2018-01-11]

  • Verghese J, Lipton RB, Hall CB, Kuslansky G, Katz MJ, Buschke H (2002) Abnormality of gait as a predictor of non-Alzheimer’s dementia. N Engl J Med 347(22):1761–1768

    Article  Google Scholar 

  • Wankerl S, Noeth E, Evert A (2017) An N-gram based approach to the automatic diagnosis of Alzheimer’s disease from spoken language, interspeech conference 2017. Stockholm, Sweden

    Google Scholar 

  • Wilson RS, Yu L, Lamar M, Schneider JA, Boyle PA, Bennett DA (2019) Education and cognitive reserve in old age. Neurology 92(10):e1041–e1050

    Article  Google Scholar 

  • Weiner J, Engelbart M, Schultz T (2017) Manual and automatic transcriptions in dementia detection from speech, interspeech conference 2017, Stockholm, Sweden

    Google Scholar 

  • Yngve V (1960) A model and an hypothesis for language structure. Proc Am Philos Soc 104:444–466

    MathSciNet  Google Scholar 

  • Yule GU (1944) The statistical study of literary vocabulary. Cambridge University Press, Cambridge

    Google Scholar 

  • Zahorian SA, Hu H (2008) A spectral/temporal method for robust fundamental frequency tracking. J Acoust Soc Am 123(6):4559–4571

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendix: Features Extracted from Each Voice Sample

Appendix: Features Extracted from Each Voice Sample

Here, we provide some details of the processing of each speech sample. Figure 4.3 gives an overview of the processing chain.

Table 4.2 lists the features extracted from the speech signal: wav format file with 16 bits sampled at 16 kHz. We implemented three approaches to breaking up the signal into periods of speaking and intervening pauses. One approach was based on detecting the pitch of the speaker’s voice using an approach called YAAPT (Yet Another Algorithm for Pitch Tracking) (Zahorian and Hu 2008). This produces results as illustrated in Fig. 4.3. A second approach, using energy in the signal and setting a threshold to separate speech from pause, was based on work by Sakhnov et al. (2009). A third approach used the Voice Activity Detector (VAD) of Sohn et al. (1999). Within each approach, a pause detector decided that any un-voiced period longer than 200 ms was a pause, and voiced segments had to be longer than 50 ms. In the literature on pauses, one may find thresholds ranging from 100 ms to 1 s. Our thresholds were set after some experimentation comparing manually determined pause segmented samples with the algorithmic outputs (Bologna et al. 2013).

In addition, we also sought a method for assessing emotion or affect that may be diminished in AD. Using the data from the pitch-based pause detector, and following the approach of Hewlett (2007), we calculated the mean, median, variance, min, and max of the pitch in each sample. We hypothesized that a low-pitch variance might capture a blunted affect. We also captured a speech rate measure from the pitch-based signal with the pauses removed (i.e. a shorter utterance length). These are also listed in Table 4.2.

Next, we describe the syntactic complexity features. These are computed using software generously shared by Roark et al. (2011). Two methods are used: Frazier (1985) and Yngve (1960). Both methods begin with a parse tree as produced by the Charniak parser (Charniak 2000). Parse trees indicate the syntactic structure of a sentence or phrase. They are made up of nodes and branches, with each node representing a grammatical category that is present in the sentence. Branches connect parent nodes and child nodes; child nodes are embedded within the grammatical structure that the parent node indicates. In Frazier scoring, each non-terminal node (any node that is not the end of a word’s grammatical structure) is given a score of 1. In some cases, such as sentence nodes, a score of 1.5 is given (Roark et al. 2011). Each word’s score is obtained by summing up the scores that are covered while tracing upward from the word to the top of the tree (Fig. 4.6). In Yngve scoring, each right-most branching node of the parse tree is given a score of 0, and each consecutive left branch has a 1 added to its score (Fig. 4.6). The score for a word is obtained by summing up scores as you move up the tree from a given word. Note that this example has no punctuation. The parser adds more nodes for punctuation, so the Frazier and Yngve scores will change. Our speech-to-text software inserts punctuation.

Fig. 4.6
figure 6

Example of a parse tree showing Frazier (dashed lines) and Yngve scoring (numbers on each branch)

Another complexity measure was proposed by Rosenberg and Abbeduto (1987) called Developmental Level. The score on Developmental Level, a scale of speech complexity, ranges from 0 to 7 points. A score of 0 is given to a simple sentence with one clause. A score of 7 is given to a “complex” sentence that contains at least two of the pre-defined grammatical structures that are involved in D-Level scoring. We used code graciously provided by Roark et al. (2011) and summed the D-Level scores for all sentences in the sample (Table 4.3).

Table 4.3 Thirteen features computed by syntactic complexity code from parse tree

Another set of concepts derive from the findings from the Nun’s study and relate to idea density (Snowdon et al. 1996). Brown et al. (2008) implemented a set of rules outlined in the Computerized Propositional Idea Density Rater (CPIDR) program. Roark et al. (2011) developed their own extension of this rule set, called p_density. Starting with these rules, we conducted extensive experiments using the examples provided by Turner and Greene (1977), making additional changes to improve performance (idea_density). Another metric is content density, the ratio of open-class words to closed-class words. Open-class words are nouns, verbs, adjectives, adverbs, and symbols. Any other parts of speech are considered closed-class words. These metrics shown in Table 4.4, including the word count used for computing speech_rate metrics in Table 4.2.

Table 4.4 Five features computed by idea density and related code from parse tree

Various measures have been proposed to assess the richness of a vocabulary. They apply different formulae to some simple word counts: number of unique words used in the utterance, the total number of words (N), and the numbers of words used i times (V(i,N)) and V the frequency of the most prevalent word (Table 4.5).

Table 4.5 Nine features assessing vocabulary richness

The Unix Operating system long ago provided a suite of software tools to help users preparing documents to assess various elements of writing style. These are listed in Table 4.6. The metrics from the Lexical Inquiry Word Count system from Pennebaker et al. (2015) are also computed, but not listed here. The interested reader may consult the documentation from this system (LIWC2015) Pennebaker et al. (2015).

Table 4.6 Fourteen features computed by the Unix Style Code from transcripts
Table 4.7 Fraction of words in transcript by word type from the Charniak parser (Charniak 2000) and the PennTreeBank (Marcus et al. 1993) 40 features

Finally, we developed a simple word counting program that counts words and their synonyms in a transcript as listed in a control input file. We used this program to count words appropriate to a picture content and also words deemed to convey positive, neutral, and negative emotions. Such an approach has been taken previously by Hux et al. (2008) for the cookie theft picture. These control files we assembled by hand, and included words found in our transcripts, so we may anticipate additional effort may be needed to augment them as new voice samples are acquired. One may hope that once sufficient samples are in hand, this need to augment may diminish eventually to zero.

For each category of word counts (picture content, positive, neutral, negative emotion words), the program has a number of key concepts (n_keys), it counts the number of key concept words found in the transcript (n_keys_matched: each key counted only as present or not), and it reports n_keys_matched/n_keys. The keys are listed in Tables 4.8, 4.9, 4.10, 4.11, and 4.12.

Table 4.8 Key concepts for cookie theft picture
Table 4.9 Key concepts for lake picture
Table 4.10 Key concepts for positive emotion words
Table 4.11 Key concepts for neutral emotion words
Table 4.12 Key concepts for negative emotion words

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Land, W.H., Schaffer, J.D. (2020). Alzheimer’s Disease and Speech Background. In: The Art and Science of Machine Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-18496-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18496-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18495-7

  • Online ISBN: 978-3-030-18496-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics