Alzheimer’s Disease and Speech Background

Land, Walker H.; Schaffer, J. David

doi:10.1007/978-3-030-18496-4_4

Walker H. Land Jr.³ &
J. David Schaffer⁴

652 Accesses
2 Citations

Abstract

Developing a possible diagnostic test for Alzheimer’s disease (AD) based on speech is used throughout this book to illustrate the application of the machine learning methods we describe. This application has many of the characteristics typical of such tasks: one has some idea that a set of features characterizing samples that one possesses might be able to classify the objects into two or more classes. Lacking sufficient depth of understanding of how this might be caused, one goes searching for useful patterns in the data—a fishing expedition.

This chapter provides background material on AD for the reader interested in how it is defined, what is known about its underlying pathology, how it is usually diagnosed in the clinic, and some of its known effects on speech. We then quickly summarize previous attempts at this task, older ones doing linguistic operations largely by hand, and more current attempts using computers. We then provide details of the speech samples we have collected, and how an array of speech features was extracted from them fully automatically, including speech to text with punctuation. Brief comments are included on issues related to experimental design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Zoom H1 Handy, 4-4-3 Recorder, Zoom Corp Kanda-surugadai, Chiyoda-ku, Tokyo, 1001-0062 Japan.
2.
We gratefully acknowledge the generosity of the Roark team in sharing their software early in our research.
3.
DementiaBank 140 plus our own 72.
4.
Demographics (age, gender, race, years of education) plus MMSE, plus approximately 230 speech features (we discarded features with little or no variations among subjects).

Abbreviations

ABeta:: Amyloid beta protein fragment
ACADIE:: Atlantic Canada Alzheimer’s Disease Investigation of Expectations
AD:: Alzheimer’s disease
ADAS-Cog:: Alzheimer’s Disease Assessment Scale-Cognitive
APOE4:: Apolipoprotein allele variant e4
ASR:: Automatic speech recognition
BDAE:: Boston Diagnostic Aphasia Examination
BN:: Bayesian network
CSF:: Cerebrospinal fluid
FTD:: Frontotemporal dementia
kDa:: Kilo daltons
LOO:: Leave one out cross-validation
MMSE:: Mini Mental State Exam
MRI:: Magnetic resonance imaging
NL:: Normal control subjects
PET:: Positron emission tomography
POS:: Part of speech
SUNY:: State University of New York
TDP-43:: Transactive response DNA binding protein 43 kDa
TTR:: Type token ratio
US:: United States
WAB:: Western Aphasia Battery

References

Adams DR, Kern DW, Wroblewski KE, McClintock MK, Dale W, Pinto JM (2018) Olfactory dysfunction predicts subsequent dementia in older U.S. adults. J Am Geriatr Soc 66:140–144
Article Google Scholar
Allison RS (1962) The senile brain. Edward Arnold, London
Google Scholar
Alzheimer’s Association (2017 ) Alzheimer’s disease facts and figures
Google Scholar
Audacity software (n.d.) download available. http://www.audacityteam.org/
Björnsson CH (1968) Läsbarhet. Liber, Stockholm
Google Scholar
Louis Bologna, Anthony Padua, Mao Mao, Xiao Chen (2013) Early detection of Alzheimer’s by analyzing speech patter anomalies, Senior Capstone Project, Binghamton University Computer and Electrical Engineering Department
Google Scholar
Brown C, Kemper S, Herman R, Covington M (2008) Automatic measurement of propositional density from part-of-speech tagging. Behav Res Methods 40:540–545
Article Google Scholar
Brunet E (1978) Vocabulaire de Jean Giraudoux: Structure et Évolution. Slatkine, Geneva
Google Scholar
Charniak E (2000) A maximum-entropy-inspired parser. In: Proceedings of the 1st North American chapter of the Association for Computational Linguistics Conference, pp 132–139
Google Scholar
Coleman M, Liau TL (1975) A computer readability formula designed for machine scoring. J Appl Psychol 60:283–284
Article Google Scholar
Dementia Bank (n.d.) [https://talkbank.org/DementiaBank/ last visited 2018-03-26]
Dubois B, Padovani A, Scheltens P, Rossi A, Dell’Agnello G (2016) Timely diagnosis for Alzheimer’s disease: a literature review on benefits and challenges. J Alzheimers Dis 49:617–631
Article Google Scholar
Flesch R (1948) A new readability yardstick. J Appl Psychol 32(3):221–233
Article Google Scholar
Folstein MF, Folstein SE, McHugh PR (1975) “Mini-mental status”. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res 12(3):189–198
Article Google Scholar
Forbes-McKay KE, Venneri A (2005) Detecting subtle spontaneous language decline in early Alzheimer’s disease with a picture description task. Neurol Sci 26:243–254
Article Google Scholar
Fraser KC, Meltzer JA, Rudzicz F (2016) Linguistic features identify Alzheimer’s disease in narrative speech. J Alzheimers Dis 49(2):407–422
Article Google Scholar
Frazier L (1985) Syntactic complexity. Cambridge University Press, Cambridge, pp 129–189
Google Scholar
Fukui T, Yamazaki T, Kinno R (2011) The “head turning sign” can be a clinical marker of Alzheimer’s disease? Dement Geriatr Cogn Disord Extra 1:310–317
Article Google Scholar
Goodglass H, Barresi B, Kaplan E (1983) The Boston diagnostic aphasia examination. Lippincott Williams and Wilkins, Philadelphia
Google Scholar
Gunning R (1952) The technique of clear writing. McGraw-Hill, New York, pp 36–37
Google Scholar
Herbert LE, Weuve J, Scherr PA, Evans DA (2013) Alzheimer disease in the United States (2010–2050) estimated using the 2010 census. Neurology 80:1778–1783
Article Google Scholar
Hewlett S (2007) Emotion detection from speech. Stanford University, TechRpt. [http://cs229.stanford.edu/proj2007/ShahHewlett%20-%20Emotion%20Detection%20from%20Speech.pdf]
Honoré A (1979) Some simple measures of richness of vocabulary. Assoc Lit Linguist Comput Bull 7(2):172–177
Google Scholar
Hux K, Wallace SE, Evans K, Snell J (2008) Performing cookie theft picture content analyses to delineate cognitive-communication impairments. J Med Speech Lang Pathol 16(2):83–99
Google Scholar
Hyman BT, Phelps CH, Beach TG, Bigio EH, Cairns NJ, Carrillo MC, Dickson DW, Duyckaerts C, Frosch MP, Masliah E, Mirra SS, Nelson PT, Schneider JA, Thal DR, Thies B, Trojanowski JQ, Vinters HV, Montine TJ (2012) National Institute on Aging-Alzheimer’s Association guidelines for the neuropathologic assessment of Alzheimer’s disease. Alzheimers Dement 8(1):1–13
Article Google Scholar
Kertesz A (1982) The western aphasia battery
Google Scholar
Kertesz A (2004) Language in Alzheimer’s disease, cognitive neuropsychology of alzheimer’s disease. Oxford University Press, Oxford, pp 197–210
Google Scholar
Kim H (2013) The Clockme System: computer-assisted screening tool for dementia. PhD thesis, Georgia Institute of Technical
Google Scholar
Kincaid JP, Fishburne RP, Rogers RL, and Chissom BS (1975) Derivation of new readability formulas for Navy enlisted personnel. Research Branch Report 8-75, Chief of Naval Technical Training: Naval Air Station Memphis
Google Scholar
Konig AS, Sorin A, Hoory R, Toledo-Ronen O, Derreumaux A, Manera V, Verhey F, Aalten P, Robert PH, David R (2015) Automatic speech analysis for the assessment of patients with predementia and Alzheimer’s disease. Alzheimers Dement 1:112–124
Google Scholar
López-de-Ipiña K, Ecay M, Solé-Casals J, Ezeiza A, Barroso N, Martinez-Lage P, Beitia B (2015) Feature selection for spontaneous speech analysis to aid in Alzheimer’s disease diagnosis: a fractal dimension approach. Comput Speech Lang 30(1):43–60
Article Google Scholar
MacWhinney B (2000) The CHILDES Project: tools for analyzing talk, 3rd edn. Lawrence Erlbaum Associates, Mahwah
Google Scholar
Marcus M, Santorini B, Marcinkiewicz M (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19(2):313–330
Google Scholar
McLaughlin GH (1969) SMOG grading—a new readability formula. J Reading 12(8):639–646
Google Scholar
Mielke MM, Vemuri P, Rocca WA (2014) Clinical epidemiology of Alzheimer’s disease: assessing sex and gender differences. Clin Epidemiol 6:37–48
Article Google Scholar
Mirheidari B, Blackburn D, Reuber M, Walker T, Christensen H (2016) Diagnosing people with dementia using automatic conversation analysis. In: Proceedings of interspeech, pp 1220–1224
Google Scholar
Nagashima K, Wu J, Takahashi S (2011) Human characteristics of sound localization under masking for the early detection of dementia. In: Early detection and rehabilitation technologies for dementia: neuroscience and biomedical applications. Medical Information Science Reference, Hershey, pp 65–71
Chapter Google Scholar
Nespoulous JL, Lecours AR, Lafond D, Lemay A, Puel M, Joanette Y, et al (1992) Protocole Montréal-Toulouse. Examen de l’aphasie. Version Béta modifiée, Éditions Ortho
Google Scholar
Orimaye SO, Wong JS, Golden KJ (2014) Learning predictive linguistic features for Alzheimer’s disease and related dementias using verbal utterances. In: Workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, Baltimore
Google Scholar
Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric properties of LIWC2015. University of Texas at Austin, Austin
Google Scholar
Rentoumi V, Raoufian L, Ahmed S, de Jager CA, Garrard P (2014) Features and machine learning classification of connected speech samples from patients with autopsy proven Alzheimer’s disease with and without additional vascular pathology. J Alzheimers Dis 42:S3–S17
Article Google Scholar
Roark B, Mitchell M, Hosom J, Hollingshead K, Kaye J (2011) Spoken language derived measures for detecting mild cognitive impairment. IEEE Trans Audio Speech Lang Process 19(7):2081–2090
Article Google Scholar
Rockwood K, Graham J, Fay S (2002) Goal setting and attainment in Alzheimer’s disease patients treated with donepezil. J Neurol Neurosurg Psychiatry 73:500–507
Article Google Scholar
Rosen WG, Mohs RC, Davis KL (1984) A new rating scale for Alzheimer’s disease. Am J Psychiatry 141(11):1356–1364
Article Google Scholar
Rosenberg S, Abbeduto L (1987) Indicators of linguistic competence in the peer group conversational behavior of mildly retarded adults. Appl Psycholinguist 8:19–32
Article Google Scholar
Sadeghian R, David Schaffer J, Zahorian SA (2017) Speech processing approach for diagnosing dementia in an early stage. In: Interspeech conference 2017, Stockholm, Sweden
Google Scholar
Sakhnov K, Verteletskaya E, Simak B (2009) Dynamical energy-based speech/silence detector for speech enhancement applications. In: Proceedings of the world congress on engineering, vol 1
Google Scholar
Savica R, Wennberg AM, Hagen C, Edwards K, Roberts RO, Hollman JH, Knopman DS, Boeve BF, Machulda MM, Petersen RC, Mielke MM (2017) Comparison of gait parameters for predicting cognitive decline: the Mayo Clinic study of aging. J Alzheimers Dis 55(2):559–567
Article Google Scholar
Semenza C, Cipolotti L (1989) Neuropsicologia con carta e matita. Cleup Editrice Padova, Padova
Google Scholar
Senter RJ, Smith EA (1967) Automated readability index. Wright-Patterson Air Force Base, AMRL-TR-6620, p iii
Google Scholar
Sharp EM, Gatz M (2011) The relationship between education and dementia an updated systematic review. Alzheimer Dis Assoc Disord 25(4):289–304
Article Google Scholar
Sichel HS (1975) On a distribution law for word frequencies. J Am Stat Assoc 70:542–547
Google Scholar
Simpson EH (1949) Measurement of diversity. Nature:163–168
Google Scholar
Sohn J, Kim NS, Sung W (1999) A statistical model-based voice activity detection. IEEE Signal Process Lett 6(1):1–3
Article Google Scholar
Snowdon DA, Kemper SJ, Mortimer JA, Greiner LH, Wekstein DR, Markesbery WR (1996) Linguistic ability in early life and cognitive function and Alzheimer’s disease in late life. JAMA 275(7):528–532
Article Google Scholar
Snowdon D (2002) Aging with grace what the nun study teaches us about leading longer, healthier, and more meaningful lives, Bantam
Google Scholar
Stelzmann RA, Norman Schnitzlein H, Reed Murtagh F (1995) An English translation of Alzheimer’s 1907 paper, “ijber eine eigenartige Erlranliung der Hirnrinde”. Clin Anat 8:429–431. http://info-centre.jenage.de/assets/pdfs/library/stelzmann_et_al_alzheimer_CLIN_ANAT_1995.pdf [last accessed 2018-03-19]
Article Google Scholar
Swinburn K, Porter G, Howard D (2004) Comprehensive aphasia test. Psychology Press, Hove
Google Scholar
Thomas C, Keselj V, Cercone N, Rockwood K, Asp E (2005) Automatic detection and rating of dementia of Alzheimer type through lexical analysis of spontaneous speech. In: Proceedings of the IEEE international conference on mechatronics and automation, pp 1569–1574
Google Scholar
Turner A, Greene E (1977) The construction and use of a propositional text base. University of Colorado Psychology Department, University of Colorado Boulder. [http://www.colorado.edu/ics/sites/default/files/attached-files/77-63.pdf last visited 2018-01-11]
Verghese J, Lipton RB, Hall CB, Kuslansky G, Katz MJ, Buschke H (2002) Abnormality of gait as a predictor of non-Alzheimer’s dementia. N Engl J Med 347(22):1761–1768
Article Google Scholar
Wankerl S, Noeth E, Evert A (2017) An N-gram based approach to the automatic diagnosis of Alzheimer’s disease from spoken language, interspeech conference 2017. Stockholm, Sweden
Google Scholar
Wilson RS, Yu L, Lamar M, Schneider JA, Boyle PA, Bennett DA (2019) Education and cognitive reserve in old age. Neurology 92(10):e1041–e1050
Article Google Scholar
Weiner J, Engelbart M, Schultz T (2017) Manual and automatic transcriptions in dementia detection from speech, interspeech conference 2017, Stockholm, Sweden
Google Scholar
Yngve V (1960) A model and an hypothesis for language structure. Proc Am Philos Soc 104:444–466
MathSciNet Google Scholar
Yule GU (1944) The statistical study of literary vocabulary. Cambridge University Press, Cambridge
Google Scholar
Zahorian SA, Hu H (2008) A spectral/temporal method for robust fundamental frequency tracking. J Acoust Soc Am 123(6):4559–4571
Article Google Scholar

Download references

Author information

Authors and Affiliations

Binghamton University, Bowie, MD, USA
Walker H. Land Jr.
Binghamton University, Binghamton, NY, USA
J. David Schaffer

Authors

Walker H. Land Jr.
View author publications
You can also search for this author in PubMed Google Scholar
J. David Schaffer
View author publications
You can also search for this author in PubMed Google Scholar

Appendix: Features Extracted from Each Voice Sample

Here, we provide some details of the processing of each speech sample. Figure 4.3 gives an overview of the processing chain.

Table 4.2 lists the features extracted from the speech signal: wav format file with 16 bits sampled at 16 kHz. We implemented three approaches to breaking up the signal into periods of speaking and intervening pauses. One approach was based on detecting the pitch of the speaker’s voice using an approach called YAAPT (Yet Another Algorithm for Pitch Tracking) (Zahorian and Hu 2008). This produces results as illustrated in Fig. 4.3. A second approach, using energy in the signal and setting a threshold to separate speech from pause, was based on work by Sakhnov et al. (2009). A third approach used the Voice Activity Detector (VAD) of Sohn et al. (1999). Within each approach, a pause detector decided that any un-voiced period longer than 200 ms was a pause, and voiced segments had to be longer than 50 ms. In the literature on pauses, one may find thresholds ranging from 100 ms to 1 s. Our thresholds were set after some experimentation comparing manually determined pause segmented samples with the algorithmic outputs (Bologna et al. 2013).

In addition, we also sought a method for assessing emotion or affect that may be diminished in AD. Using the data from the pitch-based pause detector, and following the approach of Hewlett (2007), we calculated the mean, median, variance, min, and max of the pitch in each sample. We hypothesized that a low-pitch variance might capture a blunted affect. We also captured a speech rate measure from the pitch-based signal with the pauses removed (i.e. a shorter utterance length). These are also listed in Table 4.2.

Next, we describe the syntactic complexity features. These are computed using software generously shared by Roark et al. (2011). Two methods are used: Frazier (1985) and Yngve (1960). Both methods begin with a parse tree as produced by the Charniak parser (Charniak 2000). Parse trees indicate the syntactic structure of a sentence or phrase. They are made up of nodes and branches, with each node representing a grammatical category that is present in the sentence. Branches connect parent nodes and child nodes; child nodes are embedded within the grammatical structure that the parent node indicates. In Frazier scoring, each non-terminal node (any node that is not the end of a word’s grammatical structure) is given a score of 1. In some cases, such as sentence nodes, a score of 1.5 is given (Roark et al. 2011). Each word’s score is obtained by summing up the scores that are covered while tracing upward from the word to the top of the tree (Fig. 4.6). In Yngve scoring, each right-most branching node of the parse tree is given a score of 0, and each consecutive left branch has a 1 added to its score (Fig. 4.6). The score for a word is obtained by summing up scores as you move up the tree from a given word. Note that this example has no punctuation. The parser adds more nodes for punctuation, so the Frazier and Yngve scores will change. Our speech-to-text software inserts punctuation.

Another complexity measure was proposed by Rosenberg and Abbeduto (1987) called Developmental Level. The score on Developmental Level, a scale of speech complexity, ranges from 0 to 7 points. A score of 0 is given to a simple sentence with one clause. A score of 7 is given to a “complex” sentence that contains at least two of the pre-defined grammatical structures that are involved in D-Level scoring. We used code graciously provided by Roark et al. (2011) and summed the D-Level scores for all sentences in the sample (Table 4.3).

Table 4.3 Thirteen features computed by syntactic complexity code from parse tree

Full size table

Another set of concepts derive from the findings from the Nun’s study and relate to idea density (Snowdon et al. 1996). Brown et al. (2008) implemented a set of rules outlined in the Computerized Propositional Idea Density Rater (CPIDR) program. Roark et al. (2011) developed their own extension of this rule set, called p_density. Starting with these rules, we conducted extensive experiments using the examples provided by Turner and Greene (1977), making additional changes to improve performance (idea_density). Another metric is content density, the ratio of open-class words to closed-class words. Open-class words are nouns, verbs, adjectives, adverbs, and symbols. Any other parts of speech are considered closed-class words. These metrics shown in Table 4.4, including the word count used for computing speech_rate metrics in Table 4.2.

Table 4.4 Five features computed by idea density and related code from parse tree

Full size table

Various measures have been proposed to assess the richness of a vocabulary. They apply different formulae to some simple word counts: number of unique words used in the utterance, the total number of words (N), and the numbers of words used i times (V(i,N)) and V the frequency of the most prevalent word (Table 4.5).

Table 4.5 Nine features assessing vocabulary richness

Full size table

The Unix Operating system long ago provided a suite of software tools to help users preparing documents to assess various elements of writing style. These are listed in Table 4.6. The metrics from the Lexical Inquiry Word Count system from Pennebaker et al. (2015) are also computed, but not listed here. The interested reader may consult the documentation from this system (LIWC2015) Pennebaker et al. (2015).

Table 4.6 Fourteen features computed by the Unix Style Code from transcripts

Full size table

Table 4.7 Fraction of words in transcript by word type from the Charniak parser (Charniak 2000) and the PennTreeBank (Marcus et al. 1993) 40 features

Full size table

Finally, we developed a simple word counting program that counts words and their synonyms in a transcript as listed in a control input file. We used this program to count words appropriate to a picture content and also words deemed to convey positive, neutral, and negative emotions. Such an approach has been taken previously by Hux et al. (2008) for the cookie theft picture. These control files we assembled by hand, and included words found in our transcripts, so we may anticipate additional effort may be needed to augment them as new voice samples are acquired. One may hope that once sufficient samples are in hand, this need to augment may diminish eventually to zero.

For each category of word counts (picture content, positive, neutral, negative emotion words), the program has a number of key concepts (n_keys), it counts the number of key concept words found in the transcript (n_keys_matched: each key counted only as present or not), and it reports n_keys_matched/n_keys. The keys are listed in Tables 4.8, 4.9, 4.10, 4.11, and 4.12.

Table 4.8 Key concepts for cookie theft picture

Full size table

Table 4.9 Key concepts for lake picture

Full size table

Table 4.10 Key concepts for positive emotion words

Full size table

Table 4.11 Key concepts for neutral emotion words

Full size table

Table 4.12 Key concepts for negative emotion words

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Land, W.H., Schaffer, J.D. (2020). Alzheimer’s Disease and Speech Background. In: The Art and Science of Machine Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-18496-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-18496-4_4
Published: 26 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18495-7
Online ISBN: 978-3-030-18496-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Alzheimer’s Disease and Speech Background

Abstract

Access this chapter

Notes

Abbreviations

References

Author information

Authors and Affiliations

Appendix: Features Extracted from Each Voice Sample

Appendix: Features Extracted from Each Voice Sample

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation