Abstract
Developing a possible diagnostic test for Alzheimer’s disease (AD) based on speech is used throughout this book to illustrate the application of the machine learning methods we describe. This application has many of the characteristics typical of such tasks: one has some idea that a set of features characterizing samples that one possesses might be able to classify the objects into two or more classes. Lacking sufficient depth of understanding of how this might be caused, one goes searching for useful patterns in the data—a fishing expedition.
This chapter provides background material on AD for the reader interested in how it is defined, what is known about its underlying pathology, how it is usually diagnosed in the clinic, and some of its known effects on speech. We then quickly summarize previous attempts at this task, older ones doing linguistic operations largely by hand, and more current attempts using computers. We then provide details of the speech samples we have collected, and how an array of speech features was extracted from them fully automatically, including speech to text with punctuation. Brief comments are included on issues related to experimental design.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Zoom H1 Handy, 4-4-3 Recorder, Zoom Corp Kanda-surugadai, Chiyoda-ku, Tokyo, 1001-0062 Japan.
- 2.
We gratefully acknowledge the generosity of the Roark team in sharing their software early in our research.
- 3.
DementiaBank 140 plus our own 72.
- 4.
Demographics (age, gender, race, years of education) plus MMSE, plus approximately 230 speech features (we discarded features with little or no variations among subjects).
Abbreviations
- ABeta:
-
Amyloid beta protein fragment
- ACADIE:
-
Atlantic Canada Alzheimer’s Disease Investigation of Expectations
- AD:
-
Alzheimer’s disease
- ADAS-Cog:
-
Alzheimer’s Disease Assessment Scale-Cognitive
- APOE4:
-
Apolipoprotein allele variant e4
- ASR:
-
Automatic speech recognition
- BDAE:
-
Boston Diagnostic Aphasia Examination
- BN:
-
Bayesian network
- CSF:
-
Cerebrospinal fluid
- FTD:
-
Frontotemporal dementia
- kDa:
-
Kilo daltons
- LOO:
-
Leave one out cross-validation
- MMSE:
-
Mini Mental State Exam
- MRI:
-
Magnetic resonance imaging
- NL:
-
Normal control subjects
- PET:
-
Positron emission tomography
- POS:
-
Part of speech
- SUNY:
-
State University of New York
- TDP-43:
-
Transactive response DNA binding protein 43 kDa
- TTR:
-
Type token ratio
- US:
-
United States
- WAB:
-
Western Aphasia Battery
References
Adams DR, Kern DW, Wroblewski KE, McClintock MK, Dale W, Pinto JM (2018) Olfactory dysfunction predicts subsequent dementia in older U.S. adults. J Am Geriatr Soc 66:140–144
Allison RS (1962) The senile brain. Edward Arnold, London
Alzheimer’s Association (2017 ) Alzheimer’s disease facts and figures
Audacity software (n.d.) download available. http://www.audacityteam.org/
Björnsson CH (1968) Läsbarhet. Liber, Stockholm
Louis Bologna, Anthony Padua, Mao Mao, Xiao Chen (2013) Early detection of Alzheimer’s by analyzing speech patter anomalies, Senior Capstone Project, Binghamton University Computer and Electrical Engineering Department
Brown C, Kemper S, Herman R, Covington M (2008) Automatic measurement of propositional density from part-of-speech tagging. Behav Res Methods 40:540–545
Brunet E (1978) Vocabulaire de Jean Giraudoux: Structure et Évolution. Slatkine, Geneva
Charniak E (2000) A maximum-entropy-inspired parser. In: Proceedings of the 1st North American chapter of the Association for Computational Linguistics Conference, pp 132–139
Coleman M, Liau TL (1975) A computer readability formula designed for machine scoring. J Appl Psychol 60:283–284
Dementia Bank (n.d.) [https://talkbank.org/DementiaBank/ last visited 2018-03-26]
Dubois B, Padovani A, Scheltens P, Rossi A, Dell’Agnello G (2016) Timely diagnosis for Alzheimer’s disease: a literature review on benefits and challenges. J Alzheimers Dis 49:617–631
Flesch R (1948) A new readability yardstick. J Appl Psychol 32(3):221–233
Folstein MF, Folstein SE, McHugh PR (1975) “Mini-mental status”. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res 12(3):189–198
Forbes-McKay KE, Venneri A (2005) Detecting subtle spontaneous language decline in early Alzheimer’s disease with a picture description task. Neurol Sci 26:243–254
Fraser KC, Meltzer JA, Rudzicz F (2016) Linguistic features identify Alzheimer’s disease in narrative speech. J Alzheimers Dis 49(2):407–422
Frazier L (1985) Syntactic complexity. Cambridge University Press, Cambridge, pp 129–189
Fukui T, Yamazaki T, Kinno R (2011) The “head turning sign” can be a clinical marker of Alzheimer’s disease? Dement Geriatr Cogn Disord Extra 1:310–317
Goodglass H, Barresi B, Kaplan E (1983) The Boston diagnostic aphasia examination. Lippincott Williams and Wilkins, Philadelphia
Gunning R (1952) The technique of clear writing. McGraw-Hill, New York, pp 36–37
Herbert LE, Weuve J, Scherr PA, Evans DA (2013) Alzheimer disease in the United States (2010–2050) estimated using the 2010 census. Neurology 80:1778–1783
Hewlett S (2007) Emotion detection from speech. Stanford University, TechRpt. [http://cs229.stanford.edu/proj2007/ShahHewlett%20-%20Emotion%20Detection%20from%20Speech.pdf]
Honoré A (1979) Some simple measures of richness of vocabulary. Assoc Lit Linguist Comput Bull 7(2):172–177
Hux K, Wallace SE, Evans K, Snell J (2008) Performing cookie theft picture content analyses to delineate cognitive-communication impairments. J Med Speech Lang Pathol 16(2):83–99
Hyman BT, Phelps CH, Beach TG, Bigio EH, Cairns NJ, Carrillo MC, Dickson DW, Duyckaerts C, Frosch MP, Masliah E, Mirra SS, Nelson PT, Schneider JA, Thal DR, Thies B, Trojanowski JQ, Vinters HV, Montine TJ (2012) National Institute on Aging-Alzheimer’s Association guidelines for the neuropathologic assessment of Alzheimer’s disease. Alzheimers Dement 8(1):1–13
Kertesz A (1982) The western aphasia battery
Kertesz A (2004) Language in Alzheimer’s disease, cognitive neuropsychology of alzheimer’s disease. Oxford University Press, Oxford, pp 197–210
Kim H (2013) The Clockme System: computer-assisted screening tool for dementia. PhD thesis, Georgia Institute of Technical
Kincaid JP, Fishburne RP, Rogers RL, and Chissom BS (1975) Derivation of new readability formulas for Navy enlisted personnel. Research Branch Report 8-75, Chief of Naval Technical Training: Naval Air Station Memphis
Konig AS, Sorin A, Hoory R, Toledo-Ronen O, Derreumaux A, Manera V, Verhey F, Aalten P, Robert PH, David R (2015) Automatic speech analysis for the assessment of patients with predementia and Alzheimer’s disease. Alzheimers Dement 1:112–124
López-de-Ipiña K, Ecay M, Solé-Casals J, Ezeiza A, Barroso N, Martinez-Lage P, Beitia B (2015) Feature selection for spontaneous speech analysis to aid in Alzheimer’s disease diagnosis: a fractal dimension approach. Comput Speech Lang 30(1):43–60
MacWhinney B (2000) The CHILDES Project: tools for analyzing talk, 3rd edn. Lawrence Erlbaum Associates, Mahwah
Marcus M, Santorini B, Marcinkiewicz M (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19(2):313–330
McLaughlin GH (1969) SMOG grading—a new readability formula. J Reading 12(8):639–646
Mielke MM, Vemuri P, Rocca WA (2014) Clinical epidemiology of Alzheimer’s disease: assessing sex and gender differences. Clin Epidemiol 6:37–48
Mirheidari B, Blackburn D, Reuber M, Walker T, Christensen H (2016) Diagnosing people with dementia using automatic conversation analysis. In: Proceedings of interspeech, pp 1220–1224
Nagashima K, Wu J, Takahashi S (2011) Human characteristics of sound localization under masking for the early detection of dementia. In: Early detection and rehabilitation technologies for dementia: neuroscience and biomedical applications. Medical Information Science Reference, Hershey, pp 65–71
Nespoulous JL, Lecours AR, Lafond D, Lemay A, Puel M, Joanette Y, et al (1992) Protocole Montréal-Toulouse. Examen de l’aphasie. Version Béta modifiée, Éditions Ortho
Orimaye SO, Wong JS, Golden KJ (2014) Learning predictive linguistic features for Alzheimer’s disease and related dementias using verbal utterances. In: Workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, Baltimore
Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric properties of LIWC2015. University of Texas at Austin, Austin
Rentoumi V, Raoufian L, Ahmed S, de Jager CA, Garrard P (2014) Features and machine learning classification of connected speech samples from patients with autopsy proven Alzheimer’s disease with and without additional vascular pathology. J Alzheimers Dis 42:S3–S17
Roark B, Mitchell M, Hosom J, Hollingshead K, Kaye J (2011) Spoken language derived measures for detecting mild cognitive impairment. IEEE Trans Audio Speech Lang Process 19(7):2081–2090
Rockwood K, Graham J, Fay S (2002) Goal setting and attainment in Alzheimer’s disease patients treated with donepezil. J Neurol Neurosurg Psychiatry 73:500–507
Rosen WG, Mohs RC, Davis KL (1984) A new rating scale for Alzheimer’s disease. Am J Psychiatry 141(11):1356–1364
Rosenberg S, Abbeduto L (1987) Indicators of linguistic competence in the peer group conversational behavior of mildly retarded adults. Appl Psycholinguist 8:19–32
Sadeghian R, David Schaffer J, Zahorian SA (2017) Speech processing approach for diagnosing dementia in an early stage. In: Interspeech conference 2017, Stockholm, Sweden
Sakhnov K, Verteletskaya E, Simak B (2009) Dynamical energy-based speech/silence detector for speech enhancement applications. In: Proceedings of the world congress on engineering, vol 1
Savica R, Wennberg AM, Hagen C, Edwards K, Roberts RO, Hollman JH, Knopman DS, Boeve BF, Machulda MM, Petersen RC, Mielke MM (2017) Comparison of gait parameters for predicting cognitive decline: the Mayo Clinic study of aging. J Alzheimers Dis 55(2):559–567
Semenza C, Cipolotti L (1989) Neuropsicologia con carta e matita. Cleup Editrice Padova, Padova
Senter RJ, Smith EA (1967) Automated readability index. Wright-Patterson Air Force Base, AMRL-TR-6620, p iii
Sharp EM, Gatz M (2011) The relationship between education and dementia an updated systematic review. Alzheimer Dis Assoc Disord 25(4):289–304
Sichel HS (1975) On a distribution law for word frequencies. J Am Stat Assoc 70:542–547
Simpson EH (1949) Measurement of diversity. Nature:163–168
Sohn J, Kim NS, Sung W (1999) A statistical model-based voice activity detection. IEEE Signal Process Lett 6(1):1–3
Snowdon DA, Kemper SJ, Mortimer JA, Greiner LH, Wekstein DR, Markesbery WR (1996) Linguistic ability in early life and cognitive function and Alzheimer’s disease in late life. JAMA 275(7):528–532
Snowdon D (2002) Aging with grace what the nun study teaches us about leading longer, healthier, and more meaningful lives, Bantam
Stelzmann RA, Norman Schnitzlein H, Reed Murtagh F (1995) An English translation of Alzheimer’s 1907 paper, “ijber eine eigenartige Erlranliung der Hirnrinde”. Clin Anat 8:429–431. http://info-centre.jenage.de/assets/pdfs/library/stelzmann_et_al_alzheimer_CLIN_ANAT_1995.pdf [last accessed 2018-03-19]
Swinburn K, Porter G, Howard D (2004) Comprehensive aphasia test. Psychology Press, Hove
Thomas C, Keselj V, Cercone N, Rockwood K, Asp E (2005) Automatic detection and rating of dementia of Alzheimer type through lexical analysis of spontaneous speech. In: Proceedings of the IEEE international conference on mechatronics and automation, pp 1569–1574
Turner A, Greene E (1977) The construction and use of a propositional text base. University of Colorado Psychology Department, University of Colorado Boulder. [http://www.colorado.edu/ics/sites/default/files/attached-files/77-63.pdf last visited 2018-01-11]
Verghese J, Lipton RB, Hall CB, Kuslansky G, Katz MJ, Buschke H (2002) Abnormality of gait as a predictor of non-Alzheimer’s dementia. N Engl J Med 347(22):1761–1768
Wankerl S, Noeth E, Evert A (2017) An N-gram based approach to the automatic diagnosis of Alzheimer’s disease from spoken language, interspeech conference 2017. Stockholm, Sweden
Wilson RS, Yu L, Lamar M, Schneider JA, Boyle PA, Bennett DA (2019) Education and cognitive reserve in old age. Neurology 92(10):e1041–e1050
Weiner J, Engelbart M, Schultz T (2017) Manual and automatic transcriptions in dementia detection from speech, interspeech conference 2017, Stockholm, Sweden
Yngve V (1960) A model and an hypothesis for language structure. Proc Am Philos Soc 104:444–466
Yule GU (1944) The statistical study of literary vocabulary. Cambridge University Press, Cambridge
Zahorian SA, Hu H (2008) A spectral/temporal method for robust fundamental frequency tracking. J Acoust Soc Am 123(6):4559–4571
Author information
Authors and Affiliations
Appendix: Features Extracted from Each Voice Sample
Appendix: Features Extracted from Each Voice Sample
Here, we provide some details of the processing of each speech sample. Figure 4.3 gives an overview of the processing chain.
Table 4.2 lists the features extracted from the speech signal: wav format file with 16 bits sampled at 16 kHz. We implemented three approaches to breaking up the signal into periods of speaking and intervening pauses. One approach was based on detecting the pitch of the speaker’s voice using an approach called YAAPT (Yet Another Algorithm for Pitch Tracking) (Zahorian and Hu 2008). This produces results as illustrated in Fig. 4.3. A second approach, using energy in the signal and setting a threshold to separate speech from pause, was based on work by Sakhnov et al. (2009). A third approach used the Voice Activity Detector (VAD) of Sohn et al. (1999). Within each approach, a pause detector decided that any un-voiced period longer than 200 ms was a pause, and voiced segments had to be longer than 50 ms. In the literature on pauses, one may find thresholds ranging from 100 ms to 1 s. Our thresholds were set after some experimentation comparing manually determined pause segmented samples with the algorithmic outputs (Bologna et al. 2013).
In addition, we also sought a method for assessing emotion or affect that may be diminished in AD. Using the data from the pitch-based pause detector, and following the approach of Hewlett (2007), we calculated the mean, median, variance, min, and max of the pitch in each sample. We hypothesized that a low-pitch variance might capture a blunted affect. We also captured a speech rate measure from the pitch-based signal with the pauses removed (i.e. a shorter utterance length). These are also listed in Table 4.2.
Next, we describe the syntactic complexity features. These are computed using software generously shared by Roark et al. (2011). Two methods are used: Frazier (1985) and Yngve (1960). Both methods begin with a parse tree as produced by the Charniak parser (Charniak 2000). Parse trees indicate the syntactic structure of a sentence or phrase. They are made up of nodes and branches, with each node representing a grammatical category that is present in the sentence. Branches connect parent nodes and child nodes; child nodes are embedded within the grammatical structure that the parent node indicates. In Frazier scoring, each non-terminal node (any node that is not the end of a word’s grammatical structure) is given a score of 1. In some cases, such as sentence nodes, a score of 1.5 is given (Roark et al. 2011). Each word’s score is obtained by summing up the scores that are covered while tracing upward from the word to the top of the tree (Fig. 4.6). In Yngve scoring, each right-most branching node of the parse tree is given a score of 0, and each consecutive left branch has a 1 added to its score (Fig. 4.6). The score for a word is obtained by summing up scores as you move up the tree from a given word. Note that this example has no punctuation. The parser adds more nodes for punctuation, so the Frazier and Yngve scores will change. Our speech-to-text software inserts punctuation.
Another complexity measure was proposed by Rosenberg and Abbeduto (1987) called Developmental Level. The score on Developmental Level, a scale of speech complexity, ranges from 0 to 7 points. A score of 0 is given to a simple sentence with one clause. A score of 7 is given to a “complex” sentence that contains at least two of the pre-defined grammatical structures that are involved in D-Level scoring. We used code graciously provided by Roark et al. (2011) and summed the D-Level scores for all sentences in the sample (Table 4.3).
Another set of concepts derive from the findings from the Nun’s study and relate to idea density (Snowdon et al. 1996). Brown et al. (2008) implemented a set of rules outlined in the Computerized Propositional Idea Density Rater (CPIDR) program. Roark et al. (2011) developed their own extension of this rule set, called p_density. Starting with these rules, we conducted extensive experiments using the examples provided by Turner and Greene (1977), making additional changes to improve performance (idea_density). Another metric is content density, the ratio of open-class words to closed-class words. Open-class words are nouns, verbs, adjectives, adverbs, and symbols. Any other parts of speech are considered closed-class words. These metrics shown in Table 4.4, including the word count used for computing speech_rate metrics in Table 4.2.
Various measures have been proposed to assess the richness of a vocabulary. They apply different formulae to some simple word counts: number of unique words used in the utterance, the total number of words (N), and the numbers of words used i times (V(i,N)) and V the frequency of the most prevalent word (Table 4.5).
The Unix Operating system long ago provided a suite of software tools to help users preparing documents to assess various elements of writing style. These are listed in Table 4.6. The metrics from the Lexical Inquiry Word Count system from Pennebaker et al. (2015) are also computed, but not listed here. The interested reader may consult the documentation from this system (LIWC2015) Pennebaker et al. (2015).
Finally, we developed a simple word counting program that counts words and their synonyms in a transcript as listed in a control input file. We used this program to count words appropriate to a picture content and also words deemed to convey positive, neutral, and negative emotions. Such an approach has been taken previously by Hux et al. (2008) for the cookie theft picture. These control files we assembled by hand, and included words found in our transcripts, so we may anticipate additional effort may be needed to augment them as new voice samples are acquired. One may hope that once sufficient samples are in hand, this need to augment may diminish eventually to zero.
For each category of word counts (picture content, positive, neutral, negative emotion words), the program has a number of key concepts (n_keys), it counts the number of key concept words found in the transcript (n_keys_matched: each key counted only as present or not), and it reports n_keys_matched/n_keys. The keys are listed in Tables 4.8, 4.9, 4.10, 4.11, and 4.12.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Land, W.H., Schaffer, J.D. (2020). Alzheimer’s Disease and Speech Background. In: The Art and Science of Machine Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-18496-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-18496-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18495-7
Online ISBN: 978-3-030-18496-4
eBook Packages: EngineeringEngineering (R0)