Keywords

1 Introduction

The benefits of amplification are greatest when hearing interventions are made as early as possible. There is therefore great interest in the clinical audiology community in the development of objective techniques to measure various hearing abilities that do not require behavioural responses from the patient and are able to determine fitting parameters for hearing aids and cochlear implants. While the use of cortical responses is in development (Billings et al. 2007, 2011; Carter et al. 2010; Billings et al. 2012; Chang et al. 2012), transient brainstem responses and steady state responses (Luts et al. 2004, 2006; Alaerts et al. 2010) are already used in clinical practice to objectively assess hearing thresholds in young children. However, while these measures provide an estimate of audibility with the prescribed hearing aid gain, they do not provide any indication of the expected hearing ability of the patient.

In order to develop an objective measure of hearing ability, it is necessary to establish the links between speech perception, psychophysical measures of perceptual sensitivity to the acoustic cues that underlie effective speech perception, and the proposed objective measures. This paper describes our initial investigations in this direction.

The overall aim of this study was to examine the relationships between perceptual sensitivity to temporal fine-structure cues, brainstem encoding of complex harmonic and amplitude-modulated sounds, and the ability to understand speech in noise. Understanding these links will allow the development of an objective measure that could be used to detect changes in functional hearing before the onset of permanent threshold shifts.

2 Methods

2.1 Participants

Thirty-four participants (14 men and 20 women) aged between 18–63 years took part in the experiment. All participants had normal hearing bilaterally, defined by four frequency average hearing loss thresholds (500 Hz, 1 kHz, 2 kHz and 4 kHz) of less than 25 dB HL. Pure tone hearing thresholds for the 34 participants are shown in Fig. 1. Thresholds were measured using an Otometrics MADSEN Itera II audiometer with TDH-39 Stereo Headphones. The study was approved by Royal Victorian Eye & Ear Hospital Human Research Ethics Committee. Consent was written and informed.

Fig. 1
figure 1

Pure tone hearing thresholds for each ear

2.2 Temporal Fine Structure Sensitivity

The TFS1 test (Moore and Sek 2009, 2012) was used to measure participants’ temporal fine structure (TFS) sensitivity. The test was carried out using the TFS1 test software (http://hearing.psychol.cam.ac.uk/) and was based on the standard TFS1 test protocol. The task was performed on the participant’s better ear as determined by the audiogram.

One practice run was given prior to the test. If participants could perform the task and attain a threshold they were given three real runs. For some participants the staircase procedure saturated (reached the ‘easy’ limit). Instead of receiving a threshold, these participants received a percent-correct score obtained from 40 trials of the 2AFC task at the easiest level of the staircase. Both the threshold and the percent-correct scores were converted to a d’ sensitivity measure using the method outlined by Hopkins and Moore (2007).

2.3 Speech in Noise Tests (QuickSIN)

Speech-in-noise performance was assessed behaviourally using the QuickSIN test (Killion et al. 2004). Six sets of six sentences with five key words per sentence were presented to both ears in four-talker babble noise. The sentences were presented with decreasing signal-to-noise ratios (SNRs) from + 25 dB SNR to 0 dB SNR in 5 dB steps (+ 25, + 20, + 15, + 10, + 5 and 0 dB). The average SNR loss was calculated across the six lists. This score indicates the increase in SNR required for the participant to understand the sentence compared with a normal hearing person. Any score less than 4 dB is considered normal, and a lower SNR loss score reflects better speech-in-noise performance.

2.4 Electrophysiology

2.4.1 Stimuli

Envelope-following responses (EFRs) from the brainstem were elicited in response to two stimulus types—a complex harmonic tone, and a 4 KHz sinusoidal carrier tone modulated at 110 Hz. Both sounds were 100 ms in duration with 5 ms linear onset and offset ramps. The complex harmonic tone had an F0 of 180 Hz and 20 harmonics of equal amplitude and random phase. The modulated tone had a modulation depth of 50 %. Each stimulus was presented with alternating polarities. The first 20 ms of each stimulus are shown in Fig. 2.

Fig. 2
figure 2

The first 20 ms of the complex harmonic tone (top) and modulated tone (bottom)

Stimuli were controlled via custom software in MAX/MSP (Cycling ’74), played through an RME Fireface 400 audio interface and Etymotic ER3-A insert-phones.

2.4.2 EEG Recordings and Pre-processing

EEG data were recorded from the scalp via Ag-AgCl electrodes, using a BioSemi ActiveTwo EEG System. Electrode offsets were ±40 mV. The EEG data were collected in continuous mode at a sampling rate of 16.384 kHz. The ‑3 dB point of the antialiasing lowpass filter was 3276 Hz.

The EEG recordings were segmented into epochs of ‑50 to 150 ms, separately for each stimulus type. The epochs were artefact rejected using the pop_autorej function from the EEGLAB toolbox (Delorme and Makeig 2004) using MATLAB software. EFRs were computed by adding responses to the positive and negative stimulus polarities (EFR = (Pos + Neg)/2). All subsequent correlational analyses were conducted on the averaged EFR waveforms.

The signal-to-noise ratio (SNR) of the EFR responses was calculated as 20log10(RMSpost/RMSpre), where RMSpost and RMSpre were the root mean square of the amplitude in the ‘response’ period (defined as 25–75 ms post-stimulus) and the root mean square of the pre-stimulus period (defined as 50 ms pre-stimus until stimulus onset at 0 ms) respectively. Three participants were removed from the EEG analysis as they had SNRs < 1.5 dB due to movement and other muscle artefact.

Stimulus-to-response cross-correlations (r-values) were generated using the Brainstem Toolbox 2013 (Skoe and Kraus 2010). The maximum cross-correlation values were chosen irrespective of the lag. Cross-correlations were performed against the Hilbert envelope of the stimulus, as recorded through the transducer and an artificial ear (GRAS Type 43-AG). All data transformation and statistical tests (correlational analyses) were conducted using Matlab and Minitab®. Spearman’s rank correlations were performed where data were skewed.

3 Results

3.1 Hearing Thresholds and Age

Hearing sensitivity generally declined slightly with age. The correlation between age and the pure-tone average hearing loss (at 500 Hz, 1 kHz, 2 kHz and 4 kHz) for the best ear was significant, r = 0.43, p = 0.001.

3.2 Correlations: QuickSIN, TFS Sensitivity, Age and Speech in Noise

Figure 3 shows the relationships between age, speech in noise perception, and TFS sensitivity. Both speech in noise scores (r = ‑0.38, p < 0.03) and TFS sensitivity (r = ‑0.57, p < 0.001) were significantly negatively correlated with age. Older participants generally had worse speech scores, and worse TFS sensitivity, with a stronger relationship in the case of TFS sensitivity. Speech in noise performance was moderately and significantly related to TFS sensitivity (r = ‑0.34, p = 0.046). Participants who had good TFS sensitivity generally had good speech in noise scores (good scores are negative).

Fig. 3
figure 3

Left panel: Speech in noise performance (QuickSIN SNR loss) as a function of age (left). Note that the direction of the y-axis has been reversed so that better performance is up. Middle panel: Sensitivity to TFS as a function of age. Right panel: Speech in noise performance (QuickSIN) as a function of TFS sensitivity. TFS sensitivity is expressed using a d’ measure. Note that the direction of the y-axis has been reversed so that better performance is up. The grey line in all three panels indicates a least-squares linear regression

3.3 Electrophysiology

Grand average responses for the complex harmonic and modulated tones are shown in Fig. 4. The responses show clear phase-locking to periodicity in the stimuli.

Fig. 4
figure 4

Grand average (across all participants) brainstem responses for the complex harmonic tone (top panel) and modulated tone (bottom panel). The Hilbert envelope of the stimulus waveform is shown in red

A measure of stimulus encoding strength was generated by calculating the cross-correlation between the stimulus envelope (as measured through the transducer and artificial ear) and the brainstem response. Figure 5 shows this process for one listener.

Fig. 5
figure 5

An illustration for a single listener of the calculation of the cross-correlation values for the modulated tone (top panels) and complex harmonic tone (bottom panels). The left panels show the brainstem responses (in blue) and the stimulus waveform (red) and its Hilbert envelope (black). Note the stimulus waveform here is not the electrical signal: rather it has been recorded through the transducer and an artificial ear. The right panels show the cross-correlation values between the stimulus envelope and the brainstem response as a function of the lag between them. The maximum of the cross-correlation function was determined for further analysis

3.4 Correlations: Stimulus Encoding Accuracy with Age, TFS Sensitivity and QuickSIN

The maximum cross-correlation value obtained from each participant and stimulus type was correlated with age, the TFS sensitivity and speech in noise scores. The top row of Fig. 6 shows that increasing age was associated with decreasing stimulus encoding accuracy, but only for the modulated tone stimulus (r = ‑0.53, p < 0.001). The middle row of Fig. 6 also shows a striking relationship between increased TFS sensitivity and reduced stimulus encoding for the complex harmonic tone (r = ‑0.56, p = 0.004). Interestingly, there was no such relationship for the modulated tone. There were also no significant relationships between stimulus encoding accuracy and the speech in noise scores (bottom row).

Fig. 6
figure 6

Stimulus-to-response cross-correlation values between the complex harmonic (left) and modulated (right) stimuli and brainstem response as a function of age (top row), TFS sensitivity (middle row) and speech in noise scores (bottom row)

3.5 Regression Analysis

In order to determine which psychophysical and/or EEG measure best predicted the speech scores, best subsets regression was performed. The QuickSIN scores were entered as the response variable. Age, four-frequency pure-tone hearing thresholds (in the better ear), TFS sensitivity, and stimulus encoding accuracy for both the complex harmonic and modulated tones were entered as predictors. The regression indicated that QuickSIN scores were best predicted by a combination of all variables except the hearing thresholds (R 2 = 64.4, F (4,26) = 6.32, p = 0.004). A follow-up standard linear regression using the four predictors identified by the best-subsets procedure echoed these results, although TFS sensitivity was not a significant factor. The main model was significant (R 2 = 41.8, F (4,26) = 4.49, p = 0.007), with age accounting for most of the variance in the model (18.1 %), followed by the stimulus encoding accuracy variables contributing 13.7 and 10.0 % for the modulated and complex harmonic tones respectively. When the pure-tone average hearing thresholds were added to the model, they contributed only 2 % variance.

4 Discussion

In summary, we found that in our group of normally-hearing adults (with a variety of ages), speech-in-noise performance was negatively correlated with TFS sensitivity and age. TFS sensitivity was also positively correlated with stimulus encoding accuracy for the complex harmonic stimulus, while increasing age was associated with lower stimulus encoding accuracy for the modulated tone stimulus. Surprisingly, we found that better speech in noise understanding was associated with worse stimulus encoding accuracy. Despite this unexpected direction of correlation the measures did contribute modestly to a regression model predicting the speech in noise scores. A regression analysis found that age and the combination of the two stimulus encoding accuracy measures had roughly equal contributions to the model.

Further work in this area should consider other psychophysical predictors that are known to be associated with speech understanding, such as measures of temporal modulation sensitivity, and EEG measures that more closely match the stimuli used in the psychophysics.