Introduction

Auditory verbal hallucinations (AVH), the experience of hearing a voice in the absence of corresponding acoustic stimulation, represent the most common type of hallucinations in schizophrenia, with an estimated prevalence of 70% (Johns et al., 2004). However, they also occur in approximately 10% of persons without a clinical diagnosis (Maijer, Begemann, Palmen, Leucht, & Sommer, 2018). Studying the experience of hearing voices in nonclinical individuals may provide critical insights into the neural and cognitive mechanisms underpinning AVH, with the advantage of avoiding the potential confounds associated with medication and hospitalization in psychotic patients (Badcock & Hugdahl, 2012).

Similarities in the cognitive (Brébion et al., 2016; Larøi, van der Linden, & Marczewski, 2004a) and neural (Diederen et al., 2012; Linden et al., 2010) mechanisms underpinning AVH in psychotic and nonclinical participants have been identified, supporting the notion of a psychosis continuum, i.e., that psychotic-like experiences are distributed in the general population along a continuum of severity (Baumeister, Sedgwick, Howes, & Peters, 2017; van Os, Hanssen, Bijl, & Ravelli, 2000; van Os, Linscott, Myin-Germeys, Delespaul, & Krabbendam, 2009). For instance, both psychotic and nonclinical persons reporting AVH are more likely than healthy controls to misattribute inner speech to an external source (Brébion et al., 2016; Larøi, van der Linden, & Marczewski, 2004a). Phenomenological similarities were also noted, namely the perceived location (inside the head), loudness (less intense than one’s own voice), and source (external) of AVH (Daalman et al., 2011). However, compared with psychotic patients with AVH, nonclinical voice hearers tend to perceive their hallucinatory experiences as more controllable (Choong, Hunter, & Woodruff, 2007; Daalman et al., 2011; de Leede-Smith & Barkus, 2013), as well as less frequent (Daalman et al., 2011; Larøi, 2012; Larøi & van der Linden, 2005). AVH also may differ in verbal content, which tends to be more negative in psychotic patients (e.g., commands or criticisms about what the patient is doing; Larøi et al., 2012 Nayani & David, 1996) than in nonclinical voice hearers (e.g., hearing a voice calling a person’s name when no one is there; de Leede-Smith & Barkus, 2013). From a dimensional perspective of psychotic-like experiences (van Os, Hanssen, Bijl, & Vollebergh, 2001; van Os & Linscott, 2012), the experience of AVH in the general population without need for clinical care has been typically examined by testing: 1) individuals with AVH proneness who experience brief and infrequent AVH that do not affect their daily functioning; and 2) nonclinical voice-hearers who experience frequent AVH of longer duration, often associated with other psychotic-like and mood symptoms (Johns et al., 2014). While the first subgroup is typically assessed using general hallucination-proneness measures (e.g., the Launay-Slade Hallucination Scale [LSHS]; Bradbury, Stirling, Cavill, & Parker, 2009; Morrison, Wells, & Nothard, 2002), screening interviews targeting the specific experience of hearing voices often are used to identify the second subgroup (Sommer et al., 2010).

Voices (including hallucinated voices) carry not only verbal information but also critical nonverbal information about the speaker, such as their identity and emotional state (Belin, Bestelmeyer, Latinus, & Watson, 2011). Neuroimaging evidence suggests that these three types of vocal information are processed in partially dissociated cortical regions (Belin et al., 2011; Belin, Fecteau, & Bédard, 2004). In schizophrenia patients reporting positive symptoms, behavioral and brain changes were found in the processing of speech information (Kuperberg, West, Lakshmanan, & Goff, 2008), voice identity (Alba-Ferrara, Weis, Damjanovic, Rowett, & Hausmann., 2012; Pinheiro, Rezaii, Rauber, & Niznikiewicz, 2016b), and vocal emotions (Alba-Ferrara, Fernyhough, Weis, Mitchell, & Hausmann, 2012a; Giannitelli et al., 2015; Pinheiro et al., 2013, 2014; Rossell & Boundy, 2005; Weisgerber et al., 2015). Specifically, schizophrenia patients with AVH were found to be less accurate in the recognition of negative relative to positive vocalizations (Rossell & Boundy, 2005) and showed reduced activation of the amygdala and hippocampus when listening to cries compared with laughs (Kang et al., 2009).

Vocal emotional perception in AVH along the psychosis continuum

Alterations in the processing of vocal emotions are recognized as an important feature of schizophrenia (Bozikas et al., 2006; Edwards, Pattison, Jackson, & Wales, 2001; Hooker & Park, 2002; Leitman et al., 2007; Pinheiro et al., 2013, 2014), being observed before illness onset (Addington et al., 2012; Amminger, Schäfer, Klier, Schlögelhofer, Mossaheb, Thompson, & Nelson 2012; Amminger, Schäfer, Papageorgiou, Klier, Schlögelhofer, Mossaheb, & McGorry 2012) and aggravated in patients with AVH (Alba-Ferrara, Fernyhough, et al., 2012a; Rossell & Boundy, 2005; Shea et al., 2007). Compared with nonhallucinating patients, psychotic patients reporting AVH were found to be less accurate at recognizing emotional prosodic cues (Alba-Ferrara, Fernyhough, et al., 2012a; Shea et al., 2007) and less accurate in decoding emotions in nonverbal vocalizations (Rossell & Boundy, 2005). Nonetheless, no significant differences were observed between patients with versus without AVH in the recognition of emotions from speech prosody (Rossell & Boundy, 2005). It is possible that semantic processing deficits (Rossell & Boundy, 2005) have masked symptom-specific differences in vocal emotional processing. Together, these findings suggest that schizophrenia patients who are less able to recognize emotions from both speech prosody and nonverbal vocalizations are more likely to experience AVH. Deficits in vocal emotional processing could contribute toinner speech misattribution that is thought to subserveAVH (Alba-Ferrara, Fernyhough, et al., 2012a). Note, however, that only a few studies examined whether and how the occurrence of positive-like symptoms (i.e., hallucinations and/or delusions) in nonclinical individuals affects the processing of vocal emotions (Addington et al., 2012; Amminger et al., 2012a, 2012b; Pinheiro, Farinha-Fernandes, Roberto, & Kotz, 2019). These behavioral studies have been mixed, with some reporting altered (Addington et al., 2012; Amminger et al., 2012a, 2012b) or preserved (Pinheiro et al., 2019) vocal emotional recognition.

Vocal emotional perception: insights from event-related potentials

Studies probing event-related potentials (ERP) of the electroencephalogram (EEG) support the notion that the processing of vocal emotions involves three distinct, but interactive, stages (Paulmann & Kotz, 2008a, 2008b; Schirmer & Kotz, 2006). After the sensory processing of the voice signal (reflected in the N1), the detection of its emotional salience takes place (reflected in the P2). Higher-order processes, such as the cognitive evaluation of the emotional significance of the voice, are typically reflected in later components, such as the Late Positive Potential ([LPP]; Thierry & Roberts, 2007). An early differentiation between neutral and emotional vocal cues is reflected in N1 and P2 amplitude modulations (Liu et al., 2012; Paulmann, Bleichner, & Kotz, 2013; Sauter & Eimer, 2009). Decreased N1 (Liu et al., 2012) and enhanced P2 (Sauter & Eimer, 2009) amplitudes for emotional relative to neutral vocal sounds have been reported, suggesting that the emotional content of a stimulus facilitates acoustic sensory processing and salience detection, respectively. Furthermore, emotional vocal cues tend to elicit larger LPP amplitudes than neutral voices (Pell et al., 2015; Pinheiro et al., 2016a), reflecting enhanced sustained attention towards the processing of emotionally relevant information (Hajcak, MacNamara, Foti, Ferri, & Keil, 2013; Schupp et al., 2000).

Vocal emotions are expressed through the combination of different types of acoustic features, such as fundamental frequency (F0) or pitch, intensity, and duration (Schirmer & Kotz, 2006). Of note, acoustic changes may contribute to the enhanced salience of emotional voices. Using an oddball task, Schirmer, Simpson, and Escoffier (2007) demonstrated an amplitude increase of the mismatch negativity ([MMN] – an index of automatic deviance detection – Näätänen, Pakarinena, Rinnea, & Takegataa, 2004) and P3a (an index of attention orienting – Duncan et al., 2009) in response to vocal emotions characterized by high relative to low intensity, indicating that loud emotional voices were acoustically more salient. Moreover, irrespective of their intensity, vocal emotional sounds were more easily detected and captured more attention than nonvocal sounds, which confirms the primacy of social sounds in the auditory system (Schirmer et al., 2007).

Stimulus duration also was found to modulate ERP signatures of vocal emotional processing. For instance, Chang, Zhang, Zhang, and Sun (2018) observed decreased N1 amplitude in response to vocal emotions of short and long (vs. medium) duration, as well as increased P2 amplitude in response to vocal emotions of short (vs. medium and long) duration (Chang et al., 2018). Furthermore, the N1 was decreased in response to vocalizations expressing anger, sadness, and surprise relative to happiness in the short duration condition, whereas the P2 was increased in response to happy and angry relative to surprised voices, especially when their duration was shorter (Chang et al., 2018). These findings indicate that sound duration also may affect the sensory processing and the automatically perceived salience of vocal emotions, which may be facilitated for vocal cues of short duration (Chang et al., 2018).

Altered bottom-up and top-down processing of vocal emotions has been consistently reported in schizophrenia patients with AVH (Alba-Ferrara, Fernyhough, et al., 2012a; Leitman et al., 2005; Pinheiro et al., 2013, 2014; Rossell & Boundy, 2005; Shea et al., 2007). Consistent with the hypothesized psychosis continuum, it is likely that high hallucination proneness (HP) is associated with changes in the processing of vocal emotions at both early sensory (N1 and P2) and higher-order cognitive (LPP) stages. To the best of our knowledge, no study has examined how HP affects the time course of vocal emotional processing reflected in distinct ERP indices. Nevertheless, some behavioral studies probing vocal emotional processing in individuals at genetic (Tucker, Farhall, Thomas, Groot, & Rossell, 2013) and clinical (Addington et al., 2012; Amminger et al., 2012b; Amminger et al., 2012a) high-risk of converting to psychosis and reporting positive-like symptoms (hallucinations and/or delusions) revealed similar alterations to those observed in schizophrenia patients with AVH. Specifically, Tucker et al. (2013) demonstrated that first-degree relatives of schizophrenia patients with AVH made significantly more errors in the discrimination of intensity and duration of pure tones compared to healthy controls. Furthermore, the discrimination of pure tones in these participants was associated with vocal emotional recognition accuracy: the number of errors in intensity and pitch discrimination was negatively correlated with recognition accuracy of vocal emotions (Tucker et al., 2013). Auditory processing deficits in relatives of schizophrenia patients with AVH, which were associated with reduced processing speed of vocal emotional cues, also were found to predict AVH proneness: the more prominent the auditory processing deficits were, the more likely nonclinical AVH were to occur (Tucker et al., 2013). Reduced accuracy in vocal emotional recognition also was reported in nonclinical participants experiencing psychotic-like symptoms in three additional behavioral studies (Addington et al., 2012; Amminger et al., 2012a, 2012b). Using words with emotional semantic content, van’t Wout and colleagues (2004) showed that the frequency of hallucinations in nonclinical individuals with high HP was positively associated with the time needed to process neutral (target) words when preceded by positive or negative emotional (prime) words. Hence, it is crucial to clarify whether AVH proneness is related to changes in the perception of vocal emotions and whether these putative changes are similar to those observed in schizophrenia patients. As ERPs are examined before a behavioral response is made, they afford excellent temporal resolution of the sensory and cognitive processes under study.

Current Study and Hypotheses

We probed whether the processing of emotional vocal cues is altered as a function of increased HP and whether these potential changes are associated with specific acoustic cues of vocal emotions, namely intensity and duration. Manipulations of these cues may signal changes in the acoustic saliency of vocal emotions (Schirmer et al., 2007). Nonverbal vocalizations were selected to avoid potential biases related to the concurrent processing of semantic information and because they represent more primitive expressions of emotions compared with speech prosody (Pell et al., 2015). ERP data were expected to provide insights into three processing stages of vocal emotional processing (N1, P2, and LPP).

Consistent with continuum models of psychosis (Baumeister et al., 2017; van Os et al., 2000; van Os et al., 2009), increased HP was expected to be associated with alterations in both early (N1, P2) and late (LPP) stages of vocal emotional processing. The hypothesized association between HP and altered perception of vocal emotions was further expected to be modulated by stimulus valence and acoustic parameters that were shown to increase emotional arousal (increased stimulus intensity – Schirmer et al., 2007) and facilitate emotion decoding (increased stimulus duration – Castiajo & Pinheiro, 2019).

Specifically, we hypothesized that increased HP would be associated with larger P2 amplitudes to positive vocalizations. This hypothesis was grounded in previous evidence showing selective changes in salience detection of positive vocal cues in psychotic patients (Pinheiro et al., 2013, 2014). We also hypothesized that high HP would be associated with increased LPP amplitude to negative vocalizations, irrespective of acoustic-specific changes. This hypothesis considered previous studies with schizophrenia patients revealing that AVH are associated with enhanced sustained attention to negative voices (Alba-Ferrara, Fernyhough, et al., 2012a).

Finally, considering that altered vocal emotional processing in psychotic patients with AVH has been specifically related to changes in duration discrimination (Fisher et al., 2011; Fisher, Labelle, & Knott, 2008), we expected that an increased HP would be associated with more pronounced N1 and P2 alterations (i.e., increased and decreased amplitude, respectively) when stimulus duration is manipulated.

Method

Participants

In the first stage of the study, a large sample of college students from different Portuguese Universities (N = 354) were enrolled in a study designed to adapt the Launay-Slade Hallucination Scale-Revised (LSHS – Larøi & van der Linden, 2005) for the Portuguese population (Castiajo & Pinheiro, 2017). The LSHS Portuguese version includes 16 items that tap into distinct forms of hallucinations (auditory, visual, olfactory, tactile, hypnagogic, and hypnopompic). The overall score ranges from 0 to 64; higher scores indicate higher HP. This scale has been widely used to probe nonclinical hallucinatory experiences (Larøi, Marczewski, & van der Linden, 2004b; Larøi & van der Linden, 2005; Morrison et al., 2002; Morrison, Wells, & Nothard, 2000; Waters, Badcock, & Maybery, 2003). The Portuguese version of the LSHS has shown adequate psychometric properties (Castiajo & Pinheiro, 2017).

In the second stage, and after being screened via phone to ensure eligibility, 45 participants who initially took part in the LSHS validation study were recruited for an ERP experiment on the basis of their total LSHS scores. Additional inclusion criteria were: 1) right-handedness (Oldfield, 1971); 2) European Portuguese as first language; 3) no hearing and vision impairment; 4) no history of neurological illness; 5) no history of drug or alcohol abuse in the past year (APA, 2000); and 6) no presence of medication for medical disorders that would impact EEG morphology. All participants were screened for psychopathological symptoms and for schizotypal traits with the Brief Symptom Inventory (BSI – Derogatis & Spencer, 1982; Portuguese version – Canavarro, 1999) and the Schizotypal Personality Questionnaire (SPQ – Raine, 1991; Portuguese version – Santos, 2011), respectively.

Of the 45 eligible participants, 10 subjects declined to participate due to scheduling reasons and 2 subjects had to be excluded due to EEG artifacts.Footnote 1 The final sample comprised 33 participants varying in their LSHS total scores (M = 22.15, SD = 11.70, range 4-47 points; Table 1) who met the inclusion criteria and completed the clinical and ERP assessments (mean age = 25.27, SD = 5.87 years, age range 18-42 years; mean education level = 15.21, SD = 2.39 years, education range 12-21 years; 25 females). Table 2 shows the prevalence, frequency of occurrence, perceived degree of control, and emotional content of the hallucinatory experiences for each type of hallucination measured by the LSHS.

Table 1 Frequency distribution of the LSHS total scores
Table 2 Proportion of different types of hallucinatory experiences (LSHS) according to prevalence, frequency of occurrence, perceived degree of control, and affective content

In this sample, the LSHS total score was positively correlated with the SPQ total score (r = 0.665, p < 0.001), and with the BSI positive symptom distress index (r = 0.453, p = 0.008), ensuring good convergent validity with other self-reported clinical measures. Participants who reported AVH were further screened with the Psychotic Symptom Rating Scale (PSYRATS – Haddock, McCarron, Tarrier, & Faragher, 1999; Portuguese version – Telles-Correia et al., 2017) to better understand the phenomenological characteristics of their experiences. Voice-hearing experiences were predominantly described as not unpleasant and not distressful. Participants provided written informed consent and received vouchers or course credit for their participation. The experiment was approved by a local Ethics Committee (University of Minho, Braga, Portugal).

Stimuli

Thirty nonverbal vocalizations (15 from female and 15 from male speakers) expressing anger (growls; n = 10), amusement (laughter; n = 10), and neutral content (n = 10) were selected from the Montreal Affective Voices battery (MAV, Belin, Fillion-Bilodeau, & Gosselin, 2008). The selected MAV stimuli were acoustically manipulated. First, their duration was equalized to 700 milliseconds (ms) (Castiajo & Pinheiro, 2019). Then, a shorter (500 ms) version of each type of vocalization was created. To test whether the manipulated vocal samples still conveyed the intended emotions, they were first judged by a sample of participants who did not take part in the EEG experiment (N = 52; mean age = 23.42, SD = 7.80 years, age range 18-49 years; 27 females – see Castiajo & Pinheiro, 2019). The overall mean recognition accuracy (proportion of correct responses) for the three types of vocalizations was 0.80 in the 500-ms condition (anger – 0.53; amusement – 0.97; neutral – 0.91), and 0.84 in the 700-ms condition (anger – 0.62; amusement – 0.98; neutral – 0.92). The MAV vocal samples were also manipulated in terms of intensity (55 vs. 75 dB). Therefore, ten exemplars of each type of vocalization in all four acoustic conditions (short-soft, short-loud, long-soft, long-loud) were used as stimuli in the current experiment (see Supplementary Material for examples). The manipulation of stimulus duration and intensity was performed with Praat software (Boersma & Weenink, 2005, www.praat.org).

Procedure

Participants were seated comfortably at a distance of 100 cm from a computer screen in a sound and light-attenuating chamber, with a keyboard in front of them. The experimental session included three blocks, each comprising ten random presentations of each type of vocalization in four acoustic conditions (3 types of vocalizations × 10 speakers × 4 acoustic conditions = 120 total trials per block). Stimuli were presented binaurally through Sennheiser CX 300-II headphones. Presentation software (Version 18.3, Neurobehavioral Systems, Inc., Berkeley, CA, www.neurobs.com) was used to control stimulus presentation and timing, as well as to record each participants’ responses.

Figure 1 illustrates the design of an experimental trial. Participants were asked to categorize each vocalization according to its emotional quality by pressing one of three keys (negative, neutral, positive). During each block, a break was provided after 60 trials, and no feedback was provided. The experimental session lasted approximately 40 minutes.

Fig. 1
figure 1

Schematic of an experimental trial. Note: Vocalizations were presented binaurally. Growls and laughs represented exemplars of negative and positive vocalizations, respectively. The order of the experimental blocks was counterbalanced across participants

EEG data acquisition and analysis

The EEG was recorded with a 64-channel BioSemi Active Two System (http://www.biosemi.com/products.htm) at a sampling rate of 512 Hz. Reference electrodes were placed on the left and right mastoids. In addition, eye blinks and movements were monitored through two electrodes placed on both left and right temples (horizontal electrooculogram [HEOG]) and one below the left eye (vertical electrooculogram [VEOG]). The offset of all electrodes was kept below 30 mV.

EEG data were analyzed using Brain Vision Analyzer 2.0.4 software (www.brainproducts.com). The signal was filtered offline with a 0.1 to 30 Hz 2nd order Butterworth bandpass filter, and then referenced to the average of the left and right mastoids. Individual ERP epochs were created for each vocalization type in each condition, with a −150 ms pre-stimulus baseline and 700 ms post-stimulus duration. After applying a baseline correction from −150 to 0 ms, eye blinks were corrected using the method of Gratton et al. (Gratton, Coles, & Donchin, 1983). Single trial epochs containing excessive motor artifacts (±100 μV criterion) were not included in the ERP averages. For each participant, ERP averages included at least 70% of the trials per condition. The number of discarded epochs did not differ between conditions (p > 0.05).

After a careful visual inspection of grand average waveforms, three ERP components were selected for statistical analyses: N1, P2, and LPP. The amplitude of each component was measured as the mean voltage in the following latency windows, following prior studies (Pell et al., 2015; Pinheiro et al., 2014, 2016a, 2017c): 100-200 ms (N1), 200-300 ms (P2), and 500-700 ms (LPP). Mean amplitudes for each component of interest were measured at four regions-of-interest (ROIs): ROI1 (left fronto-central): F1, F3, FC1, FC3; ROI2 (right fronto-central): F2, F4, FC2, FC4; ROI3 (left centro-parietal): C1, C3, CP1, CP3; ROI4 (right centro-parietal): C2, C4, CP2, CP4.

Statistical analyses

ERP and accuracy data were separately analyzed with linear mixed-effects models using the lmer4 (Bates, Maechler, Bolker, & Walker, 2014) and lmerTest (Kuznetsova, Brockhoff, & Christensen, 2017) packages in the R environment (R3.4.3. GUI 1.70). HP was calculated as the sum of the 16 LSHS items (i.e., LSHS total score). Rather than splitting participants into high and low HP subgroups, a continuous variable (LSHS total scores) was used in the statistical model to avoid issues associated with dichotomization, including loss of statistical power and biased estimates (Altman & Royston, 2006; Selvin, 1987). Mixed-effects models were chosen, because they have benefits over traditional statistical methods (i.e., analysis of variance). Specifically, they consider both fixed and random effects (participants’ intra-individual variability) andreduce the potential occurrence of spurious effects (Jaeger, 2008). Due to its robustness and efficient estimates, mixed-effects models have been proposed as alternatives to traditional methods for statistically analyzing ERP data using a repeated-measures design (Baayen, Davidson, & Bates, 2008; Bagiella, Sloan, & Heitjan, 2000). Because the best procedure to correct for multiple testing in mixed-effects modeling remains to be determined (Joo, Hormozdiari, Han, & Eskin, 2016), and since the current study is driven by a priori hypotheses (Streiner & Norman, 2011), no adjustment for multiple comparisons was used. Notwithstanding, we provide confidence intervals, which have been proposed as an alternative to traditional methods to correct for multiple comparisons (Nakagawa, 2004; Thompson, 2002). Only statistically significant findings (p < 0.05) are reported.

ERP data

The hypothesis that HP would affect the processing of vocal emotions at early and later stages was tested for each ERP component with three distinct mixed-effects models, including amplitude as outcome, participants as random effects, and HP (LSHS total score), intensity (loud [75 dB], soft [55 dB]), valence (neutral, positive, negative), duration (long [700 ms], short [500 ms]), and ROI (left fronto-central, right fronto-central, left centro-parietal, right centro-parietal) as fixed effects.

Behavioral data

To specify the effects of HP on the recognition accuracy of vocal emotions (proportion of correct responses), a linear mixed-effects model was tested. Specifically, recognition accuracy was included as outcome, participants as random effects and HP (LSHS total score), intensity (loud [75 dB], soft [55 dB]), valence (neutral, positive, negative), and duration (long [700 ms], short [500 ms]) as fixed effects.

Results

ERP data

Figure 2 illustrates grand average waveforms as a function of emotion and manipulations of intensity and duration. Mean amplitudes for each emotion type and acoustic condition are presented in Figure 3. Scatterplots in Figure 4 show the mean N1, P2, and LPP amplitudes for each participant as a function of HP.

Fig. 2
figure 2

Grand average waveforms contrasting neutral, negative, and positive vocalizations under intensity and duration manipulations at C3 and C4 electrodes. Topographic maps show the spatial distribution of the N1, P2, and LPP effects in the total sample (N = 33). Neu = neutral; Neg = negative; Pos = positive; LPP = Late Positive Potential

Fig. 3
figure 3

Amplitude differences between neutral and emotional vocalizations in each acoustic condition based on duration and intensity manipulations. Bars represent mean amplitudes over left centro-parietal electrodes (ROI3) in the case of N1, and LPP and over right centro-parietal electrodes (ROI4) in the case of P2. Standard error (SE) of the means is represented in error bars. Amp = amplitude

Fig. 4
figure 4

Mean amplitude at centro-parietal electrodes as a function of hallucination proneness across emotion types (neutral, negative, positive) and acoustic manipulations (shorter/softer: 500 ms – 55 dB; shorter/louder: 500 ms – 75 dB; longer/softer: 700 ms – 55 dB; longer/louder: 700 ms – 75 dB). HP = hallucination proneness; Amp = amplitude

N1

N1 amplitude was modulated by stimulus intensity and valence, but not by stimulus duration (p > 0.05). The N1 was decreased in response to loud compared with soft vocalizations (β = 1.915, SE = 0.398, t(1551) = 4.801, p < 0.001, 95% confidence interval [CI]: [1.132, 2.697]), as well as in response to both negative (β = 0.847, SE = 0.398, t(1551) = 2.126, p = 0.033, 95% CI: [0.065, 1.630]) and positive (β = 1.403, SE = 0.398, t(1551) = 3.519, p < 0.001, 95% CI: [0.621, 2.185]) compared with neutral vocalizations. Intensity interacted with valence: loud vocalizations elicited a less negative N1 response when they had a neutral compared with positive quality (β = −2.311, SE = 0.564, t(1551) = −4.098, p < 0.001, 95% CI: [−3.418, −1.205]).

HP interacted with valence, intensity, and duration: an increase in HP was associated with a decreased N1 in response to neutral compared to positive vocalizations in the high intensity (loud) and long duration conditions (β = −0.081, SE = 0.031, t(1551) = −2.556, p = 0.011, 95% CI: [−0.143, −0.018]).

P2

P2 amplitude was modulated by valence and intensity: the P2 was increased (i.e., more positive) in response to positive compared to neutral vocalizations (β = 1.185, SE = 0.461, t(1551) = 2.569, p = 0.010, 95% CI: [0.280, 2.090]) and to loud compared to soft vocalizations (β = 2.132, SE = 0.461, t(1551 ) = 4.621, p < 0.001, 95% CI: [1.227, 3.037]). Valence interacted with intensity and duration: an increase in P2 amplitude was observed in response to positive compared with neutral vocalizations in the high intensity and long duration conditions (β = 2.202, SE = 0.922, t(1551) = 2.387, p = 0.017, 95% CI: [0.392, 4.012]).

HP interacted with valence, intensity, and duration: an increase in HP was associated with a less positive P2 for positive compared with neutral vocalizations when they were acoustically more salient, i.e., in the high intensity and long duration conditions (β = −0.117, SE = 0.036, t(1551) = −3.193, p = 0.001, 95% CI: [−0.189, −0.045]).

LPP

LPP amplitude was modulated by valence and intensity: the LPP was increased (i.e., more positive) in response to negative compared to neutral vocalizations (β = 2.778, SE = 0.645 t(1551) = 3.318, p < 0.001, 95% CI: [1.513, 4.044]), and in response to loud compared to soft vocalizations (β = 2.140, SE = 0.645, t(1551) = 3.318, p < 0.001, 95% CI: [0.875, 3.405]). Intensity interacted with valence and duration: the LPP was increased in response to loud negative vocalizations with a longer duration compared to neutral vocalizations (β = 3.980, SE = 1.290, t(1551) = 3.085, p = 0.002, 95% CI: [1.450, 6.511]).

HP modulated the interaction between duration and valence: higher HP was associated with an increased LPP for negative compared with neutral vocalizations when they had a longer duration (β = 0.097, SE = 0.036, t(1551) = 2.675, p = 0.008, 95% CI: [0.025, 0.168]). HP also modulated the interaction between intensity and valence: higher HP was associated with an increased LPP for loud negative compared to neutral vocalizations (β = 0.074, SE = 0.036, t(1551) = 2.033, p = 0.042, 95% CI [0.002, 0.145]).Footnote 2

Behavioral data

Recognition accuracy was modulated by stimulus valence: negative vocalizations were less accurately recognized than neutral vocalizations (β = -0.087, SE = 0.033, t(363) = -2.595, p = 0.010, 95% CI: [−0.154, −0.021]; Table 3). Neither stimulus intensity and duration nor individual differences in HP affected recognition accuracy (p > 0.05).

Table 3 Proportion of correct responses for each type of vocalization and acoustic condition

Discussion

Alterations in vocal emotional perception were found to be associated with AVH in psychotic patients (Alba-Ferrara, Fernyhough, et al., 2012a; Rossell & Boundy, 2005; Shea et al., 2007). However, whether similar changes are observed in nonclinical participants with high HP remained to be clarified. Using ERP, the current study demonstrates that both early and later stages of vocal emotional processing are affected by HP. Furthermore, it provides preliminary evidence for a link between abnormal perception of vocal emotions and hallucination proneness, consistent with the hypothesis of a psychosis continuum (Baumeister et al., 2017; van Os et al., 2000; van Os et al., 2009).

Sensory processing of vocal emotions (N1) as a function of hallucination proneness

The auditory N1 indexes the sensory processing of the stimulus (Näätänen & Picton, 1987) and the allocation of resources to form and maintain a sensory memory trace of the eliciting stimulus (Obleser & Kotz, 2011). In good agreement with previous evidence (Liu et al., 2012), the current study revealed that positive (amusement) and negative (angry) vocalizations elicited a decreased N1 response compared to neutral vocalizations, suggesting that auditory sensory information is more easily processed when it has an emotional quality (Jessen & Kotz, 2011; Paulmann, Jessen, & Kotz, 2009). Studies using pure tones have shown that an increase in sound intensity (>70 dB) results in decreased N1 amplitude. We support this finding: higher sound intensity was associated with a less negative N1 amplitude, irrespective of valence. In contrast, we found a decreased N1 response to neutral relative to positive vocalizations in the high intensity condition, which indicates that valence modulates the effects of sound intensity on the N1, plausibly via arousal and its effects on attention (Lithari et al., 2010). Listeners tend to associate an increase in stimulus intensity with an increase in emotional arousal: for example, sound intensity correlates with distance and can inform on whether danger is approaching (Schirmer et al., 2007). Arousal and attention effects were found to be reflected in an increased N1 response (Coull, 1998). The current finding may indicate that the sensory processing of loud emotional (positive) vocalizations is facilitated relative to the sensory processing of loud neutral vocalizations, possibly due to increased automatic attention. No significant changes in N1 amplitude were observed in response to duration manipulations, revealing that the N1 was mainly modulated by intensity changes.

Consistent with our hypothesis, the N1 also was affected by HP. An increase in HP was associated with a decreased N1 amplitude in response to neutral relative to positive vocalizations, but only in the high intensity and long duration conditions, i.e., when sounds were physically more salient. In other words, more salient acoustic information was needed to normalize the N1 response to positive vocalizations as a function of increased HP. Similarly, alterations in the sensory processing of vocal emotions have been reported in psychotic patients with AVH (Fisher et al., 2008; Fisher et al., 2011).

Detection of the emotional salience of the voice (P2) as a function of hallucination proneness

In agreement with previous evidence (Liu et al., 2012; Sauter & Eimer, 2009), we observed a more positive P2 response to emotional relative to neutral vocalizations. This effect occurred specifically for positive vocalizations, supporting its socially relevant role (Pell et al., 2015; Pinheiro, Barros, Dias, & Kotz, 2017a; Pinheiro, Barros, Vasconcelos, Obermeier, & Kotz, 2017b). The P2 amplitude increase to positive vocalizations was further enhanced when the voice was acoustically more salient, i.e., high intensity and long duration conditions. This may have contributed to facilitated emotional salience detection.

We also observed that an increased HP resulted in a reverse P2 effect: loud and long positive vocalizations led to a decreased P2 amplitude relative to loud and long neutral vocalizations. This finding suggests that HP is associated with altered emotional salience detection in voices, particularly of positive vocal cues. Positive vocalizations could elicit decreased attention, which might affect how emotional salience is automatically perceived. Previous studies with psychotic patients demonstrated selective changes in salience detection from positive vocal cues (i.e., increased P2 amplitude to happy prosody – Pinheiro et al., 2013, 2014). The current findings suggest that the early stages of vocal emotional processing might be more affected during the perception of positive vocalizations also in nonclinical persons with high HP.

Cognitive evaluation of the emotional significance of the voice (LPP) as a function of hallucination proneness

Consistent with previous studies (Pell et al., 2015), we found that the LPP amplitude was increased for negative (angry) compared to neutral vocalizations, irrespective of stimulus intensity and duration. In healthy participants, the increased sustained attention to angry voices, reflected in larger LPP amplitudes, has been shown to indicate the preferential processing of potentially threatening cues (Frühholz & Grandjean, 2012). The current finding reveals that negative vocalizations were associated with enhanced sustained attention and required increased elaborative processing relative to neutral vocalizations, corroborating its adaptive function (Pell et al., 2015).

HP also affected the cognitive evaluation of the emotional significance of the voice. Typically, emotional vocalizations elicit an increased LPP amplitude compared to neutral vocal cues (Pell et al., 2015; Pinheiro et al., 2016a). In the current study, we observed that listeners with high HP benefited from cues that were acoustically more salient: LPP was enhanced for negative relative to neutral vocalizations in the high intensity and duration conditions. Studies with schizophrenia patients with AVH have shown enhanced sustained attention to negative vocal cues even when the acoustic properties of the voice were not manipulated (Alba-Ferrara, Fernyhough, et al., 2012a). However, the current study reveals that a negativity bias in voice perception (reflected in an increased LPP for negative relative to neutral voices) in participants with high HP is only observed when voices are acoustically more salient. Given that HP did not modulate the recognition accuracy of negative vocalizations, which achieved high recognition accuracy in all four acoustic conditions (Table 3), this pattern of findings suggests interactive effects of HP and salience rather than the effects of lower recognizability of negative cues.

Overall, our findings indicate that alterations in vocal emotional processing in high HP may be primarily driven by altered salience of acoustic representations of emotions. The aberrant salience hypothesis of psychosis (Kapur, 2003) postulates that psychotic symptoms are associated with altered dopaminergic transmission that leads to abnormal salience assignment to stimuli in the world. The current study indicates that high hallucination proneness in the general population also may be associated with changes in how salience is detected and assigned to emotional voices. Longitudinal investigations of nonclinical persons with high HP are necessary to examine whether and how changes in vocal emotional perception may predict transition to psychosis.

Limitations

The interpretation of the current findings should consider the relatively small sample size and convenience sampling approach. Evidence has shown little to no degree of bias in the estimation of fixed effects with mixed-effects modeling, even when the sample size is small (Clarke & Wheaton, 2007; Maas & Hox, 2005). Notwithstanding, the current findings (especially those concerning interactions between factors) should be considered exploratory and replicated in future studies with larger samples.

Conclusions

Our findings suggest that the sensory processing (N1), salience detection (P2), and cognitive evaluation of the emotional significance of vocalizations (LPP), taking place before a behavioral response is made, are affected by high HP. However, individual differences in HP did not modulate recognition accuracy of emotional vocalizations. Hence, electrophysiological measures may represent a more sensitive measure of the effects of high HP on the perception of vocal emotions. Importantly, the effects of high HP were modulated by the acoustic properties of emotional vocalizations, specifically intensity and duration, which were found to change the acoustic saliency of sounds. Additionally, these effects were valence-specific: whereas changes in the processing of positive vocalizations were enhanced in early stages (N1 and P2), changes in the processing of negative vocalizations were more pronounced in later cognitive stages (LPP). Similarities in ERP changes underlying the processing of vocal emotions in nonclinical persons with high HP and psychotic patients (Pinheiro et al., 2013, 2014) support the hypothesis of a psychosis continuum (Baumeister et al., 2017; van Os et al., 2000; van Os et al., 2009). Changes in voice perception mechanisms may be a core feature of hallucination proneness (Pinheiro, Schwartze, & Kotz, 2018).