Introduction

The human voice is one of the main sources providing first impressions of a speaker’s identity, including biological sex. The perceived biological sex of an adult speaker from their voice is primarily defined by mean fundamental frequency (F0, perceived as voice pitch) and, to a lesser extent, from vocal tract resonances (or formants), which in men are on average 50% and 20% lower, respectively, than women’s (Titze 1989; Gelfer and Mikos 2005). In addition to signaling sex, these voice patterns (e.g., relatively lower pitch and resonance in men's voices and relatively higher pitch and resonance in women’s voices) influence listeners’ attributions of gender, that is the “roles, behaviors, activities, and attributes that any society considers appropriate for girls and boys, and women and men” (World Health Organisation 2020). For example, listeners judge men and women with low-frequency voices as physically bigger, stronger, more masculine, more physically and socially dominant than those with voices of relatively high-frequency voices (for reviews: Hall et al. 2005; Pisanski and Bryant 2019). These associations can be partly explained in evolutionary terms, as voice pitch, at least in males, is inversely related to testosterone (Cartei et al. 2020b; O'Connor et al. 2011), which in turn is positively associated with a host of physiological masculine characteristics, including physical strength and body size (Bhasin et al. 1996), as well as self-reported dominance (Puts et al. 2006). At the same time, listeners have a tendency to overgeneralize the sex dimorphism that characterizes the voice of adult speakers, resulting in sex-stereotype biases in judgement patterns. For instance, the perceived association between pitch and body size may lead to misattributions of physical strength in adults (Feinberg et al. 2005; Sell et al. 2010), and of sex in babies (e.g., low-pitched cries are more likely to be attributed to boys and high-pitched cries to girls, despite the absence of sex differences in the pitch of babies: Reby et al. 2016).

Although most of extant research focuses on the impact of vocal masculinity and femininity on listeners’ perceptions of speakers within intrasexual competition or mate choice contexts, a few studies have helped uncover the wider socio-economic implications of speaker attributions. Like masculine-looking men and women (Little 2014; Re and Rule 2017; Rule and Ambady 2009; Sczesny et al. 2006; Todorov et al. 2005), speakers with masculine (e.g., lower-pitched) voices are often considered to have positive personality attributes including competence and leadership abilities. For instance, when asked to select political leaders, both men and women tend to select male and female leaders with more masculine (lower-pitched) voices and rate them as more competent than their higher-pitched counterparts (Klofstad et al. 2012, 2015). In addition, Tigue et al. (2012) showed that voices from political candidates with artificially lowered pitch were associated with perceptions of ability and skill more often than were their higher-pitched versions, independent of whether the content spoken was political or neutral. Similarly, research on the impact of voice pitch within the business context found that artificially lower-pitched voices of job candidates are associated with greater competence, regardless of applicant gender or résumé information (depicting either a stereotypically masculine or a stereotypically feminine applicant—Ko et al. 2009). Moreover, a lowered voice pitch from organizational spokespersons results in greater perceptions of competence and ability to restore organizational reputation compared to a raised voice pitch, particularly in times of crisis (Claeys and Cauberghe 2014).

While this research demonstrates that sex-related voice variation is sufficient to trigger stereotyping in adult listeners, an important theoretical question concerns whether auditory-based stereotyping of adults is already present in childhood, paralleling evidence on children’s gender stereotyped judgments of adults based on body shape and facial appearance (Montepare and Zebrowitz-McArthur 1989; Pine 2001). Our study aims to bridge this gap by directly examining how voice variation in masculinity and femininity impacts children’s occupational stereotyping of adults. An investigation of this nature will provide valuable insights into the role of vocal cues in the early origins of stereotyping, paving the way for developmental investigations of stereotyping from multiple angles. Moreover, given that children’s prior expectancies of other people bias their interactions with them (Harris et al. 1992; Gurland and Grolnick 2003), voice-based judgments may also have an impact on how children would engage with adults, with practical implications for understanding and improving such interactions.

Our study focuses on occupational competence, given that perceived competence is a key dimension (alongside warmth) underlying person and group perception (for a review: Fiske et al. 2007). Although no research to date has directly examined how the voice impacts competence judgments of children, recent evidence suggests that children may be sensitive to sex-related variation in voice frequency, and that this variation influences their assessment of speakers’ traits in gender-stereotypical ways. For instance, children are sensitive to vocal masculinity and femininity in the voices of their peers, as they match stereotypically masculine and feminine descriptors of a child character with corresponding masculinized or feminized voices (Cartei et al. 2019a). Moreover, a recent study using a voice imitation paradigm has shown that children conform to gender-stereotyped expectations by masculinizing and feminizing their voices for traditionally male and female occupations (Cartei et al. 2020a). The present work aims to extend this literature by investigating for the first time whether child listeners use variation in voice masculinity and femininity (by artificially lowering/raising voice pitch) to make gender stereotypical predictions about the occupational competence of adult speakers.

We chose to study 8- to 10-year-olds as previous research has shown that from about age 8 children’s range of stereotypes expands, and the nature of the gender associations becomes more abstract and multi-dimensional. For instance, they are able to use gender-related variation in behavior and appearance in a stereotypical manner when making predictions of peers’ future occupational career choices (Martin et al. 1990). Specifically, we hypothesize that children will assign higher competence to lower-pitched (more masculine) voices for stereotypically male occupations. Conversely, we expect that children will assign higher competence to higher-pitched (more feminine) voices for stereotypically female occupations. Finally, voices re-synthesized to a midline pitch should receive highest ratings when paired with gender-neutral occupations.

Method

Participants

Forty-eight children (20 females, mean age = 9.46; SD = 0.47, range: 8.6–10.4) took part in the study. The total sample size was based on a previous study of voice perception in child and adult listeners (Cartei et al. 2019a) reporting significant effects of gender-role stereotype ratings based on variation in vocal masculinity and femininity in children’s voices.

Children from UK Years 4 and 5 with no history of hearing impairments were prospectively recruited via school newsletters in two village primary schools, with informed consent by the headteachers. Parents were given a written study information sheet explaining the purpose and protocol of the study (that children would be asked to guess how good a person was at their job after listening through headphones to some men and women in specific occupations as they said a series of sentences). Parents were encouraged to ask any questions by contacting the researchers and provided with an opt-out form should they not want their child participating in the study, but no objection was received. After parental consent, children were approached about the study on the day of the experiment. Researchers explained the main points of the consent/assent form verbally, adjusting the explanation to the child's age and comprehension level. Ethical approval was obtained from the University of Sussex Science and Technology Cross-Schools Research Ethics Committee (reference: ER/VC44/17).

Speaker Selection

Eight adult speakers of British English (4 women, mean age = 24; SD = 0.32, range: 21–27) were selected from a database of 26 adults (13 women) reading out loud the following three sentences: “hello, it is nice to meet you”, “thank you for your help”, “no, I do not want to go” (see "Appendix 1" for details on acoustic analysis). For each speaker, the three sentences were concatenated as a single voice stimulus with 50 ms silence between sentences, creating 5-s “thin slices” (Ambady and Rosenthal 1992) to minimize task fatigue while eliciting listeners’ judgements (see: Hughes and Harrison 2017; Tigue et al. 2012 for examples of “thin slices” in voice research). These speakers were selected to maximize the variance in apparent vocal tract lengths (aVTL) from our original sample, which was estimated from formants 1–4 (aVTL is inversely correlated with the averaged distance between adjacent formants as well as absolute formant values: longer vocal tracts result in lower, more closely spaced formant frequencies, translating into a more resonant, or sonorous, voice—see "Appendix 1"). For males, the selected speakers had aVTLs of 15.4 cm, 16.2 cm, 16.7 cm and 17.5 cm. For women, the selected speakers had aVTLs of 14.2 cm, 14.7 cm, 15.0 cm and 15.5 cm.

Pitch Re-synthesis

From each original recording, we used the PSOLA algorithm in PRAAT 6.0.28 (change gender command) to create three stimuli varying in pitch without altering other aspects of the sound. In one stimulus mean F0 was altered to fit the mean F0s for the men and women in our original speaker database (mid F0), while in the other two stimuli F0 was manipulated to be, respectively, 1 standard deviation (SD) lower (lowered F0) or higher (raised F0) than the mean values for men (mid F0: 115.2 ± 12.8 Hz) and women (mid F0: 204.4 Hz ± 29.4 Hz) in our sample, following a similar procedure to Reby et al. (2016). Thus, the resulting F0 values for each of the selected male speakers were: 102.4 Hz, 115.2 Hz, 128.0 Hz and for female speakers: 175.0 Hz, 204.4 Hz, 233.8 Hz. To confirm the perceived naturalness of the voice stimuli, we asked 10 listeners (5 men, 5 women) to rate the speakers’ voices from the database and the 24 resynthesized versions (3 × 8 speakers) on a 7-point scale (1 = very unnatural, 2 = unnatural, 3 = somewhat unnatural, 4 = neither, 5 = somewhat natural, 6 = natural, 7 = very natural). One-way ANOVAs were separately run on the ratings of male and female speakers, treating the ratings from 1 to 7 as continuous. The within-subjects factor was stimulus type (four levels: original, raised, lowered, and mid resynthesized variants). Listeners' average scores for the original and resynthesized stimuli were above 6 “natural” and there was no significant difference between unmanipulated and resynthesized voices, female: F(3, 24) = 0.663, p > 0.05, male: F(3, 24) = 0.277, p > 0.05.

Procedure

Children sat individually in a quiet room at their school with the researcher. Voice stimuli were played back one at the time from a laptop through high-quality child-safe headphones (PURO Labs BT2200). For each voice, the experimenter read out loud the speaker’s occupation, followed by a brief description of the occupation. Next, children listened to the speaker’s voice and were asked to rate how good or bad they thought that person (children were told whether it was a man or a woman) was at their job on the basis of their voice. Children marked their answer by putting a cross on a paper-based, picture-aided Likert-scale (1 = very bad, 2 = bad, 3 = not bad nor good, 4 = good, 5 = very good, with corresponding smiley faces ranging from “unhappy” to “happy” (see "Appendix 2"). We selected nine occupations, three stereotypically female (babysitter, beautician, nurse), three gender-neutral (doctor, student, writer), and three stereotypically male (builder, lorry driver, mechanic). Our choice of occupations for each of the three categories was guided by the Office of National Statistics (2019) and by findings from a questionnaire with UK children aged 6–10 on perceived occupational gender ratio and competence (Cartei et al. 2020a).

Each child rated all the voice stimuli in two successive blocks, one with all 12 male voice stimuli from the 4 male speakers, and one with all 12 female voice stimuli from the 4 female speakers (8 speakers × 3 pitch conditions × 1 out of 9 occupations randomized within each child, and counter-balanced between children). Children were told the speakers’ sex for each stimulus, and the order in which the blocks were presented was alternated between participants to control for order effects. Before each block, children practiced the task twice by listening to a man’s and woman’s voice from the original database of 26 speakers, but not from the 8 selected speakers. This pre-test allowed the experimenter to make sure the child understood the task, as well as to adjust the playback volume to a comfortable level.

Statistical Analyses and Results

To investigate the effects of occupation type and F0 variant on children’s ratings of men and women speakers, we ran two Linear Mixed Models (LMM) separately for the male and female speakers, with occupation type (male-typed, female-typed, gender-neutral), F0 variant (lowered F0, mid F0, raised F0), listener sex and their 2-way interactions as fixed factors. Apparent Vocal Tract Length (aVTL) and occupation (nested within occupation type) were random factors. Both LMMs also included listener identity as a random factor, with a separate intercept for each listener (Table 1). Pairwise comparisons (Bonferroni corrected) were used to detect significant differences between group means for significant main and interaction effects. Standard estimates of effect sizes (Cohen’s d) are reported, with values of 0.2, 0.5, and 0.8 representing small, medium, and large effects (Cohen 1988).

Table 1 Effect of occupation type and speaker resynthesised F0 on occupational competence ratings

Occupational Competence Ratings of Women Speakers

There was a significant main effect of occupation type on ratings of women speakers: across F0 variants, women were slightly, but significantly, rated as more competent for the gender-neutral occupations than the female (d = 0.28, p < 0.05) or male occupations (d = 0.59, p < 0.05). Women were rated significantly more competent for the stereotypically female occupations than the male occupations, d = 0.31, p = 0.025 (see Fig. 1a).

Fig. 1
figure 1

Effect of occupation type on children’s mean competence ratings of a women and b men speakers

There was also a significant interaction effect between occupation type and F0 variant (Fig. 2). When paired with the stereotypically female occupations, women’s raised pitch voices received the highest competence ratings (M = 3.9, SE = 0.15), compared to the mid pitch voices (M = 3.4, SE = 0.15), d = 0.65, p < 0.05, and lower pitch voices, d = 1.1, p < 0.05. Women’s lower pitch voices also received lower ratings (M = 2.9, SE = 0.15) than mid pitch voices, d = 0.39, p < 0.05. For the stereotypically male occupations, women’s raised pitch voices received the lowest ratings (M = 2.6 SE = 0.16) compared to the mid pitch voices (M = 3.4, SE = 0.15), d = 0.80, p < 0.05, and lower pitch voices, (M = 3.3, SE = 0.15) d = 0.61, p < 0.05. However, women’s lowered pitch voices did not receive higher ratings than mid pitch voices, p > 0.05. No significant difference in ratings was found amongst women’s F0 variants in the gender-neutral occupations, p > 0.05.

Fig. 2
figure 2

Occupation type (female, neutral, male) by F0 variant (raised (yellow), mid (green), lowered(blue)) for women speakers (Color figure online)

Occupational Competence Ratings of Men Speakers

There was a significant main effect of occupation type on ratings of men speakers: pairwise comparisons revealed that, across F0 variants, men were rated less competent for the female occupations than for the gender-neutral, d = 0.48, p < 0.05, and male occupations, d = 0.61, p < 0.05. Mean ratings were highest for the male occupations compared to the gender-neutral occupations, though not significantly so, p > 0.05 (see also Fig. 1b).

There was a significant interaction of occupation type and F0 variant (Fig. 3). When paired with the stereotypically female occupations, children rated men’s lowered pitch voices as significantly less competent (M = 2.2, SE = 0.15) than mid F0 (M = 2.9, SE = 0.15), d = 0.70, p < 0.05, or raised pitch versions (M = 3.6, SE = 0.14), d = 1.3, p < 0.05, while the latter received higher competence ratings than mid pitch voices d = 0.77, p < 0.05. For the stereotypically male occupations, children rated men’s lowered pitch voices as significantly more competent (M = 4.2, SE = 0.14) than the mid pitch (M = 3.4, SE = 0.14), d = 0.80, p < 0.05, and raised pitch (M = 3.2, SE = 0.15) versions, d = 0.97, p < 0.05. For the gender-neutral occupations, no significant differences were found amongst F0 variants, all ps > 0.05.

Fig. 3
figure 3

Occupation type (female, neutral, male) by F0 variant (raised (yellow), mid (green), lowered(blue)) for male speakers (Color figure online)

Discussion

This is the first study to show that children make gender-stereotypical judgments of adult speakers on the basis of speaker’s variation in vocal masculinity and femininity, complementing prior research that focused exclusively on adults. Specifically, in line with our predictions, we found that feminized voices received the highest ratings when paired with stereotypically female occupations, and the lowest ratings when paired with stereotypically male occupations. Also consistent with our predictions, masculinized voices received the lowest ratings when paired with stereotypically female occupations, and male (but not female) masculinized voices received the highest ratings when paired with stereotypically male occupations. Overall, our results show that variation in adults’ vocal masculinity and femininity (manipulated by artificially lowering or raising mean voice pitch) affects children’s ratings of speakers’ occupational competence in gender-stereotypical ways, though ratings for stereotypically male occupations were also influenced by speakers’ sex.

In terms of the overall pattern of results, the observed ratings are largely consistent with psychoacoustic studies with adult listeners, showing that (re-synthesized and natural) male voices with lower pitch are preferentially attributed stereotypically male characteristics, such as masculinity (Pisanski et al. 2012), physical and social dominance (Hall et al. 2005; Puts et al. 2007; Vukovic et al. 2011), authority (Sorokowski et al. 2019), and leadership (Klofstad et al. 2012; Tigue et al. 2012), though perceivers associated higher pitch more strongly with high- than with low-rank behaviors in at least one study (Ko et al. 2015). On the other hand, women with higher-pitched voices are known to be preferentially attributed stereotypically female characteristics, such as femininity (Röder et al. 2013), friendliness (Tsuji 2004; Ohara 1999), and submissiveness (Borkowska and Pawlowski 2011).

Although, as expected, our results show that feminized voices from speakers of both sexes received the highest competence ratings for stereotypically female jobs, psychoacoustic studies report that adult listeners rate lower-pitched individuals as more competent than higher-pitched individuals both from speakers’ recordings that are neutral ratings of speakers reading out loud vowels and sentences of gender-neutral content (Krahé and Papakonstantinou 2020; Oleszkiewicz et al. 2016) or politically relevant (e.g., ratings of hypothetical political candidates: Klofstad et al. 2012). However, none of these studies asked listeners to make judgments in the context of female-typed occupations, whereas our study did. Because professions that are dominated by women tend to be stereotyped as more feminine, and requiring more “female‐like” traits (e.g., warmth: Eagly and Carli 2003; friendliness: Wharton 1999; helpfulness and cooperation: Cejka and Eagly 1999), competence on these jobs is likely to be judged on these traits, and thus may drive the higher competence ratings for the higher-pitched voices observed in the present study. While the present study did not directly assess whether high-pitched voices triggered these types of inferences, in partial support of this hypothesis, Oleszkiewicz and colleagues (2016) report that adult listeners make positive associations between high pitch and warmth in women’s voices (though not in men’s). Also, Halper and Stopeck (2019) report that perceptions of warmth primarily drive the relationship between job candidate gender and both likeability and job hireability for female-dominated domains such as the caregiving professions.

Both speakers’ biological characteristics and listeners’ socialization processes may contribute to the observed overall pattern of results. Lower-pitched male voices positively correlate with salivary testosterone levels in childhood and adulthood (Cartei et al. 2014, 2020b), and testosterone is a primary driver of physiological masculine features, such as increased muscle size and strength (Bhasin et al. 1996), and physical fitness (Fink et al. 2006; Manning and Taylor 2001), which are valued traits in physically demanding jobs that are male-dominated (Colker 1985). As well as negatively correlating with testosterone, higher-pitched voices in men are preferred by women seeking greater perceived parental and relationship investment (Apicella and Feinberg 2009). Moreover, higher-pitched voices in women positively correlate with level of estrogen, which is positively linked to maternal behavior in numerous species, including rats, mice, sheep, and possibly non-human primates (Bridges 2015). Thus, a high voice pitch may advertise greater actual or perceived propensity for nurturing and care-taking roles, which are stereotypically seen as women’s jobs (Guy and Newman 2004). While the observed ratings may partially reflect children’s sensitivity to voice cues underlying qualities of speakers, many such attributions are nowadays irrelevant to job competence. For instance, there is considerable overlap in men’s and women’s physical strength, and many heavy manual jobs are now machine-operated, which means that many women are physically capable of doing such work (Ness 2012).

Moreover, the idea that voice pitch is a reliable cue to biosocial dimensions fails to account for the fact that children and adults typically develop stereotypic views and prejudices concerning groups that are unjustified (and thus uncorrelated with any observable traits or behaviors, e.g., Bereczkei and Mesko 2006; Bigler and Liben 2007; Zebrowitz 1996). Specifically, socialization research has shown that, consistent with the general principle of correspondence bias (Gilbert and Malone 1995), individuals tend to ascribe gender-stereotypic attributes to job holders that are in line with occupational sex ratios, even if those attributes are irrelevant to those jobs (Cejka and Eagly 1999). Given that sex-segregation is still a predominant feature of many jobs (Office of National Statistics 2019), the observed ratings could emerge from children’s observations of the vocal characteristics of the sex that is numerically dominant in the occupation (males’ voices being, on average, lower-pitched than females’), even if those correspondences are irrelevant to competence.

An additional possibility for children’s higher ratings of feminized voices in female-typed roles is based on children’s prior experience. From infancy, children learn to associate higher pitch voices with relational and affective skills, which are important in many stereotypically female occupations, including the ones in the present study (Guy and Newman 2004). Indeed, raised pitch appears to communicate caregivers’ affect and intentions nonverbally, and caregivers routinely increase their pitch when speaking to children as opposed to adults (Broesch and Bryant 2015; Grieser and Kuhl 1988). For instance, when mothers speak with a heightened pitch (and expanded melodic contours) they are more able to elicit and maintain infant attention, independent of what they are saying (Papoušek et al. 1990). High-pitch is also common in caregivers’ speech when conveying emotional information to children compared to speaking to adults (Kitamura and Burnham 2003).

Contrary to our hypothesis, we also found that women’s masculinized voices were not rated as more competent than the mid F0 variant for the masculine occupations. Specifically, to the extent that F0 cues for physiological masculinity in women (e.g., decreased estrogen, lower fertility Bryant and Haselton 2009; Prelevic 2013, but not testosterone: Dabbs and Mallinger 1999), more masculine female voices were expected to be rated as more competent in male jobs, but this is not what we observed. An alternative explanation for our findings is that children’s competence ratings of low-pitched women’s voices resulted from a (conscious or unconscious) compromise between perceived masculinity and overall preference for high-pitched voices in females. Previous research with adult listeners indicates that, while low-pitched voices in both men and women are perceived as more masculine (Krahé and Papakonstantinou 2020), and are preferred over high-pitched voices in male speakers, they are not preferred over high-pitched voices in female speakers (Tsantani et al. 2016). In fact, women speaking with lower-pitched voices are rated as less vocally attractive (Feinberg et al. 2008) and as having fewer favorable personality traits than higher-pitched women (e.g., Scherer 1974, 1978). Lending support to this argument, a recent study looking at job hiring preferences (Phelan et al. 2008) found that fictitious female job applicants with masculine traits were judged by adult raters as more competent, but lacking in social skills compared to applicants with feminine traits, while no such bias was found in male applicants.

Although variation in voice pitch within the two sexes influenced children’s ratings stereotypically, children rated men as significantly more competent than women in male jobs and less competent than women in female jobs, regardless of our pitch manipulations. These results suggest that speaker gender may be a stronger contributor to stereotyping than vocal variation in masculinity and femininity. It is also possible that this effect was heightened by our paradigm, given that children knew in advance the sex of the speaker and rated all speakers of the same sex in one block. Indeed, hiring bias research demonstrates that when occupational assessors are told the sex of hypothetical job candidates, stereotype-congruent associations (e.g., female/male applicants being considered for a stereotypically female/male jobs), are given more favorable evaluations than when stereotype incongruent associations are primed (e.g., female/male applicants being considered for stereotypically male/female jobs), even when applicants are equally qualified (Rice and Barth 2016).

In summary, our study shows that children use within-sex variation in vocal masculinity and femininity when making gender-stereotypical judgments of adults, as previously found in judgments of other children (Cartei et al. 2019a). Our findings also complement those of a recent voice imitation study, which showed that children link vocal masculinity/femininity to stereotypically male/female occupations (Cartei et al. 2020a), by showing that gender-linked variation influences beliefs about competence. Together these observations highlight the fact that the voice is an important aspect of children’s gender stereotyping and indicate that it can be easily used as a versatile, implicit measure of children’s gender stereotyping, through voice perception or production tasks.

To further trace the developmental trajectory of children’s occupational stereotyping (stereotype flexibility and stereotype knowledge), the present paradigm could be used with a wider range of occupations and ratings of relevant traits other than competence (e.g., dominance, friendliness). It could also be extended to younger children and adolescents to assess the degree to which voice stereotypes correlate with a child’s classification skills, knowledge about job requirements, and gender stereotype flexibility, all of which develop with age (Liben et al. 2002). Moreover, cross-cultural comparisons with our study should establish the extent to which our findings can be generalized to diverse cultural contexts, outside that of Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies (Henrich et al. 2010). Our paradigm could also be used in conjunction with inter-individual measures, to investigate how individual differences in children’s occupational stereotyping may emerge. For instance, differences in exposure to division of labor in the family (Serbin et al. 1993; Fulcher et al. 2008), and on television (O’Bryant et al. 1978), both affect children’s occupational stereotyping. It would be interesting to know if and how the patterns observed in the present work would be subject to this kind of environmental influence.

Finally, given that children use gender-related voice variation to make judgments about adults in occupations, an important next step would be to explore the relative contributions of these judgments to child–adult interpersonal processes. Specifically, future studies could explore whether voice masculinity and femininity do affect children’s interactions with men and women in these roles, by using confederates and recording children’s behavioral responses during and after the interactions (e.g. asking children if they felt more comfortable to be treated by a nurse having a feminine rather than masculine voice).