Sex Differences in Personality Traits
Do sex differences in personality exist? If so, how big are the effects?
Are these effects of a magnitude that they could meaningfully affect life decisions and outcomes?
Moreover, if they do exist, why do they exist?
In this entry, we will consider the empirical evidence speaking to the existence, or not, of sex differences and consider how these findings relate to the varied theoretical positions that seek to explain sex differences or a lack thereof. The focus of this entry is primarily on differences in subclinical trait measurement. However, we provide brief discussion of the evidence for sex differences in psychopathological traits. Finally, we consider some of the methodological debates in the study of sex differences in personality, particularly with respect to the appropriate estimation of the magnitude of differences.
Two key definitions are important with respect to the content of this entry. First, here we are discussing sex differences and differentiate sex from gender. By sex, we are referring to the biological designation to male or female. By gender, we take a much broader definition encapsulating the identification of an individual with a given gender that may have multiple biological, psychological, and sociological influences. This distinction is consistent with those adopted by organizations such as the World Health Organization, according to which sex refers to the biological and physiological characteristics that define men and women and gender refers to the socially constructed roles, behaviors, activities, and attributes that a given society considers appropriate for men and women.
Second, this entry concerns differences in personality traits, where traits are defined in a classical sense as relatively stable and enduring patterns of thoughts, feelings, and behaviors. Thus, we are distinguishing traits from states, where states are defined as more momentary fluctuations in given social contexts.
Theoretical Perspectives on Sex Differences
The existence and importance, or not, of sex differences in personality have been argued from a number of theoretical perspectives. Broadly speaking, these can be split into three categories: evolutionary, social role, and methodological or artifact explanations.
In evolutionary terms, human personality evolved as a set of characteristics that conferred advantage to some over others in the context of a wide variety of adaptive challenges across evolutionary time. Therefore, to the extent that these challenges differed for males and females, so may the average level of traits across these groups. Common examples discussed would include the differential traits required by males and females for the successful rearing of children. In our distant past, females may have required higher levels of nurturing and agreeable traits in order to ensure the safe upbringing of their children – characteristics less essential for males who would have invested less time in the rearing of children. Males on the other hand may have required increased levels of traits associated with success in competition for resources, such as dominance, aggression, or risk-taking.
Sociocultural explanations come in a variety of forms but broadly argue that it is the features and properties of modern social context that give rise to perceived sex differences in personality. For example, social role model perspectives (e.g., Eagly 1997) would suggest sex differences arise from the different social role expectations on males and females with respect to the core determinants of personality – namely, thoughts, feelings, and behaviors. These expectations manifest and are shared early, shaped by the differential ways those around us, society broadly speaking, reward and punish behaviors. These schedules of reinforcement and punishment have led some to label such explanations as cognitive social learning (Hyde 2014). The various opportunities or restrictions placed by the societies in which individuals live come to govern the behaviors that are expressed by the males and females within them.
Lastly, some have argued that estimates of sex differences from psychometric tools reflect a measurement artifact. Specifically, this position holds that males and females possess different stereotypes as to the appropriate characteristics of males and females. When responding to a self-report inventory, these groups respond in a socially desirable manner with these differing stereotypes in mind. As such, responses reflect socially desirable responding toward these differing sex stereotypes.
Clearly, these competing theories of sex differences can be supported by differing patterns of observed difference. For example, given a focus on evolutionary time scales, one would expect more cross-cultural consistency in sex differences from an evolutionary perspective than may be expected from a sociocultural perspective. Considering the latter, we may expect to see similar patterns from cultures, which share relevant societal features, and differing patterns across those which do not. Similarly, such differences in cultures may lead to differing gender stereotypes, and thus under an “artifact” explanation, we may expect similar cross-cultural patterns. However, sociocultural and artifact explanations may be differentiable by consideration of patterns across self- and other reports, as well as from explicit and implicit measurement of traits. Specifically with respect to the latter, if targeted response bias is the primary driver of observed differences, then such differences should disappear when personality is assessed reliably by means for which the intention of measurement is not known to the respondent and thus less susceptible to targeted responding.
Empirical Evidence for Sex Differences
A large number of research studies and psychometric inventory manuals have published data on the differences between males and females in personality traits. These studies generally report standardized measures of mean difference, such as Cohen’s d, where d around 0.20 is considered small, around 0.50 is considered moderate, and around 0.80 is considered large. Methodological issues in this approach are discussed later in this entry.
Hierarchical organization of the FFM, HEXACO, and 16PF5
Openness to experience
Openness to experience
Openness to change
Openness to change
When considering sex differences in personality, this higher-order structure poses an interesting question. Are sex differences in facets within a given domain all in a consistent direction? This is important because if facets within a domain show sex differences that run in opposite directions, these will potentially cancel out completely or reduced the observed sex differences at the domain level. In the discussion that follows, we discuss broad domains and facets under the ordering presented in Table 1, alongside a number of non-overlapping and individual traits argued to display sex differences.
Summarizing Domain-Level Differences
In the discussion that follows, we present empirical results in terms of various metrics of effect size for the difference between males and females. Negative estimates indicate that females have higher mean scores than males, while positive estimates indicate males have higher mean scores than females.
Selected findings on sex differences in FFM domains in adult Western samples
Costa and McCrae (1992)b
Marsh et al. (2010)c
Vianello et al. (2013)
Vianello et al. (2013)
Weisberg et al. (2011)
Schmitt et al. (2008)
Single sample vs. meta-analysis
k = 7–25
Observed vs. latent
40-item adjective 50-item IPIP
−0.14 0.49 0.09
Selected findings on sex differences in HEXACO domains in adult Western samples
Lee and Ashton (2016)
Lee and Ashton (2016)
De Vries, Ashton, and Lee (2009)
De Vries, Ashton, and Lee (2009)
Ashton and Lee (2009)
Lee and Ashton (2004)
Single sample vs. meta-analysis
Single sample (online Sample)
Single sample (Student Sample)
Single college sample
Observed vs. latent
Overall, across studies and inventories, differences range from essentially zero to a maximum of approximately 1.2 on a d-score metric. A majority of the differences identified across studies, at the domain level, would fall in the small to moderate range (Cohen’s d = < │0.40│). The only differences across Tables 2 and 3 to be consistently larger than this are the HEXACO differences in Honesty-Humility and Emotionality, which we discuss further in the next sections.
Across all studies reported in Tables 2 and 3, females have been consistently shown to have higher mean scores than males for Neuroticism (FFM) and Emotionality (HEXACO). Across inventories, the magnitudes of these effects vary slightly. Large variation can be seen across personality taxonomy (FFM vs. HEXACO), where Neuroticism in the FFM shows generally small to moderate effects, while Emotionality shows generally large effects.
The consistency and magnitude in the domain-level effects may be reflective of the fact that the facets within these domains show consistent sex differences in both direction and magnitude. With respect to studies utilizing the FFM (e.g., Costa, Terracciano and McCrae 2001, Table 2; Weisberg et al. 2011), we see higher scores for females on all facets with Cohen’s d ranging from −0.09 to −0.44. Considering the HEXACO taxonomy, Lee and Ashton (2016) report higher scores for females on all facets with Cohen’s d ranging from −0.53 to −1.08. Thus across studies and taxonomies, at the domain and facet level, consistent differences in Neuroticism-related traits have been observed.
At the domain level, Extraversion as operationalized in the FFM inventories shows a small (d range − 0.08 to −0.44) and consistent difference with females scoring higher than males. Within the HEXACO inventories, this difference is reversed, with males generally scoring higher than females, but the effects here are small (<0.10). Taken at face value, this would suggest minimal overall sex differences in Extraversion-related traits, with any variability seen dependent on specific operationalizations of Extraversion.
However, if one considers the facet-level associations, further variation is revealed. Again, using the same studies as reference points, Costa, Terracciano, and McCrae (2001, Table 2; see also Weisberg et al. 2011) report facet-level differences within Extraversion for FFM-based inventories which differentially support male versus female higher scores. For example, Costa, Terracciano and McCrae (2001) show higher scores for males in the facets of Assertiveness (d = 0.10 to 0.27) and Excitement-Seeking (d = 0.18 to 0.38) and higher scores for females for all other facets (range d − 0.04 to −0.33).
In the case of the HEXACO, studies by Lee and Ashton (2004) and Lee and Ashton (2016) report differences in the facets of Extraversion differentially showing higher scores for males and females, of small to moderate effect size (Social Self-Esteem = 0.13/−0.30; Social Boldness = 0.12/0.30; Sociability = −0.11/−0.31; Liveliness = −0.05/−0.15). Thus, for Extraversion-related traits, the domain-level patterns of low to no sex differences present only a partial picture, as at the facet level, we see larger differences with a mixed pattern of male versus female higher scores.
At the domain level, Openness and related traits show a largely consistent, but small, difference with males scoring higher than females. The exception here is the study by Marsh et al. (2010) which, using a latent variable approach, found females to score higher than males. A similar pattern of male higher score, but with smaller effects, is seen for the HEXACO inventories (largest d = 0.22). Once again, this domain-level difference masks a much greater degree of variability in the direction of difference and effect sizes at the facet level.
Costa, Terracciano, and McCrae (2001) report male higher scores for Fantasy (d = 0.16), Values (d = 0.07), and Ideas (d = 0.32), the latter being consistent across US adults, US college age individuals and adults from multiple other cultures. From the same study, females show consistently higher scores for Aesthetics (d = −0.34), Feelings (d = −0.28), and Actions (d = −0.19). Weisberg et al. (2011) report higher scores for males in the BFAS narrow scale of Intellect (d = 0.22) but higher scores for females in Openness (d = −0.27). Perhaps unsurprisingly, the domain-level Openness difference here was practically zero. Again in the HEXACO, Lee and Ashton (2004) and Lee and Ashton (2016) report varied facet differences for Openness (Aesthetic Appreciation = −0.29/−0.39; Inquisitiveness = 0.52/0.62; Creativity = −0.00/0.25; Unconventionality = 0.22/0.31).
Agreeableness is an interesting domain to consider, as it is one of the places in which the taxonomies of the FFM and HEXACO differ. In the case of the FFM, consistent sex differences are seen at the domain level, with females scoring higher than males and effect sizes ranging from small to large (see Table 2). Like Neuroticism, the facets of Agreeableness as measured by FFM inventories are consistent, favoring females, with effect sizes for those differences ranging from d of −0.03 to −0.43 (Costa, Terracciano and McCrae 2001).
The pattern for the HEXACO depends on which domain trait we consider. The namesake Agreeableness within the HEXACO inventories shows small sex differences with the male-female dominance varying across study. This is also reflected at the facet level where small differences in both directions are observed (e.g., Lee and Ashton 2016, Table 1).
However, the pattern is different if we consider Honesty-Humility. Much like Agreeableness from the FFM inventories, Honesty-Humility shows a consistent difference with females scoring higher than males, and moderate effect sizes at both the domain level (d range − 0.38 to −0.59, Table 3) and across facets (Lee and Ashton 2004, d range − 0.28 to −0.71; Lee and Ashton 2016, d range 0 to −0.59). The facet with the smallest sex differences is Sincerity, which is generally close to zero across samples.
Conscientiousness is another of the domains where the specific operationalization of the trait across taxonomy appears to make a difference with respect to the overall consistency of observed effects. In both FFM and HEXACO inventories, the domain-level difference shows females tending to score higher than males. In the case of the FFM, the effect sizes range from d’s of 0.05 to −0.67 (Table 2). For the HEXACO inventories, we see much the same (Table 3), with a maximum difference of −0.46.
Once again (and at fear of sounding repetitive), the generally small to moderate difference may reflect differential patterns of effects in the facets. As reported in Costa, Terracciano and McCrae (2001), while the FFM Conscientiousness facets of Competence and Deliberation show males scoring higher, all other facets show near-zero difference, or differences favoring females. In the HEXACO on the other hand, the facet-level effects reported in Lee and Ashton (2004) and Lee and Ashton (2016) show only one facet-level difference in favor of males, that being Prudence in the online self-report sample (d = 0.10).
The Relation Between Facet Level and Global Differences
As documented above, and perhaps unsurprisingly, when larger, consistent differences are observed at the domain level, all lower-level facet differences are (a) in the same direction favoring either males or females and (b) more consistently of larger magnitude. However, the fact that consistent facet associations within domain are not ubiquitous could be viewed as problematic. Indeed, Costa, Terracciano and McCrae (2001) stop short of estimating domain-level differences for all FFM due to the patterns of facet difference.
Such patterns may perhaps lead us to reflect differently on the magnitude of global differences. If a set of facets coded positively for a domain show different patterns of mean difference by sex in the simple sum scores, it should be no surprise that these differences average out to near-zero differences when we look at sum score differences by sex at the domain level. This is not to say that the domain-level effect is not an appropriate level of analysis, but simply to say that we may want to consider the various ways in which such a difference, or lack thereof, may occur.
Differences Within Males and Females Versus across Males and Females
As has been noted, the topic of sex differences in personality is considered by some to be controversial. Aside from any discussion of the social and political ramifications of such lines of research, one powerful argument concerns the magnitude of the differences within and between groups. Essentially, this argument runs that within males, there is a greater difference between those at the lower and higher ends of the distribution than there is between the average male and the average female. The same would be true of females. Given that the within-group differences at a population level may be larger than the between-group differences, it follows to ask whether the comparison across groups is a reasonable focus of study.
Relatedly, it is also important to contextualize the magnitude of mean differences. Standardized estimates of mean difference, whether univariate or multivariate, can be expressed as the proportion of overlap in score distributions. Under the assumptions of a normal distribution with equal standard deviations, when d denotes a small effect (0.20), approximately 58% of one distribution (say males) is above or mean of the other (say females), and the distributions have approximately 92% overlap. When d denotes a medium effect (0.50), approximately 69% of the distribution of one group is above the mean of the second, and the distributions have approximately 80% overlap. Finally, when d denotes a large effect (0.80), approximately 78% of one distribution will be above the mean of the other, and the distributions have approximately 69% overlap.
Given the general size of the mean differences associated with research on sex differences in personality, the score distributions of males and females are likely to be heavily overlapping.
Emergence and Change in Sex Differences Across Life Course
Extant evidence points to a relative stability in the estimates of sex differences in domains across studies, although the location of the most meaningful differences, be that at domain or facet level, remains debated. A further interesting question to ask with respect to sex differences is when do differences emerge and do they remain stable across the life course?
In a large-scale meta-analysis, Else-Quest et al. (2006) investigated sex differences in temperamental traits. Historically, temperament has been argued to be the developmental precursor to later life personality, a concept which has been developed over time and supported by a large body of empirical research (see Rothbart 2011, for review). Else-Quest et al. (2006) identified studies in children less than 7 years old and meta-analytically derived estimates of sex differences under three broad groupings of characteristics, Effortful Control, Negative Affectivity, and Surgency. In adult personality trait nomenclature, these groupings would approximately align with conscientiousness, neuroticism, and extraversion.
With respect to Effortful Control, effect sizes for ten dimensions were calculated with all eight significant differences favoring females. The largest of the effect sizes (d = −1.01, k = 6) was for Effortful Control. Inhibitory Control (d = −0.41, k = 22), Perceptual Sensitivity (d = −0.38, k = 38), Low Intensity Pleasure (d = −0.29, k = 20), and Attention (d = −0.23, k = 9) all showed small to moderate effect sizes. For Negative Affectivity, all effect sizes were small. Of the three dimensions which showed significant differences, two behavioral styles (Difficult, d = 0.13, k = 28; Intensity, d = 0.10, k = 37) favored males, and one psychobiological dimension (Fear: d = −0.12, k = 34) favored females. Finally for Surgency, small to moderate effect sizes favoring males were found for Activity (Behavioral Style, d = 0.33, k = 50; Psychobiological, d = 0.23, k = 34), High Intensity Pleasure (d = 0.30, k = 18), and Surgency (d = 0.55, k = 8).
Focusing on the later period of development between ages 12 and 17, De Bolle et al. (2015) analyzed data from 4850 adolescents from across 23 cultures who had completed the NEO-PI-R-3 as a measure of the FFM. Taking into account cultural differences, they found that females scored higher than males in Openness and Conscientiousness at all ages. No sex differences in Neuroticism were present at ages 12 and 13, but from age 15 onward, mean differences favoring females emerged for the facets of Anxiety, Depression, and Vulnerability. Interestingly, though not studied directly by De Bolle et al. (2015), this is also an approximate point in development where males and females begin to diverge on clinical-level anxiety and depression. Patterns for Extraversion were complex, with females scoring more highly for Warmth and Gregariousness across development and males scoring higher on Excitement-Seeking. Overall, while patterns for other traits were complex, by the age of 17, sex differences in domains were largely equivalent in size and direction to estimates from adult samples.
Analysis of age by sex interactions showed some interesting patterns. For example, there was a significant age by sex interaction for Neuroticism, indicating that from ages 12 to 17, both boys and girls decreased in Neuroticism but boys did so at a greater rate. As such, the difference in Neuroticism by age 17 was larger. The same pattern was found for two facets of Neuroticism, Anxiety and Vulnerability, suggesting these may be the driver of the domain-level effect. Two further facets, Positive Emotions (Extraversion, male decrease greater than female decrease) and Ideas (Openness, male decrease less than female decrease), showed increasing sex differences across adolescence. Conversely, for the facets of Assertiveness (Extraversion, male increase, female decrease), Aesthetics (Openness, male increase), and Achievement Striving (Conscientiousness, male increase greater than female increase), the magnitude of sex differences decreased from ages 12 to 17.
The findings for Assertiveness are especially interesting when considered in combination with the results from other samples. At earlier ages, females score more highly than males. Across adolescence, this difference erodes such that in the later teens, there is no marked sex difference. Yet by adulthood, males show consistently higher scores in this facet. De Bolle et al. note, and it is important to emphasize, that in a majority of cases, both the effects (interactions) and the overall mean differences reported were generally small.
In a massive cross-sectional study (n = 1,267,218), Soto et al. (2011) report on trait differences from ages 10 to 65, reporting different trends for males and females and thus allowing consideration of sex differences across the life course. The authors based their study on FFM domain estimates as measured by the Big Five Inventory (BFI) delivered online as part of the Gosling-Potter Internet Personality Project. The BFI is primarily a domain measure of the FFM, but facets of Self-Discipline and Order (Conscientiousness), Altruism and Compliance (Agreeableness), Anxiety and Depression (Neuroticism), Activity and Assertiveness (Extraversion), and Ideas and Aesthetics (Openness) can be scored from available items.
Based largely on theory and research evidence concerning the social roles and cognitive developments across adolescence into adulthood, Soto et al. (2011) hypothesized that Neuroticism and its facets would increase for females across adolescence, while male trajectories would remain flat, thus increasing the sex difference, while in later adulthood, Neuroticism would decline for both males and females. Given the size of the sample, statistical significance tests are not reported, with trajectories displayed graphically and with the magnitudes of differences presented in a T-score metric. Here T-score differences of 2 points are small, 5 points are medium, and 8 points are large, effects.
Results for Neuroticism were largely as predicted. At age 10, there were minimal sex differences in Neuroticism and its facets. In the mid-teens, sex differences were at their peak at approximately 5 points. From around the mid-20s, both males and females showed a decline, with females declining more steeply from the late 20s until around the age of 50. From this point on, the sex difference in Neuroticism was stable at 1–2 points.
Results for Conscientiousness showed marked decreases from ages 10 to approximately 15 (3 T-score points), before displaying a sharp increase from around ages 15 to 20. This increase was steeper for females and established an approximate 2-point female advantage which remained consistent throughout the life course. Agreeableness displayed similar, but less pronounced, trends. Females had higher scores throughout the life course, with the difference increasing from approximately 1 point in youth to approximately 2 points in adulthood.
Extraversion decreased for both males and females from ages 10 to around 15, more sharply in males (~5 points) than females (~3 points), and then remained largely stable until later life. From age 50, the approximate 2-point difference in favor of females closed such that by 65, there was no discernible sex difference. Finally, the pattern for Openness was much the same, although the adult advantage in this instance was in favor of males. The difference seems primarily driven by the Ideas facet and a marked drop in female scores between the ages of 15 and 20, creating a difference that then persists across the life course.
Collectively, these results present a complex pattern of sex differences across adolescence, with small but consistent adult differences emerging in the late teens and early 20s. These differences largely persist across later life. However, similarly to the cross-sectional estimates of mean difference, these patterns are not necessarily consistent within a given broad domain, with facets displaying differing patterns of development.
Information on the life-course development of differences in traits may be especially important for differentiating competing theories. For example, major changes in the pattern of sex differences that emerge with hormonal changes in development may point toward a biological basis for differences. However, it is necessary to also take into account the rapidly changing social contexts in emerging adulthood and adolescence that may – at the same time – serve to reinforce societal gender stereotypes. Further, changes in the magnitude of sex differences in adulthood that correspond to major life transitions (marriage, birth of children, etc.) may point to explanations based on social roles.
Sex Differences in Personality Change
The studies discussed above are, though large scale, cross-sectional, meaning that the estimates of differences at a particular age are based on a discrete subsample to all other age groups. Such studies are unable to speak to whether there are sex differences in within-person personality change across aging. One hypothesis concerning such change extends from the social role perspective on mean-level differences and states that, through mid-life, shifts in typical gender/sex roles lead to differential changes in trait levels. As Roberts et al. (2006) note, one may believe males would become more nurturing, for example, as family roles increase in importance against more self-focused outcomes.
Terracciano et al. (2005) analyzed data from a large sample (n = 1944) of US adults who had completed the NEO-PI-R on multiple occasions between 1989 and 2004. The authors found evidence of baseline sex differences consistent with the cross-sectional results discussed above, but no evidence that sex was predictive of change across time. Similar patterns of results were found in a meta-analysis of personality change by Roberts et al. (2006). Synthesizing results from 92 studies, while mean-level change was evident across the life and most marked in adolescence, no statistically significant relationships between sex and mean-level change were identified. Further, Ferguson (2010) meta-analyzed stability coefficients for personality change and found females displayed fractionally higher stability than males, but the difference was not statistically significant. In totality, there is little evidence from within person longitudinal studies that personality changes differentially by sex. However, there remain comparatively few studies of this type.
Cross-Cultural Stability of Differences
The studies discussed above largely focus on Western cultures. Here we briefly consider the stability of sex differences across cultures. Schmitt et al. (2008; note direction of differences reversed for consistency with the rest of this entry) compared sex differences in FFM traits across 55 cultures using scores from the BFI. Neuroticism showed the most consistent differences, with only two cultures (Botswana and Indonesia) showing higher scores for males, with a mean d of −0.40. The most varied estimates were for Openness, where in 37 cultures a difference was found favoring men and in 18 a difference favoring women, with a mean d of 0.05.
Consistent with these findings, Bleidorn et al. (2013) investigated personality development across the life course in a sample of 884,328 individuals from 62 nations, using data from the previously discussed Gosling-Potter Internet Personality Project. Results indicated differences in Neuroticism, Extraversion, Agreeableness, and Conscientiousness in favor of women and Openness in favor of men. Additionally, they showed significant variability in the size of these differences across cultures.
How then does evidence from cross-cultural studies feed into the evaluation of the various theoretical explanations for sex differences? Schmitt et al. (2008) argue that the variability and pattern of sex differences cross-culturally suggest that social role explanations for sex differences are less plausible. If social roles were to play a key causal explanatory role in sex differences, then one may expect to see smaller differences in countries with greater equality and larger differences in countries with more inequality, less access to education for women, reduced access to certain professions, etc. In fact, the reverse pattern is true, and larger differences tend to be seen in Western industrialized nations. Instead, Schmitt et al. suggest that the pattern of differences may be more consistent with evolutionary explanations and tentatively introduce the possibility that gene environment interactions may be at play in fostering increased personality dimorphism in research-rich environments.
Differences in Variability
The studies presented thus far have primarily focused on sex differences in mean levels of a given trait. However, a small but growing number of studies have considered whether males and females also differ with respect to variability in traits. Across a variety of domains, men show more variability than women, and as such, it remains an open question whether the same is true for personality traits.
The theoretical arguments for any such differences draw on a number of perspectives. Social role explanations would posit that females have more constrained social experiences in many cultures and, as such, have more restricted opportunities to express and develop varying personality profiles when compared to men. Evolutionarily it has been suggested that the increased male variability is a result of the many possible behavioral strategies that would be equally successful with respect to selection.
Thus far, the patterns of results are themselves highly variable. In a study across four cultures, Borkenau et al. (2013a) found that based on self-reports of the FFM traits, males and females did not differ in variability. However, greater variance in informant ratings of male versus female targets was found for all FFM domains other than Neuroticism. Borkenau et al. (2013b) aimed to replicate these findings across 51 cultures based on informant ratings of target individuals. Again, a reasonable degree of trait across culture variability was found; however, the most consistent pattern was that males displayed more variability than females across traits in 34 of the 51 cultures studied.
Evidence is still relatively limited for differences in trait variability. Studies to date have generally been based on evidence from measures of variability at the between-person level, observing whether or not these metrics differ across the sexes. However, the recent growing interest in the use of experience sampling methods (assessing personality “in the moment” many times a day in an intensive data collection burst) may offer fresh perspective on the topic. Here rich information on daily fluctuations in traits and states can be gathered. Such data provides within-person information on personality variability which will undoubtedly be hugely informative with respect to understanding differences in variability.
Summary of Empirical Findings
How then should we evaluate the current state of evidence for sex differences in personality traits? One of the most well-known perspectives on psychological sex differences is the gender similarities hypothesis (GSH) proposed by Hyde (e.g., 2014 for recent review). The GSH simply put is that males and females are more similar to one another on most traits than they are different and that large differences between the sexes do not exist or are rare. In general, this position appears reasonable at the domain level. The estimates presented in this entry thus far rarely reach the d of 0.80 defined above as being considered a large effect across groups. Indeed, most would fall slightly above or below small effects (d ≈ 0.20). However, as we have also seen, estimates of differences for broad domains that are sum score aggregates of lower-level facets may mask larger differences in these facets due to the different directions of differences in positively relating facets. Further, other methodological considerations may be influencing the current pattern of reported associations.
Methodological Issues in Sex Difference Research
Single Studies Versus Meta-Analysis
The studies discussed in this entry have been limited to either meta-analytic estimates of sex differences, estimates reported in the context of original descriptions of personality taxonomies, or large-scale single studies. By and large these studies have reported a standardized measure of difference based on some form of average or sum scores of sets of items (an observed score).
Hyde (e.g., 2014) has argued in favor of meta-analytic estimates of sex differences. Meta-analysis involves identifying sets of published and unpublished studies that provide estimates of an effect of interest. The individual effects are combined into a single estimate of the effect of interest, and an estimate of the variability in the effect across studies. The primary logic for such an approach is that the estimate derived from aggregating many studies, if the heterogeneity across studies can be understood, will always be an improvement on the estimate from any single study sample. With enough information from the original studies, meta-analysis can also make adjustments for the use of different tools to measure traits, adjust for reliability of measurement, and include other variables (age, culture, etc.) which may act as moderators of estimates of difference.
Undoubtedly, meta-analytic estimates carry advantages over the estimation of group differences from a single study. However, the quality of the meta-analytic estimate will be constrained by the quality of the original studies. As such, any methodological weaknesses in these studies will transfer to the meta-analytic results.
In the study of group differences, one assumption above all else is critical to drawing valid inferences about the magnitude of differences, yet it often remains untested. Specifically, this assumption is that the particular tool being used to assess the trait of interest performs equivalently in males and females or, to put it another way, that the same trait is being compared in each group.
As a concrete example, suppose we are interested in whether there is a difference in Neuroticism as measured by the NEO-PI-R between males and females. We administer the scale to a large group of randomly sampled participants and create a simple or weighted sum score as per the instructions in the test manual. We then conduct a statistical test of mean difference (a t-test) and calculate the associated effect size using Cohen’s d. What we have assumed here is that the equations that relate the underlying trait of Neuroticism to the item responses in the groups are identical, such that we can use the same scoring rules across groups and have a score that measures the same thing in these groups. If this assumption does not hold, then any estimate of difference is invalid.
The procedure for testing this assumption is known as measurement invariance within the factor analytic setting or differential item functioning within item response theory. The process of measurement invariance involves specifying a statistical model that relates each of a set of measured items (e.g., the questions in a survey tool) to a latent variable representing the personality trait they are hypothesized to measure. The same model is estimated for males and females, and, sequentially, constraints are placed on the model equating different parameters. The difference in model fit between the constrained and unconstrained models provides a statistical test of whether the assumption that these parameters are equal across groups is supported or not. If they are constrained equal, and model fit significantly declines, then it is not reasonable to assume the parameter is the same in both groups.
Measurement invariance can be tested at different levels, increasing with respect to the number of equality constraints placed across groups. Different types of comparison across groups require different levels of invariance constraints. In the context of group difference tests, a minimum requirement is what is referred to as scalar (or partial scalar) invariance (for details on levels of invariance with an application to personality measurement, see Marsh et al. 2010). If scalar invariance holds, then it is reasonable to test for latent mean differences across groups. Note, however, that this is not the same as stating that mean differences can be tested in sum scores of test items. For this to be a valid test of mean differences, strict invariance (a more constrained model) is required to hold. This is because if the item residuals are not the same across groups, differential item reliability across groups can bias observed mean difference tests.
To the authors’ knowledge, there are no published examples of strict measurement invariance being tested prior to observed mean differences being evaluated on a sum score, nor examples of this level of invariance holding in standardization (or other large) samples being used in the rationale for testing observed score mean differences. Further, there are a limited, but growing, number of studies of sex differences in personality that have utilized measurement invariance and latent mean difference tests. Of the studies previously discussed, only the study of Marsh et al. (2010) is based on latent means. Booth and Irwing (2011) studied latent mean differences with measurement invariance in the 16PF. Others, for example, De Bolle et al. (2015), include some tests of measurement invariance prior to conducted mean difference tests, but these may not be complete. For example, in the case of De Bolle et al., these models use facet scores as indicators of models for the domains of the FFM, and so item-level invariance is not considered.
Impact of a Lack of Invariance
The question of how much impact a violation of the assumption of measurement invariance would have on a given estimate of mean difference based on observed scores is somewhat complex. It concerns the interaction of the size and direction of any true differences versus the size (both effect size and the number of items affected) and direction of any differences due to a measurement bias. Dependent on what combination of these factors is at play, estimates of difference may be spurious, inflated, or attenuated. Clearly, this is a complex picture but one that requires attention from researchers if they are serious about identifying and understanding if sex differences in personality exist and the true magnitude of these differences. In studies of measurement invariance, as with all statistical analyses, one must also be aware of general issues concerning the representativeness of samples and statistical power. Invariance analyses are often complex, and power for a fixed sample size will vary dependent on the level of invariance being tested.
Why Is Invariance Not Tested?
This is a difficult question to answer, but it is possible to speculate. First, it is a topic which has only risen to prominence outside of the methodological literature comparatively recently. Second, and relatedly, applied researchers who have not specifically studied psychometrics or more advanced latent variable methodologies may be unaware of the strictness of the assumptions related to the use of sum or observed scores. Third, measurement invariance is not a simple analysis and requires a good degree of statistical understanding and large sample sizes. Fourth, many researchers reasonably assume that such issues of measurement have been researched as part of the validation of extant tools. Two points are worth noting here: (a) measurement invariance rarely features in test manuals of extant inventories, and (b) even if invariance is established in the standardization samples for a given inventory, it does not mean it will hold in any given sample for which a mean difference test is being conducted. Fifth, these analyses require an adequately fitting measurement model across groups to be established, and personality data is renowned for showing poor model fit in such models (see Hopwood and Donnellan 2010 for discussion). This latter point has been somewhat mitigated by the recent introduction of exploratory structural equation modeling, which has been applied in the study of sex differences in personality including tests of measurement invariance (see Marsh et al. 2010).
Univariate Versus Multivariate Measures of Effect Size
Recently debate has begun as to whether the magnitude of sex differences in personality should be estimated based on univariate or multivariate measures of effect size. Cohen’s d is a univariate measure, and as is evident from the studies discussed in this entry, a common practice is to consider collections of univariate effects when discussing multidimensional constructs such as personality. However, some (e.g., see Del Giudice 2017) have suggested that multivariate estimates such as Mahalanobis D may be more informative. The basic logic of this approach is that while differences in individual traits may be small, an accumulation of differences across multiple traits may result in outward manifestations of personality that are quite different. Further, univariate measures by definition fail to take into account the correlational structure between sets of measures being compared.
In the study of sex differences in omnibus personality inventories, Del Giudice et al. (2012) reported multivariate estimates in an analysis of the 16PF5 using the US standardization sample, also incorporating an evaluation of measurement invariant latent versus observed score differences. The results demonstrated (a) that latent mean estimates from measurement invariance models were larger than the observed score mean differences of the same data (consider also Marsh et al. (2010) in Table 2 versus the observed score studies) and (b) that, when combined using Mahalanobis D, the overall difference between males and females was large.
Hyde (2014) has argued against the use of D, stating that the use of such measures maximizes difference and is uninterpretable. While the former is a generally correct statement concerning D, the latter is less accurate (see Del Giudice 2017). Further, as Del Giudice et al. (2012) have shown, and as our discussion of facet and domain differences has emphasized, current practice may be underestimating differences. Key to this debate seems to be the question of which level of aggregation is most appropriate for estimating the magnitude of sex differences. Is it facets, domains, or perhaps “all” of personality?
Sex Differences in Psychopathology
Finally, we conclude by briefly discussing sex differences in psychopathological traits. Psychopathology is viewed by many as underpinned by or representing the extreme end of “normal” personality traits. As such, it is not surprising that well-replicated sex differences are also observed in the majority of psychopathological disorders (e.g., see Martel 2013). Internalizing disorders such as anxiety and depression tend to be more prevalent in females as are other disorders with a strong “negative emotionality” component such as eating disorders and borderline personality disorder. Externalizing and neurodevelopmental disorders such as autism, attention deficit hyperactivity disorder, conduct disorder, and schizophrenia spectrum disorders tend to be more prevalent in males. Thought of another way, females are relatively more affected by disorders with an onset in adolescence, whereas males are relatively more affected by disorders with an early onset. This seems to be true “within” diagnostic categories as well. For example, females who show conduct problems are more likely to show an adolescent onset than early onset. Sex differences in prevalence vary considerably across different diagnostic categories with, for example, no apparent sex difference in adolescent oppositional defiant disorder but an eightfold increased risk of eating disorders in females relative to males.
Within diagnostic categories, there may also be sex differences in manifestation. In autism, for example, there is some evidence that females may be better able to conceal their difficulties. Similarly, as regards conduct problems, females show a preference for relational aggression over physical aggression, evidencing relational aggression levels on a par with males who otherwise show higher aggression levels than females. Some have argued that differences in prevalence and manifestation mean that the less-affected sex tends to be underdiagnosed. This is because symptoms of “stereotypically male” disorders may be harder to recognize in females and vice versa. As such, there have been some calls to create “gender-specific” diagnostic criteria for some disorders in which the symptom lists referred to during the diagnostic process are tailored to gender.
Theoretical Explanations from Psychopathology
Various explanations for sex differences in psychopathology have been proposed, some of which refer to specific disorders and others which attempt to explain sex differences in general. These refer to both biological (e.g., genetic, epigenetic, and neurocognitive) and environmental (e.g., in utero exposures, socialization) factors and both ultimate and proximate mechanisms. To explain sex differences in neurodevelopmental disorders, for example, it has been proposed that male-female differences in prenatal testosterone exposure make males more vulnerable to the effect of early adverse environmental exposures. In addition, stronger socialization against “acting-out” behaviors in females has been proposed to contribute to sex differences in conduct problems.
A general model of sex differences in psychopathology is the multifactorial threshold model in which one sex is assumed to require a greater loading of risk factors to tip them over into psychopathology. For example, for a male and female who have identical genetic and environmental risk factors, the latter may be more protected against developing autism because of gender-specific buffering factors. This can be contrasted with the idea that one sex tends to manifest a particular psychopathology at a higher rate due to showing higher levels of risk factors. For example, it has been proposed that males show higher levels of early-onset conduct problems because they are more likely to show early predisposing neurocognitive deficits. It can also be contrasted with the idea that sex differences exist because the same underlying liability is expressed differently in males and females. For example, it has been proposed that substance use and depression are the more male versus female manifestations of the same underlying vulnerability.
Another general perspective on sex differences in psychopathology proposes that they are due to sexual selection for traits that increase reproductive success. In males, traits related to social dominance and resource acquisition are selected for, especially “approach” traits such as disinhibition and sensation-seeking. At extreme levels or when poorly calibrated to the environment, these traits may increase the risk of externalizing disorders. In females, traits related to social competence such as empathy and negative emotionality may be selected for which, when extreme or poorly calibrated, may confer risk of internalizing disorders. Although much work remains to be done in illuminating the causes of sex differences in psychopathology, as with sex differences in personality, their study is helping to reveal the underlying causes of psychopathology in general.
Conclusions: Should We Study Sex Differences in Personality?
The aim of this entry was to give a broad overview of the empirical evidence for sex differences in personality. In doing, our focus was on the question “Do differences exist?” giving specific attention to the methodological issues in the study of sex differences. We have briefly discussed some of the theoretical arguments for sex differences and some of the related work on differences in trait variation. We have not dealt in any depth with the highly important question set out at the beginning of this entry of “If sex differences do exist, what do they mean for life outcomes?”
We close by considering the broader question of, given the information present, should we continue to study sex differences in personality? The short answer, yes. Despite large volumes of work, there remain some gaps with respect to in-depth measurement invariance analyses of mean differences at the facet level – thus the basic question of the magnitude of differences is not entirely resolved. Further, whether one believes sex differences exist or not, whether they are large or small, biologically based or the result of societal stereotype, it is not possible to understand the implications of such differences for individuals and society without continued rigorous scientific investigation.
- Conn, S. R., & Rieke, M. L. (Eds.). (1994). 16PF fifth edition technical manual. Incorporated: Institute for Personality & Ability Testing.Google Scholar
- Costa, P. T., Jr., & McCrae, R. R. (1992). NEO personality inventory–revised (NEO-PI-R) and NEO five-factor inventory (NEO-FFI) professional manual. Odessa: Psychological Assessment Resources.Google Scholar
- De Bolle, M., De Fruyt, F., McCrae, R. R., Löckenhoff, C. E., Costa, P. T., Jr., Aguilar-Vafaie, M. E., … & Avdeyeva, T. V. (2015). The emergence of sex differences in personality traits in early adolescence: A cross-sectional, cross-cultural study. Journal of Personality and Social Psychology, 108, 171–185.Google Scholar
- De Vries, R. E., Ashton, M. C., & Lee, K. (2009). De zes belangrijkste persoonlijkheidsdimensies en de HEXACO Persoonlijkheidsvragenlijst. [The six most important personality dimensions and the HEXACO Personality Inventory.] Gedrag & Organisatie, 22, 232–274.Google Scholar
- Lee, K., & Ashton, M. C. (2016). Psychometric properties of the HEXACO-100. Assessment, 1073191116659134.Google Scholar
- Maccoby, E. E., & Jacklin, C. N. (1974). Myth, reality and shades of gray: What we know and don’t know about sex differences. Psychology Today, 8, 109–112.Google Scholar
- Rothbart, M. K. (2011). Becoming who we are: Temperament and personality in development. New York: Guilford Press.Google Scholar