1 Introduction

Gender differentials in educational achievements have long been the focus of research. This is not surprising given that education has been shown to improve many life outcomes such as health and labour market outcomes (Card 1999; Schoeni et al. 2008). The underrepresentation of women in science, technology, engineering and mathematics (STEM) careers has resulted in research and policies focusing on gender gaps in test scores, particularly in maths-related subjects in the early years of schooling (Fryer and Levitt 2010; Justman and Méndez 2016). While there has been a rich literature on gender gaps in educational achievements, little consensus exists about the evolution as well as the factors contributing to the gaps in early childhood. One major issue plaguing researchers in documenting the evolution of the gaps is the lack of rich panel data. This study sets out to contribute to the literature by using a recent and nationally representative Longitudinal Study of Australian Children (LSAC) survey to document the evolution and examine factors contributing to gender gaps in academic achievements in early seventh grade of schooling.

This paper contributes to the international literature on the gender test score gap by not only introducing the Australian case study but also bringing three other additions to the current literature. The first addition is that with the remarkably rich panel data relative to previous international literature—containing five assessments over the first 7 years of schooling of the same children, and an exhaustive list of home and school environments—enables the testing of several socialisation theories. For example, one of the particular advantages of the data is that pre-school cognitive skillsFootnote 1 of students are observed, allowing investigation of the way that initial academic endowments contribute to the gender test score over their first 7 years of schooling. As another example, the data contain test scores of students up to the seventh grade while current US studies, which use a comparable US data set from the Early Childhood Longitudinal Study Kindergarten cohort, only examine the gender test score gap up to the fifth grade (Fryer Jr and Levitt 2004; Fryer and Levitt 2010; Sohn 2012; Bertrand and Pan 2013). These Australian data thus allow examination of the evolution of the gender test score gap through higher grades than that of the US studies.

The second addition is that this paper is one of a few papers in the literature applying a quantile regression to investigate the relative performance of male and female students along the whole distribution of test scores rather than at means (Husain and Millimet 2009; Sohn 2012; Gevrek and Seiberlich 2014). Analysis based solely on means may miss important information in other parts of the distribution (Firpo et al. 2009). This is especially relevant when policy concern is focused on the tail of the test score distribution, and when evaluating and decomposing the gender test score gap at different points of the test score distribution is of interest (Husain and Millimet 2009; Sohn 2012; Gevrek and Seiberlich 2014). To do so, this paper applies an unconditional quantile regression developed by Firpo et al. (2009). The advantage of the unconditional quantile regression over the traditional conditional quantile regression of Koenker and Bassett (1978) is that its estimates can be interpreted as the impact of changes in explanatory variables on the dependent variable for those at a specific point in the distribution.Footnote 2 The estimates from the unconditional quantile regression can then be directly applied to an Oaxaca-Blinder (OB) decomposition method to examine factors contributing to the gender test score gap across the entire distribution. Therefore, this study makes its third addition to the literature as one of a few papers (Sohn 2012; Gevrek and Seiberlich 2014) applying a quantile decomposition method to study the gender test score gap.

By using the first five waves of the LSAC survey, we find that males excel at numeracy at all grades, whether at means or along the distribution. Also, we uncover heterogeneous patterns in the gender test score gap across the test score distribution, by test subjects and test grades. The regression results also reveal a widening gender test score gap in numeracy as students advance their schooling. The decomposition results indicate that gender disparities in pre-school cognitive skills can explain a large part of the differences in academic performance.

The remainder of the paper is structured as follows. Section 2 summarises the most relevant literature while Section 3 describes the data. Section 4 presents this study’s empirical regression and decomposition models and Section 5 discusses the regression results. Section 6 reports decomposition results of factors contributing to the gender test score gap, and, finally, Section 7 concludes.

2 Literature review

International literature has consistently shown significant gender test score gaps, with male students generally outperforming female students in maths and science while female students excel at literacy subjects (Wilder and Powell 1989; Marks 2008; Bedard and Cho 2010; Fryer and Levitt 2010; Christopher et al. 2013; Falch and Naper 2013; Stoet and Geary 2013; Dickerson et al. 2015). In addition, studies have often documented that the gender gap in a particular subject only appears at certain educational levels and tends to increase as students advance their schooling (Coleman et al. 1966; Husain and Millimet 2009; Fryer and Levitt 2010).

Research that has been devoted to attempting to explain the recognised patterns in the gender educational gap has proposed a wide range of different contributing factors. For example, some studies have demonstrated that differences in the brain between genders may explain these patterns as males tend to be better at analysing systems, while females tend to be better at reading the emotions of other people (Kimura 2000; Baron-Cohen 2007). Furthermore, gender differences in competition (Gneezy et al. 2003; Niederle and Vesterlund 2010), parental time investment in children (Baker and Milligan 2016), or social and cultural conditioning and gender-biased environments (Guiso et al. 2008; Bedard and Cho 2010; Dickerson et al. 2015) are possible explanations for the observed gender gaps in academic achievements. An emerging number of studies also highlight the roles of non-cognitive skills (Jacob 2002; Duckworth and Seligman 2006; Christopher et al. 2013; Golsteyn and Schils 2014) in contributing to the gender test score gap.Footnote 3 This present paper contributes to the literature by assessing the role of pre-school cognitive skills in contributing to the gender academic achievement gap and how that role evolves as students advance in their schooling.

Australian studies have documented gender differences in academic outcomes at all educational levels. For example, Nghiem et al. (2015) used the first four waves of the LSAC data to report that male students outperform their female counterparts in grade 3 and 5 numeracy. In contrast, female students outperform in grade 3 writing and grade 5 reading and grammar. More recently, Justman and Méndez (2016) used administrative data from Victoria to show that male students score higher than female students in mathematics and lower in reading in grades 7 and 9. As another example, Marks (2008) used the OECD’s 2000 Programme for International Student Assessment (PISA) project to document that 15-year-old Australian females perform better than males in reading but worse in mathematics. Using various datasets, Homel et al. (2012) reported that 18-year-old Australian females are more likely to complete Year 12 than males. At the tertiary educational level, Booth and Kee (2011) used aggregate data to report that since 1987 Australian females were more likely than males to be enrolled at university. These studies often attempt to capture the gender educational achievement gap by including a gender dummy variable in a multivariate regression framework and only examine the mean gap.

3 Data and descriptive statistics

3.1 Data and sample

We use data from the first five waves of the biannual national representative LSAC survey. The LSAC, initiated in 2004, contains comprehensive information about children’s test scores and other socio-economic and demographic background of the children and their parents. The LSAC sampling frame consists of all children born between March 2003 and February 2004 (the birth or “B cohort”, infants aged 0–1 year in 2004), and between March 1999 and February 2000 (the kindergarten or “K cohort”, children aged 4–5 years in 2004). In this study, children of K cohort are used because measures on student test scores are more widely available for this cohort in the first five waves of the survey.

To indicate the academic achievements of students, we employ results from the National Assessment Program – Literacy and Numeracy (NAPLAN) tests.Footnote 4 The NAPLAN test is required of all Australian students in grades 3, 5, 7 and 9 in the five domains of reading, writing, spelling, grammar and numeracy. The test scores range from 0 to 1000 and are comparable across students and over time (ACARA 2014). The NAPLAN test results of the children were collected via data linkage with the LSAC data (Daraganova et al. 2013). At the time of this study, the linkage data for LSAC were mainly available for students in grades 3, 5 and 7. Thus, we employ these test results at these grades to measure the academic achievements of students. Following the previous Australian literature (Justman and Méndez 2016; Cobb-Clark and Moschion 2017) and for brevity purposes, we focus on two main test subjects: reading and numeracy.Footnote 5 Since the NAPLAN test dates and LSAC survey dates are not the same, test results and survey data are merged in the way that test results are not pre-dated by survey data.Footnote 6 This matching exercise shows that NAPLAN test scores in grades 3, 5 and 7 are merged with survey data in waves 2, 3 and 4, respectively. As is generally done in the literature (Husain and Millimet 2009; Fryer and Levitt 2010; Sohn 2012; Golsteyn and Schils 2014), NAPLAN test scores are standardised (with mean 0 and standard deviation 1) by grade and domain in this paper.

To measure the initial stocks of students’ cognitive skills, we use the Peabody Picture Vocabulary Test (PPVT) and Who Am I (WAI). The PPVT is an interviewer-administered test to assess a child’s knowledge of the meaning of spoken words and his or her receptive vocabulary for standard English (Dunn and Dunn 1997). The PPVT test requires a child to show the picture that best represents the meaning of a stimuli word spoken by the examiner. The WAI test is also administered by an interviewer to measure the general cognitive ability of pre-school age children to perform literacy and numeracy tasks, such as reading, copying and writing letters, words, shapes and numbers (Lemos and Doig 1999). PPVT and WAI scores are used in wave 1 when the student is 4 or 5 years old (i.e., before enrolling in primary school). Similar to NAPLAN test scores, PPVT and WAI test scores are standardised for ease of interpretation.

3.2 Sample

As discussed in Section 3.1, this study focuses on K cohort children because test scores are more widely available for them. Furthermore, among students who took any test in any test grade, the focus is on about 96% of those who completed all five test subjects. Moreover, the sample is restricted to students without missing information on a list of important explanatory variables. To keep the results comparable over time, specifications that use variables which are available in all waves of the LSAC and contain the least missing information (see Table 1 and Section 4 for a list of variables included in our baseline models) are used. These variables are commonly used in studies which employ a popular and comparable US data set from the Early Childhood Longitudinal Study Kindergarten cohort (Fryer Jr and Levitt 2004; Fryer and Levitt 2010; Sohn 2012; Bertrand and Pan 2013) to study a gender test score gap of school students.Footnote 7

Table 1 Summary statistics by gender

The original sample sizes for the K cohort in waves 2, 3 and 4 are 4464, 4331 and 4169, respectively. The above restrictions result in final samples of 2471, 3225 and 2801 students in waves 2, 3 and 4, respectively. Appendix 1: Table 6 suggests that sample attritions are mainly attributed to the fact that students’ NAPLAN test scores are not linked to the LSAC data. Reasons for original sample attrition are discussed in Norton and Monahan (2015), and seasons for not having NAPLAN test scores linked to the LSAC data are discussed in detail in a technical report by Daraganova et al. (2013). Note that there is a slightly smaller number of students in wave 2 in this sample because the grade 3 NAPLAN tests were first introduced in 2008 when some K cohort students might have attended higher grades, and as such did not take the tests. Additionally, Appendix 1: Table 6 reveals that, conditional on having NAPLAN test scores linked to the LSAC data, sample attritions are mostly due to missing information on pre-school cognitive skills (i.e. PPVT and WAI) and household income. We dropped individuals with missing information on control variables rather than using the “dummy variable adjustment” method because deletion has been found to produce less-biased estimates (Allison 2001).

We investigate whether our sample selection criteria led to sample selection issues. One particular concern relating to our research design is that the child’s gender may affect the probability that an individual child is included in the final sample. Therefore, we ran a probit model where the dependent variable is equal to one if the child is in our sample and zero otherwise. The explanatory variables are basic demographic characteristics, including the child’s gender. Regression results (reported in Appendix 1: Table 7) suggest some evidence of statistically significant selection on some observables. For instance, children in our sample are more likely to come from more advantageous households with non-Aboriginal or native backgrounds or come from two-parent households or live in owned homes. However, the pseudo-R2 values are relatively small, indicating that selection on observable characteristics is quantitatively weak. More importantly, in two out of three regressions by test grades, p values from a t test for statistical significance of the gender dummy included in the regression are greater than 0.05, alleviating concern that our results may be driven by sample selection.

3.3 Summary statistics by gender

Summary statistics by gender for students’ background characteristics and home environment variables that are used in the analysis are presented in Table 1. Insignificant gender differences in parental characteristics (such as mother’s ethnicity, education, work status, family size, income and home ownership status) suggest that the gender of children in this sample is randomly assigned across families.Footnote 8 There is also no significant difference in most of our measures of parental investment in child development, such as parental time with the child, children’s access to computers or school sectors. The only distinguishable gender difference is that female students were more likely to be breast fed at 3 or 6 months old.

However, significant gender differences in terms of initial cognitive and health endowments are noticed. In particular, female students have an academic advantage even before they start their school years because their PPVT or WAI scores, measured at ages 4 or 5, are higher than male students of the same age. Our finding of a female advantage in pre-school reading test scores (as represented by PPVT) is consistent with that presented in the work by Fryer and Levitt (2010) for children in the USA. We additionally show that at ages 4 or 5 girls also display higher general cognitive ability (as measured by WAI) than boys.Footnote 9 In line with the Australian national birthweight pattern by gender reported in the medical literature (Dobbins et al. 2012), our data also show that female students are generally smaller than male students at birth, with females more likely to have birth weight of 2500 g or lower. Similarly, we observe female students in the sample are slightly older (1 month) than male students. This gender difference is consistent with a pattern, observed in Table 1, that girls’ mothers are about 4 months older than boys’ mothers. Lastly, while male students appear to have a greater number of younger siblings than female students, the former have a lower number of same age siblings.

Table 1 displays that significant differences in verbal and general cognitive performance exist between boys and girls by the time they enter primary schools. Similar to the reasons behind the gender disparity in educational achievements discussed in Section 2, the origin of gender differences in pre-school cognitive skills remains largely unknown. Some suggest differences are due to the role of biological gender differences (Vandenberg 1967) while others suggest different treatments and expectations from parents or teachers may lead to pre-school gender cognitive differences (Lewis and Freedle 1972; Block 1976; Lewis and Brooks-Gunn 1979; Lavy and Sand 2015; Baker and Milligan 2016).

To have some ideas about how pre-school cognitive skills are formed, in a purely descriptive way, we follow the child development literature to run a regression of each of them (i.e. PPVT and WAI) on a list of factors contributing to the child’s development (Currie 2009; Cunha et al. 2010). The list includes child characteristics (i.e. gender, age, ethnicity), early child outcomes (as measured by child birth weight), early parental investment (as measured by breastfeeding the child at 3 or 6 months), concurrent parental investment (as represented by a home environment index, an out-of-home activity index and access to computers)Footnote 10 and family environment (maternal age, migration background, health, number of siblings, maternal working hours, family income and living with both parents). The results (reported in Appendix 1: Table 8, column 1) show higher pre-school PPVT test scores are observed for girls, older children, children with normal birth weight, children of native or highly educated mothers, or children with more early or concurrent investment from parents. Appendix 1: Table 8 (column 2) additionally conveys that the characteristics associated with higher PPVT test scores are also factors explaining higher WAI test scores among 4- or 5-year-old children. An exception is that children of mothers migrating to Australia from a Non-English Speaking Background (NESB) country have higher WAI scores than children of native mothers. Overall, the results from this exercise highlight that significant differences in cognitive skills between boys and girls already exist before entering school and that pre-school cognitive skills may measure intergenerational genetic transmission or accrued parental investment in child development prior to school.

4 Empirical models

4.1 Regression models

Prior literature methods are followed to estimate the gender test score gap by regressing test scores (Y i ) of student i in each test grade and each subject on the gender dummy variable (Male i which takes the value of 1 if the student is male and 0 if female); therefore, the sign and magnitude of the gender coefficient estimate indicates the direction and magnitude of the gender test score gap. The changes in the gender test score gaps estimated over the three school grades describe the evolution of the gender test score gap from grade 3 of primary school to either the final grade of primary school or the first grade of secondary school.Footnote 11 In particular, for each test subject and each test grade, the raw gender test score gap is estimated using the following basic model:

$$ {Y}_i=\alpha +\beta {\mathrm{Male}}_i+{\varepsilon}_i $$
(1)

where ε i represents idiosyncratic error terms.

In addition to the raw test score gap, the gender test score gap conditional on a rich list of factors contributing to the student’s development is examined using the following equation:

$$ {Y}_i=\alpha +\beta {\mathrm{Male}}_i+{X}_i\gamma +{\varepsilon}_i $$
(2)

where X i include the student’s characteristics (i.e. age, ethnicity, health status), household characteristics (i.e. mother’s migration status,Footnote 12 household size, parents’ education, and household income), indicators of the parental investment in the student’s education (e.g. breastfeeding the child at 3 or 6 months, access to computers, and two indices of “quality time” that parents and children spend together), and indicators of neighbourhood characteristics (i.e. physical infrastructure or neighbourhood social-economic status). The issues of students sitting the NAPLAN test in different years for the same grade are addressed by using information both on the age of students at the year they sat the test and dummy variables for the test year. The differences in the survey time and test time are controlled for by including the dummies for quarters of survey time in regressions. In model 2, state dummy variables are included to control for differences in educational jurisdictions by states/territories.

The marginal gender test score gap after students entered primary schools is then examined by including the student’s initial stock of academic ability as indicated by scores on WAI and PPVT tests (E0Ki), which are administered prior to primary school entry, using the following “value-added” model:

$$ {Y}_i=\alpha +\beta {\mathrm{Male}}_i+{X}_i\gamma +{E}_{0 Ki}\theta +{\varepsilon}_i $$
(3)

The value-added model is our preferred specification because it is in line with the dynamic theory of skill formation (Todd and Wolpin 2007; Cunha et al. 2010). As discussed in Section 3.3, pre-school cognitive skills may measure accrued parental investment in child development prior to primary school, so use of the value-added model also helps isolate effects of such investment on the gender test score gap observed during primary and early secondary school years.Footnote 13

The ordinary least squares (OLS) method is first applied to estimate the mean gender test score gap using the three specifications described above. Unreported statistics from our data show that for both males and females the mean test score is usually not the same as the median, suggesting that the test score distribution is skewed and contains extreme values. This distributional characteristic suggests the need for examining the determinants of academic achievement not only at the mean but also along the whole distribution (Koenker and Bassett 1978; Firpo et al. 2009). The unconditional quantile regression (UQR) technique is employed to investigate the gender test score gap along the entire distribution.

This technique is chosen over the (conditional) quantile regression method proposed by Koenker and Bassett (1978) because the latter does not allow interpretation of its estimates as the marginal impact of an explanatory variable on the outcome of interest unless the rank-preserving condition holds (Firpo 2007; Firpo et al. 2009). In contrast, the unconditional quantile regression technique introduced by Firpo et al. (2009) does. Technically, the unconditional quantile regression method runs a regression of the estimated re-centered influence function (RIF) on a set of explanatory variables (Firpo et al. 2009).Footnote 14 The RIF for the quantile of interest q τ is:

$$ \mathrm{RIF}\left(Y,{q}_{\tau}\right)={q}_{\tau }+\frac{\tau -D\left(Y\le {q}_{\tau}\right)}{f_Y\left({q}_{\tau}\right)}, $$
(4)

where f Y (q τ ) is the marginal density function of an outcome Y, and D is an indicator function. In practice, RIF(Y, q τ ) is not observed so its sample counterpart is used instead:

$$ \mathrm{RIF}\left(Y,{\widehat{q}}_{\tau}\right)={\widehat{q}}_{\tau }+\frac{\tau -D\left(Y\le {\widehat{q}}_{\tau}\right)}{{\widehat{f}}_Y\left({q}_{\tau}\right)}, $$
(5)

where \( {\widehat{q}}_{\tau } \) is the sample quantile and \( {\widehat{f}}_Y\left({q}_{\tau}\right) \) is the kernel density estimator. As mentioned above, one crucial distinguishing feature of the UQR method is that it provides a way to recover the marginal impact of the explanatory variables on the unconditional quantile of Y. Another appealing feature of the UQR method is that its regression results can be applied directly to an OB decomposition method to examine factors contributing to the gender test score gap across the whole distribution without having to implement many simulations that are necessary in the alternative quantile regression-based decomposition method.

4.2 Decomposition models

The factors contributing to the male-female test score gap at the mean and at selected percentiles are examined by following the literature on gender wage gaps (Blinder 1973; Oaxaca 1973; Fortin et al. 2011) in applying an OB type of decomposition of the form:

$$ {\widehat{Y}}_m-{\widehat{Y}}_f=\underset{"\mathrm{charateristic}\ \mathrm{effect}"}{\underbrace{\left({\widehat{Z}}_m-{\widehat{Z}}_f\right){\widehat{\mu}}^{\ast }}}+\left\{\underset{"\mathrm{return}\ \mathrm{effect}"}{\underbrace{{\widehat{Z}}_m\left({\widehat{\mu}}_m-{\widehat{\mu}}^{\ast}\right)+{\widehat{Z}}_f\left({\widehat{\mu}}^{\ast }-{\widehat{\mu}}_f\right)}}\right\} $$
(6)

where \( \widehat{Y} \) is the mean test score of males (m) or females (f), \( \widehat{Z} \) is a vector of the mean observed characteristics, \( {\widehat{\mu}}_m\ \left({\widehat{\mu}}_f\right) \) is a vector of the estimated coefficients in the regression of test score on the set of covariates, including the constant, for male (female) sample and \( {\widehat{\mu}}^{\ast } \) is a vector of the estimated coefficients from the pooled male and female sample with other covariates and the gender dummy. The gender dummy variable is included in estimating the reference structure \( \left({\widehat{\mu}}^{\ast}\right) \) to obtain unbiased estimates of other variables (Neumark 1988; Fortin 2008; Jann 2008).Footnote 15

In Eq. (6), the first term on the right-hand side is the component of the gender test score gap due to differences in observed characteristics—the “characteristic effect”. The second term on the right hand-side is the difference in factors other than the observed characteristics—the “return effect”, sometimes interpreted as “unexplained” or “discrimination”. We focus on detailed decomposition of the characteristic effect because it is well-known that detailed decomposition results of the return effect are influenced by the arbitrary scaling of continuous variables (Jones 1983; Jones and Kelley 1984). To facilitate an interpretation of the results, variables contributing to the academic achievement of students are separated into four groups: (1) their characteristics, (2) their families’ characteristics, (3) their initial cognitive skill endowments, and (4) other factors.

5 Empirical regression results

5.1 Estimates of gender test score gap at means of test score distribution

Estimates on gender test score gaps at means in reading and numeracy over the three grade levels (3rd, 5th, and 7th) from three specifications are reported in Table 2. Raw gender test score gaps at means (estimated from model 1, see the first row of each subject panel in Table 2) show the well-known gender gaps in both maths and reading skills as observed in the literature: male students outperform female students in maths but lag behind with respect to reading (Husain and Millimet 2009; Fryer and Levitt 2010; Nghiem et al. 2015; Justman and Méndez 2016). Furthermore, while the gender test score gap in reading is already observed in all grades, the (reverse) gender gap in numeracy only presents in grades 5 and 7. The finding of the gender test score gap in numeracy in favour of male students only being present at certain educational levels is also in line with previous US findings in that a gender maths score gap was only observed for US students at their first (Husain and Millimet 2009) or third grade tests (Fryer and Levitt 2010).Footnote 16 It is, however, interesting to note that while these raw figures suggest that a gender maths score gap only appears at a certain grade, it takes from two to four more years to observe this pattern in Australia. Table 2 additionally indicates that the raw gender test score gaps in reading and numeracy increase from grade 3 to grade 5 and are quite stable in both grades 5 and 7.

Table 2 Estimated gender score gap over the grades at mean

The gender test score gaps estimated from model 2 suggest that adjusting for a comprehensive list representing characteristics of students, their families and their neighbourhood does not change the earlier findings in terms of the magnitude as well as the statistical significance level. However, additionally including students’ WAI and PPVT tests measured at ages 4 or 5 in the regression model 3 does. In particular, a reversed and statistically significant (at the 5% level) gender test score gap is observed in favour of male students in third grade reading, where male students outperform female students by about 0.07 standard deviations. Furthermore, the observed gender test score gap in grades 5 and 7 reading turns from statistically significant in model 2 to insignificant in model 3. In contrast, controlling for students’ prior academic endowment turns the gender test score gap in numeracy in favour of male students from statistically insignificant to highly significant (at the 1% level) in grade 3 and substantially increases (by more than double) the magnitude of the gap in all studied grades.

In summary, the above results suggest that including pre-school cognitive skills in students’ development equations shrinks the gender gap in reading while widening the gender gap in numeracy in terms of the statistical significance level and magnitude. This finding is consistent with our previously observed pattern of girls having higher pre-school cognitive skills. Estimates of the above gender test score gaps also highlight the importance of controlling for students’ pre-school cognitive skills, which is the summary of genetic and early childhood investment in the formation of human capital, in the student development as shown in the literature (Todd and Wolpin 2007; Bernal 2008; Cunha et al. 2010; Lai 2010; Elder and Jepsen 2014; Fortin et al. 2015; Nghiem et al. 2015). As previous studies in this literature were unable to control for pre-school cognitive skills—due to the unavailability of such measures in the researchers’ data sets—this is a novel empirical result.

The estimated gender test score gaps, where statistically significant, are largely in line with international literature; however, the gender gap in a particular subject only appears at certain educational levels and tends to increase as students progress through school (Coleman et al. 1966; Husain and Millimet 2009; Fryer and Levitt 2010). Our results additionally show that the pattern of a widening gender test score gap as students advance through school persists even conditioning on pre-school cognitive skills. Two observations from the full results of test score regressions (reported in Appendix 1: Tables 9 to 11) help explain why including pre-school cognitive scores does not change the above observed pattern. First, the impact of pre-cognitive skills on subsequent academic achievements is relatively stable across school grades, so including pre-cognitive skills which are in favour of females in the regressions tends to change the estimate of the male dummy by the same magnitude. Second, including pre-school cognitive skills in the test score regressions while improving the explanatory power of all included explanatory variables leaves a substantial part of students’ academic achievements unexplained (the maximum R2 is 0.35, as shown in Appendix 1: Tables 9 to 11).

5.2 Estimates of gender test score gap along the test score distribution

We next explore the heterogeneity in gender test score gaps over the distribution of student performance. Figure 1 succinctly represents estimates of gender test score gaps (the thick solid orange line) and their respective 95% confidence intervalsFootnote 17 (the thin solid orange line) along the test score distribution for reading and numeracy. While the value-added estimates are the focus of this analysis, Fig. 1 also reports gender test score gap estimates (the thick dotted brown line) for comparison purposes and their corresponding 95% confidence intervals (the thin dotted brown line) obtained using regression model 2, which does not include initial endowment in cognitive skills.

Fig. 1
figure 1

Gender test score gaps along the distribution by test subject and grade. Panel a: Reading. Panel b: Numeracy

Value-added estimates for gender reading test score gaps (panel A, Fig. 1) show male students’ statistically significant advantage in grade 3 reading observed earlier at means may have been driven by those in the middle (around the 50th percentile) or top (above the 90th percentile) of the distribution because estimates are statistically significant at these percentiles only. In contrast, females statistically significantly outperform males in grade 7 reading roughly around the median of the distribution. Thus, despite the mean test score gap being statistically indistinguishable from zero, the distributional investigation suggests female students’ statistically significant advantage in grade 7 reading. However, statistically significant differences in reading scores by gender are not observed at any other remaining percentiles or test grades. Also it is noted that controlling for pre-school cognitive skills reduces the gender reading test score gap favouring female students in terms of the magnitude and statistical significance in nearly all percentiles.

Turning to value-added estimates on a gender test score gap in numeracy (panel B, Fig. 1), males outperform females over virtually the whole distribution and in all grades. Additionally, the gender numeracy test score gap is more pronounced at the upper end of the distribution. A widening gender test score gap in numeracy is also observed as students advance through school. Furthermore, the steeper slope of the gender test score gap line at the higher end of the distribution (more visible for grades 5 and 7) suggests that the observed widening gender numeracy test score gap favouring male students may have been driven by top performing students. Finally, including students’ pre-school cognitive ability is found to increase the gender numeracy test score gap favouring male students in terms of magnitude and statistical significance.

In summary, the above analysis of the gender test score gap across the distribution indicates that focusing on mean gap could overlook important policy relevant heterogeneity across the distribution. Furthermore, this analysis highlights the importance of controlling for pre-school cognitive skills in analysing the gender test score gap. In particular, the results from quantile regressions indicate that controlling for pre-school cognitive skills closes down the gender gap favouring females in reading, while increasing the gender gap favouring males in numeracy, and this pattern holds at all points of the test score distribution.

6 Empirical decomposition results

We next discuss about the decomposition results using the methods outlined in Section 4.2. Tables 3 and 4 report the estimated total male-female test score gap, together with its contributing factors at the mean and selected percentiles, separated by grades for reading and numeracy, respectively. Figure 2 displays concise estimates of total gender test score gap (with their 95% confidence intervals) and the characteristic and return effect along the whole distribution for reading and numeracy.Footnote 18 Estimates of the total gender gap (results are reported on the first row of Tables 3 and 4) are largely similar to those obtained from regression model 1 (results are reported in Table 2 and Fig. 1). Tables 3 and 4 show that the estimated total gender gaps are statistically insignificant at some points of the test score distribution for some test subjects or grades (for instance, at the 90th percentile of grades 3 and 7 reading, at means and all percentiles of grade 3 numeracy and at the 10th percentile of grades 5 and 7 numeracy). As it is not meaningful to explain the total gender gaps which are statistically insignificant, the focus is on the decomposition results where the gaps are statistically significant.

Table 3 Contributions to the male-female test score gap at mean and selected percentiles by grade—reading
Table 4 Contributions to the male-female test score gap at mean and selected percentiles by grade—numeracy
Fig. 2
figure 2

Decomposition of test score gap along the distribution by test subject and grade. Panel a: Reading. Panel b: Numeracy

Decomposition results for reading (Table 3 and Fig. 2, panel A) show that estimates for the characteristic effect are negative and statistically significant, implying that gender differences in observable characteristics predict an advantage favouring female students in reading scores. In addition, estimates of the characteristic effect are of the same sign and largely similar magnitude as those for the total gap, indicating that female students’ advantages in reading are greatly attributable to their more favourable endowments of characteristics promoting reading scores. This is the case when the total gap is examined either at means or along the distribution. In contrast, the return effect plays a smaller role in contributing to the total gap since its estimates are statistically insignificant (at almost all selected percentiles) or of an opposite sign to the total gap estimates (at virtually the entire distribution of grade 3 reading test scores as can be seen in the first graph in panel A of Fig. 2). Regarding the contributions of the characteristic effect, estimates from Table 3 indicate that gender differences in pre-school cognitive skills play the most significant role since their estimates are statistically significant, of the same sign and largely similar magnitude as those of the total characteristic effect. In contrast, estimates for factors other than pre-school cognitive skills suggest that they contribute little to the total characteristic effect since their estimates are usually statistically insignificant or small in size. The aggregate decomposition results (either at means or along the distribution) additionally suggest a decreasing role of the characteristic effect in contributing to the total gap as students advance to higher grades.Footnote 19 This is consistent with the declining contribution of initial cognitive skill endowments to the total characteristic effect as students progress through school.Footnote 20

Table 4 and Fig. 2 (panel B) show the characteristic effect is negative and statistically significant, indicating that gender differences in observable characteristics predict an advantage in favour of female students in numeracy. Similar to the gap in reading, pre-school cognitive skills account for most of the characteristic effect in the case of the numeracy gap. In contrast, the return effect is positive and statistically significant, suggesting that male students are better able to convert educational inputs into higher numeracy test scores. Since the return effect dominates the characteristic effect, whether at the mean or along the distribution, the total gender numeracy score gap is positive, suggesting that male students outperform female students in numeracy. However, consistent with the regression results from regression model 1, estimates of the total gap are statistically significant in grades 5 and 7 only. Panel B in Fig. 2 additionally shows that at grades 5 and 7, the characteristic effect line diverts from the zero horizontal line along the test score distribution (i.e. the effect is more negative), suggesting that female students at the higher end of the distribution possess more of the characteristics associated with higher numeracy scores. In addition, the return effect line diverts from the zero horizontal line along the test score distribution, indicating that male students at the higher end of the distribution are more efficient in transforming education inputs into higher numeracy test scores. The combination of these two opposite trends explains the widening gender numeracy test score gap in favour of male students along the distribution.

In sum, consistent with the regression results presented in Section 5, the above decomposition analysis of the gender test score gap highlights the role of pre-school cognitive skills in explaining the gap. These decomposition results further suggest that failing to account for initial academic skills would considerably limit the ability to explain factors contributing to the gender test score gap.Footnote 21,Footnote 22 However, a large part of the gender test score gap remains unexplained in this study, as has also been reported in the previous international studies (Sohn 2012; Gevrek and Seiberlich 2014; Golsteyn and Schils 2014). Similarly, our finding of an insignificant role of the return part in explaining the total gender test score gap in reading (grades 5 and 7) is in line with findings from previous studies of primary school students from the Netherlands (Golsteyn and Schils 2014) or the USA (Sohn 2012). Unfortunately, why the large part of the gender test score gap remains unexplained and why the return part plays an insignificant role in explaining the total gender test score gap remain open questions, suggesting a need for more research on factors driving the gender test score gaps. The decomposition analysis additionally suggests that focusing on only the mean gap overlooks important policy relevant heterogeneity across the distribution. It is interesting to observe that while the test score gap favouring females (i.e. in reading) is mostly due to differences in pre-school cognitive skills, the test score gap favouring males (i.e. in numeracy) is mainly due to differences in returns (i.e. the unexplained part). The significant female advantage in pre-school cognitive skills suggests the test score gap favouring females is usually due to differences in pre-school cognitive skills; however, the test score gap favouring males is largely due to differences in returns, which remains unanswered in this study, consistent with previous studies (Sohn 2012; Golsteyn and Schils 2014). To this end, further research into factors contributing to male students’ greater efficiency in transferring education inputs into higher test scores would be worthwhile.

7 Conclusions

Drawing on the recent and nationally representative panel of Australian children, the patterns and factors contributing to the gender test score gap in academic achievements over the first 7 years of schooling have been examined. Regression results reveal that males excel at numeracy across all grades, whether at means or along the distribution. While mean regression results indicate a male advantage in grade 3 reading, quantile regression results show this gender test score gap is generally driven by those in the middle or top of the distribution. In addition, while mean regressions do not show noticeable gender differences in grade 7 reading, quantile regression results suggest females do outperform males at the lower end of the test score distribution. The regression results herein also reveal a widening gender test score gap in numeracy as students advance in their schooling. Quantile regression results additionally suggest that the widening gender numeracy test score gap favouring male students may have been driven by top performing students.

Applying an OB decomposition method, the impacts of gender differences in resources and their returns on academic achievements have been examined. The main results are that gender disparities in pre-school cognitive skills can explain a considerable part of the differences in academic performance. Female students are better endowed with pre-school cognitive skills and they use them to achieve better scores or reduce their score disadvantages relative to male students.

This paper has documented that differences in pre-school cognitive skills considerably help explain the gender test score gaps observed during primary and early secondary school years. While these findings cannot be interpreted as causal, given the descriptive nature of the paper, they contribute to understanding gender test score gaps, with results useful in informing the direction of future interventions aimed at reducing the gender test score gap. Many questions remain unanswered, with a large part of the gender test score gap remaining unexplained, and no increased understanding in why the test score gap favouring males is largely due to differences in returns, indicating more research on the relationship between gender and educational achievement is warranted.

From a policy perspective, it is important to understand the patterns as well as the factors contributing to the gender test score gap, not only at the mean but along the distribution of the test score. One of the results from this study is the finding that pre-school cognitive skills play a significant role in explaining the gender test score gap observed up to seventh grade. This result suggests that policies aiming at reducing the gender test score gap should be implemented even prior to students enrolling at school. This policy implication is in line with that from the skill development literature, which usually shows early intervention is more beneficial than late intervention (Heckman 2000). Another finding of the heterogeneity of the gender test score gap across the distribution indicates that such policies should be targeted at some particular student groups.