1 Introduction

Teaching quality is critical to fostering student learning. An effective teacher can significantly reduce the learning gap between low- and high-performing students (Kane and Staiger 2012; Araujo et al. 2016). A high-quality teacher can produce a gain in achievement equivalent to one additional school year (Rivkin et al. 2005). Good teachers are also associated with lower school dropout rates and can have long-term impacts on outcomes such as university enrollment and income (Chetty et al. 2011).

Teachers’ effectiveness cannot be predicted by observable characteristics such as the level of education, experience (at least after the first few years), age, or the type of contract (Araujo et al. 2016). Effective teaching is associated with the quality of the interactions between teachers and their students, with what happens inside the classroom and how the teacher uses the time in class. Students learn more when teachers spend more class time on academic activities, keep students engaged for longer periods of time, and minimize the time spent on classroom management activities (taking attendance, explaining the schedule for the day, distributing papers, and so forth) (Bruns and Luque 2015; Stallings and Knight 2003).

But what happens when teachers engage more or differently with some students than with others? This paper addresses whether differences between boys and girls in learning outcomes are associated with differences in the quantity and quality of teacher–student interactions. Research on teachers’ effectiveness concludes that classroom dynamics and interaction matters, but little work has been done to understand whether teachers’ interactions vary depending on the gender, race, or income level of the student and how these differences affect learning outcomes. The evidence in this paper contributes to the broader discussion of whether traditional measures of teacher–student interactions that focus, in an aggregate manner, more on quality than on quantity capture all that matters for learning.

Using videotapes, we analyze teaching practices in Chile, a country with one of the largest gender gaps in academic achievement. The results show a pattern of differentiated attention favoring boys. Our measures of gender biases (or differentiated attention) in the classroom correlate with the overall quality of teacher–student interactions measured with the Classroom Assessment Scoring System (CLASS). Teachers who perform worse on CLASS also tend to demonstrate greater attention bias by student gender.

We also estimate the correlation between attention bias in the classroom by gender and students’ scores on Chile’s national test, the Sistema de Medición de la Calidad de la Educación (SIMCE). Girls whose teachers demonstrate greater attention bias favoring boys tend to have lower math scores. The measures of gender bias for student-initiated actions (calling out when not called on) have a statistically significant positive correlation with boys’ scores in math and a statistically significant negative correlation for girls. These correlations do not indicate causation.

The paper is organized as follows. Section 2 presents an overview of the literature. Section 3 presents the conceptual model, the sample characteristics, and the setting. Section 4 describes the coding strategy and variables. Section 5 presents the findings and analyzes the correlation between teacher gender bias and student academic performance, as well as between teacher gender bias and measures of the quality of teacher–student interactions. Section 6 summarizes the paper’s conclusions and identifies policy implications.

2 Previous literature

Gender differences in educational outcomes have received increasing attention in recent years (Bos et al. 2014; Guiso et al. 2008; OECD 2014; Straus 2015). Many countries have marked gender gaps in enrollment, graduation, or learning achievements, some favoring girls, others favoring boys. On international tests, boys tend to perform better in math and science while girls tend to score higher in reading and language. On the OECD’s 2015 Programme for International Student Assessment (PISA),Footnote 1 for example, average girls’ scores were 30 points higher than boys’ in reading, and boys’ average scores were 5 points higher than girls’ in math (OECD 2015b). Girls outperformed boys on reading in all 72 countries that administered the 2015 PISA; boys outperformed girls in 47 countries.

Math results are especially important, because they are good predictors of future achievement; school test scores and results on university access exams determine, among other factors, career decisions and contribute to differences in employability, occupational segregation in the labor market, and differences in earnings (Bos et al. 2014; Heckman 2011; Mizala 2014; Murnane et al. 1995; Ñopo 2012; Paglin and Rufolo 1990; Lavy and Sand 2015; Terrier 2016).

Using data on Chile, Bharadwaj et al. (2015) show that learning gaps in math (favoring boys) increase with age and over the schooling cycle. They do not find any correlation with parental background or investment in the child, unobserved ability, the gender of the teacher, or other observable variables of the classroom environment. Research from the USA finds that boys are more sensitive than girls to environmental influences, such as home environment and neighborhood quality (Autor et al. 2016; Bertrand and Pan 2013; Chetty et al. 2016). Carneiro et al. (2017) find the opposite pattern in Ecuador, where daughters of mothers with university education outperform girls whose mothers have only primary education by 0.58 standard deviations; the figure for boys is 0.47 standard deviations.

As teachers are a key component of students’ learning, their behavior could also potentially be associated with gender gaps in educational outcomes. Research based on classroom observations in the USA and Latin American countries shows that most effective teachers spend more time using “active” instructional methods (directly engaging with students through explanation, lecture, and questions and answers) than “passive instruction” (walking around the classroom monitoring as students work individually on their assignments). Effective teachers also minimize the time spent on classroom management activities (taking attendance, explaining schedule for the day, distributing papers, and so forth) (Stallings and Knight 2003; Bruns and Luque 2015).

Carneiro et al. (2017) examine whether differences in test scores between boys and girls in Ecuador vary with differences in the quality of the teacher–students interactions (measured by CLASS). Despite the significant gap in math test scores (in favor of boys), they find that, with their data, the quality of interactions cannot predict these gender differences in learning outcomes. As the authors state, however, their data would not reveal whether there was generalized behavior among teachers in Ecuador biased against girls. CLASS measures the general quality of teacher–student interaction; it does not disaggregate teachers’ attention and actions by student characteristics.

Related research finds that teachers’ gender biases in primary school affect the academic achievement gap during middle school and secondary school and enrollment in advanced level courses in math and science during secondary school (Lavy and Sand 2015; Terrier 2016). Measuring teachers’ gender biases by comparing “blind” and “nonblind” (with and without the name of the student) classroom exams in Israel, Lavy and Sand (2015) show that teachers seem to unconsciously discourage female students by underestimating their abilities while overestimating the skills of their male classmates. Using the same approach, Terrier (2016) finds evidence of gender bias against boys among middle school teachers in France.

Only a few studies have used videotapes to look at gender bias in teacher classroom practice (e.g., Davis 2000; Sadker and Sadker 1994). Sadker and Sadker (1994) analyze videotapes to do so in the fourth, sixth, and eighth grades in four states and the District of Columbia in the USA. Their analysis shows the following:

  1. 1.

    Teachers interact more with boys than with girls.

  2. 2.

    Boys receive more praise, criticism, and remediation than girls.

  3. 3.

    During a discussion, boys are eight times more likely than girls to call out (shout out answers even when not called on).

  4. 4.

    Teachers are less likely to reject behavior by boys, even if it violates classroom rules.Footnote 2

  5. 5.

    Girls receive more “acceptance” (a bare acknowledgment of their work, such as “uh-huh” or “okay”) than boys.

  6. 6.

    Girls who receive less attention from their teachers may come to underestimate their abilities and lose motivation.

The literature on developed countries (most of it based on qualitative analysis of small samples) indicates that getting more of a teacher’s attention—whether positive (e.g., responding to or working one-on-one with the student) or negative (e.g., disciplining the student)—has consequences for students’ performance (Sadker and Sadker 1994). Most research points to a prevalence of gender bias in favor of boys across subject areas and school environments, mostly in the form of teachers giving more attention to boys than girls (AAUW 1992; LaFrance 1991; Sadker and Sadker 1994; Sadker et al. 1993).

3 Setting and methodology

3.1 Study case

Chile presents a paradox with respect to gender. It has full equality in health and survival and parity in literacy and enrollment in education, but it lags behind with respect to women’s economic participation and opportunity (WEF 2015).

Gender gaps in academic achievement are also large in Chile. First, more boys than girls are enrolled in primary school (Table 1). Among all fourth graders in Chile, there were 1.2 boys for every girl in 2011 and 2012. In the sample used for this study, the ratios were 1.4 in 2011 and 1.2 in 2012.

Table 1 Ratio of boys to girls in sample schools and all schools in Chile, in 2011 and 2012

Second, Chile’s gender gap on the 2015 PISA in math was the third-largest among member countries of the OECD, with boys outperforming girls (OECD 2015a, 2015b).

Third, among students completing tertiary education, the share of women is one of the highest among OECD countries, but female labor force participation is one of the lowest. The completion rate is higher for girls than boys in Chile at almost all education levels (Bassi et al. 2015).

3.2 Conceptual model

International evidence reveals significant gender differences in educational outcomes that increase with age and schooling cycle (Kersey et al. 2018; Banjong 2014; Robinson and Lubienski 2011). To explain those differences, we hypothesize that teachers, an important ingredient in the production function of learning, may be doing something different in their interactions with different groups of students. Based on existing evidence, we hypothesize that the amount of attention teachers devote to students is important for learning and that it is different for boys and girls.

Why would teachers behave differently with boys and girls? Teacher biases may reflect a combination of teachers’ reactions to students’ behaviors and teachers’ own expectations of students, beliefs, and stereotypes. It is well documented that boys and girls behave differently in class (Measor and Sykes 1992; Erden and Wolfgang 2004; Sadker and Sadker 1994); teachers may just be responding to these differences. However, evidence also documents teacher biases and the fact that they are often unaware that their interactions are different with different groups of students (Consuegra et al. 2016; Terrier 2016; Lavy and Sand 2015; OECD 2015a; Shumow and Schmidt 2013; Robinson et al. 2011; Van Duzer 2006; Olszewski-Kubilius and Turner 2002; Tiedemann 2002; Li 1999; Fennema et al. 1990).

Traditional measures such as CLASS scores may be overlooking important dimensions of the quality of interactions. A study from Ecuador shows that the quality of interactions between teachers and students affects differences in math scores among children in early elementary school (Carneiro et al. 2017). CLASS looks at teacher class organization, instructional support, and emotional support (see Appendix). Boys tend to be more active than girls; teachers may instinctively pay more attention to boys in order to control the classroom. Within the pool of “good teachers,” CLASS will be unable to distinguish between teachers who pay more attention to boys to control and maintain order in the classroom and teachers who try to balance their interactions with boys and girls. The fact that CLASS does not capture attention to different groups of students may explain why Carneiro et al. (2017) find no correlation between CLASS and gender differences in learning outcomes.

The literature on classroom interactions focuses on the nature, quality, and quantity of interactions between teachers and students in different domains in the classroom (Pianta et al. 2008a; Stallings and Knight 2003; Bruns et al. 2016). CLASS does not examine the content of what is being taught or the specific activities carried out. Using a very detailed protocol for each dimension, observers give scores (on a 1–7 scale) per domain (classroom organization, instructional support, emotional support) and overall (see Appendix). Each observation is based on two 15–20-min segments of the class.

A different strand of the literature examines what happens in the classroom. It generates quantitative data on teachers’ practice by coding segments of time and observing activities and participation (Bruns et al. 2016). The coding strategy and the variables generated for this study are more closely related to this line of research. One of the advantages of using coding methods based on counting occurrences and time the length of interactions, is that this method can generate quantitative data from qualitative information.

3.3 Data and methodology

The sample includes 237 tapes (almost 590 h of videotaped classes) from the classrooms of 137 academically low-performing schools in Chile (based on 2009 SIMCE scores). These schools belong to a random sample designed for an impact evaluation of a program implemented in 2011 by Chile’s Ministry of Education to improve learning outcomes in math and language of students from pre-kindergarten to fourth grade (Bassi et al. 2016). Eligible schools included public and subsidized private schools in which the average SIMCE scores in 2005–2009 in math and language were below the national average (250) and the number of students per level from pre-kindergarten to fourth grade averaged at least 20. For the year of the tapings (2012), SIMCE scores were available by school, student, and teacher.

SIMCE has been applied to all schools in Chile annually since 1988 to monitor students’ learning in fourth, eighth, and tenth grades in math, reading, and social sciences.Footnote 3 SIMCE also collects information about teachers, students, and parents, through complementary questionnaires. The test is applied in October to early November every year. In this study, we use SIMCE test scores for fourth grade in 2012 for the schools in our sample.

As the sample used for this study is not representative, results cannot be inferred to all Chilean schools. The gender gap in the sample schools is higher than average; it may, therefore, be easier to detect teachers’ bias in low-performing schools. The large gender gap observed in both PISA and national exams is representative of all schools, however.

The videos were coded using the CLASS instrument, which provides good measures of specific dimensions of teaching quality (Pianta et al. 2008a). Each of the 237 videotapes shows a single teacher instructing fourth grade students. One teacher in each of the fourth grades of the 137 schools in the sample was videotaped for four pedagogic hours. The tapes cover 69 math classes and 168 classes on other subjects, mainly language arts.Footnote 4

The program implemented in 2011 by the Ministry of Education did not include any component or activity explicitly addressing gender bias in teaching. As the videotapes were produced for another purpose, they proved amenable for studying gender bias in the classroom, as there was no reason to believe that teachers may have altered their behavior in this respect. The taping strictly followed the protocol of the upper elementary version (fourth to sixth grade) of CLASS (Pianta et al. 2008b). Several studies link better student outcomes (in both learning and the development of socioemotional skills) with teacher scores.Footnote 5

Most of the schools in the sample are public schools with students of medium-low socioeconomic status (according to the SIMCE classification) (Table 2). The average 2012 SIMCE score for schools in the sample for combined subjects was 243 (Table 3), 20 points below the national average for that year of 263. The score gap by gender is 10.0 points in language arts (favoring girls) and 4.6 points in social sciences (favoring boys), both similar to the gaps at the national level. In math, the gap is 5.8 points (favoring boys), about 3 points higher than Chile’s average gap. All the differences between boys and girls in test scores are statistically significant, with p values for the test of differences in means of 0.

Table 2 Descriptive statistics of schools in the sample
Table 3 Average 2012 SIMCE scores of fourth grade schools in the sample and in all fourth grades in Chile

A teacher questionnaire, including questions on educational background, experience, and tenure (number of years in the same school), complemented the videos. Most teachers in the schools studied were women (92.5%) (Table 4). The fact that the sample included very few male teachers made it impossible to differentiate teacher’s behavior by their gender. About 90% of teachers had university degrees. Almost the same number of teachers had less than 5 years of experience, 5–10 years, 11–24 years, or more than 25 years.

Table 4 Descriptive statistics of teachers in the sample

4 Coding strategy and categories

This study builds on the work of Sadker and Sadker (1994), expanding their coding categories (of remediation, praise, criticism, and acceptance). Following these categories, we “quantify” teachers’ interactions (episodes) with girls and boys during these classes. In addition, for each class, we track the amount of time teachers spend with girls versus boys, count the number of interactions by gender, and analyze teachers’ responses (distinguishing positive responses from negative ones). We then correlate these data with students’ scores on the SIMCE and with teachers’ performance (as measured by CLASS).

We refine and complement the conceptual framework of Sadker and Sadker with other dimensions based on the literature on gender bias in the classroom, including time spent with girls versus boys and the level of control a teacher has over the classroom. The coding scheme used in this paper consists of seven variables. The first four are from Sadker and Sadker (1994):

  1. 1.

    Praise. Praise can be provided after correct answers are given, unsolicited, or general. Examples include “Good job,” “That was an excellent paper,” and “I like the way you’re thinking.”

  2. 2.

    Criticism. Criticism includes negative comments or discipline that provides an explicit statement that the work done or a particular behavior is not correct. Examples include “No, you’ve missed number four” and “This is a terrible report.”

  3. 3.

    Remediation. Remediation involves helping a student, encouraging him or her to correct a wrong answer, or expanding and enhancing his or her thinking. Examples include “Check your addition” and “Think about what you’ve just said and try again.”

  4. 4.

    Acceptance. Acceptance is an acknowledgement of a correct answer given when a student is called on, when the student calls out, or during quiet work. Examples include “Uh-huh” and “Okay.”

  5. 5.

    Call on. Call on occurs when a teacher calls a student by name and asks him or her to answer a question, speak to the class, or participate.

  6. 6.

    Time spent. Time spent refers to time spent with individual students or groups of students of a single gender (divided into time segments).Footnote 6

  7. 7.

    Call out. Call out occurs when a student shouts out an answer when not called on by the teacher.

This study distinguished between classroom behaviors initiated by the teacher and classroom behaviors initiated by the student (Fig. 1). The coding sheets also include a final section where coders can indicate their perception of the teacher’s level of control in the classroom (on a 1–3 scale, from poor control to good control) as well as the teacher’s level of gender bias (on a 1–3 scale, from no obvious gender bias to significant/obvious gender bias).

Fig. 1
figure 1

Coding Scheme. Note: Categories are a refined and expanded version of the categories in Sadker and Sadker (1994)

Except for time spent (the time teachers spend with girls versus boys), coding consists of counting specific events that take place during a video segment and classifying them under a specific category (e.g., praise, criticism, remediation). In this study, coders counted events only when they clearly observed them in the video.

We used a formative method, starting the coding with a random pilot sample of 10% (19 videos). We took this approach not only as a test of the coding variables but also to search for other dimensions that merited coding in the full sample. Based on the results of the pilot sample, we added variables to capture teacher control of the classroom and disaggregated subjects into math and language arts/other subjects, based on the “embedded” nature of some of the language instruction and the similarity of non-math instruction.

The language arts classes were not the only classes that focused on language; grammar and other aspects of language instruction were also incorporated into subjects such as social studies. Every tape contained some language instruction, and all subjects except math used similar pedagogical techniques, which included reading out loud by the teacher, individual students, and often the whole class in unison. Math was taught in a different way, with teachers using considerable board work and never reading out loud.

Five carefully trained coders (University of Virginia college students, four women and one man) coded 237 videotapes between October 2014 and November 2015. This coder team was different from the CLASS coding team (described in Appendix). Although coders received thorough training, as well as a template to ensure that they applied the same criteria, video coding involves a subjective component that could result in measurement error. To address this potential problem, we had two coders independently code 41% of the videos. The intercoder reliability rate was 90.7%,Footnote 7 a good result compared with other studies.Footnote 8

5 Results

5.1 Gender-biased variables

The videotapes were not recorded with any intent to study teachers’ gender bias in the classroom. It is thus highly unlikely that teachers modified their behavior with respect to this variable because they knew they were being videotaped.

All of the variables studied show a bias in favor of boys (Table 5). In math, differences are statistically significant in criticism, acceptance, call on, and call out. In other subjects (mainly language arts), differences for all variables except praise are statistically significant.

Table 5 Teacher behaviors in interactions with boys and girls, by subject

Although we did not document which students were actually present in the classroom on the day of the taping, we did document class enrollment by gender. As gaps may partly reflect the fact that there were a few more boys than girls in the classrooms observed, we repeated the measures controlling for the number of students of both genders.

The results show that teachers’ bias in favor of boys seems to be robust to the adjustment for the number of boys in the classroom (Table 6). Except for praise in non-math subjects, all gaps remain positive (in favor of boys). Differences in criticism and call out are still statistically significant; call on, acceptance, and time spent are no longer statistically significant. In math, gaps in criticism, acceptance, and call out remain statistically significant.

Table 6 Teacher behaviors in interactions with boys and girls after adjusting for number of boys and girls in the classroom, by subject

This analysis tested only for differences in the mean of the measures between boys and girls. We also evaluated the equality of the entire distribution of the measures across gender by graphical inspection and by performing the Kolmogorov-Smirnov test of equality of distributions.

Using the variables described above, we constructed two measures. For all teacher-initiated actions (TIAs) (the four Sadker and Sadker variables, call on, and time spent), we take the first principal component of each factor by subject and plot its distribution for boys and girls. We also look at call out, a student-initiated action (SIA).

The last column in Table 5 shows the correlation of the different factors of the TIA measure with the standardized first principal component. Most factors are strongly correlated with the principal component. The first principal component of teacher behavior in math classes mainly captures call on, praise, and remediation; in other subjects, it mainly captures call on, time spent, and remediation. For the two types of actions, we plot the distribution for girls and boys separately for math and language arts/other subjects. We report the p value of the test of equality of the principal component across gender.

Figure 2 shows the distributions of the TIA first principal component and the SIA for girls and boys in math (both measures control for the number of students in the class). Panel a presents the TIA first principal component; panel b illustrates SIA (call out). Both types of actions show statistically significant differences in the distributions for boys and girls, with the girls’ distribution skewed to the left. There are almost no cases of high attention to girls (call ons) and few or no cases of high participation (call outs) of girls.

Fig. 2
figure 2

Distribution of SIMCE math scores of fourth grade girls and boys. a Teacher initiated (pca), b Student initiated (callout). Source: authors’ analysis based on primary data

Figure 3 repeats the exercise for other subjects (mainly language arts). It shows consistent differences between the distributions by gender. Gender differences in teachers’ negative responses to SIAs are also significant.

Fig. 3
figure 3

Distribution of SIMCE language/other subject scores of fourth grade girls and boys. a Teacher initiated (pca). b Student initiated (callout). Source: authors’ analysis based on primary data

Based on tests of equality of distributions of the TIA and SIA measures for boys and girls, we reject the null hypothesis that teachers interact with girls and boys in the same manner.

5.2 Correlation between gender variables and SIMCE and CLASS scores

This section investigates (a) the association between the observed measure of teachers’ gender bias and observed measures of other dimensions of teacher–student interactions and (b) how gender bias in the classroom relates to students’ performance on the SIMCE. To do so, we create an indicator for gender bias in TIAs and SIAs. The indicator TIA is the difference between the first principal components of TIAs for boys and girls. The indicator SIA is the differences between boys and girls in calling out answers.

5.2.1 Gender bias and CLASS scores

We coded the same 237 videotapes using the CLASS instrument for fourth to sixth grades (Pianta et al. 2008b). The details of the CLASS coding scheme are described in the Appendix. CLASS measures the quality of teacher–student interactions in three main domains: emotional support, classroom organization, and instructional support. Coders scored these interactions on a scale of 1–7 (Table 7). The average CLASS score for the teachers in our study was 3.95. Only one teacher scored above 5, and no teacher scored less than 2. Scores tended to be higher in class organization and lower in instructional support.

Table 7 CLASS scores of teachers in the sample

Table 8 presents the results from estimating a regression of the gender bias indicators on the CLASS score (the first principal component of CLASS dimensions) and the disaggregated CLASS dimensions. All regressions include the following covariates:

  • The ratio of boys to girls in the classroom

  • Characteristics of the school (type of administration, income decile, and experience and tenure of the school principal)

  • Characteristics of the classroom teacher (total experience, tenure in the school, and whether the teach has or is pursuing a graduate degree).

Table 8 Correlation between CLASS (pca) and gender bias, as measured by teacher-initiated and student-initiated actions

In columns (3) and (4), we include school fixed effects, to control for any permanent differences in school idiosyncrasies in gender attitudes. We weight observations by class size (the results are very similar in unweighted specifications). In the specifications in which we do not control for school fixed effects for the two measures (TIA and SIA), the average CLASS score is negatively correlated with gender bias (worse teachers have higher gender bias).

The results are similar for the domains of the CLASS score:

  • Better class organization is associated with less gender bias.

  • The correlation with the emotional support score is negative for both TIA and SIA.

  • Correlations for instructional support are not statistically significant. However, when school fixed effects are added, the association between the quality of teacher–student interactions and the TIA indicator for gender bias disappears. For the SIA indicators in classes in which the quality of teacher–student interactions is better according to the CLASS measurements, there is more active participation of boys relative to girls. This finding may suggest that better interaction with students empowers boys at the expense of girls in the classroom.

  • Overall, the results without fixed effects indicate an inverse correlation between measures of the quality of teacher–student interactions and various measures of gender bias, especially emotional support and class organization. The relationship between the quality of interactions and gender biases in teachers’ attention are no longer significant in the fixed effects model. The results of the fixed effects model are driven by the 103 schools in the sample (47%) that have more than one fourth grade classroom. The results in this model are more consistent with the findings of the study of Ecuador by Carneiro et al. (2017). Although the comparison across schools suggests that higher CLASS scores are associated with less gender bias, the within-school variation seems to suggest that after controlling for permanent school characteristics, better CLASS scores are associated with significantly higher participation rates of boys relative to girls. Whether the measures of the quality of interactions are capturing all relevant dimensions of students’ well-being within the classroom remains unclear.

5.2.2 SIMCE scores and gender bias

Paying more attention to some students could be associated with better learning outcomes. Our results cannot be interpreted as causal effects, however. Students were not randomly allocated to teachers, so the data do not allow other possible effects of unobserved variables to be isolated. For example, the best students (both girls and boys) may have more engaged parents, who get involved with the school to make sure their children get the best teachers, who may also be more capable of providing equal support to all students. The analysis below analyzes whether teacher gender bias is correlated with test scores.

To do so, we use the SIMCE, which fourth graders take every year. Table 9 presents the results of simple ordinary least squares regressions of SIMCE scores by discipline (reading, math, and science) for girls and boys. As explanatory variables, we use the two measures of gender bias (TIA and SIA). In all specifications, we include other covariates (to control for student, school, and teacher characteristics) as well as the CLASS score (to control for the quality of teacher–student interactions). All regressions control for the ratio of boys to girls in the classroom and school fixed effects. Along with the coefficients, we include the normalized coefficient that translates effects into percentages of a standard deviation of the test score for the corresponding gender group.

Table 9 Correlation between gender bias and SIMCE scores

In the case of TIA, the results show significant negative correlations between gender biases and the test scores of girls in math and science: Girls whose teachers demonstrate greater gender bias have lower tests scores in these subjects. The effects are also big, particularly for math. For every 1 standard deviation increase in TIA, girls’ scores on the SIMCE decrease by 34.6% of a standard deviation in math and 5% of a standard deviation in science, after controlling for student, school, and teacher characteristics.

SIA (call out) is significantly and negatively correlated with girls’ test scores in reading and math. Coefficients suggest a larger impact on reading scores than TIA. A 1 standard deviation increase in SIA is associated with a decrease of 18% of a test score standard deviation in math and of 15% of a test score standard deviation in science. Girls perform significantly better in math relative to boys in classroom with higher gender equality in SIA.

Table 10 shows the association between gender bias and SIMCE scores by the socioeconomic status of the school. The negative correlation between gender bias and girls’ learning is stronger in low socioeconomic status environments, and the positive correlation of gender bias on learning for boys is magnified in high socioeconomic status schools.Footnote 9

Table 10 Correlation between gender bias and SIMCE scores, by socioeconomic status of the school

In sum, the results suggest a correlation between gender bias (both from teachers and in terms of student behavior) and test scores. The magnitudes are large: the variation in SIMCE scores that is associated to the variation in gender bias explains 5–35% of the total variation in SIMCE scores. Our data do not permit claims to be made about causality, but the results are consistent with arguments well documented in the literature that point to the importance of teacher–student interaction for learning.

6 Conclusions

This study finds differences in the amount and type of attention teachers devote to girls and boys in the classroom in Chile. The fact that clear patterns already appear in fourth grade students is important, given that gender gaps in test scores tend to increase as students progress through school grades (Bharadwaj et al. 2015). As established in the education literature, receiving more teacher attention (positive or negative) may affect motivation, aspirations, and performance, as well as long-term outcomes, such as decisions about college, employment possibilities, and earnings.

Biases are often unconscious. They are based on myths and beliefs that are not necessarily grounded in evidence or even direct experience. For example, there is a general perception that girls talk more in class than boys. In one of their studies, Sadker and Sadker (1985) show a film of a classroom discussion and ask teachers and administrators which gender talked more. Although the quantitative data showed that boys talked three times as much, the majority of teachers claimed that girls talked more than boys.Footnote 10

The fact that attention biases by gender are particularly related to math scores raises the question of why girls do better in language. An important factor behind these results could be that the process of learning is different in math than in other subjects. Reading can be improved by continued reading (something students can do on their own), whereas mathematical thinking requires “engaging students in posing and solving problems” (Fite 2002). The fact that certain subjects are more teacher-dependent than others could explain part of these differences.

Teachers’ use of instructional time, the quality of instructional support, the use of materials (including information and communications technology), the quality of classroom management, the ability to keep students engaged, and the emotional support provided to students seem to be important factors in learning. Making sure that all students benefit from good teaching practices is as important as how frequently good practices are used.

In terms of stimulating broad participation of all students in the classroom, certain pedagogical techniques—such as calling on the whole classroom, asking for quick answers or rushing students, and accepting call outs from students even if they have not respected their turn by raising their hands—seem to be especially problematic for girls. Raising teachers’ awareness of the gender biases implicit in these practices should be combined with a review of the extent of gender bias in textbooks.

Whether they favor boys or girls, gender gaps hinder the possibility of developing all students’ full potential. Educational policy should take into account impacts on and unintended consequences for both groups. Further research on why teachers pay more attention to boys than girls would help policy makers craft specific interventions to generate school environments more prone to learning for all students, regardless of their background.