Keywords

7.1 Inequality in Teacher Quality: The Conceptual Terrain

Our study has focused on the relationship between teacher factors and mean student achievement. However, average performance can conceal massive differences among different groups of students. Hypothetically two countries can have very similar mean achievement but dramatically different distributions in achievement. This is the issue of educational equity, which has become a major focus of policymakers and researchers since at least the 1960s. In fact, a persuasive argument can be made that educational equity is as important as mean achievement.

Concerns about equity are grounded in two issues, one practical and the other normative. First, despite the argument that there is an equity-efficiency trade-off (that overall increases in student learning come at the cost of more uneven distribution of equity in education) recent evidence suggests that no such trade-off exists, and, in reality, that greater educational equity is associated with higher average student performance (Parker et al. 2018) As a consequence, educational systems that generate more unequal outcomes may be depressing their stock of human capital by failing to tap into the potential of all of their students, with deleterious consequences for national prosperity.

Second, educational inequality is also intrinsically problematic. The implicit social contract in most modern societies is that unequal rewards in the marketplace (i.e., large differences in wealth and income) can only be justified on the basis of fair competition. Educational systems have traditionally been viewed as the key mechanism for establishing this condition, by giving all students a fair chance to develop their talents. If some students are systematically disadvantaged in their chance to earn a good education, it calls into question the legitimacy of the social order. This is particularly so when there are entire groups of children that are systematically disadvantaged based on their background circumstances, such as their gender, race and ethnicity, socioeconomic status, or place of national origin, to name just a few examples.

In outlining these conceptual issues, we have thus far glided over a very important distinction between inequality in educational outcomes and inequality in educational opportunities. While differences in educational outcomes may be strongly suggestive of background unfairness, and very high variation in student performance may signify a failure to maximize educational potential, differences in educational opportunities are more morally suspect and point to possible causes of educational inequality. It is patently unfair if some children are short-changed solely due to their ascriptive characteristics (gender, poverty, etc.), especially when those disadvantages are the product of policy. When schools are structured in such a way to ensure that more advantaged students have access to, for example, a more rigorous curriculum, higher quality teachers, or better facilities, then the educational system, and the people that manage and support it, are culpable for inequality. However, because policies are malleable, the extent to which policy is responsible for unequal opportunities indicates that these inequalities are also malleable. Policies can be changed.

Although most studies of educational inequality have focused on specific countries, international and comparative studies are extremely valuable. The specific cultural and institutional contexts may influence the kinds of inequalities that manifest in particular countries, and so require careful examination on their own terms. However, there are some inequalities that are extremely common across educational systems, and these differences can provide important lessons about what causes inequality and how to reduce it.

Arguably the most universal educational inequality is a consequence of socioeconomic status (SES). Although other types of inequality are certainly important, in virtually every educational system, students whose parents have lower incomes and less formal education perform worse by virtually any educational metric. Whether using PISA or TIMSS data, international large-scale assessments indicate that low-SES students register lower mean scores than their more affluent peers (Chudgar and Luschei 2009; Montt 2011; Schmidt et al. 2015). The precise nature of this relationship remains in dispute. While there is considerable evidence that low-SES children typically have fewer opportunities and resources in their homes and communities, the role of in-school factors remains unclear, and may vary greatly across educational systems. For example, research based on the United States indicates that high-poverty students usually have lower-quality teachers, whether measured by experience, educational background, or more sophisticated value-added modeling (Goldhaber et al. 2015). Results from the OECD’s Teaching and Learning International Survey (TALIS) similarly show lower levels of teacher professionalism in economically disadvantaged schools in multiple countries. (OECD 2016) But a group of studies (Akiba et al. 2007; Burroughs and Chudgar 2017; Chudgar and Luschei 2009) have found that, by some metrics, there are countries where more economically disadvantaged students have access to higher quality teachers. There are other in-school factors where the inequalities are more stark and consistent, however. Comparative analysis by Chmielewski (2014) and Schmidt et al. (2015) using PISA data, and Schmidt et al. (2001) using TIMSS data, indicate persistent inequalities in opportunity to learn rigorous mathematics content.

7.2 A Comparative Analysis of Inequality in Teacher Effectiveness

In this chapter, the basic approach is similar to that used in Chap. 5, except that, instead of treating mean student performance as the dependent variable, our focus is on educational inequality. Whereas Chap. 5 suggested that there was a fairly weak and inconsistent relationship between teacher quality measures and student outcomes, here we explore whether teachers’ characteristics and behavior, as measured by TIMSS items, are related to educational inequality, and consequently whether changes in teacher quality have a role in promoting greater educational equity. As in Chap. 5, we aimed to identify common patterns across time and space, with an emphasis on consistent relationships, but, as discussed in Chap. 5, there are a number of methodological and substantive limitations to this approach, so the results should be treated as preliminary.

We examined two measures of inequality: variation in student performance and differences between high- and low-SES classrooms. In our first set of analyses, we followed Montt (2011) and Mullis et al. (2016) in assessing overall inequality by using standard deviations in student outcomes as our measure. This measure of inequality captures overall differences in student outcomes without focusing on subgroup differences. More compressed distributions in TIMSS mathematics performance are considered as indicating lower levels of inequality in outcomes. As with the analyses of average outcomes, we focused on the 2003−2015 cycles of TIMSS, since many of the variables of interest were absent from the 1995 and 1999 iterations.

7.2.1 Inequality as Within-Country Variation I: Descriptives

The first step is to examine mean differences in within-country standard deviations, ignoring classroom-level effects. Country-level analysis was conducted for each country participating in TIMSS between 2003 and 2015 for both grade four and grade eight (Tables 7.1 and 7.2). At grade four, within-country score variation across all cycles ranged from a high of 114 points for Yemen in 2003, to a low of 53 points for the Netherlands in 2011. At grade eight, the highest standard deviation across all cycles considered ranged from a high of 113 points for Saudi Arabia in 2015, to a low of 58 points for Australia in 2011. At both grades four and eight, there was a general tendency toward greater within-country variation in student mathematics test scores in the Middle Eastern/Arab-speaking countries.

Table 7.1 Standard deviations in student performance in TIMSS mathematics by education system at grade four
Table 7.2 Standard deviations in student performance in TIMSS mathematics by education system at grade eight

Delving deeper into the data, we examined the subset of countries that participated in TIMSS between 2007 and 2015: there were 22 countries that participated in all cycles of TIMSS over this period at grade four, and 25 countries at grade eight. At grade four, the average within-country variation in mathematics scores changed very little overall, being 79.4 in 2007, and 80.0 in both 2011 and 2015. There was a fair degree of movement for particular countries, however. An equal number of educational systems (11 each) witnessed declines and increases in the size of standard deviations. The largest increases in inequality were exhibited by Iran (a 17 point increase) and the United States (a six point increase), while the largest declines were in Japan (seven points) and the Slovak Republic (five points).

Patterns differed for grade eight. Most especially, there was a great deal more variation in the size of within-country performance variation. The standard deviations across the 25 countries were 85 points in 2007 and 2015, and 80 points in 2011. Further, the magnitude of the changes was far greater than in grade four. The average increase for countries that saw an increase in inequality was 13 points (compared to only four points at grade four). Similarly, the average size of the decline in those that saw shrinking standard deviations was 10 points at grade eight, compared with only three points at grade four. On balance, there were more countries with a shrinking inequality score (15 systems) than countries with a growing inequality score (10 systems). It is notable that, in 2015, the United States saw larger within-country variation in mathematics outcomes than in 2007 at both grade levels.

Examination of within-country trends reveals few clear patterns. Concentrating on those systems that participated in at least three of the last four cycles of TIMSS, the data indicate no consistent trends at grade four. At grade eight, there was a steady increase in standard deviations between 2003 and 2015 in two systems (Armenia, totaling five points and Palestine, seven points), and a steady downward trend in score variation in four systems: New Zealand (14 points), Oman (34 points), Syria (19 points), and Tunisia (nine points).

7.2.2 Inequality as Within-Country Variation I: The Influence of Teacher Factors on Student Variation

We further examined whether teacher factors and student controls might account for the apparently random variation in overall within-country inequality in mathematics scores. Replicating the fixed-effects analysis employed in Chap. 5, we constructed a model with two student-level controls (books in the home, and language of the test spoken at home) and five teacher-level predictors (alignment, time spent on teaching mathematics, teacher education, self-efficacy, experience, and teacher gender). The purpose of the model was to explore whether within-country temporal changes in teacher human capital might account for score variations. Of particular interest was whether greater alignment with national standards and more time spent on mathematics might be associated with lower standard deviations in mathematics outcomes. Although teacher characteristics such as experience and education are conventionally treated as measures of teacher quality, content coverage and time spent on mathematics could also be viewed as metrics of high-quality instructional practices (although, of course, time and content are influenced by school policies).

This analysis yielded fairly weak results (Tables 7.3 and 7.4). At grade four, none of the predictor variables were statistically significant, and, contrary to expectations, the direction of association between time on mathematics and alignment was positive rather than negative; in other words, inequality increased. The predictors also failed to reach the 0.05 level of statistical significance at grade eight, although changes in self-efficacy were significant at the looser 0.10 cutoff. However, self-reported preparation to teach mathematics topics had a weak and non-significant association with greater inequality. Unlike grade four, at grade eight curricular alignment and time spent on teaching mathematics were associated with smaller standard deviations, although with very weak t-values.

Table 7.3 Country-level fixed effect estimates of the relationship of teacher quality to standard deviations in student performance, grade four
Table 7.4 Country-level fixed effect estimates of the relationship of teacher quality to standard deviations in student performance, grade eight

7.2.3 Inequality as Differences Between High- and Low-SES Classrooms

Instead of employing standard deviations as a measure of educational inequality, one alternative is to consider classroom effects. We calculated the variation in student mathematics outcomes for students who all had the same mathematics teacher (in other words, within-classroom inequality), and then ran a series of single-level within-country linear regressions using the standard set of predictors. Our main hypothesis was that teachers who spent more time on mathematics would be associated with smaller differences between students in their class, especially at grade four.

These regressions also produced only very weak results (Tables 7.5, 7.6, 7.7, 7.8, 7.9, 7.10, 7.11 and 7.12). There were few statistically significant associations, and none of these relationships were consistent across time; this finding raises serious doubts about the stability of these associations, even within countries. Further, where there was statistical significance, there was no consistent direction of association, which suggests that there is no general cross-national association between teacher quality and within-classroom inequality. Time spent on mathematics was only statistically significant in one system at grade four (namely Hungary in 2011), but (surprisingly) in eight systems at grade eight. In most cases where p < 0.05, the relationship between time and within-classroom variation in performance was in the expected direction; in other words, more time spent on teaching mathematics led to a decrease in inequality. The only positive and statistically significant relationship was for Moldova in 2003. The strongest result was that for Japan, where more time spent on teaching mathematics was significantly associated with lower standard deviations in student outcomes in both 2007 and 2011.

Table 7.5 Within-country regression estimates of the relationship of teacher quality to standard deviations in classroom outcomes, grade four, TIMSS 2003
Table 7.6 Within-country regression estimates of the relationship of teacher quality to standard deviations in classroom outcomes, grade four, TIMSS 2007
Table 7.7 Within-country regression estimates of the relationship of teacher quality to standard deviations in classroom outcomes, grade four, TIMSS 2011
Table 7.8 Within-country regression estimates of the relationship of teacher quality to standard deviations in classroom outcomes, grade four, TIMSS 2011
Table 7.9 Within-country regression estimates of the relationship of teacher quality to standard deviations in classroom outcomes, grade eight, TIMSS 2003
Table 7.10 Within-country regression estimates of the relationship of teacher quality to standard deviations in classroom outcomes, grade eight, TIMSS 2007
Table 7.11 Within-country regression estimates of the relationship of teacher quality to standard deviations in classroom outcomes, grade eight, TIMSS 2011
Table 7.12 Within-country regression estimates of the relationship of teacher quality to standard deviations in classroom outcomes, grade eight, TIMSS 2015

Our second method of analyzing educational inequalities also relies on classroom-level characteristics, but instead of aggregating all classrooms together, we differentiated high- and low-SES classes. The key variable we used to define socioeconomic status was the common proxy variable, number of books in the home. Our approach for identifying a classroom as high- or low-SES builds on that used by Schmidt et al. (2015), and Burroughs and Chudgar (2017), who both used interquartile differences. First we calculated the mean number of books in the home per classroom, and we then identified all of those schools above and below the 25th and 75th percentile, respectively. Finally, the average of classroom characteristics was taken for each key variable. Welch’s t-test was used to determine whether these differences are statistically significant, since it is not as sensitive to variation in sample size or variance between groups, unlike Student’s t-test (Derrick et al. 2016). However, it must be emphasized that our analysis may be vulnerable to Type I (“false positive”) error, since standard errors were calculated using the adjusted weight model (as discussed in Chap. 5) instead of by using jackknife standard errors. The jackknifing procedure was developed for use with the entire sample of schools, not a subsample as employed here. Another limitation is that often only relatively few classrooms are compared against one another. The results should therefore be treated with caution.

As might be expected, our analysis showed large and statistically significant differences in mean student performance between high- and low-SES classrooms in nearly every instance. At grade four, the gap was statistically significant in all but five cases, and statistically significant and negative (richer classrooms posting lower mathematics scores) in only two cases (Armenia in 2007 and Saudi Arabia in 2015). At grade eight, there was a statistically significant and positive advantage for high-SES classrooms in all but four cases.

At grade four, as with other analyses, there were only a modest number of instances where the teacher quality differences between high- and low-SES classrooms were statistically significant (Table 7.13). The most powerful results at grade four were found for teacher self-reported preparedness to teach math, with statistically significant positive gaps (i.e., greater advantage for wealthier classrooms) in 15 instances, and statistically significant negative gaps in four cases. This inequality could be due in part to differences in teacher placement, but could also reflect biases in the instrument if teachers in advantaged schools were to have higher rates of professional satisfaction. However, there are some general (if non-significant) patterns. Pooling across cycles, there were 122 cases (educational systems across multiple years) where high-SES classrooms had more experienced teachers. Similarly, teachers in high-SES classes reported higher self-efficacy in 125 cases. The other variables saw much more variability in the relationship between classroom SES and measures of teacher effectiveness. In approximately half the TIMSS countries, low-SES classrooms had teachers who reported stronger alignment to the curriculum, better education to teach mathematics, and spent more time on teaching mathematics than teachers in high-SES classrooms.

Table 7.13 Number of education systems with statistically significant differences (positive or negative) in teacher quality metrics by classroom socioeconomic status, 2003−2015

At grade eight, statistically significant differences in teacher quality were more common, but also occurred in both directions. High-SES classrooms registered significantly higher teacher experience in 37 cases, but lower teacher experience in four cases. A similar result was found for teacher education (25 positively significant versus eight negatively significant cases), alignment (21 positively significant versus seven negatively significant cases), and self-efficacy (39 positively significant versus three negatively significant cases). The results were more balanced for time spent on teaching mathematics (25 positively significant versus 28 negatively significant cases). The results were quite similar when all differences (not just those that were statistically significant) were considered.

Although these results point to modest advantages for high-SES classrooms at grade eight and more equity at grade four, it should be remembered that the results were often quite inconsistent across years. In only a handful of cases was there a statistically significant difference for the same country across multiple years. For example, high-SES classrooms had higher mean teacher experience in four cycles of TIMSS for Iran and three for Syria. For time spent on mathematics, in Chinese Taipei, more affluent classrooms spent more time on grade eight mathematics with statistically significant differences in three different cycles of TIMSS, while there were multiple significant and negative differences (namely where low-SES classrooms had the advantage) for four cycles of TIMSS in Singapore and three cycles in the United States. Teachers in high-SES classrooms also had reliably higher self-efficacy in Jordan in three cycles of TIMSS.

7.3 Discussion

An equity analysis of TIMSS data provides strong evidence that there is a broad, substantial, and enduring inequality in student outcomes. Cross-national analysis of within-country standard deviations demonstrates considerable variation in student performance, and students in high-SES classrooms generally outperform students in lower-SES classrooms. However, there is considerably less support for the hypotheses that there are important differences in teacher quality between types of classrooms, or that educational inequalities are based on such differences. Our analyses also raise important questions about whether teacher characteristics have similar effects on students when cultural contexts differ. The variation in the size, strength, and direction of indicators between study cycles also raises genuine concerns about overreliance on a single year of TIMSS data when making inferences about effect of teachers on students.

Having said that, our analysis of equity does highlight one important conclusion: policymakers and researchers should be careful about attributing the lessons drawn from one educational system to another. It is simply not the case that low-SES students have less experienced or educated teachers in every national context (although in many they do), as many studies have found in the United States. In some educational systems at some grades, students in lower-SES classrooms may have the teachers that are more experienced and better prepared to teach. But other lessons do have more general applicability. For years now, a growing body of literature in the United States has suggested a straightforward equation of easily observable teacher characteristics are a poor indicator of quality instruction, absent of more robust statistical models and controls. The TIMSS data suggests that this lesson is broadly applicable to many countries. Equity remains an issue of vital concern, but an exclusive reliance on policies like improving teacher alignment or time spent on teaching mathematics may be unlikely to reduce these inequalities and improve student outcomes.