Keywords

3.1 TIMSS and TIMSS Advanced Data

TIMSS and TIMSS Advanced assessments have been measuring trends in international mathematics and science achievement since 1995, based on nationally representative samples of students in each participating country at grade four, grade eight, and the final year of secondary school (for students taking advanced coursework in physics and mathematics). TIMSS has been administered every four years for six assessment cyclesFootnote 1 (namely in 1995, 1999, 2003, 2007, 2011, and 2015), while TIMSS Advanced has been administered at three points in time (1995, 2008, and 2015). Following the release of the international reports from each assessment, the IEA releases international databases for secondary analyses. In addition, after each assessment, a portion of the assessment items (and scoring guides) are released, while at least half are retained as secure items for future assessment cycles. In both assessments, items may be released after one, two, or three assessment cycles.

This report used assessment items and student performance data from the TIMSS and TIMSS Advanced assessments conducted across all assessment cycles from 1995 to 2015. The set of countries administering TIMSS and TIMSS Advanced varies for each assessment cycle. We report on five countries that participated in the TIMSS Advanced 2015 assessment and in all, or most, of the TIMSS grade eight and grade four mathematics and science assessments since 1995: Italy, Norway, the Russian Federation, Slovenia, and the United States (see Tables 3.1, 3.2, and 3.3). These countries were selected from the nine countries participating in TIMSS Advanced 2015 (Table 3.1) to maximize the data available to answer the research questions (see Sect. 1.4). All selected countries participated at all three grade levels in the 2015 assessments and were missing data for no more than one assessment cycle at any grade level. The five selected countries thus permit the greatest number of comparisons across countries, grade levels, and assessment cycles.

Table 3.1 Participation of countries in TIMSS Advanced assessments, by cycle
Table 3.2 Participation of countries in TIMSS grade eight assessments, by cycle
Table 3.3 Participation of countries in TIMSS grade four assessments, by cycle

TIMSS assesses mathematics and science achievement at two grade levels and so has two target populations: all students enrolled in grade four and all students enrolled in grade eight (or the equivalent grades in each country). The TIMSS Advanced physics and mathematics populations are defined as students in their final year of secondary school who are currently taking (or who had previously taken) the TIMSS Advanced-eligible courses in physics or advanced mathematicsFootnote 2 (Martin et al. 2014). (More information is provided about the TIMSS and TIMSS Advanced populations in Appendix.)

The TIMSS Advanced population is a select group reflecting one-quarter or less of final-year students in most countries in 2015. The coverage index (percentage of the corresponding age cohort covered by the TIMSS Advanced physics and advanced mathematics student populations) was lower in physics than in advanced mathematics in all five countries included in the study (Tables 3.4 and 3.5). For physics, the coverage index in 2015 ranged from about 5% coverage in the Russian Federation and the United States, to 18% in Italy (Table 3.4). There were some differences in the physics coverage index across the assessment years, with increases seen in Italy, the Russian Federation, and the United States, and decreases seen in Norway and Slovenia between 2015 and 2008 or 1995. In particular, the percentage of students studying physics at an advanced level in Italy increased from 4% in 2008 to 18% in 2015, which indicates a progressively more inclusive sample of students. In contrast, the percentage of students decreased in Slovenia from 39% in 1995 to 7–8% in 2008 and 2015, reflecting a more restricted sample of students. For advanced mathematics, the coverage index in 2015 ranged from 10 to 11% in the Russian Federation and the United States, to 34% in Slovenia (Table 3.5; see Appendix for any additional data considerations between 1995 and 2015).

Table 3.4 Coverage index and percentages of female and male students in TIMSS Advanced physics samples, by country and year
Table 3.5 Coverage index and percentages of female and male students in TIMSS Advanced 2015 mathematics samples, by country

In addition to the overall coverage index, the percentages of female and male students in the TIMSS Advanced populations (final-year students taking advanced coursework in physics or mathematics) varied across countries and may differ from the percentages in the full population of students in their final year of secondary school. Boys were more likely to undertake advanced physics coursework than girls in all five countries (Table 3.4); only about 30% of advanced physics students in Norway and Slovenia, and about 40% of students in Italy, the Russian Federation, and the United States were female. The percentage of females in physics did not change substantially across assessment years in any country. In contrast to physics, the percentage of female students taking advanced mathematics (Table 3.5) was lower than males in Italy and Norway (about 40%), higher than males in Slovenia (about 60%), and about equal to males in the Russian Federation and the United States.

All results in this report are based on item-level statistics available using the TIMSS and TIMSS Advanced international databases from each assessment cycle, including the weighted percent correct for each country and the percentage of students in each item response category (see Sect. 3.2.3). Item-level statistics were computed for each country, as well as on average across the five countries included in the study (overall and by gender). Example items used in this report include “restricted-use” itemsFootnote 3 from the TIMSS 2015 assessments, as well as released items from prior assessment years. Although all example items are released or restricted-use items, appropriate non-released (secure) items from TIMSS 2015 were included in the analyses of patterns in misconceptions, but are not shown in the report.

3.2 Methodology

Our methodology consisted of three major components: (1) assessment framework review and content mapping to identify the set of items measuring the selected topics in our study (gravity and linear equations); (2) evaluation of diagnostic item-level performance data to identify the specific performance objectives measured by these items and to provide evidence of specific types of misconceptions, errors, and misunderstandings; and (3) analyses of the percentage of students demonstrating these misconceptions, errors, and misunderstanding to report patterns across countries by grade level, gender, and assessment year.

3.2.1 Assessment Framework Review and Content Mapping

To determine how mathematics and science concepts progress from the lower grades in TIMSS to TIMSS Advanced, topics covered in the 2015 TIMSS Advanced assessment frameworks were mapped to related topics at grades four and eight in the TIMSS 2015 frameworks. In the TIMSS and TIMSS Advanced frameworks, the greatest degree of content overlap across grades four, eight, and 12 is in the physics topic area of mechanics (forces and motion) and the mathematics content area of algebra, resulting in adequate numbers of assessment items across grades to report on patterns of misconceptions. Within topics, a set of framework objectives were identified at each grade level that were then used to select the items used in the study.

As described in Chap. 1, this study focuses on two specific topics: gravity in physics and linear equations in algebra. We determined the set of TIMSS 2015 and TIMSS Advanced 2015 framework objectives that measured these topics (or precursor topics) across grade levels for gravity (Table 1.1) and linear equations (Table 1.2). Since the TIMSS and TIMSS Advanced frameworks have been revised over the past 20 years, content mapping also included mapping the TIMSS framework objectives in 1995, 1999, 2003, 2007, and 2011, and the TIMSS Advanced framework objectives in 1995 and 2008, to the corresponding TIMSS 2015 framework objectives.

3.2.2 Evaluation of Item-Level Performance Data

Once the specific TIMSS and TIMSS Advanced framework objectives related to gravity and linear equations were identified, sets of items for each topic (16 for physics and 28 for mathematics) from the grade four, grade eight, and TIMSS Advanced assessments were assembled and reviewed. First, the TIMSS Advanced 2015 items were evaluated to determine the performance objectives measured by each item and the specific types of misconceptions, errors, and misunderstandings demonstrated by students across the five TIMSS Advanced countries chosen for the study (Italy, Norway, the Russian Federation, Slovenia, and the United States).Footnote 4 Then, TIMSS items from across the assessment cycles at grades four and eight that measured related or precursor concepts were evaluated for evidence of specific misconceptions, errors, and misunderstandings at the lower grade levels.Footnote 5

Evidence of misconceptions, errors, and misunderstandings was determined by examining patterns in the item-level performance data. For multiple-choice (MC) items, this involved distractor analysis, or examining the incorrect options to determine common errors and misconceptions that may be demonstrated by students who choose those options. For constructed-response (CR) items (where students provide a written response), response patterns were determined based on the nature of student responses as defined in the scoring guides that accompany the items. In TIMSS and TIMSS Advanced, scoring guides provide item-specific criteria to differentiate between correct, partial, and incorrect student responses and use two-digit diagnostic codes to track specific misconceptions or errors (i.e., to differentiate between different types of partial and incorrect responses). This initial item evaluation used item statistics (i.e., the weighted percentage distributions of students in each country choosing each MC response option or each CR item response category in the scoring guide) obtained from the international data almanacs available on the TIMSS & PIRLS International Study Center website (https://timssandpirls.bc.edu/).

Further content analysis of the set of items covering the topics of gravity and linear equations at each grade level identified a set of performance objectives (four in physics and nine in mathematics) that were measured by these items across the grade levels. These performance objectives are based on the set of TIMSS and TIMSS Advanced items selected for the study and are more specific than the broader TIMSS and TIMSS Advanced framework objectives outlined in Chap. 1. Some performance objectives were assessed at only one grade level, while others were measured by items at two grade levels (i.e., TIMSS Advanced/grade eight or grade eight/grade four) or at all three grade levels (for physics only). For the items measuring each performance objective, we identified the misconceptions, errors, or misunderstandings that may be demonstrated by different types of incorrect student responses. There were from one to six items measuring each type of misconception, error, and misunderstanding. (See Sect. 1.2 for detailed definitions of the terms, and Chap. 4 for an overview of performance objectives, misconceptions, errors, and misunderstandings, and the set of items used in the study.)

3.2.3 Reporting Patterns in Percent Correct and Percent with Misconceptions, Errors, and Misunderstandings by Grade, Country, Gender, and Assessment Year

All of the analyses used to report on the percent correct and percentage of students with misconceptions, errors, and misunderstandings were conducted using the IEA’s International Database (IDB) Analyzer (Version 4.0) Percentages function (IEA 2018). The IDB Analyzer uses a jackknife repeated replication (JRR) procedure to compute estimates and standard errors for a variety of statistics, such as average scores and percent correct (see Appendix for further technical notes). We do not provide standard errors in the tables and figures in this book (supplementary materials providing standard errors for all estimates are available for download at www.iea.nl/publications/RfEVol9).

Four types of analyses were used to produce the item-level statistics shown in the report.

Percent Correct

This is the percentage of students receiving credit on each item. For MC and short CR items (each worth one score point), this reflects the percentage of students who provided a correct answer. For extended CR items, this reflects the weighted percentage of students receiving full credit (two points) or partial credit (one point). For example, on an item where 10% of students received full credit and 10% received partial credit, the weighted percent correct is 15%, which reflects the percentage of students receiving full credit (10%) plus half the percentage receiving partial credit (5%). Percent correct was computed for all items in each country (overall and by gender). When reporting percent correct on the set of items in physics and mathematics, data from the most recent assessment was used for each item.

Percentage of Students with Misconceptions, Errors, and Misunderstandings

Two different types of item-level analyses were used to determine these percentages:

  1. (1)

    Specific types of misconceptions and misunderstandings reflected items where a single response option (in the case of MC items) or a single scoring guide category (in the case of CR items), or multiple response options or multiple scoring guide categories were identified to track and report on a particular type of misconception or misunderstanding. The percentage of students with the specific misconception (in physics) or misunderstanding (in mathematics) was calculated as the sum of the percentages of students in each of the relevant options or score categories. Specific misconceptions and misunderstandings apply to 11 items in physics (10 MC and one CR) and three items in mathematics (all MC). For two of the MC items in physics, one response option measured one type of misconception and others measured a second type; two separate analyses were conducted to obtain the percentages for both types of misconceptions.

  2. (2)

    General types of misunderstandings reflected items where there were no specific misconceptions, errors, or misunderstandings tracked. All that could be determined was whether or not a student was able to demonstrate the understanding or ability required for the performance objective measured by the item. For these items, the percentage of students with a more general type of misunderstanding reflected all students who did not answer the item correctly. This included students who attempted the item but provided an incorrect response (including invalid responses or off-task comments), as well as those students who did not answer the item (omitted responses).Footnote 6 General types of misunderstandings apply to six items in physics (one MC and five CR) and 26 items in mathematics (12 MC and 14 CR). The majority of these items were constructed response and many required students to explain their answer or show their work. In the TIMSS scoring guides, the general incorrect code 79 covers any type of incorrect response, including “crossed out, erased, stray marks, illegible, or off task” responses. When including blanks (omitted responses), we assumed that students who reached the item, but did not respond, did not have the understanding necessary to answer the question (i.e., similar in nature to responses that contain stray marks or off-task comments). This is consistent with the assumption underlying TIMSS scale scores, where omitted responses are treated as incorrect in scaling. The alternative would be to remove the blanks (omitted responses) from the sample, which would underestimate the percentage of students who did not demonstrate conceptual understanding.

The codes used for specific or general types of misconceptions, errors, and misunderstandings, and the corresponding value labels in the TIMSS data files are provided for all physics and mathematics items in Appendix (Tables A.1 and A.2).Footnote 7 The percentage of students with misconceptions, errors, and misunderstandings was computed for all items in each country (overall and by gender). For trend items administered in multiple assessments, the percentage of students was reported for each assessment year.

Average Percent Correct and Average Percent with Misconceptions, Errors, and Misunderstandings

These averages reflect the percent correct (or percent with misconceptions, errors, or misunderstandings) in each country averaged across the countries that have data for the item. For most items, this reflects the average across all five countries. However, there were some assessment years where data were not available for all countries, and the averages were based only on three or four countries.

3.2.4 Statistical Comparisons

Differences in the percent correct and the percentage of students with misconceptions, errors, and misunderstandings were computed (1) between each country and the average across the five countries, (2) between female and male students within each country, and (3) across assessment years for the trend items. The appropriate t-tests were used for all comparisons involving these item-level statistics, and indicators of statistical significance are provided in all data tables and figures that provide comparisons. A difference was considered “significant” when the probability (p) associated with the t-test was less than 0.05 (i.e., the probability is at least 95% that the reported difference is “real” and not due to chance). We used the following t-tests for each type of comparison.

  1. (1)

    For comparisons between the percentages in each country and the average across the five countries, there is overlap between the samples (i.e., each country is part of the average). In such cases, a part-whole t-test was used to account for this overlap:

    $$ t = \frac{{\left( {est_{j} - est_{i} } \right)}}{{\sqrt {se_{i}^{\,\,2} + \left( {1 - 2p} \right)se_{j}^{\,\,2} } }} $$

    Where esti is the estimated average percentage for the five countries; estj is the estimated percentage for one country; sei and sej are the respective corresponding standard errors; and p is the proportion of the five countries represented by each country (0.2).

  2. (2)

    For within-country gender differences, there are two types of t-tests that can be used depending on the student samples: independent (when there are independent random samples of female and male students drawn from the population) and non-independent (when this is not the case).

    For independent random samples, the independent t-test is appropriate:

    $$ t = \frac{{\left( {est_{female} - est_{male} } \right)}}{{\sqrt {(se_{female} )^{2} + (se_{male} )^{2} } }} $$

    where estfemale and estmale are the estimates for the percentage of females and males, respectively, and sefemale and semale are the corresponding standard errors of these percentages. The independent t-test can be calculated using the output from the IDB Analyzer, where the JRR procedure is used to determine the separate percentages and standard errors for females and males.

    However, in the TIMSS and TIMSS Advanced assessments, the samples of female and male students are not independent, since they are in the same schools and classrooms selected to take the assessments. Therefore, the correct t-test for non-independent samples requires the standard error of the difference between the percentage of females and the percentage of males:

    $$ t = \frac{{\left( {est_{female} - est_{male} } \right)}}{{se\left( {est_{female} - est_{male} } \right)}} $$

    The standard error of the difference, se (estfemale − estmale), takes into account the covariance (cov) between females and males for dependent samples:

    $$ se_{{\left( {est_{female} - est_{male} } \right)}} = \sqrt {se^{2}_{{\left( {est_{female} } \right)}} + se^{2}_{{\left( {est_{male} } \right)}} - 2cov\left( {est_{female } ,est_{male} } \right)} $$

    To obtain the appropriate standard errors, the JRR procedure must be conducted on the female–male percentage difference. The version of the IDB Analyzer that we used (Version 4.0), employs the JRR procedure to obtain standard errors for the percent of females and the percent of males. It does not, however, allow for jackknifing the gender differences for these item-level statistics (percent correct or percent misconceptions, errors, and misunderstandings). Therefore, the independent standard errors (and computed t-tests) obtained using the IDB Analyzer are approximations that do not take into account the covariance between females and males. These approximations are acceptably accurate if the covariances are small in comparison to the standard errors of the percentage of females and percentage of males. For the item-level statistics, this is expected to be the case due to the design of TIMSS, where only a small number of students take each item. Generally, about four students in each school or class will take each item (Martin et al. 2014).

    To determine the magnitude of the covariances for gender differences, we conducted analyses for selected items using the EdSurvey R PackageFootnote 8 (NCES [National Center for Education Statistics] 2018) Gap function. The Gap function applies the JRR technique to the difference between the percentage of females and males. The output includes the standard error of the difference and the covariance. In the tested cases, we found that the covariance for the item-level statistics was very small. We then ran analyses on the same items using the IDB Analyzer and compared the standard errors and t-tests obtained for gender differences using the two different methods. The standard errors using the correct non-independent method (EdSurvey R Package) and those using the independent method (IDB Analyzer) were approximately the same (to the nearest at the 0.0001%), and the significance of reported differences was not affected.Footnote 9 Thus, for convenience, we used the output from the IDB Analyzer for the gender differences and applied the approximate independent t-tests for all items. Additional information on both software packages, as well as example outputs, are provided in Appendix.

  3. (3)

    The differences between years for trend items are based on independent samples. Thus, the standard independent t-test was used:

    $$ t = \frac{{\left( {est_{year1} - est_{year2} } \right)}}{{\sqrt {(se_{year1} )^{2} + (se_{year2} )^{2} } }} $$

    where estyear1 and estyear2 are the estimates for the percentage of students in the two assessment years being compared, and seyear1 and seyear2 are the corresponding standard errors.

3.3 Addressing the Research Questions

As described in Sect. 3.2.2, we reviewed the set of TIMSS and TIMSS Advanced items that measured student understanding of the key concepts (gravity in physics and linear equations in mathematics), administered in each assessment year from 1995 to 2015. As a consequence, we established performance objectives that could be assessed by the items across grade levels and the types of student misconceptions, errors, and misunderstandings demonstrated on these items. This enabled us to report student performance on the TIMSS and TIMSS Advanced items related to gravity and linear equations across countries by grade level, gender, and assessment year to answer the three research questions.

3.3.1 Research Question 1

  • What are common types of student misconceptions, errors, and misunderstandings in grade four, grade eight, and the final year of secondary school (TIMSS Advanced students), and how do they compare across countries?

To answer the first research question, we examined items from TIMSS and TIMSS Advanced administered at each grade level that demonstrated specific types of student misconceptions, misunderstandings, and errors. We determined the percentage of students for each response type by country and on average across the five countries included in the study. Response patterns provided evidence of the nature and extent of students’ misconceptions, misunderstandings, and errors. We presented released example items at each grade level to illustrate the different types of misconceptions, errors, and misunderstandings. Each example item exhibit showed the item; the scoring guide (for CR items) or correct answer (for MC items); and other item information, including the TIMSS item ID,Footnote 10 year(s) administered, and the performance objective assessed by the item.Footnote 11

3.3.2 Research Question 2

  • How do student misconceptions, errors, and misunderstandings differ by gender?

To answer the second research question, we determined the percentage of male students and percentage of female students demonstrating each type of misconception, misunderstanding, and error. We prepared tree graphs showing the gender differences across countries at each grade level.

3.3.3 Research Question 3

  • How persistent are patterns in misconceptions, errors, and misunderstandings over time?

To answer the third research question, we plotted figures for each country showing the percentage of students demonstrating the specific types of misconceptions, misunderstandings, and errors over multiple assessment years based on the set of trend items.