2.1 Introduction

When measuring student achievement, traditional methods of analysis often focus on what students know (i.e., the correct answers). For example, large-scale assessments such as IEA’s TIMSS use unidimensional models such as item response theory (IRT) to measure individual students’ latent abilities, skills, and knowledge. Recent research using multidimensional models has begun to consider both correct and incorrect patterns when measuring and reporting on specific skills/abilities and misconceptions. Prior research has highlighted the importance of identifying and understanding student misconceptions to improve learning in both physics and mathematics.

We divide the literature review into three sections. The first section reviews the variety of diagnostic models that have been used to explore student attributes and misconceptions, misunderstandings, and errors in mathematics and science. The second and third sections explore prior research into student misconceptions, misunderstandings, and errors in physics related to gravitational force, and in mathematics related to linear equations, respectively. Both sections also look at gender differences in the prevalence of misconceptions.

2.2 Diagnostic Models Overview

Traditional psychometric models used for test analysis, such as IRT models, often focus on measuring a single latent continuum representing overall ability (Bradshaw and Templin 2014). Although these models are considered an important means of assessing student knowledge, their focus on measuring one underlying student ability is limiting. De la Torre and Minchen (2014) noted that the unidimensional nature of these methods made them less effective as diagnostic models. The need for models that would provide diagnostic information spurred the development of a new class of test models known as cognitive diagnostic models (CDMs).

A CDM is a type of model that classifies different combinations of mastered student attributes into different latent classes. It then determines students’ abilities based on various skills or attributes that students have or have not mastered (de la Torre and Minchen 2014; Henson et al. 2009). An example of a CDM model is the diagnostic classification model (DCM), which uses distractor-driven tests (designed to measure both “desirable and problematic aspects of student reasoning”) or multiple-choice tests that measure multidimensional attributes (Shear and Roussos 2017). In addition to the DCM, there are many other types of CDMs, such as the rule space model (Tatsuoka 1983), the deterministic input, noisy “and” gate (DINA) model (Junker and Sijtsma 2001), the noisy input, deterministic “and” gate (NIDA) model (Maris 1999), and the reparametrized unified model (RUM) (Roussos et al. 2007). Each of these models vary in terms of their complexity, the parameters they assign to each item, and the assumptions made when random noise enters the test-taking process (Huebner and Wang 2011). The varied and multidimensional nature of CDMs makes them better suited to performing educational diagnoses. In fact, a recent study by Yamaguchi and Okada (2018) using TIMSS 2007 mathematics data found that CDMs had a better fit than IRT models.

A relatively new approach, the scaling individuals and classifying misconceptions (SICM) model, investigated by Bradshaw and Templin (2014), combines the IRT model and the DCM to provide a statistical tool to measure misconceptions. The SICM model uses data on wrong answers by modeling categorical latent variables that represent “misconceptions” instead of skills. To categorize misconceptions, the authors cited inventories such as the force concept inventory (Hestenes et al. 1992), an assessment of the Newtonian concept of force.

For large-scale assessments, such as TIMSS, applying these current diagnostic models can be difficult since the TIMSS assessments were not designed as cognitive diagnostic assessments that measure specific components of skills/abilities, nor were they designed using a CDM with pre-defined attributes (de la Torre and Minchen 2014; Leighton and Gierl 2007). However, some studies have shown that applying these approaches to TIMSS data can provide valuable information about test takers. Dogan and Tatsuoka (2008) used the rule space model to evaluate Turkish performance on the TIMSS 1999 grade eight mathematics assessment (also known as the Third International Mathematics and Science Study-Repeat, or TIMSS-R), determining that Turkish students demonstrated weaknesses in skills such as applying rules in algebra and quantitative reading. Another study (Choi et al. 2015) also used a CDM approach to compare performance on the TIMSS mathematics assessment between the United States and Korean grade eight samples. While these studies showed that CDM can offer valuable information on student concept mastery in TIMSS, these studies also acknowledged there are limitations when applying these models to this assessment.

In general, CDMs and SICMs use best-fit models to predict student-level proficiency and misconceptions, and these models would be most efficient when used on computer adaptive tests (CATs), so that “all test takers can be measured with the same degree of precision” (Hsu et al. 2013). The TIMSS assessments, which are not designed for student-level reporting and are not computer-adaptive, are not catered to CDMs and SICMs. Based on the TIMSS assessment design, only a portion of the items are administered to each student; thus, the claims that can be made about student proficiency on specific skills and concepts are limited.Footnote 1

In contrast to research using the types of diagnostic models described above, our study used a different diagnostic approach based on item-level performance data (i.e., frequency distributions across response categories) for individual assessment items to explore the nature and extent of students’ misconceptions, errors, and misunderstandings demonstrated by their incorrect responses. Other studies conducted by countries participating in TIMSS have taken a similar approach to describing student understanding and misconceptions based on their responses to individual TIMSS and TIMSS Advanced mathematics and science items at different grade levels (Angell 2004; Juan et al. 2017; Mosimege et al. 2017; Prinsloo et al. 2017; Provasnik et al. 2019; Saputro et al. 2018; Văcăreţu, n.d.; Yung 2006). For example, Angell (2004) analyzed student performance on TIMSS Advanced 1995 physics items in Norway; a series of diagnostic reports published in South Africa used item-level data from TIMSS 2015 to describe performance of their students in mathematics for grade five (Juan et al. 2017) and grade nine (Mosimege et al. 2017), and in science for grade nine (Prinsloo et al. 2017); and Saputro et al. (2018) used performance on algebra items from TIMSS 2011 to understand the types of errors made by students in Indonesia. All of these reports presented released items from TIMSS and TIMSS Advanced and described common types of incorrect answers given by students on the assessments, finding that misconceptions were often context-dependent and could be missed in broader analyses.

Our study goes beyond looking at individual assessment items by focusing on sets of items that measure specific concepts of interest in physics and mathematics across grade levels (gravity and linear equations, in this case). Student performance on these items are used to report on patterns in misconceptions across countries, grades, and assessment cycles, and by gender. Considering the assessment design of TIMSS, there is unique value in this approach to focus on item-level data to make country-level inferences and better understand how student misconceptions have changed over time in different cultural contexts.

2.3 Misconceptions in Physics

Physics misconceptions (including those related to gravity) held by students of varying ages have been studied extensively. Previous research has included investigations of primary, secondary, and university students (Darling 2012; Demirci 2005; Hestenes et al. 1992; Pablico 2010; Piburn et al. 1988; Stein et al. 2008), as well as pre-service teachers (Gӧnen 2008). The literature about misconceptions related to gravitational force demonstrates that alternate conceptions of physical observations and processes based on intuition or preconceived notions are common and pervasive.

When analyzing misconceptions in physics, many researchers have focused on “common sense beliefs,” a “system of beliefs and intuitions about physical phenomena derived from extensive personal experience” that students may develop before they even enter the classroom (Halloun and Hestenes 1985a, b). Many of these beliefs are misconceptions inconsistent with scientific explanations provided during formal instruction; moreover, they are difficult to overcome and can inhibit students from understanding and applying more advanced physics concepts if not addressed early on. Numerous studies have been conducted to further explain these misunderstandings and several diagnostic tests have been developed to measure them, the most widely used being the force concept inventory, which uses multiple-choice items to track student misconceptions relating to “common sense beliefs” (Hestenes et al. 1992). Research has shown that many physics misconceptions are best overcome by focused instruction that actively aims to address these misconceptions (Eryilmaz 2002; Hestenes et al. 1992; Thornton et al. 2009).

Misconceptions based on common-sense beliefs tend to be incompatible with many physics concepts, such as Newton’s laws. For example, several studies have documented that students believe that there is always a force in the direction of motion and that this belief sometimes prevails even after college instruction (Clement 1982; Hestenes et al. 1992; Thornton and Sokoloff 1998). Another well-documented misconception is that it is not possible to have acceleration without velocity (Kim and Pak 2002; Reif and Allen 1992). These misconceptions can often stem from students’ inability to distinguish between velocity, acceleration, and force (Reif and Allen 1992; Trowbridge and McDermott 1980). In particular, many students struggle with gravitational force. The concept appears to be poorly learned at the secondary level, with related misconceptions continuing in higher levels of education (Bar et al. 2016; Kavanaugh and Sneider 2007).

In addition, many students’ conceptions of gravity are closely related to their conceptions of a spherical Earth (Gönen 2008; Nussbaum 1979; Sneider and Pulos 1983). When conducting interviews with children in grades six and 10 on what objects presented to them were acted on by gravity, Palmer (2001) found that <30% of students in each grade level were able to correctly answer that all of the objects were acted on by gravity. Some students, Palmer noted, also believed that buried objects (beneath the surface of Earth) were not subject to gravity.

Many of these misconceptions have been shown to be stable in the face of conventional physics instruction, preventing students from learning new concepts. One previous study on misconceptions about force and gravity investigated high school students’ conceptions about the direction of motion and force on a ball being thrown upward and then falling back down (Pablico 2010). The majority of students in the study (grades 9–12) demonstrated the misconception that the net force on the ball was always in the direction of motion throughout the ball’s path, not understanding that it is the constant downward force due to gravity that causes the observed changes in motion. Many students thought that the force was directed upward during the ball’s upward motion and that the force was zero when the ball was at the top of its flight (when it stops momentarily and changes direction). Although students identified the force as downward when the ball was traveling down, most were not able to correctly justify this answer, with many students believing that the force must be directed down since the ball is moving downward.

Other research has described instances of gender gaps in students’ understanding in physics. For example, at the beginning of physics courses, females tend to start with lower levels of conceptual understanding, and conventional instructional approaches are not effective in shrinking this gender gap (Cavallo et al. 2004; Docktor and Heller 2008; Hake 2002; Hazari et al. 2007; Kost et al. 2009).

2.4 Misunderstandings in Mathematics

In mathematics, algebra is often considered a gatekeeper to higher education and related career paths (Kilpatrick and Izsák 2008). Although algebraic understanding is considered crucial for student success in more advanced mathematics courses, many scholars have documented that students struggle with algebraic concepts, especially those relating to linear equations.

Solving linear equations requires a balance of conceptual knowledge and procedural skills. Conceptual knowledge involves having an understanding of principles and relationships, while procedural skills involve the ability to carry out a sequence of operations effectively (Gilmore et al. 2017). Unlike simpler arithmetic problems, solving linear equations involves much more than merely memorizing and applying a formula to solve an equation; it also includes understanding the relationship between the quantities represented. Conceptually, students need a deep understanding of independent and dependent variables to explain what slope or intercepts mean in a given situation (Kalchman and Koedinger 2005). Yet many students have shown a tendency to rely on procedural knowledge despite lacking a conceptual understanding of the equation (Caglayan and Olive 2010).

Stump (2001) argued that although high school pre-calculus students have been exposed to formal instruction, their conceptual understanding of “slope” is not well developed. When testing a group of high schoolers in her study, Stump found that many students understood slope in functional situations but were unable to recognize it as a measure of rate of change or as a measure of steepness. Other researchers noted that while gaining an understanding of slope, students they interviewed were unable to recognize the difference between additive and multiplicative relationships (Simon and Blume 1994) or were unable to understand ratio as a measure of slope (Swafford and Langrall 2000). This inability to develop a conceptual knowledge of the relationship between variables has contributed to many misunderstandings related to slope and linear equations.

Lack of conceptual knowledge about the relationship between variables in linear equations also impacts a student’s ability to understand and translate the symbolic nature of linear equations. Official standards, such as those of the National Council of Teachers of Mathematics (NCTM), recommend that students must be able to “represent and analyze relationships using tables, verbal rules, equations, and graphs” (NCTM 1989). Yet many students find it very difficult to represent equations graphically. Research suggests that this is because students tend to lack a strong understanding of the relationship between algebraic equations and graphical representations (Knuth 2000).

Even when using a graphical approach would ensure a higher likelihood of success, researchers have found that students were reluctant to use graphs (Knuth 2000; Tsamir and Almog 2001; Dyke and White 2004). For example, Knuth (2000) found that even when working on problems designed to encourage the use of graphical reasoning, students demonstrated a strong reliance on other solution methods and failed to use graphical-solution methods. In another study, Huntley et al. (2007) conducted clinical interviews of third year high school mathematic students and found that many students needed to be prompted to use graphical solutions even it was the most efficient method to solve the equation.

This difficulty with modeling algebraic relationships graphically makes it difficult for students to translate real life word problems into the appropriate algebraic equations (Adu et al. 2015; Bishop et al. 2008). Without focused and deliberate instruction, it would be difficult for students to tackle these algebraic misunderstandings as they progress to higher levels of mathematics. As noted in the physics section, some research in this area has found that males make fewer mistakes than females and make different types of mistakes when solving problems related to multi-step linear equations in algebra (Powell 2013).

This report contributes to the literature on research into students’ misconceptions and misunderstandings in physics and mathematics by studying specific types of related misconceptions, errors, and misunderstandings about gravity and linear equations across grade levels and reporting patterns in these across countries and by gender. The results reinforce the importance of identifying and understanding students’ misconceptions, errors, and misunderstandings to determine what changes may be needed in the curricula through secondary school to improve student learning and to ensure their readiness for post-secondary education and/or future careers.