1 Background

At the end of primary school, many special education (SE) students are considerably behind on the topic of subtraction with numbers up to 100 compared with their peers in regular education (Kraemer, Van der Schoot & Van Rijn, 2009). To support low-performing students and to give them confidence in carrying out subtraction problems, it is suggested, for example by the U.S. National Mathematics Advisory Panel (2008), that these students would benefit from being taught one prescribed way of solving calculations. This opinion is also expressed in the Netherlands.Footnote 1

However, the idea of teaching only one method goes against the goal of developing numeracy in students. This goal implies that students should be able to choose a suitable method when solving number problems (Treffers, 1989; Van den Heuvel-Panhuizen, 2001; Warry, Galbraith, Carss, Grice, & Endean, 1992). Moreover, being numerate is also seen as a target for mathematically weaker students (e.g., Kilpatrick, Swafford, & Findell, 2001; NCTM, 2000; Verschaffel, Torbeyns, De Smedt, Luwel, & Van Dooren, 2007).

A further objection against the one-method approach is that if students would have to restrict themselves to only one way of solving problems, many problems would require an unnecessarily long solution path (see, e.g., Torbeyns, De Smedt, Stassens, Ghesquière, & Verschaffel, 2009). For example, to solve 62 − 58, first take away 50, resulting in 12; then take away 2, making 10; and finally, take away 6, for an answer of 4. This method, which requires a large number of taking-away steps, is more error-sensitive than when students focus on the difference between the numbers and add on from the subtrahend until the minuend is reached. For 62 − 58, start with 58 and add 2, resulting in 60, and then add 2 more to reach 62, giving the difference of 4 as the answer. In this latter approach, students change the operation that is presented in the problem: subtraction is transformed into addition, which can bring computational advantages for them (see, e.g., Torbeyns, De Smedt et al., 2009).

Another difficulty with using prescribed methods is that they can lead to a “didactical ballast” for the students (Van den Heuvel-Panhuizen, 1986). Following a prescribed solution method can be a source of error for students because this method is not grounded in their own thinking, i.e., the ownership is completely on the side of the teacher or textbook author.

In sum, we can say that there are many disadvantages to teaching one fixed solution method for solving calculations. Moreover, studies (e.g., Torbeyns, De Smedt et al., 2009) have shown that flexible adaptation of the solution method can make problems easier for students. Therefore, one might decide to teach even students who are weak in mathematics the flexible use of solution methods. However, the critical point for making this decision is whether these students are able to operate in such a flexible way. Some studies (see, e.g., Milo, 2003; Timmermans, 2005) have indicated that SE students with learning difficulties in mathematics have trouble in choosing a solution method in a flexible way. In the study reported in this paper, we further investigate whether SE students are able to adapt their solution methods to the nature of the problems presented to them. The focus of the study is on subtraction up to 100. The students’ solution methods and performances are assessed using an information and communication technology (ICT)-based test.

1.1 Strategies and procedures for solving addition and subtraction problems up to 100

Generally, three different types of strategies can be distinguished for solving addition and subtraction problems with numbers up to 100: splitting, stringing, and varying (Van den Heuvel-Panhuizen, 2001). Although researchers do not always use the same wording—for example, other expressions can be found in Klein, Beishuizen, & Treffers (1998) and Torbeyns, De Smedt et al., (2009)—there is broad agreement about the general meaning of these strategies. To use a splitting strategy, the subtraction problem is solved by decimally splitting both the minuend and the subtrahend and processing the tens and the ones separately (e.g., 54 − 31 is calculated as 50 − 30 and 4 − 1, with 23 as the final answer). With a stringing strategy, the starting number, which could be either the minuend or subtrahend, is kept whole and the second number is added or subtracted in parts (e.g., 63 − 47 is calculated as 63 − 40 = 23, then 23 − 3, and, finally, 20 − 4 = 16). Applying a varying strategy requires a flexible processing of numbers that is based on known number relationships and properties of operations. For example, a number that is not the intended subtrahend but that is easier to handle may be taken away from the minuend, and afterwards, this “wrong number” is compensated (e.g., 77 − 29 is calculated by 77 − 30 = 47, and then 47 + 1). Another example of a varying strategy is to change the subtraction into an easier problem by keeping the difference between the minuend and the subtrahend the same (e.g., 77 − 29 is calculated by 78 − 30).

Torbeyns, De Smedt et al., (2009) describe subtraction in a different way. They distinguish (1) direct subtraction (DS), which means taking away the subtrahend from the minuend; (2) indirect addition (IA), which means adding on from the subtrahend until the minuend is reached; and (3) indirect subtraction (IS), which means subtracting from the minuend until the subtrahend is reached.

According to Torbeyns, De Smedt et al., (2009), splitting, stringing, and varying belong to the class of DS procedures, whereas IA is considered as a separate class of procedures which do not fit the three strategies. However, we see this differently. Splitting, stringing, and varying can be considered strategies which all describe how we deal with the numbers involved (in splitting, both numbers are decimally decomposed in tens and ones; in stringing, one number is kept as a whole number; and in varying, one or both numbers are changed in order to get an easier problem). In contrast to these strategies, we can call DS, IA, and IS procedures which describe calculations from the perspective of how the operation is carried out. In fact, the strategies and the procedures complement each other. Together, they offer a complete framework for describing how students solve additions and subtractions up to 100.

Table 1 illustrates the strategies and procedures by prototypical examples of subtraction problems in which the numbers are likely to elicit particular strategies and procedures. The framework reflects how these are related. A DS procedure often goes together with splitting or stringing. For IA and IS, stringing is the most obvious strategy; although splitting can be applied as well. Finally, when a varying strategy is applied, multiple operations are required.

Table 1 Relation between procedures and strategies illustrated with problems

1.2 Solving subtraction problems with crossing the ten

Various studies (e.g., Beishuizen, Van Putten, & Van Mulken, 1997; Fiori & Zuccheri, 2005; Kraemer et al., 2009) have shown that SE students experience many difficulties in solving subtraction problems up to 100 that require crossing the ten, i.e., problems in which the ones digit of the subtrahend is larger than that of the minuend (e.g., 62 − 58). These problems can be solved in different ways that clearly have a different success rate. A highly error-sensitive approach is to solve such problems by a DS procedure together with a splitting strategy. SE students frequently apply this combination of DS and splitting, which often leads to the mistake of reversing the ones digits (Kraemer et al., 2009). In the case of 62 − 58, this means subtracting 2 from 8 instead of 8 from 2.

IA can be a good alternative for DS in problems that have a small difference between the subtrahend and minuend and which require crossing the ten. Several researchers (Beishuizen et al., 1997; Torbeyns, De Smedt, Ghesquière, & Verschaffel, 2009a; Van den Heuvel-Panhuizen, 2001) have shown the advantage of such a procedure. In small-difference subtractions, because of the small distance to be bridged, students can determine the difference relatively quickly and easily, while DS would take considerably more and/or more difficult steps (Torbeyns et al., 2009a). Students need to understand that addition and subtraction are inversely related in order to apply this change in direction and solve a subtraction problem flexibly, i.e. by addition instead of subtraction. This is an important understanding in the development of students’ arithmetical competence, which can help them solve difficult subtraction problems.

1.3 Solving subtraction problems by indirect addition

Connected to the earlier described debate about whether or not teaching SE students one fixed method for solving number problems, there is also controversy on whether SE students are able to solve subtraction problems by applying IA. For example, a few recent intervention studies concluded that even students in regular primary education have difficulties using IA to solve subtraction problems (Torbeyns et al., 2009a; Torbeyns, De Smedt, Ghesquière, & Verschaffel, 2009b). Torbeyns et al. (2009a) found that students have great difficulty picking up the IA procedure, even when they have been taught IA. This was particularly true for low-achieving students. Situations where students have not been taught to use IA are even more disappointing. Several studies (Blöte, Van der Burg, & Klein, 2001; Klein et al., 1998; Torbeyns et al., 2009a, b) suggested that in such situations, students will hardly apply this procedure.

However, these studies are challenged by other intervention studies that support the claim that, already in the first grades of primary mathematics education, students with a wide range of mathematical abilities can learn to solve subtraction problems flexibly by applying IA (Blöte et al., 2001; Fuson & Willis, 1988; Klein et al., 1998; Menne, 2001). For example, the study by Klein et al., (1998) revealed that the early introduction of flexible strategies or procedures, such as IA, helped improve students’ scores.

1.4 Conditions influencing students’ procedure use

To get a thorough understanding of whether SE students can apply IA, we need to know which conditions influence students’ procedure use. First of all, student characteristics, such as their general mathematical ability, age, and grade level (see Torbeyns et al., 2009b), are found to be of influence. Furthermore, teaching characteristics, for example, whether or not students have been taught a particular procedure, turned out to play a role, although not all researchers found this (see Section 1.3). A third source of influence is problem characteristics, including (a) the numbers involved and (b) the problem format (context problems or bare number problems).

1.4.1 Influence of numbers involved

Several studies (e.g., Blöte et al., 2001; Fuson & Willis, 1988; Klein et al., 1998; Menne, 2001; Torbeyns, De Smedt et al., 2009; Torbeyns, Ghesquière, & Verschaffel, 2009) have indicated that subtraction problems that require crossing the ten and that have a small difference between the minuend and subtrahend (e.g., 62 − 58) may evoke the use of IA. However, IA could also be an efficient procedure for solving large-difference subtraction problems with a relatively small difference around the tens and requiring crossing the ten (Torbeyns et al., 2009a). For example, a problem like 82 − 29 may be easily solved by IA (i.e., 29 + 1 = 30, 30 + 50 = 80, and 80 + 2 = 83, so 1 + 50 + 2 = 53). Finally, research suggested that small-difference problems that do not require crossing the ten (e.g., 47 − 43) may also evoke the use of IA (Gravemeijer et al., 1993).

1.4.2 Influence of problem format

In subtraction, two didactical phenomenological interpretations can be distinguished: (1) subtraction as taking away and (2) as determining the difference. In the first interpretation, the matching operation is that of taking away the subtrahend from the minuend. However, in the second interpretation, the difference is determined by bridging the gap, which can be done in two ways: by adding on from the subtrahend until the minuend is reached and by decreasing the minuend until the subtrahend is reached. Both interpretations of subtraction need to be addressed if we want students to learn subtraction in a more complete way (Freudenthal, 1983; Müller & Wittmann, 1984; Van den Heuvel-Panhuizen & Treffers, 2009). To contribute to this broad understanding of subtraction, students should be given more than just bare number problems. Several studies (Klein et al., 1998; Torbeyns et al., 2009b; Blöte et al., 2001; Van den Heuvel-Panhuizen, 1996) revealed that bare number problems hardly evoke the use of IA, which can be explained by the presence of the minus sign that emphasizes the “taking-away” action (Van den Heuvel-Panhuizen, 1996). Context problems, on the contrary, lack this operation symbol and therefore open up both interpretations of subtraction (Van den Heuvel-Panhuizen, 2005). Moreover, the action described in the context of a problem may prompt the use of a particular procedure (Van den Heuvel-Panhuizen, 1996, 2005).

1.5 Research questions and hypotheses

Previous studies on weak-performing students’ use of the indirect addition procedure in regular primary education have led to contradictory results, which may be due to not taking all relevant conditions into account in one study: the role of the problem format (context or bare number problems), the numbers involved, and the occurrence of prior instruction in IA. The present study is set up to include all these conditions. Moreover, the study addresses students in SE. The study has two foci: students’ spontaneous use of IA, i.e., applying IA without being asked to use this procedure, and students’ success rate. We formulated the following research questions:

  1. 1.

    Can SE students make spontaneous use of IA for solving subtraction problems up to 100, and which conditions influence the use of IA?

  2. 2.

    Does the use of IA help SE students solve subtraction problems up to 100 successfully, and under which conditions does IA use lead to successful problem solving?

Concerning the first research question, our general expectation is that SE students can make use of IA (Hypothesis 1a) and that they are more likely to apply IA:

  • In small-difference subtraction problems with crossing the ten than in large-difference problems with or without crossing the ten (influence of numbers involved, Hypothesis 1b)

  • In context problems that reflect adding on than in context problems that reflect taking away or in bare number problems (influence of problem format, Hypothesis 1c)

  • When having received instruction in IA than when not having received this instruction (influence of prior instruction, Hypothesis 1d)

With respect to the second research question, our general expectation is that applying IA results in a higher success rate than not applying IA (Hypothesis 2a) and that this is particularly true:

  • When applying an IA procedure in combination with a stringing strategy rather than when applying a DS procedure together with a splitting strategy (Hypothesis 2b)

  • In small-difference subtraction problems with crossing the ten rather than in large-difference problems with or without crossing the ten (influence of numbers involved, Hypothesis 2c)

  • When having received instruction in IA rather than when not having received this instruction (influence of prior instruction, Hypothesis 2d)

2 Method

2.1 Participants

In the Netherlands, subtraction up to 100 is mainly taught in the second grade of primary school. In total, 56 students from 14 second grade classes in three Dutch SE schools participated in the study. In the Netherlands, about 3% of the children of primary school age are in SE schools. This percentage involves only students who have learning difficulties; thus, no students with physical disabilities are included. The participating students (39 boys, 17 girls) were 8–12 years old, with a mean age of 10 years and 6 months (SD = 10.4 months). In regular education, 8- to 9-year-olds are in grade 3 and 11- to 12-year-olds in grade 6. This means that the students in our study were 1 to 4 years behind in mathematics compared with their peers in regular primary school.

The students’ mathematical ability level was established with the Cito Monitoring Test for Mathematics End Grade 2 (Janssen, Scheltens & Kraemer, 2005). The standardization of this test was based on a representative sample of Dutch second grade students in regular primary education whose average ability score was 56.4 (SD = 14.6). The ability scores of the students in our sample ranged from 32 to 56 with an average of 47.8 (SD = 6.8) which is a considerably lower score (d = -.59) than that of the students in regular primary school.

2.2 Materials

2.2.1 ICT-based test on subtraction problems

An ICT-based testFootnote 2 was developed in which item characteristics were varied systematically over 15 items (see Table 2). These characteristics include number characteristics and format characteristics.

Table 2 Different types of subtraction items and number of items in ICT-based test on subtraction

The number characteristics refer to the size of the difference between the minuend and subtrahend (small means <7 or large means >11), whether the tens have to be crossed (e.g., 61 − 59) and whether or not the minuend and the subtrahend are close to a ten (<3). The format characteristics refer to whether or not the items are presented as a bare number problem (BN) or as a context problem. The latter can describe a taking-away situation (ConTA) or an adding-on situation (ConAO). Figure 1 shows an example of a ConAO item.

Fig. 1
figure 1

Album item; the accompanying read aloud instruction is: “The album has space for 51 cards. 49 are already included. How many more cards can be added?”

In agreement with the aim of our study, all number and format characteristics were uniformly distributed over the test item positions (see Table 2). This means that the influence of item characteristics is not confounded by the position of the items. However, in order not to lose statistical power—given the small sample of students—we decided to present a fixed set of items to all students.

The 15 items were displayed one per screen. The students could click to continue to the next item. The accompanying text was read out by the computer. By clicking on the ear button, the student could hear the spoken text again.

After a short introduction, the students worked individually on a touch-screen notebook. Students were told that they were free to choose any solution method. After filling in an answer, they reported verbally how they found this answer. The students’ on-screen work was recorded by Camtasia Studio software. All students and their parents gave their permission for collecting these records.

2.2.2 Psychometric properties of the test

We investigated the psychometric properties of the collection of 15 items with respect to IA use and success rate of the answers. The score for IA use was retrieved from a dichotomous division of the students’ responses (IA used or IA not used) to the 15 items of the ICT-based test on subtraction.

The reliability of the scale for IA use was rather low (α = 0.45). An exploratory factor analysis based on tetrachoric correlations showed that 30.8% of the total variance was explained by the first factor. The proportion of the first and second eigenvalues equaled 1.52, meaning that the first factor substantially dominated the second factor, which was confirmed by a factor analysis scree plot.

The reliability of the scale for success rate was moderate (α = 0.69). An exploratory factor analysis indicated a first dominant factor which accounted for 37.8% of the total variance. The proportion of the first and the second eigenvalues equaled 2.79, which emphasizes the strength of unidimensionality. The factor analysis scree plot revealed a second, but weak, factor.

To get a better understanding of the structure of the test, we carried out an oblique rotation of the factor loading matrix, which implies that items are forced to load on only one factor. For the IA use scale, the procedure revealed no clearly interpretable factor solution. For the success rate scale, it was found that one factor was mainly characterized by items of categories B and C (items that have the minuend and the subtrahend close to a ten) and the other factor to items of categories A, D, and E (items that do not have the minuend and the subtrahend close to a ten).

2.2.3 Online teacher questionnaire

To collect data about the students’ prior instruction on subtraction problems, we asked their mathematics teachers which procedures they had taught their students for solving these problems. An online questionnaire was developed for collecting these data. The link for the questionnaire was sent by email to the 14 teachers of the students. The teachers received the questionnaire shortly after their students were administered the ICT-based test. All 14 teachers filled in and submitted the questionnaire.

Apart from a few general questions about the teachers’ background, the questionnaire contained a specific question on the topic of “subtraction up to 100” to collect data on the procedures (DS or IA) they had taught their students for solving subtractions up to 100.

2.3 Analysis

The students’ responses were classified on the basis of the screen videos, which captured the students’ answers to the test items and their verbal reports. The students’ answers were coded as correct or incorrect. The verbal reports were used to classify the students’ strategies and procedures. The strategies were coded as splitting, stringing, or varying and the procedures as DS, IA, IS, or MO (see Table 1). In addition, when the student knew the answer to a problem by heart, we assigned the code KF (Known fact); when the student did not come up with an answer to a problem, the code NR (No response) was used; and when the student decided to erroneously add up the two numbers in a problem, it was coded as Ad (Addition). The responses were coded by two raters independently. There were only a few cases of disagreement (<5%). After discussing these cases, full agreement was reached.

The information about prior instruction in IA was derived from the online teacher questionnaire. In addition, we analyzed the mathematics textbook series used by the teachers on the frequency of tasks in grades 1 and 2 that address the inverse relation between addition and subtraction in solving calculations up to 100.

Item responses of students (procedures, strategies, and success rate) were collected at case level, and the cases (students × items) are on the one hand nested within students (who are in turn nested within teachers) and on the other hand nested within items. This structure enabled the use of cross-classified multilevel models with predictors at case, item, student, and teacher levels. We estimated the models in WinBUGS (Lunn, Thomas, Best, & Spiegelhalter, 2000). Because of the dichotomous nature of the dependent variables, we made use of multilevel logistic regression models.

In addition to the cross-classified multilevel analyses, we also used logistic regression models in which neither student, teacher, nor item effects were included. We did this by making use of generalized estimating equations (GEE). In this approach, standard errors of regression coefficients are adjusted as a result of the cross-classified data structure (Halekoh, Højsgaard, & Yan, 2006).

3 Results

3.1 Frequencies of procedures and strategies

The data analysis was based on all cases in which the students gave an answer to a particular item. Of the 840 possible cases (56 students doing all 15 items each), 72 cases were missing. This resulted in 768 cases to be analyzed. Table 3 gives an overview of the applied procedures in combination with the applied strategies at the level of the cases (the student’s responses). DS was used in 63% of the total cases and went together almost equally often with a stringing or a splitting strategy. IA was used in 34% of the total cases of answered items and was often applied in combination with a stringing strategy.

Table 3 Cross-tabulation of frequencies of procedures and strategies for 768 cases

3.2 SE students’ spontaneous IA use

Of the 15 subtraction problems, the total number of times the students applied IA to solve an item ranged from 0 to 8 items (M = 4.6, SD = 1.9).

3.2.1 Different conditions and IA use

Numbers involved

Figure 2 shows that IA was most frequently applied in small-difference problems without and with crossing the ten (A and B, respectively). DS appeared to be the most popular procedure in large-difference problems (D and E), even in large-difference problems that have the minuend and subtrahend both close to a ten (C). The more frequent use of IA in A and B than in C, D, and E appeared to be significant in a GEE logistic regression (b = 1.24, SE = 0.16, p < 0.05).

Fig. 2
figure 2

Percentage of procedure use related to number characteristics of the items

Problem format

Figure 3 shows that IA mainly appeared in the items with an adding-on context (ConAO) and that DS was most often used in items with a taking-away context (ConTA). Moreover, when solving BN problems, the students preferred DS. A GEE logistic regression showed a significant difference in IA use between context and bare number problems (b = 2.38, SE = 0.25, p < 0.05).

Fig. 3
figure 3

Percentage of procedure use related to the problem format

Prior instruction

The teachers’ responses to the online questionnaire revealed that two different textbook series were used in the 14 classes. Although these textbook series each contain some missing addend problems, they do not explicitly address the inverse relation between addition and subtraction.

Because teachers could have paid attention to IA without it being addressed in the textbook series, we also asked them which procedures they taught their students for solving subtraction problems. Their answers made it clear that all teachers taught DS. Only three teachers responded that they taught both DS and IA. Therefore, the students of these three teachers, 16 in total, were taught both procedures. These 16 students applied IA in 29% of the total of 209 cases (16 students answered 15 items each, minus 31 missing cases). The other 40 students who were not taught IA applied this procedure in 36% of the total of 559 cases (40 students answered 15 items each, minus 41 missing cases).

3.2.2 Multilevel analysis with IA use as the dependent variable

To examine the influence of the different conditions on IA use, we carried out a multilevel analysis in which we specified a cross-classified multilevel model containing an empty model 0 and a model 1 with predictors (see Table 4).

Table 4 Multilevel logistic regression model with IA use as the dependent variable

In model 0, only random effects of items, students, and teachers are specified. The intercept represents the average use of IA transformed onto the logit scale of the multilevel logistic regression model. The intercept (b = −1.47, SE = 0.82) is smaller than zero, which implies that IA is applied in less than half of the cases.

The large SD of the random item effect (SD = 2.83) compared with the student effect (SD = 0.93) indicates that IA use is mainly an item characteristic. This means that the application of IA is elicited by the nature of an item rather than by the specific preference of a student. Thus, students seemed to apply IA in a flexible, item-specific way.

The SD of the teacher component (SD = 0.41) is also small compared with the SD at the item level. Nevertheless, it should be noted that there is a substantial variation between teachers whose instruction might have consequences for students’ IA use.

In model 1, the numbers involved and the problem format are included as predictors at the item level. Here, all categories except a reference category of these variables are dummy coded (1 = item possesses the property, 0 = item does not possess this property). The regression coefficients of categories A, C, D, and E of the predictor numbers involved represent their contrast with the reference category B. In Table 4, the negative regression coefficients for numbers involved categories A, C, D, and E indicate that the frequency of IA use for the items belonging to these categories is smaller than for items in category B. However, we only observed a significant difference for numbers involved categories D (b = −3.35, SE = 1.26, p < 0.05) and E (b = −2.98, SE = 1.17, p < 0.05) and not for numbers involved categories A (b = −0.45, SE = 1.06, p > 0.05) and C (b = −1.73, SE = 1.11, p > 0.05). That the regression coefficient of A is close to zero indicates that in items like 47 − 43 (category A) and 61 − 59 (category B), IA is equally frequently used. For C, the regression coefficient suggests that students applied IA less frequently in category C than in B, but more often than in categories D and E.

With respect to the problem format, the regression coefficients of the categories of the predictor problem format (ConAO and ConTA) represent their contrast with the category BN problems. We found that IA was significantly more often applied for items that involve a context problem that reflects adding on (ConAO) than for BN problems (b = 4.74, SE = 0.93, p < 0.05). Such a significant difference was not found between items that involve context problems that reflect taking-away (ConTA) and BN problems (b = 1.42, SE = 0.99, p > 0.05). To investigate whether there is a difference in IA use between the context problem types ConAO and ConTA, we created a new variable which is defined by the difference of the regression coefficients of the two context problem types. Based on the WinBUGS output, we computed the distribution of this new variable, which revealed that IA use occurred significantly more often for ConAO than for ConTA items (b = 3.33, SE = 0.86, p < 0.05).

When examining whether there is a difference in IA use between context problems (ConAO and ConTA) and BN problems, we found that IA was significantly more used in context problems (b = 3.08, SE = 0.86, p < 0.05).

The SD of the item effect in model 1 (SD = 1.19) was substantially smaller than the corresponding SD in model 0 (SD = 2.83). This means that a large amount of item variance in IA use is explained by the item predictors numbers involved and the problem format. The explained variance at the item level (R²Item = 0.83) corresponds to the reduction of variance from model 0 to model 1.

At the student level, neither gender (b = 0.44, SE = 0.41, p > 0.05) nor the Cito ability score (b = −0.01, SE = 0.03, p > 0.05) turned out to be a significant predictor for IA use. At the teacher level, we found that despite the variation between the teachers, the variable IA taught (b = −0.44, SE = 0.55, p > 0.05) is not significant. Both for teacher and student levels, a small increase of SD is observed in model 1 compared with model 0.

As shown in Table 3, not using IA almost always implies the use of DS. Therefore, an indication of the regression coefficients for DS can be found by multiplying the regression coefficients b in Table 4 by −1. The SDs for all three levels will hardly change.

3.3 SE students’ success rate in IA

Of the 15 subtraction problems, the students solved between 1 and 14 items correctly (M = 7.7, SD = 3.5). In 68% of the 260 cases in which IA was applied and in 51% of the 480 cases in which DS was applied, the students’ answers were correct. The higher success rate when using IA appeared to be significant in a GEE logistic regression model (b = 0.82, SE = 0.17, p < 0.05).

3.3.1 Different conditions and success rate in IA

Number involved

Figure 4 shows that for items in category B, students’ success rate when using IA is 87%, whereas it is 39% when using DS. This positive difference of 48 percentage points in success rate between applying IA and DS deviates from the negative difference of 11 and 4 percentage points found in the categories D and E, respectively. This difference in success rate between IA and DS for the different categories of numbers involved appeared to be significant in a GEE logistic regression model (b = 2.64, SE = 0.56, p < 0.05).

Fig. 4
figure 4

Percentage of correct answers related to number characteristics of the items

Prior instruction

The students who had received IA instruction correctly solved 77% of the 61 total cases for which they used IA. The students who did not receive IA instruction correctly solved 67% of the 199 total cases in which they applied IA. The difference in these percentages did not appear to be significant in a GEE logistic regression (b = 0.55, SE = 0.34, p > 0.05).

3.3.2 Multilevel analysis with success rate as the dependent variable

To examine the influence of the conditions on success rate, we carried out a multilevel analysis in which we specified a cross-classified multilevel model containing model 0 and model 1 (see Table 5).

Table 5 Multilevel logistic regression model with success rate as the dependent variable

In model 0, the SD of the random student effects (SD = 1.20) is larger than the SD of the random item effects (SD = 1.12), which indicates that correctly solving an item is more student-related than item-related. In addition, the SD of the random teacher effect (SD = 0.31) is quite small compared with the SD at the item level.

In model 1, several predictors at the case, item, student, and teacher levels are included. At the case level, the predictors strategy use and procedure use are included to investigate their influence on the success rate. Because our focus is on the IA procedure, which was mostly combined with a stringing strategy, we used IA and stringing use as the dummy variables. Although there is a positive relation between IA use and success rate, IA use did not significantly predict success rate (b = −0.40, SE = 0.52, p > 0.05). The use of the stringing strategy increases the success rate significantly (b = 0.72, SE = .28, p < 0.05). However, the best predictor of a correct answer is the combination of IA and stringing (b = 1.17, SE = 0.55, p < 0.05). This finding was obtained even after controlling for all the other predictors at the item, student, and teacher levels.

At the item level, the predictors numbers involved and problem format are included. Items belonging to the numbers involved categories C (b = −1.39, SE = 0.52, p < 0.05) and E (b = −1.30, SE = 0.55, p < 0.05) are significantly more difficult than items of category B. Concerning the problem format, we found that both types of context problems (ConAO and ConTA) did not significantly differ from the BN problems (b = 0.36, SE = 0.43, p > 0.05 and b = 0.05, SE = 0.45, p > 0.05, respectively).

At the student level, it appeared that students’ success rate is positively related to the Cito ability score for mathematics (b = 0.12, SE = 0.03, p < 0.05); however, gender is not (b = −0.20, SE = 0.39, p > 0.05). Finally, at the teacher level, we found that IA taught is not a significant predictor of success rate (b = −0.09, SE = 0.44, p > 0.05).

Using the SDs at the item level in model 0 and model 1, we found that the item difficulties are largely explained (R²Item = 0.78) by the item predictors. The explained variance at the student level (R²Student = 0.20) is smaller. Apparently, other student characteristics besides the two included in model 1 are responsible for the variance at the student level. The explained variance at the teacher level (R²Teacher = 0.30) is also less than on the item level.

To investigate whether the success rate in the case of IA use differed for the different numbers involved, we specified an additional multilevel regression model including the predictors IA use, the categories of numbers involved (which are also used in model 1), and the interactions of IA use with each of these categories. As in model 1, category B served as a reference category. For all interactions of IA use with numbers involved categories, we found significant negative regression coefficients. This means that IA use is most successful when it is applied in small-difference problems with crossing the ten (category B) compared with all the other categories of numbers involved (A vs. B: b = −1.58, SE = 0.67, p < 0.05; C vs. B: b = −3.07, SE = 0.67, p < 0.05; D vs. B: b = −3.14, SE = 0.71, p < 0.05; E vs. B: b = −2.59, SE = 0.74, p < 0.05).

4 Conclusions and discussion

Our study showed that SE students can indeed make use of IA when solving subtraction problems (Hypothesis 1a). The main prompt for using IA turned out to be the item characteristics. Students used IA in a rather flexible item-specific way. With respect to the numbers involved, we found that students mainly used IA in small-difference problems with crossing the ten (Hypothesis 1b). With regard to the problem format, our study revealed that students most frequently applied IA in context problems that reflect adding on (Hypothesis 1c). However, contrary to what we stated in Hypothesis 1d, students did not apply IA more often when having received instruction in IA.

Our study showed that the SE students were quite successful in solving subtraction problems when using IA (Hypothesis 2a), but the results from the two types of applied analyses were not univocal. In the GEE regression, IA use was found to significantly influence success rate, whereas in the multilevel regression (in which—in contrast to the GEE approach—the student’s general ability of solving the subtraction problems in the test is included as a random effect), IA use was not a significant predictor for success rate. Because students were free to choose their solution method, the use of IA might be related to their general ability to solve test items correctly. This explains why the GEE approach and the multilevel approach lead to different results (see also Molenberghs & Verbeke, 2004).

Furthermore, solving the test items by applying IA together with stringing appeared to be more successful than applying DS together with splitting (Hypothesis 2b). This finding emphasizes the importance of examining procedures (IA use or DS use) as well as strategies (splitting, stringing, and varying) when investigating students’ ability to solve number problems.

Regarding the numbers involved, we found that in small-difference problems with crossing the ten, students were more successful when applying IA (Hypothesis 2c). Again, for prior instruction, we did not find an effect of IA use on success rate (Hypothesis 2d).

In sum, our study has revealed that: SE students (1) are able to use IA spontaneously, (2) are rather flexible in applying IA to solve subtraction problems, and (3) are quite successful when solving subtraction problems by IA. These outcomes contrast with some research findings described in Section 1.3 which suggested that weak students have difficulties in applying IA to solve subtraction problems. Our findings made it clear that sensitive assessment tools are needed to reveal students’ ability. In our case, test items designed with a particular format and number characteristics enabled us to make SE students’ ability to use IA visible. Furthermore, our findings have consequences not only for assessment but also for teaching mathematics to SE students, and in particular teaching them subtraction problems. Restricting this teaching to the straightforward taking-away procedure underestimates SE students’ mathematical abilities and does not offer the best environment for them to develop a deep understanding of different subtraction approaches.

Although the present study confirmed to a large degree our hypotheses about SE students use of IA to solve subtraction problems up to 100, our results should be handled with care. First of all, our study was limited in the number of students and schools. A second drawback of our study was that we did not carry out a detailed inventory of the students’ prior instruction in IA, i.e., we only asked whether the students had been taught a particular procedure and not how it was taught. This lack of information on the quality of the instruction might explain why no influence was found of prior instruction on the students’ success rate. Finally, the test we used for this study has some shortcomings. Not only did we have no more than a small number of items but we also offered these items to every student in the same order. The latter means that our results could be flawed as a result of order effects in the items. Although we tried to minimize the order effect of the item characteristics by distributing them uniformly over the test item positions, it cannot be guaranteed that the particular sequence of the item characteristics did not influence the outcomes of our study. Nevertheless, our findings showed that SE students were able to use IA. Providing evidence for this was the main goal of the study.