Students’ achievement-related beliefs play an important role in school by directing behavior and effort in learning situations (e.g., Atkinson 1964; Bandura 1997; Eccles et al. 1983; Wigfield et al. 2006). Students who believe that they can and will do well at school are more likely to perform better and to engage in an adaptive manner in academic tasks than students who have a negative self-perception and expect to fail (Pintrich and Schunk 2002). Several concepts pertaining to achievement-related beliefs have been introduced, such as perceived competence (Harter 1982), academic self-concept (Shavelson et al. 1976), and self-concept of ability (e.g., Nurmi and Aunola 2005), the last of which will be used in the present study.

Despite certain differences in emphasis, these concepts all refer to students’ own understanding of their abilities in academic situations (for a review, see Bong and Skaalvik 2003). Furthermore, they all emphasize that the understanding of the ability is based not only on an individual’s own performance but also on social comparison with the reference group (for empirical evidence, see, e.g., Chiu et al. 2017). The big-fish-little-pond effect (BFLP effect) model (Marsh et al. 2007) argues that students compare their own academic ability with the academic abilities of their classmates and use this social comparison impression as a basis for forming their self-concept of ability. Previously, this model has primarily been tested among adolescents and in selective school systems (Marsh et al. 2007, Marsh et al. 2018); however, the model has rarely been tested among younger students (see Guo et al. 2018), although primary school age might be the most critical period for forming of self-concept of ability. In the present study, we examine whether the effects indicated by the BFLP effect model can already be found in the primary grades and in a school system that does not select and track students according to different abilities, that is, in the Finnish educational context.

The big-fish-little-pond effect model

Individual academic performance has been shown to provide the basis for the development of students’ self-concept of abilities in primary school (Aunola et al. 2002; Chapman and Tunmer 1997). During the first school years, students acquire experiences of diverse academic tasks and related feedback on their success. Increasing cognitive maturation allows the students to make more accurate evaluations of this feedback and to compare their abilities and success to those of their classmates. During this time, the associations between students’ academic performance and self-concept of ability increase in intensity over their previous state (Denissen et al. 2007; Eccles et al. 1998).

The skill-development model argues that academic performance is important in the formation of self-concept among younger students (Calsyn and Kenny 1997). In support of this developmental view, Aunola et al. (2002) showed that good reading skills in grade 1 (7-year-olds) make a positive contribution to later self-concept of ability, while poor reading skills constitute a risk factor for the development of negative self-concept of ability. Chapman and Tunmer (1997) demonstrated that children’s pre-reading performance at age 5 was an important predictor of their reading-related self-concept at age 7. Similarly, in a longitudinal study assessing both students’ self-concept of ability and performance in reading and mathematics during grades 1, 2, 4, and 7, Viljaranta et al. (2014) showed that performance during previous grades predicted students’ subsequent self-concept of ability. However, gender may moderate this relation between performance and self-concept. It has been shown, for instance, that, while boys and girls perform equally well in mathematics and in literacy, gender differences in self-concept exist from quite a young age, with boys holding more positive self-concept for mathematics and girls holding more positive self-concept for literacy (Denissen et al. 2007; Eccles et al. 1993; Wouters et al. 2013).

In addition to children’s own academic performance, the performance level of their classmates may also have an effect on students’ self-concept of ability. It has been suggested that equally skilled students have lower self-concept of ability when they are placed in a classroom where the average performance level is high than when they attend a low-performing classroom (Marsh 1987; Marsh et al. 2018). Thus, the same absolute level of performance can lead to different self-concept depending on the contexts in which students evaluate themselves. This phenomenon—known as the big-fish-little-pond effect (Marsh et al. 2007, 2008)—has been shown to be highly specific to the academic component of self-concept. For example, Marsh (1987) demonstrated that there was a large negative classroom effect on academic self-concept but little or no negative classroom effect on general self-concept or self-esteem.

In statistical terms, BFLP effect has been operationalized in terms of a concurrent path model where both students’ own performance and the performance level of the classroom predict individual students’ self-concept of ability (see Fig. 1). The effect of students’ own performance on self-concept is assumed to be positive (e.g., the better the skills of a student in mathematics, the higher the student’s self-concept in mathematics), whereas the effect of the performance level of the class is assumed to be negative (the better the skills of the class in mathematics, the lower the individual student’s self-concept in mathematics—after accounting for the student’s own skills in mathematics).

Fig. 1
figure 1

The big-fish-little-pond effect (BFLP effect) model (adapted from Marsh et al. 2007)

The mechanism underlying the BFLP effect has been explained from a number of different perspectives (Marsh and O’Mara 2010), which all emphasize that comparison processes are inevitable in a classroom (Marsh et al. 2007, 2018). The negative effect has been shown to be similar regardless of students’ own performance level (Marsh and Hau 2003). In samples involving adolescents and young adults in selective school systems, evidence has indicated cross-cultural generalizability of the BFLP effect model (Marsh and Hau 2003) and also support for the model in a variety of academic domains (Parker et al. 2013). Altogether, these studies attest that evidence for the BFLP effect is highly robust among adolescents (Marsh et al. 2018).

Nonetheless, the previous research on BFLP effect has at least three limitations. First, the BFLP effect model has mostly been applied in the analysis of the effects of classes or schools on adolescents’ or young adults’ academic self-concept (e.g., Jonkmann et al. 2012; Marsh and O’Mara 2010; Parker et al. 2013; Trautwein et al. 2006). There is surprisingly scant research testing the BFLP effect among younger primary school students, even though examination of the BFLP effect is particularly interesting with respect to these years when students’ self-perception of their academic performance is being formed (see Guo et al. 2018; Marsh et al. 2015; Pinxten et al. 2015; Thijs et al. 2010; Trautwein et al. 2008; Wouters et al. 2013). Trautwein et al. (2008) examined the BFLP effect among 10-year-old children, but they focused on physical activity rather than academic skills. Thijs et al. (2010) and Wouters et al. (2013) studied the BFLP effect among Dutch and Belgian primary school students, finding that students in the last grade of primary schools (Mage = 11–12 years) used their classmates as referents in assessing their own academic performance. In a Belgian study by Pinxten et al. (2015), strong negative correlations were documented between class-average performance in math and literacy and academic self-concept in these subjects already in grade 4. Marsh et al. (2015) investigated evidence for negative classroom effect on students’ self-concept in mathematics by using data from fourth and eighth grade students in three Middle Eastern Islamic countries (Iran, Kuwait, and Tunisia), four Asian countries (Hong Kong, Japan, Taiwan, and Singapore), and six Western countries (Australia, UK, Italy, Norway, Scotland, and the USA). They documented a positive effect of students’ performance and a negative effect of class average performance on self-concept in mathematics in all countries in both age cohorts. When focusing on the results from Western countries, the variation in the negative BFLP effect was notably wide among fourth graders: it ranged from − 0.134 in Norway to − 0.482 in Italy. A study by Guo et al. (2018), which also focused on the BFLP effect among fourth graders, indicated that the BFLP effect might be weaker in reading than in math.

While there is currently some evidence that supports the assumption of negative BFLP effect already in the primary school context, more research is needed to establish whether these findings could be replicated in other educational systems. Furthermore, previous findings by Marsh et al. (2015) suggest replications of the BFLP effect would be especially relevant in Nordic countries where the effect might be different if compared to other Western countries. Thus, in the present study, data from students in grade 3 (age 9), grade 4 (age 10), and grade 6 (age 12) were used to examine whether the assumptions of the BFLP effect model already hold during the primary school years in Finland. Because the literature indicates slightly different effects for literacy and math (Guo et al. 2018), both subject domains were included in the study.

Second, prior studies examining BFLP effects on students’ self-concept have typically been conducted in academically selective school systems (e.g., Jonkmann et al. 2012; Marsh et al. 2007; Marsh and O’Mara 2010; Parker et al. 2013; Schurtz et al. Shurtz et al. 2014; Trautwein et al. 2006), where students are tracked according to their performance level. These studies have shown a particularly strong negative effect of classroom performance among all students attending academically selective schools or classes (Marsh and Hau 2003). The BFLP effect model has seldom been studied in nonselective educational settings, and more specifically, only a few researchers (Guo et al. 2018; Marsh et al. 2015; Pinxten et al. 2015) have studied it in nonselective primary school systems, in which the present study was conducted.

Finally, although it has been claimed that the BFLP effect is robust, universal, and scarcely affected by any moderators (cf. Marsh et al. 2008), gender differences in the BFLP effect have only seldom been systematically examined (Marsh et al. 2007; Plieninger and Dickhäuser 2015; Preckel et al. 2008; Thjis et al. Thijs et al. 2010), and the findings for gender have been contradictory. For example, Marsh et al. (2007) concluded that difference between girls and boys in BFLP effect for math self-concept is marginal. In contrast, results by Plieninger and Dickhäuser (2015) indicated that gender moderates the BFLP effect, as the negative classroom effect in science was substantially larger for girls than for boys. Further, Preckel et al. (2008) found that the negative classroom effect was particularly large among girls attending a special class for gifted. However, given that only a few researchers have empirically investigated the generalizability of the BFLP effect across gender, and of those only Thjis et al. (Thijs et al. 2010) have focused on primary school, further research is needed on the topic.

Finnish school system

In Finland, compulsory education beginning at age 7 is an integrated 9-year structure intended for the entire age group. Finnish society strongly emphasizes the importance of equal educational opportunities for everyone, and, as a result, most Finnish children attend the public school system (only 1.4% attend private schools), and compulsory education is provided for all students completely free of charge (including free learning materials, books, and lunch). Furthermore, primary schools do not select or test students as part of admission (expect for some schools with special emphasis), but instead students are from the catchment area (i.e., from neighborhood) and they are designated a place by the municipality educational authorities. An overwhelming majority of students are enrolled in their nearest school. The classes are formed by the principals more or less randomly instead of assigning students to classes based on ability for instance. Parents can, however, make a wish, that their child could be enrolled in the same class with his or her best friend, but this is not guaranteed. Students often continue with same classmates for 6 years until the end of the primary grades.

During the primary school phase, grades 1–6, students typically are taught all or most of the subjects by their class teacher. Class teachers in Finland are required to have a university Master of Education degree. Teachers are responsible for autonomously applying the curriculum guidelines and assessing student progress on the basis of the objectives of the national core curriculum, and no national tests or evaluations are administered in primary school. All students have the right to general and special support, individual guidance, and tailored instruction at school if needed (National Core Curriculum 2014).

Present study

The aim of this study is to test whether the assumptions of the BFLP effect model hold for primary school students in a nonselective educational system. First, we set out to examine to what extent students’ performance in literacy and in mathematics, assessed at grades 3, 4, and 6, is related to their self-concept in these subjects. In line with previous studies, we expected (Hypothesis 1) that academic performance in a particular subject would predict the level of self-concept in that subject (Aunola et al. 2002; Chapman and Tunmer 1997; Viljaranta et al. 2014). Second, we examine to what extent the average performance level in the classroom is related to students’ self-concept in literacy and in mathematics at grades 3, 4, and 6. The findings of the previous studies have demonstrated a classroom effect during primary grades in certain subjects (Guo et al. 2018; Marsh et al. 2015; Pinxten et al. 2015; Thijs et al. 2010; Trautwein et al. 2008). Thus, we hypothesized (Hypothesis 2) that classroom average performance level would be negatively related to students’ self-concept in literacy and in mathematics after controlling for individual student performance (Marsh et al. 2008), but we hypothesized that the effect might be stronger for math than for literacy (see Guo et al. 2018). Third, because previous research has found contradictory findings concerning the differences between girls and boys with respect to negative classroom effects (e.g., Marsh et al. 2008; Plieninger and Dickhäuser 2015; Preckel et al. 2008), we examined gender differences in these associations. However, no exact hypotheses for gender differences were set.

Method

Participants and procedure

This study is part of an extensive longitudinal First Steps study (Lerkkanen et al. 2006-2016) comprising approximately 2000 students who in the first phase were followed up from the beginning of their kindergarten year (year 2006) to the end of the sixth grade (year 2012), with simultaneous data gathering from their parents and teachers. The sample was drawn from four municipalities in different parts of Finland. In three of these municipalities, the whole age cohort participated, while in the fourth municipality, the participating students comprised approximately half of the age cohort. Parents gave their written consent for their children’s participation in the study.

The participants whose data were used in the present analyses consisted of an intensively followed subsample of 504 students (237 girls and 267 boys) from 133 grade 3 classrooms (the average age was 8 years and 3 months) attending general education (those attending special education classes were excluded from the present data). The selection of the subsample was performed in a stratified fashion so that there was at least one child from each classroom. This subsample consisted of children identified at the end of kindergarten as at risk for reading difficulties because of lower level of pre-reading skills (n = 214), as well as randomly selected control children from the same classrooms (n = 290). Because of the high number of at-risk children, we compared the subsample (n = 504) to the total sample (N = 2000) of the follow-up in terms of children’s math and reading skills. The results of the independent samples t-test indicated that the present subsample did not differ from the total sample in terms of academic skills. Of these 504 grade 3 students, data was available for 482 students (225 girls and 257 boys from 130 classrooms) in grade 4 and for 365 students (162 girls and 203 boys from 119 classrooms) in grade 6.

Students were asked to provide ratings in individual test situations regarding their self-concept of ability in literacy and mathematics in the spring of grades 3, 4, and 6. In addition, they were tested at the same time points on their literacy and mathematics skills in group test situations with their classmates. Calculation of the classroom average performance scores for the BFLP effect models (i.e., aggregated performance) was based on information regarding all students from each classroom who participated in the group tests (N = 1889 at grade 3; N = 1852 at grade 4; and N = 1769 at grade 6).

In terms of family background, 7% of the children’s parents did not have education beyond comprehensive school (compulsory education up to the completion of grade 9), 31% had completed upper-secondary education (senior high school or vocational school, grades 10–12), 36% had a bachelor’s degree or a vocational college degree (3-year education at a college or university), and 26% had a master’s degree or higher (i.e., licentiate or doctoral degree). The sample was fairly representative of the Finnish population (Statistics Finland 2007).

Measures

Self-concept in literacy and mathematics

Students’ self-concept in literacy (reading and spelling) and in mathematics was measured using the Self-Concept of Ability Scale (Nicholls 1978; for validity, see Aunola et al. 2002). In the test, students were presented with 10 circles arranged in a column from the top to the bottom of the page. The students were told that the chain of circles represents a group of students, with the topmost circle (value 1) representing the student who is best at a certain subject, the circle at the bottom (value 10) representing the student with the poorest skills, and the others representing students with skills in between. The participants responded by pointing to one of the circles. For the analyses, the scale was reversed (i.e., value 10 standing for high self-concept of ability and value 1 standing for low self-concept of ability). In grade 3, the question was similar for literacy and for mathematics: “Can you show me how good you are at reading and spelling/at mathematics? Which one is you?”. In grades 4 and 6, the variable measuring students’ self-concept in literacy was constructed based on six questions. Three of the questions measured self-concept of ability in reading (general reading-related self-concept, reading fluency, and reading comprehension): “Can you show me how good you are at reading/reading fast/understanding what you read? Which one is you?”. Three of the questions measured self-concept of ability in spelling (general writing-related self-concept, spelling, and productive writing): “Can you show me how good you are at writing/spelling/productive writing? Which one is you?”. Cronbach alpha reliability was 0.81 at grade 4 and 0.89 at grade 6. The variable measuring grade 4 and grade 6 students’ self-concept in mathematics was constructed using three questions measuring self-concept of ability in mathematics (general math skills, automatization of adding and subtraction skills, and multiplication): “Can you show me how good you are at math/multiplication/mental arithmetic? Which one is you?”. Cronbach alpha reliability was 0.84 in grade 4 and 0.88 in grade 6.

Literacy performance

A variable measuring students’ literacy performance was constructed based on students’ scores in group tests of reading fluency, reading comprehension, and pseudoword spelling.

Reading fluency was assessed using a group-administered subtest of the nationally standardized reading test battery (Lindeman 1998). In this speed test, a maximum of 80 items could be attempted in a 2-min time limit. For each item, there was a picture with four words next to it, and students were asked to read silently the four (phonologically similar) words and to then draw a line connecting the picture to the word semantically matching it. The measure was scored by calculating the number of correct answers (maximum score possible = 80). Because of the nature of this speed test, the score reflects both the student’s fluency in reading the stimulus words and his or her accuracy in making the correct choice from among the alternatives. In the highly transparent orthography of the Finnish language, differences between students have typically been identified using this type of reading test (Holopainen et al. 2001).

Reading comprehension was assessed using a group-administered subtest of the nationally normed reading test battery (Lindeman 1998). The students were asked to silently read a fictional story and then answer 12 multiple-choice questions. The students received one point for each correct answer (maximum score possible = 12). The students completed the task at their own pace, but the maximum time allotted was 45 min.

Pseudoword spelling was assessed using a group-administered task with 8 items consisting of a one-syllable pseudoword (1 item: vuil), two-syllable pseudowords (2 items: saihdi; raalsku), and three- and four-syllable pseudowords (5 items: hiuruutti; seivolssi; paunitteri; ruustivaimu; nuppasengit) (Torppa et al. 2010). Each item was presented orally twice. The score was the number of correctly spelled items (maximum score = 8). The composite score for literacy performance in grades 3, 4, and 6 was formed by calculating a mean based on the standardized test scores of reading fluency, reading comprehension, and pseudoword spelling.

Mathematics performance

A variable measuring students’ mathematics performance was constructed in grades 3 and 4 based on students’ scores in fluency in arithmetic skills and arithmetic reasoning. Each assessment was weighed as the same in computing the mean of mathematics performance. In grade 6, students’ performance in mathematics was based on their scores in fluency in arithmetic skills. Thus, the focus on the assessments employed was on mastery of basic arithmetic skills.

Fluency in arithmetic skills was assessed using the group-administered Basic Arithmetic Test (Aunola and Räsänen 2007; see also Räsänen et al. 2009). In this speed test, a maximum of 28 items, consisting of 14 items for addition (e.g., 2 + 1 = ? and 3 + 4 + 6 = ?) and 14 items for subtraction (e.g., 4 − 1 = ? and 20 − 2 − 4 = ?), is attempted in a 3-min time limit (in grade 6, four of these items included also division or multiplication). Task difficulty increases gradually across the test. The test indexes a combination of speed and accuracy (see Zhang et al. 2014). The final score is the total number of correct answers (maximum score possible = 28).

Arithmetic reasoning was assessed using a NMART test (Koponen and Räsänen 2003; see also Langdon and Warrington 1997) with 30 items in which students are asked to continue a series of three numbers by adding a fourth number that would complete the series. For example, an item would involve a series of three numbers (e.g., 3, 5, 7), and the student needs to circle the number out of the four additional numbers given as alternatives that best fits as the fourth number in the series. After four practice items, the students completed the test at their own pace, but the maximum time allotted was 10 min for 30 tasks. It was assumed that on average students would complete the tasks approximately in 5 min, and therefore, the test was not a speed test. One point was given for each correct answer. The final score was the total number of correct answers (maximum score possible = 30).

The composite score for mathematics performance in grades 3 and 4 was formed by calculating a mean based on the standardized test scores of arithmetic fluency and arithmetic reasoning. In grade 6, fluency in arithmetic skills was used as the measure of mathematics performance.

Classroom average performance

Students’ average performance in literacy (reading fluency, reading comprehension, and pseudoword spelling) was aggregated across the students in a classroom, i.e., students’ mean score of performance level was calculated for each classroom. Students’ performance in mathematics (arithmetic and arithmetic reasoning) was also aggregated across students in a classroom by calculating mean score of the performance level in mathematics in each classroom.

Analytical strategy

The analyses were performed using the Mplus statistical package (version 7.01; Muthén and Muthén 1998–2012). The standard missing at random (MAR) approach was applied (Muthén and Muthén 1998–2012). This missing-data method uses all the data that is available in order to estimate the model without imputing data. The parameters of the models were estimated using the full information maximum likelihood (FIML) estimation with non-normality robust standard errors (MLR estimator). As the data were hierarchical in nature (i.e., students were nested in classrooms), we calculated intraclass correlations (ICCs) and design effects. The ICCs for self-concept of ability in literacy were 0.02, 0.01, and 0.01, and p values = 0.78, 0.81, and 0.81, in grades 3, 4, and 6, respectively. The ICCs for self-concept of ability in math were 0.03, 0.01, and 0.01, and p values = 0.87, 0.89, and 0.88, in grades 3, 4, and 6, respectively. The design effects were for self-concept of ability in literacy, 1.06, 1.03, and 1.02, and for self-concept of ability in math, 1.08, 1.03, and 1.02, respectively. Hox and Maas (2002) suggested that analyzing multilevel data as single-level data can yield acceptable (not overly biased) parameter estimates and inferential tests, if the design effects are smaller than 2.0. Therefore, we used the Type = COMPLEX approach (Muthén and Muthén 1998–2012). This COMPLEX approach estimates the model at the level of the whole sample, but corrects for distortions in standard errors in estimation caused by the clustering of observations (i.e., clustering effect of students nested in classrooms). Such methodology has been used previously by Marsh and O’Mara (2010) when investigating the BFLP effect. Differences between girls and boys in the BFLP effect model were tested using a multigroup approach, and differences between models were tested with the Satorra–Bentler scaled χ2 test for difference. The goodness-of-fit of the estimated models was evaluated using the following four indicators: χ2 test, comparative fit index (CFI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR).

Results

Descriptive analyses

Descriptive statistics, correlations between the observed variables, and mean differences between girls and boys in performance and self-concept in literacy and mathematics are presented in Tables 1, 2, and 3. Students’ performance in literacy and their self-concept in literacy, as well as performance in math and self-concept in math, had a statistically significant positive correlation in grades 3, 4 and 6. Independent samples t test analysis showed that literacy performance and self-concept in literacy were both significantly higher for girls than for boys across all grades. In mathematics, performance of boys was significantly better than that of girls’ in grades 3 and 4, but in grade 6, girls and boys did not differ in mathematics performance. However, self-concept in mathematics was significantly higher for boys than for girls across all grades.

Table 1 Correlations, means (M), standard deviations (SD), and mean differences between girls and boys (t) at grade 3
Table 2 Correlations, means (M), standard deviations (SD), and mean differences between girls and boys (t) at grade 4
Table 3 Correlations, means (M), standard deviations (SD), and mean differences between girls and boys (t) at grade 6

Models for the big-fish-little-pond effect

The path models testing the assumptions of the BFLP effect model examined the associations between students’ individual academic performance and self-concept of ability, as well as the associations between classroom average performance and self-concept of ability. In this model, evidence of a negative classroom effect would be indicative of the BFLP effect. Models were calculated separately for literacy and mathematics within each grade level. Differences between girls and boys in the BFLP effect models were tested using a multigroup approach.

Path models in grade 3

The results of grade 3 models are presented in Fig. 2. For literacy, the gender-constrained model (i.e., a similar model for girls and boys) suggested that there is a difference between girls and boys [χ2(3) = 8.10, p = 0.04; CFI = 0.97, RMSEA = 0.08, SRMR = 0.08]. The Satorra–Bentler scaled χ2 test for difference [∆χ2(∆1) = 9.89, p = 0.002] indicated that individual performance was more strongly related to self-concept in literacy among girls than boys. The analyses did not show differences between genders concerning the negative effect of classroom performance in literacy, and the effect of average classroom performance was nonsignificant for both girls and boys.

Fig. 2
figure 2

Effect of individual performance and average classroom performance on students’ self-concept in literacy (a) and in mathematics (b) in grade 3: standardized solution. Note. ***p < 0.001. The first value is for girls and the second is for boys (significantly different values are bolded)

For mathematics, the gender-constrained model (i.e., similar model for girls and boys) provided an excellent fit [χ2(3) = 2.22, p > 0.05; CFI = 1.00, RMSEA = 0.00, SRMR = 0.03] (Fig. 2), and further analyses showed that girls and boys did not differ statistically in any path. Individual performance in mathematics was significantly and positively related to both girls’ and boys’ self-concept in mathematics. Although the χ2 test for difference [∆χ2(∆1) = 1.69, p > 0.05] indicated that the negative classroom effect in mathematics was similar among boys and girls, the path coefficients indicated a significant negative classroom effect for boys but a nonsignificant effect for girls.

Taken together, the grade 3 models indicated that high-level individual performance was positively related to students’ self-concept in literacy and in mathematics, but in literacy, the relation was stronger among girls than boys. Average classroom performance level was negatively related only to boys’ self-concept in mathematics.

Path models in grade 4

Next, we tested the effects of individual performance and classroom average performance for literacy and mathematics in grade 4 (Fig. 3). The gender-constrained model provided an excellent fit for the literacy model: [χ2(3) = 3.70, p > 0.05; CFI = 1.00, RMSEA = 0.03, SRMR = 0.06], and further analyses indicated that girls and boys did not differ statistically in any path. A significant positive association was found between students’ individual performance in literacy and their self-concept in literacy for both girls and boys. Classroom average performance was not significantly associated with either girls’ or boys’ self-concept in literacy.

Fig. 3
figure 3

Effect of individual performance and average classroom performance on students’ self-concept in literacy (a) and in mathematics (b) in grade 4: standardized solution. Note. *p < 0.05; ***p < 0.001. The first value is for girls and the second is for boys (significantly different values are bolded)

For mathematics, the gender-constrained model provided a poor fit [χ2(3) = 10.31, p = 0.02; CFI = 0.97, RMSEA = 0.10, SRMR = 0.11], indicating that certain differences between girls and boys were present. The Satorra–Bentler scaled χ2 test for difference [∆χ2(∆1) = 9.98, p = 0.002] showed that a path from individual performance to self-concept in mathematics was stronger among girls than boys. Although the chi-squared difference test for difference [∆χ2(∆1) = 1.96, p > 0.05] failed to show a difference in the classroom effect between girls and boys, the path coefficients indicated a significant negative classroom effect for boys but not for girls.

Taken together, the grade 4 models indicated that individual performance was positively related to students’ self-concept in literacy and in mathematics. Furthermore, in mathematics, this association was stronger for girls than for boys. However, as in grade 3, the assumed negative BFLP effect was found only for boys and only in mathematics.

Path models in grade 6

Finally, we tested the effects of individual performance and classroom average performance in grade 6 (Fig. 4). The gender-constrained model provided an excellent fit for the literacy model: [χ2(3) = 4.80, p > .05; CFI = 0.98, RMSEA = 0.06, SRMR = 0.04]. The results did not indicate differences between girls and boys in the relation between individual performance in literacy and self-concept in literacy: Individual performance in literacy was significantly and positively related to both girls’ and boys’ self-concept in literacy. The findings concerning average classroom performance indicated a negative classroom effect [∆χ2(∆1) = 3.98, p = .045] for boys but not for girls in literacy.

Fig. 4
figure 4

Effect of individual performance and average classroom performance on students’ self-concept in literacy (a) and in mathematics (b) in grade 6: standardized solution. Note. *p < 0.05; **p < 0.01; ***p < 0.001. The first value is for girls and the second is for boys (significantly different values are bolded)

For mathematics, the gender-constrained model provided an excellent fit [χ2(3) = 0.48, p > .05; CFI = 1.00, RMSEA = 0.00, SRMR = 0.03]. Furthermore, the χ2-test for difference indicated no differences between genders. The results showed a strong positive relation between individual performance and self-concept in mathematics for girls as well as for boys. The negative classroom effect was significant for both girls and boys.

In sum, the findings for Grade 6 indicated that the positive relation between individual performance and students’ self-concept in literacy and mathematics was strong and similar among girls and boys. Negative classroom effect was present in literacy only for boys, but in mathematics this effect was present regardless of gender.

Discussion

According to the BFLP effect model, a student’s self-concept of ability is dependent on the performance level of the group, as well as the student’s own individual performance level (Marsh et al. 2007). In the present study, using a Finnish sample, we examined whether the assumptions of the BFLP effect model would already be visible among primary school students (Grades 3, 4, and 6) in a nonselective school system. Furthermore, we examined whether there are gender differences in the BFLP effect. The results showed that students’ individual performance in both literacy and in mathematics was positively associated with their self-concepts in literacy and mathematics across all grades. A small negative classroom effect proposed by the BFLP effect model emerged in mathematics. This was indicated by lower self-concept of ability in classrooms with high average performance level among boys in all grades, whereas among girls this was found only in Grade 6. In literacy, the data did not show evidence for the negative classroom effect proposed by the BFLP effect model in Grades 3 or 4, and in Grade 6 it was found only among boys. The finding of relatively small negative classroom effects in comparison to some prior studies may suggest that classrooms void of selection and tracking may protect against comparisons in classrooms which typically have a decreasing impact on students’ self-concept of ability.

The results of the present study showed in line with previous literature that students’ own performance level was positively associated with their self-concept of ability at each grade level. This was true for both literacy and mathematics and for both boys and girls: the better the individuals’ own performance level, the higher the level of their self-concept of ability in that particular subject. The third graders’ performance in literacy and mathematics was already highly related to their self-ratings of their standing in comparison to their peers in the respective subjects. This result is in line with previous studies showing a positive association between performance and academic self-concept (Aunola et al. 2002; Chapman and Tunmer 1997; Viljaranta et al. 2014). As suggested by motivational models, feedback from teacher and peers, along with skill level and accomplishments, is likely to affect students’ self-concept of ability (for a review see Muenks et al. 2018). Teacher’s feedback in the classroom, in particular, is seen to contribute to students' evolving understanding of their abilities in academic situations (Burnett 2003; Furtak et al. 2016; Pesu et al. 2016).

The results provided some evidence for the assumed negative classroom effect of self-concept of ability in mathematics, especially for boys after controlling for individual student performance, whereas evidence for a negative classroom effect in literacy was very scant and emerged only in Grade 6 and only for boys. In mathematics, the average performance level in the classroom was negatively associated with boys’ self-concept of ability in mathematics at each grade level, although the effect was small in comparison to recent result from Belgian primary school system (see Pinxten et al. 2015). For girls, the evidence failed to show a negative classroom effect in mathematics in Grades 3 and 4, but a small effect was found in Grade 6. In literacy, the analyses failed to show a negative classroom effect for self-concept in Grades 3 or 4, and in Grade 6 the effect was found only for boys. The findings thus provide some support for the BFLP effect model which proposes that social comparison processes affect students’ academic self-concept already in primary school classrooms (see also Guo et al. 2018; Marsh et al. 2015; Pinxten et al. 2015; Thijs et al. 2010). However, given the small effect size, the support was weak.

It seems that in the Finnish nonselective school system, where children’s performance level in classroom can be quite heterogeneous due to lack of selection or tracking, a negative classroom effect on student’s self-concept of ability is small. This suggestion is in line with results by Chiu et al. (2017) showing that high heterogeneity in primary school classrooms (in terms of family background and past achievement) benefit students reading achievement. The findings also corroborate the results by (Marsh et al. 2007, 2018) showing that in Norway (where the educational system is very similar to Finland), the effect of negative BFLP effect among fourth graders was also small in comparison to several other Western countries. However, it is important to note, that BFLP effect has rarely been examined among young primary school students. It is possible that the scant evidence for BFLP effect in the present study is also partially due to participants’ age. Nicholls (1978) argued already in late seventies that children in between 9 to 12 years of age can differentiate ability and effort as causes of outcomes, but do not always apply this understanding. This statement has been criticized recently, for example by Cimpian (2017; for a review see Moenks et al. 2018) who showed that children are able to understand abilities at a much earlier age. Thus, future research should continue to examine the developmental differentiation of self-concepts of ability and the role of the context on BFLP effect in primary school.

The results in mathematics differed from those in literacy, which is in accordance with recent findings by Guo et al. (2018). The difference between the subjects may be associated with the different nature of these subjects, the beliefs of the capabilities required to succeed in them, and the visibility of differences in skill levels. For negative BFLP effect to occur, students need to have accurate information about the accomplishments of their classmates. Students’ exposure to comparable information may be more likely in mathematics than in literacy. Mathematics is a hierarchically structured content and skill area, where new knowledge is cumulatively based on previously established routines and understanding. In mathematics, classmates may more easily notice when a student falls behind the rest of the class. In literacy, by contrast, comparing literacy skills after the basic reading acquisition stage is not as clear as in mathematics, especially in the Finnish context where children typically reach fluent and accurate reading skills by the end of second grade (Lyytinen et al. 2006). Another possibility is, as Guo et al. (2018) suggest, that the BFLP effect is stronger when students’ opportunities to evaluate their competence is restricted to school environment. One’s level of literacy skills may be more often than math skills be used as a resource and manifested in the daily life outside of school, and, thus, students may construct their self-concept in literacy using a broader context and experiences gathered from formal and informal settings than that for math.

The findings revealed interesting differences between girls and boys in self-concept of ability, and, particularly, in relation to classroom effects. The analyses showed that girls performed better in literacy at all grade levels and that their literacy self-concept was higher than that of boys. Moreover, negative BFLP effect did not emerge in literacy among girls. By contrast, boys performed better than girls in mathematics in Grades 3 and 4, and their mathematics self-concept was higher than that of girls in all grades. However, among boys, small negative classroom effect was present in mathematics already in Grade 3 and continued to be evident in Grades 4 and 6, whereas among girls it emerged only in Grade 6. These findings extend our understanding of BFLP effect, first, by showing that average classroom performance has different effect on girls’ and boys’ self-concept of ability (see also Plieninger and Dickhäuser 2015; Preckel et al. 2008), and that the effect is dependent on the school subject. The results imply that boys start to compare their performance with their classmates earlier than girls and that the classroom context affects boys’ mathematics self-concept negatively already at an early phase in their school career.

The results highlight the relevant role of the classroom environment and the potential influences of teacher-student and peer relations and of practices of evaluation and feedback on students' self-concept of ability. Comparison with peers and apprehension about one’s own standing in a group are greater in educational settings which emphasize being better than others and where evaluation standards are comparative (Anderman and Midgley 1997). It is, therefore, essential that teachers’ beliefs and ensuing practices do not produce or reinforce competition and comparison among students but rather create a classroom atmosphere where students can experience success and satisfaction regardless of their ability level and feel a sense of personal accomplishment and growing mastery (see Bong and Skaalvik 2003). In addition to maintaining a collaborative classroom climate, negative classroom effects on academic self-concept may be lessened by well-developed teacher awareness of students’ individual needs and by competence adapting tasks and activities to accommodate these needs in order to match the students’ zone of proximal development (Pressley et al. 1996). Optimal differentiation and individuation is a challenging instructional task requiring both sensitivity to individual students’ academic, motivational and emotional needs and at the same time, attention to the whole group of students and classroom dynamics. This kind of responsive orchestration at its best requires small enough class sizes. Learning methods which encourage students to build autonomy and agency and provide feedback concerning their individual improvement seem to be one method, particularly in mathematics, to reduce social comparison processes (see also Marsh and Craven 2002).

Limitations

This study has at least five limitations that should be taken into account. First, a clear limitation of the study was that in contrast to Grades 4 and 6 in which the self-concept scales of literacy and math consisted of several items, in Grade 3 the self-concept of ability scales consisted of one item, which is not psychometrically optimal. However, similar measures have been used also in previous studies focusing on early grades (e.g., Lazarides et al. 2018; Viljaranta et al. 2014). Furthermore, the relatively high correlations found both for self-concept in literacy (across age correlations between .66 and .94) and for self-concept in math (across age/time correlations between .44 and .68) suggest that the measures succeeded in tapping self-concept comparably at the respective ages. With respect to Grade 3, the findings need to be replicated preferably by using a measure based on several items. Second, the math tests employed in the present study focused on mastery of basic arithmetic skills. Thus, the findings have to be replicated by using other math measures, such as those that focus on problem solving. Third, similar studies should be conducted also in countries with different education systems (e.g., in countries where tracking is more clearly present in schools and classrooms) to confirm whether the small evidence for BFLP effect that we found is due to the nonselective school system and not due to participants young age. Fourth, although we followed the same students from Grade 3 to Grade 6, it is important to emphasize that the results of the present study are based on correlational data collected at a single point in time. As such, we cannot suggest causal interpretations as a result of these models. Finally, all differences between girls and boys were not statistically significant according to the chi-square test, although the paths were typically significant only either for boys or girls. Therefore, studies with other samples are needed before generalizing the results concerning gender differences.

Our study extended previous research by examining BFLP effect already in early stage of school career. However, for fully understanding of BFLP effect, it would be important in future studies to examine also with different designs how students’ self-concept develops in classroom context. For example, it would be important to examine in more detail whether students in one class compare themselves with all classmates when forming their self-concept or whether they compare themselves mainly with a smaller reference groups such as friends or classmates of same gender.

Conclusions

The present study extends previous findings on the big-fish-little-pond effect by showing that in a nonselective school system average classroom performance has only a relatively small negative association with boys’ self-concept in mathematics already at an early phase of their school career. In literacy and among girls, the average classroom performance negative effect did not emerge until the end of primary school.