16.1 Introduction

Dutch primary and secondary schools use a variety of tests, each with its own function. In this chapter, we will focus on the most important mathematics tests, describe the primary function of each of these tests, and explain how the tests are used for accountability. In the concluding discussion section, we will identify the difficulties associated with testing mathematics in the Netherlands.

Testing in the Netherlands focusses on monitoring whether or not educational objectives have been achieved and whether the students have mastered content-specific standards. These standards play an important role in allowing Dutch schools to operate relatively autonomously and design their own programs. The standards for different points in a student’s educational career are formulated by SLO (2008), the Netherlands Institute for Curriculum Development, under supervision of the Ministry of Education. Schools need to use these standards as guidelines for setting up the content of their educational programme. So, what schools have to teach is determined, but schools can choose how they work towards the objectives. A sample of mathematics objectives to be achieved at the end of primary education is shown in Fig. 16.1.

Fig. 16.1
figure 1

Sample of objectives of primary mathematics

In addition to the relatively broad educational objectives that have been in place for quite some time, more detailed content standards have recently been introduced for basic competencies. As the minimum proficiency level of basic skills in the Dutch language and mathematicsFootnote 1 in secondary education, and particularly in teacher training programs, were considered too low, the Dutch government introduced content standards: the so-called ‘Referentieniveaus’ (Reference standards) for Dutch language and arithmetic (Expertgroep Doorlopende Leerlijnen voor Taal en Rekenen/wiskunde, 2007). These standards are described for the main transition points in the Dutch educational system: end of primary education, end of secondary education, and end of vocational education. For each transition point, a foundation level (1F, 2F, and 3F) and an ambition level (1S, 2S, and 3S) are specified. All students in a particular school type or track should be able to master the foundation level, while a substantial percentage of students should also be able to master the more challenging ambition level.

Finally, there is a set of standards in the guiding material and test specifications for the construction of national tests and examinations. In so-called syllabi, more detailed descriptions are given of the objectives. These syllabi are specified by the College voor Toetsen en Examens (CvTE)Footnote 2 for all subjects in secondary education and for mathematics and the Dutch language in primary education. They contain examples of potential examination problems to indicate both the difficulty level and the content of the national examinations and tests. Part of the mathematics syllabus for the pre-university level of secondary education in the domain of algebra is shown in Fig. 16.2.

Fig. 16.2
figure 2

Part of the mathematics syllabus for the pre-university level of secondary education for the domain of algebra

The objectives, syllabi, and content standards together form the base for testing mathematics in the Netherlands. With this framework in mind, we will now describe the different tests used in primary and secondary education.

16.2 Testing Mathematics in the Netherlands

16.2.1 Dutch Education System

Figure 16.3 shows the main elements of the Dutch education system. Primary education includes eight years, starting with two kindergarten years. Children can go to school at the age of four. From the age of five, school is mandatory.

Fig. 16.3
figure 3

The Dutch school system

Students finish primary education around age twelve and enter secondary education. Secondary education is tracked into three school types:

  • VMBO: Pre-vocational secondary education, duration 4 years, subdivided in different levels

  • HAVO: General secondary education, duration 5 years

  • VWO: Pre-university secondary education, duration 6 years.

Hereafter, students can go to different levels of further education:

  • MBO: Intermediate vocational education, duration 1–4 years, subdivided in different levels

  • HBO: Higher professional education (also called ‘universities of applied sciences’)

  • University.

At the end of each school level students have to reach particular achievement standards for mathematics/arithmetic (Fig. 16.3).

16.2.2 Primary Education

The main objective in primary education, meant for students aged 4–12 is that students (1) gain, gradually and in meaningful contexts, familiarity with numbers, measures, shapes, structures, and their appropriate relationships and calculations; (2) learn to use the language of mathematics; and (3) are able to deal with various sources of content, including daily life, other courses, and pure mathematics (OCW, 2015).Footnote 3

At the end of primary education, teachers advise students on their secondary education track. To confirm this advice, schools are obliged to administer a test in Dutch language and mathematics. Schools can choose between a number of different tests. When the teacher recommends a lower educational track than that indicated by the test, the teacher’s advice can be reconsidered. The end of primary school test also measures whether students have mastered the foundational (1F) or ambition (1S) level standards for mathematics (and Dutch language).

In addition to the test’s primary function of indicating a secondary education track or verifying the teacher’s recommendation, the aggregated test results for all students can also be used to diagnose areas of improvement for the school (Béguin & Ehren, 2010). For example, they can be used to determine which subjects require more attention and to determine whether measures for improvement have been effective. Most schools use the End Primary School Test, developed by Cito, the Netherlands national institute for educational measurement. The CvTE is mandated by the government of the Netherlands to ensure the quality and proper administration of these national tests and examinations.

To monitor the development of primary students in a more formative way, a large number of schools uses a monitoring and evaluation system. One commonly-used system is LOVSFootnote 4 developed by Cito. This system contains tests for different subject domains and sub-domains (e.g., Dutch vocabulary and spelling, and mathematics) for Grade 1–6, with assessments twice a year. There is also a system for pre-schoolers (4- and 5-year-old children) for Dutch language and mathematics. The monitoring system for primary school mathematics is a mixture of mostly open-ended items covering different domains. Each assessment results in an ability score.

Because all mathematics tests in the monitoring system are correlated to each other, teachers can compare test results to those of a previously administered test to monitor student growth. The tests are standardised across the country, enabling teachers to compare individual or class test results and growth with the national average. In addition to indicating a student’s overall mathematics ability, the tests also provide information for further analysis. For example, the teacher can analyse whether a student scores very poorly or very high in specific areas. Is the result for the sub-domain Numbers and Operations relatively low and for the sub-domain measurement high, then this could indicate that numbers and operations require additional attention.

The Cito Entrance Test for Grades 4–5 with an assessment once a year is an alternative to the student monitoring system. This test uses a multiple-choice format. It provides a complete overview of the student’s skills in mathematics as well as in different sub-domains of Dutch language. In Grade 5, the Cito Entrance Test also provides information to indicate the appropriate secondary education track. All the aforementioned tests are also suitable for students with special educational needs.

The Cito LOVS does not assess mathematical fluency (quickly and correctly solving problems). Therefore, schools use several other tests to monitor this aspect of mathematics.

Along with the national standardised tests from Cito and other test providers, schools use other tests for mathematics such as the tests included in textbooks, various exercises, and (digital) test systems.

Appendix A shows some examples of the type of items which are incorporated in the Cito End Primary School Test and the Cito LOVS tests.

16.2.3 Secondary Education

Mathematics is taught in different ways in the different secondary education tracks. In the first few years of VMBO, the lower tracks of secondary education, the focus is on acquiring insight and skills in the sub-domains of numbers and operations, shapes and figures, quantities and measures, patterns, relations, and functions. Because of the vocational focus of this secondary education track, it is important to provide contexts in which mathematics can be applied: contexts related to everyday life, other subjects, further education, the workplace, and mathematics itself. In the later years of VMBO, mathematics is only a compulsory subject in the technical sectors; for other students it is an optional subject.

In the higher secondary education tracks, covering HAVO and VWO, of which the highest grades are subdivided into the profiles Nature & Technology, Nature & Health, Economy & Society, and Culture & Society, mathematics is a compulsory subject.Footnote 5 There are different mathematics courses targeted at different profiles:

  • Mathematics A, targeted at the Society profiles but also permissible for students in the Nature & Health profile; the focus is more on using mathematical methods and on applications of mathematics.

  • Mathematics B, targeted at the Nature profiles and compulsory for students in the Nature & Technology profile; the focus is more on the abstract nature of mathematics.

  • Mathematics C, exclusively for students in pre-university education in the Culture & Society profile; the course has some overlap with Mathematics A.

  • Mathematics D, a supplementary mathematics course in the specialised or optional component of their profile, for students already taking Mathematics B. Schools are not required to offer a Mathematics D course.

Secondary education ends with a final examination in each subject. For most subjects, the final examination comprises a school examination and a national examination; some subjects, such as physical education, only have a school examination. The school examination is prepared by the individual school and is administered in the final school year or years. Tests can be written, oral, and practical. The national final examination is the same for all schools of a certain type and takes place at the same time in all schools. The student’s final mark in a subject is the average of the marks in the school and national examinations. In the Appendices C, D, and E, examples of examination items for the various mathematics courses are shown. These items illustrate the significant differences among the mathematics courses.

The national final examinations in the Netherlands are developed by Cito under the supervision of the CvTE. In the lower secondary educational track (VMBO), there are three national final mathematics examinations, differing in level. These examinations exist in both a paper-based and a computer-based version. In the higher secondary educational tracks (HAVO and VWO), there are, as mentioned before, national final mathematics examinations for Mathematics A, B, and C for each school level. Mathematics D has only a school examination. All the examinations are exclusively paper-based.

For secondary education, there are also monitoring and evaluation systems available for mathematics. An example of such a system is the Cito Monitoring System Secondary Education. This system contains four tests which can be administered over the first three years of secondary education. Students can be evaluated on a vertical equated scale (Béguin & Ehren, 2010). Schools have to monitor student progress in a standardised way, but can choose (or develop) their own system of tests. In addition to these standardised tests, secondary schools use—similar to primary schools—other tests as well, such as those prepared by the teacher.

16.3 Function of Tests

The previous section of this chapter describes different types of mathematics test. In this section, we will describe the different functions of tests, followed by an outline of functions for the most commonly used tests. Tests can have four different functions: to evaluate and adjust instruction, to evaluate proficiency and make decisions about students, to evaluate proficiency and make decisions about classes and schools, and to evaluate proficiency and make decisions about the quality of the educational system.

16.3.1 Tests to Evaluate and Adjust Instruction

Tests, especially formative tests, ensure that instruction can be adjusted to the students. Tests are designed to provide information not only about the general level of the students but also about student development. Ideally, teachers can use test results to diagnose the specific help or instruction that students need. Examples of tests for evaluating and adjusting instruction are textbook tests and student monitoring systems. The goal of a textbook test is to assess whether students have mastered specific content. When a student answers (almost) all questions correctly, the teacher knows he or she can go on in the textbook. The goal of monitoring systems is to indicate students’ current ability levels and growth. These systems contain questions at different levels and in all categories. Teachers can use them to identify specific students who need more instruction or practice and which sub-domains need more attention. In primary education, the student monitoring systems do not aim to classify students. In practice, here the tests are used to identify students who need extra attention or extra challenges. Both in primary and secondary school the tests of the student monitoring systems are also used to choose a secondary education track.

16.3.2 Tests to Evaluate Proficiency and Make Decisions About Students

Tests can also be used to evaluate students’ proficiency and make decisions about students. Naturally, these two functions are related. In order to make decisions about a student, the teacher has to figure out whether the student meets the requirements for his or her grade level. This indicates a direction for student’s future education.

There are four types of tests for evaluating proficiency and making decisions about students, specifically:

  • Tests for selecting students. An example is the examination a student has to pass in order to be admitted to further education, such as succeeding in the national examination for HAVO or VWO, with special requirements regarding the subjects that have been chosen, as a condition for acceptance to higher education.

  • Tests for classifying students. Examples are the end of primary school tests. The results of the tests indicate what type of secondary education is best suited for a student.

  • Tests for placement. An example is placement in special education. The results of the student monitoring systems are one indicator used to place a student in special education. For special education placements, these results must show that a student’s growth is below the growth one might expect for a student at a particular age.

  • Tests for certification. The best-known certification test in the Netherlands is the national examination at the end of secondary education.

16.3.3 Tests to Evaluate Proficiency and Make Decisions About Classes and Schools

Tests to evaluate proficiency of students can also be used to evaluate classes and schools. Class growth is central in making decisions about classes. When making these decisions several questions come up. What is the relationship between an increase in ability of a class and the past scores of this class? How is the increase in ability of a class compared to the national increase? But it is also possible to compare the current improvement with previous increases in ability within one school population. How does the improvement of this year’s Grade 2 class compare to that of last year’s Grade 2? The Cito LOVS incorporates these analyses. Appendix B illustrates and explains a trend analysis at school level.

16.3.4 Tests to Evaluate Proficiency and Make Decisions About the Quality of Education

Schools, school organisations, and also the education inspectorate can evaluate the quality of education. National and international assessments are used to evaluate the quality of education.

An example of a national assessment carried out by Cito is PPON.Footnote 6 This assessment is used to evaluate primary school education in detail every five years. Information from this study is used by content experts and decision makers (Béguin & Ehren, 2010). The last PPON for mathematics, carried out in 2011, evaluated 22 different mathematical sub-domains (Scheltens, Vermeulen, & Van Weerden, 2013). In 2014, the responsibility for PPON-like national assessment shifted from the Ministry of Education to the Inspectorate of Education. This change of responsibility will lead to some differences in approach, but the necessity of a national assessment is beyond dispute.

Examples of international assessments are PISA and TIMSS. PISA (the Programme for International Student Assessment) takes place every three years and compares the knowledge and abilities of 15-year-olds in reading, mathematics, and science (Kordes, Bolsinova, Limpens, & Stolwijk, 2013).

The Netherlands also participates in TIMSS (Trends in International Mathematics and Science Study). TIMSS takes place every three years in Grade 4 and 8 and assesses mathematical and science skills. Like in PISA, Dutch students score on average significantly higher than the international average (Meelissen et al., 2012). Table 16.1 summarises the different functions of the most commonly used mathematics tests.

Table 16.1 Outline of functions of the most commonly used mathematics tests

16.4 Use of Tests for Accountability

In the Netherlands, test scores are important for educational accountability. In addition to test evaluations, schools are evaluated by school inspectors who visit the schools. As the Inspectorate of Education is required by law to assess the educational quality that schools offer (including whether the school offers a safe learning environment to students), tests and annual reports are assumed to measure the quality of the school’s educational process. The inspectorate uses test scores to identify low-quality schools. Schools that have declining test scores or low test scores over a period of three years are considered to be failing or at risk of failing (Béguin & Ehren, 2010).

Based on the summative or formative function of the tests, it can be assumed that they are valid for measuring the proficiency of an individual student. However, this is not necessarily the case for the aggregated results that are used to indicate educational quality at the school level. Two aspects are important. First, aggregated results as an indicator can misrepresent educational quality if parts of the curriculum are not represented in the tests at hand. For example, the student monitoring tests for primary education do not contain rather open problems in which the student is asked to combine different (mathematical) skills to reach a solution. Nevertheless, a relatively low score in the student monitoring test can still be validly interpreted as a potential lack of quality. Second, one can argue that a test that is a valid measurement of individual students must have different characteristics and content than a test that measures schools (Béguin & Ehren, 2010).

16.4.1 Primary Education

Until recently, the Inspectorate of Education used interim results on the student monitoring system and an end of primary school test as indicators to evaluate the proficiency of primary schools. A new framework for accountability has been available since 2016. This framework focusses on how schools use their test results. The inspectorate no longer sets standards for the interim results of the student monitoring system, but standards are still used for the end of primary school tests (OCW, 2016).

16.4.2 Secondary Education

Since 2016, the Inspectorate of Education has used indicators to judge the quality of a secondary school. First, the inspectorate compares the level of third-year secondary students (Grade 9) to the secondary school track advice that was given at the end of primary school. Next, the inspectorate looks at the percentage of students that pass the first year of secondary school without delay and the percentage of students that pass the last part of secondary school without delay. Finally, the results of the national examination are taken into account. These indicators are compared to a standard established by the inspectorate. The combination of the values achieved on these indicators form a score for the school as a whole. Each of the components contributes to this score and overall it is a balanced system (OCW, 2015). The basic idea of this system of judgement is that schools might do better at one component but worse at another and that this compensates. So if, for example, a school challenges students to achieve a higher level of education than advised, the average scores of these students on the national final examinations can potentially be lower than the scores of students who follow the advised level of secondary school. This will affect the school’s indicator for results on the national examinations. Also, it is possible that these students might even need an extra year to finish their secondary education.

16.5 Discussion

16.5.1 Content-Related Issues

16.5.1.1 Testing with or Without Context

Dutch mathematical education has a strong tradition of Realistic Mathematics Education (RME). Mathematics has to be learned in meaningful situations. In the last ten years, a group of experts in mathematics education has advocated for more attention to learning algorithms, teaching fixed procedures for every operation, and teaching mathematics in less meaningful situations. This group has written their own primary education textbooks. As schools have autonomy, they are free to use an RME-based textbook or a mechanistic algorithm-based textbook (or something between the two). This also has consequences for the tests. Today’s assessments contain context problems as well as bare number problems. Nevertheless, schools may vary in the attention they pay to bare number problems and context problems. Therefore, it is possible that there are differences in the extent to which the assessments measure what is actually taught in the school.

Another point about RME is that, in problems that relate to real situations, students are faced with more complex situations in which different mathematical competences have to be combined. In tests, however, different competences are tested in isolation. This is partly because tests have to determine whether there are any gaps in mathematical skills. In order to determine this, it is necessary that each question focusses on one particular competence. This is because in more complex computational problems, the outcome is less clear and analysis is more difficult for teachers, making the results less reliable.

16.5.1.2 Should Mathematics be a Compulsory Subject?

In the Dutch educational system, in the lower grades of secondary education all students at each level must do mathematics, but this does not continue through the end of secondary education. In the pre-university secondary school track (VWO), all students are required to do mathematics. For the other levels, mathematics is not obligatory. So, the system requires that pre-university students know about mathematical relations and be able to do some mathematical thinking at a certain level, but for the majority of secondary students, mathematics is an elective. One could ask oneself what this means for society as a whole: will this lead to a social gap (or an increase in an existing gap) between university-educated citizens and others?

16.5.2 Use of Test Scores

Almost all tests, whether monitoring tests, diagnostic tests, or examinations, provide information about student progress towards content standards. In all these cases, mathematical ability is expressed as a value, for example, an ability score. To ascertain whether a student has obtained a content standard, these standards are connected to an ability score. This is a convenient and effective way to access whether a student has attained a particular standard. A disadvantage of this procedure is that mathematical ability is squeezed into one value. If a student scores strongly in one domain, this may compensate for a weakness in another domain. Therefore, students may pass a certain content standard according the test without mastering the specific goals of all the reference standards because they exceed the standards in some other domains. A passing score, therefore, should always be considered into the light of a domain analysis. If a student scores equally (well) in all domains, it may be concluded with reasonable certainty that he or she has mastered the skills described in the reference standards. If the student scores relatively poorly in one or several domains, it is advisable to review the points from the ambition level reference standards in order to establish whether there are gaps to work on with the student.

16.5.3 Use of Tests

16.5.3.1 Autonomy Versus Control

Schools in the Netherlands have the freedom to organise their own teaching programme. As a consequence, they have to account for their choices, for example to the school inspectors. This accountability policy places pressure on schools; they are busy fulfilling all necessary requirements. As a result, autonomy is not what schools experience. By focussing on controlling what schools do, and therefore on collecting test data, there is the risk that the tests partly prescribe the content of the teaching programme. Schools feel that they are judged by the results of the tests, and so they will try to achieve the highest scores. For some schools, this means that the tests determine what they emphasise in their teaching. In these cases, the school does not autonomously decide what they offer their students, but, to put it bluntly, the teaching programme is dictated by the tests.

16.5.3.2 Resistance Against Testing

As mentioned above, schools experience a lot of pressure from testing. Since primary education assessment occurs twice a year for about six subjects, it takes two weeks a year to administer these tests. In addition to that, the use of the student monitoring system is often seen as ‘testing for the school inspectors or the school board’ rather than a monitoring system for students. It is very counterproductive to use these tests for accountability.

Another type of resistance is against tests for pre-schoolers. As most Dutch children enter primary school at the age of 4 (attending school is required from age 5 on), there is a monitoring system for these young students, too. For mathematics, these tests measure some elementary knowledge of numbers, such as the ability to count, adding small collections of objects, and knowledge of mathematics-related terms such as long(er), short(er), first, and last. Like all monitoring tests, these tests are also ability tests. As a result, the test contains questions that the average student can do well, but it also includes questions designed for lower- and higher-than-average-level students. Since the test includes questions for the more proficient students, some of these assignments demand more than strictly required for the average-level goals. Especially when dealing with young children, this calls for a lot of discussions with teachers. On the one hand, they want to give their students an experience of success, thus, they want their students to pass as many questions as possible. On the other hand, teachers often talk about the importance of play for children aged 4 or 5. Some teachers find that, in order to cover all the topics that are in the tests, they cannot let the students play as much as they think is necessary for their students’ development. The message that students are also allowed to make mistakes in the monitoring tests is a difficult one and has been insufficiently communicated to schools.

16.5.3.3 Teaching-to-the-Test

The main goal of the (monitoring) tests is to monitor the development of students in order to adjust instruction to their potential and needs. The use of test results to assess the quality of schools is of minor importance. In actual practice, however, it seems that the main purpose of the monitoring tests is for external parties to assess school quality. The result of this is that schools, against all advice, adapt their teaching to the tests and have their students practise for the test. The consequence of this is that the expectations of the Inspectorate of Education rise further, since the average assessment test scores increase. The fact that the average assessment score increases does not mean that mathematics proficiency has automatically increased. In this case, the higher assessment scores are the result of more frequent passes of certain parts of the test, not an indicator that teaching methods in mathematics have improved overall. To fairly determine student ability, it is necessary to update the tests frequently. However, this is (very) expensive. In the future, adaptive testing, in which questions in each test are different for different students, may be one solution. Teaching aimed at specific test problems would then be less feasible for schools. Furthermore, schools should be encouraged to keep in mind the real purpose of the tests.

Teaching-to-the-test is a phenomenon that occurs not just with monitoring tests, but also, for example, with end of school tests and examinations. In these cases, however, it is less ‘helpful’ for schools because these types of test are updated annually. Therefore, teaching to specific assignments is not possible, and, in fact, it is never advisable.

16.5.3.4 Misuse of Tests

An end of primary school test is administered to ascertain whether a student has successfully completed the curriculum in order to decide whether he or she is ready for a certain type of secondary education. Monitoring tests serve a different purpose. They are, as mentioned, primarily meant to steer teaching efficiently towards students’ abilities. However, as ministerial policy on secondary school advice changes, there is a danger that monitoring tests could be given a different, more serious function than their original purpose. In 2015, both the time set for administering the end of primary school test and the aims of this test changed. Before 2015, it was meant to be an objective test, indicating a direction for a student along with teacher advice. The test was administered in February, before students had to choose a secondary school. A large number of secondary schools required a minimum score on the end of primary school test for admission. This made this test a very important one for students and their parents. To avoid misuse of the test, the government decided to move the time for this test to after students have registered for secondary school. Now, teacher advice is the primary factor in secondary school choice, and students can change their choice only when they get a higher than expected score on the end of primary school test. The result of this is that some secondary schools now require minimum scores on the students’ monitoring test progress results—again, a misuse, in this case of the monitoring tests instead of the end of school tests. Schools should use the monitoring tests only as a means of diagnosis and not as a selection tool. A positive development is that the results of the monitoring tests are no longer part of the inspectorate’s evaluation framework. As a result, the emphasis in schools moves to the primary goal: namely, identifying students’ capabilities and challenges.

16.5.3.5 One Test, Different Functions

On a related point, attention should be given to the use of a test for more than one purpose. Different types of tests each have their own goal and contribute to the quality of Dutch mathematics instruction in their own way. One is aimed at informing teachers and schools about student ability, while another test provides information about the school as a whole, and again other tests aim to determine the national level. As can be seen in Table 16.1, many tests are used for more than one purpose. In order to facilitate accurate assessments, each test should have its own goal(s) and, moreover, that goal (or those goals) should be clear for all parties involved. Only in this way can tests be used for the intended purpose, that is, as a means of improving education. Ultimately, all tests serve this purpose. Whether a test is meant to tune teaching to the needs of students or to determine the quality of teaching, all tests should finally contribute to the best possible education to prepare students for their future as much as possible.