Skip to main content

Advertisement

Log in

Relationships Between the Way Students Are Assessed in Science Classrooms and Science Achievement Across Canada

  • Published:
Research in Science Education Aims and scope Submit manuscript

Abstract

Canadian students experience many different assessments throughout their schooling (O’Connor 2011). There are many benefits to using a variety of assessment types, item formats, and science-based performance tasks in the classroom to measure the many dimensions of science education. Although using a variety of assessments is beneficial, it is unclear exactly what types, format, and tasks are used in Canadian science classrooms. Additionally, since assessments are often administered to help improve student learning, this study identified assessments that may improve student learning as measured using achievement scores on a standardized test. Secondary analyses of the students’ and teachers’ responses to the questionnaire items asked in the Pan-Canadian Assessment Program were performed. The results of the hierarchical linear modeling analyses indicated that both students and teachers identified teacher-developed classroom tests or quizzes as the most common types of assessments used. Although this ranking was similar across the country, statistically significant differences in terms of the assessments that are used in science classrooms among the provinces were also identified. The investigation of which assessment best predicted student achievement scores indicated that minds-on science performance-based tasks significantly explained 4.21% of the variance in student scores. However, mixed results were observed between the student and teacher responses towards tasks that required students to choose their own investigation and design their own experience or investigation. Additionally, teachers that indicated that they conducted more demonstrations of an experiment or investigation resulted in students with lower scores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The discrepancy between the MANOVA results, which indicated differences in assessment practice among the provinces, and the three-level HLM interclass correlation, which indicated minimal provincial differences, may have been due to the large sample size which may have made small effects statistically significant (Tabachnick and Fidell 2013). This result is further supported by the low partial-eta squared values from each of the MANOVAs.

References

  • Abrahams, I., & Millar, R. (2008). Does practical work really work? A study of the effectiveness of practical work as a teaching and learning method in school science. International Journal of Science Education, 30(14), 1945–1969. https://doi.org/10.1080/09500690701749305.

    Article  Google Scholar 

  • American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME]. (2014). Standards for educational and psychological testing. Washington, DC: Author.

  • Barab, S. A., Gresalfi, M. S., & Ingram-Goble, A. (2010). Transformational play: using games to position person, content, and context. Educational Researcher, 39(7), 525–536 Retrieved from http://ase.tufts.edu/DevTech/courses/readings/Barab_Transformational_Play_2010.pdf.

    Article  Google Scholar 

  • Bennett, R. E., & Gitomer, D. H. (2009). Transforming K-12 assessment: integrating accountability testing, formative assessment and professional support. In C. Wyatt-Smith & J. J. Cumming (Eds.), Educational assessment in the 21st century (pp. 43–61). New York, NY: Springer.

    Chapter  Google Scholar 

  • Bennett, R.E., Persky, H., Weiss, A.R., and Jenkins, F. (2007). Problem solving in technology-rich environments: a report from the NAEP technology-based assessment project (NCES 2007–466). U.S. Department of Education. Washington, DC: National Center for Education Statistics. Retrieved from the Institute of Education Sciences website: http://nces.ed.gov/nationsreportcard/pubs/studies/2007466.asp.

  • Black, P., & Wiliam, D. (1998). Inside the black box: raising standards through classroom assessment. London: School of Education, King’s College.

    Google Scholar 

  • Chalmers, A. F. (1999). What is this thing called science? Indianapolis, IN: Hackett Publishing Company.

    Google Scholar 

  • Chu, M-W. (2017, March). Using computer simulated science laboratories: a test of pre-laboratory activities with the learning error and formative feedback model. Unpublished doctoral dissertation, University of Alberta, Edmonton.

  • Council of Ministers of Education, Canada [CMEC] (2013a). Pan-Canadian assessment program PCAP—2013 student questionnaire. Council of Ministers of Education, Canada. Toronto: Author. Retrieved from https://www.cmec.ca/docs/pcap/pcap2013/Student%20Questionnaire.pdf.

  • Council of Ministers of Education, Canada [CMEC] (2013b). Pan-Canadian assessment program PCAP—2013 teacher questionnaire. Council of Ministers of Education, Canada. Toronto: Author. Retrieved from https://www.cmec.ca/docs/pcap/pcap2013/Teacher%20Questionnaire.pdf.

  • Council of Ministers of Education, Canada [CMEC] (2014). Pan-Canadian assessment program 2013: report on the pan-Canadian assessment of science, reading, and mathematics. Council of Ministers of Education, Canada. Toronto: Author. Retrieved from http://cmec.ca/Publications/Lists/Publications/Attachments/337/PCAP-2013-Public-Report-EN.pdf.

  • Duncan, & Noonan. (2007). Factors affecting teachers’ grading and assessment practices. Alberta Journal of Educational Research, 53(1), 1–21 Retrieved from http://ajer.journalhosting.ucalgary.ca/index.php/ajer/article/view/602/585.

    Google Scholar 

  • Frontline (2014). The testing industry’s big four: profiles of the four companies that dominate the business of making and scoring standardized achievement tests. Retrieved from http://www.pbs.org/wgbh/pages/frontline/shows/schools/testing/companies.html.

  • Fung, K., & Chu, M.-W. (2015). Fairness of standardized assessments: discrepancy between provincial and territorial results. Journal of Contemporary Issues in Education, 10(1), 2–24. https://doi.org/10.20355/C5KG6P.

  • Gobert, J., Sao Pedro, M., Raziuddin, J., & Baker, R. (2013). From log files to assessment metrics for science inquiry using educational data mining. Journal of the Learning Sciences, 22(4), 521–563 Retrieved from http://slinq.org/projectfiles/pubs/GobertEtAlJLS2013.pdf.

    Article  Google Scholar 

  • Hodson, D. (1996). Laboratory work as scientific method: three decades of confusion and distortion. Jounral of Curriculum Studies, 28(2), 115–135. https://doi.org/10.1080/0022027980280201.

    Article  Google Scholar 

  • Hodson, D. (2003). Time for action: science education for an alternative future. International Journal of Science Education, 25(6), 645–670. https://doi.org/10.1080/09500690305021.

    Article  Google Scholar 

  • Hofstein, A., & Lunetta, V. N. (2003). The laboratory in science education: foundations for the twenty-first century. Science Education, 88(1), 28–54. https://doi.org/10.1002/sce.10106.

    Article  Google Scholar 

  • Leighton, J. P., Chu, M.-W., & Seitz, P. (2013). Cognitive diagnostic assessment and the learning errors and formative feedback (LEAFF) model. In R. Lissitz (Ed.), Informing the practice of teaching using formative and interim assessment: A systems approach (pp. 183–207). Charlotte: Information Age Publishing.

  • Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education/Praeger.

    Google Scholar 

  • Klinger, D. A., & Saab, H. (2012). Educational leadership in the context of low-stakes accountability: the Canadian perspective. In L. Volante (Ed.), School leadership in the context of standard-based reform: International perspective (pp. 69–94). New York, NY: Springer Science + Business Media.

    Chapter  Google Scholar 

  • Klinger, D., DeLuca, C., & Miller, T. (2008). The evolving culture of large-scale assessments in Canadian education. Canadian Journal of Educational Administration and Policy, 76, 1–34.

    Google Scholar 

  • Krathwohl, D. R. (2002). A revision of bloom’s taxonomy: an overview. Theory Into Practice, 41(4), 212–218 Retrieved from https://www.depauw.edu/files/resources/krathwohl.pdf.

    Article  Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Ma, J., & Nickerson, J. V. (2006). Hands-on, simulated, and remote laboratories: a comparative literature review. ACM Computing Surveys, 38(3), 7. https://doi.org/10.1145/1132960.1132961.

    Article  Google Scholar 

  • McMillan J. H. (2001). Fundamental assessment principles for teachers and school administrators. Practical Assessment, Research & Evaluation, 7(8). Retrieved from http://pareonline.net/getvn.asp?v=7&n=8.

  • National Research Council. (2006). America’s lab report: investigations in high school science. Committee on High School Science Laboratories: Role and Vision, In S. R. Singer, M. L. Hilton, and H. A. Schweingruber, (Eds.). Board on Science Education, Center for Education. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press. Retrieved from http://www.nap.edu/catalog/11311.html.

  • National Research Council. (2014). Developing assessments for the next generation science standards. In J. W. Pellegrino, M. R. Wilson, J. A. Koenig, & A. S. Beatty (Eds.), Division of behavioral and social sciences and education, Committee on Developing Assessments of Science Proficiency in K-12. Board on Testing and Assessment and Board on Science Education. Washington, DC: The National Academies Press Retrieved from http://www.nap.edu/catalog.php?record_id=18409.

    Google Scholar 

  • Next Generation Science Standards Lead States. (2013). Next generation science standards: for states, by states. Washington, DC: The National Academies Press Retrieved from http://www.nap.edu/catalog.php?record_id=18290.

    Google Scholar 

  • O’Connor, K. (2011). 15 fixes for broken grades (Canadian edition). Toronto, ON: Pearson Canada.

  • Organization for Economic Co-operation and Development [OECD] (2017). PISA 2015 assessment and analytical framework: science, reading, mathematic, financial literacy and collaborative problem solving, OECD Publishing, Paris. Retrieved from https://doi.org/10.1787/9789264281820-en.

  • PhET. (2017). PhET interactive simulations: research. Retrieved from https://phet.colorado.edu/en/research.

  • Popham, W. J. (2011). Classroom assessment: what teachers need to know (6th ed.). Boston: Pearson.

    Google Scholar 

  • Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: applications and data analysis methods (Second Edition). Newbury Park, CA: Sage.

  • Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29(7), 4–14 Retrieved from http://nepc.colorado.edu/files/TheRoleofAssessmentinaLearningCulture.pdf.

    Article  Google Scholar 

  • Shute, V. J., & Ventura, M. (2013). Measuring and supporting learning in games: stealth assessment. Cambridge, MA: Massachusetts Institute of Technology Press Retrieved from http://myweb.fsu.edu/vshute/pdf/white.pdf.

    Book  Google Scholar 

  • Shute, V., Leighton, J. P., Jang, E. E., & Chu, M.-W. (2016). Advances in the science of assessment. Educational Assessment, 21(1), 34–59. https://doi.org/10.1080/10627197.2015.1127752.

    Article  Google Scholar 

  • Snijders, T. A. B., & Bosker, R. J. (2012) Multilevel analysis: an introduction to basic and advanced multilevel modeling (Second Edition). London: Sage Publishers.

  • Supovitz, J. (2009). Can high-stakes testing leverage educational improvement? Prospects from the last decade of testing and accountability reform. Journal of Educational Change, 10(1), 211–227. https://doi.org/10.1007/s10833-009-9105-2.

    Article  Google Scholar 

  • Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics. Boston: Pearson/Allyn & Bacon.

    Google Scholar 

  • Volante, L., & Jaafar, S. B. (2008). Educational assessment in Canada. Assessment in Education: Principles, Policy, & Practice, 15(2), 201–210. Retrieved from. https://doi.org/10.1080/09695940802164226.

    Article  Google Scholar 

  • Wainer, H. (1990). Introduction and history. In H. Wainer, N. J. Dorans, R. Flaugher, B. F. Green, R. J. Mislevy, L. Steinberg, & D. Thissen (Eds.), Computerized adaptive testing: a primer (pp. 1–21). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Zenisky, A. L., & Sireci, S. G. (2002). Technological innovations in large-scale assessment. Applied Measurement in Education, 15(4), 337–362. https://doi.org/10.1207/S15324818AME1504_02.

    Article  Google Scholar 

Download references

Acknowledgements

Preparation of this paper was supported by Council of Ministers of Education Canada (CMEC). CMEC encourages researchers to express freely their professional judgment. This paper, therefore, does not necessarily represent the positions or the policies of CMEC and no official endorsement should be inferred.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Man-Wai Chu.

Appendices

Appendix A

PCAP-2013 Student and Teacher Survey Questionnaire Assessment Items

Student Questionnaire (CMEC, 2013a)

Assessment Types

  1. 1.

    How often do you do the following in your science class? (Four-point Likert scale: 1=never, 2=rarely, 3=sometimes, and 4=often)

  1. a)

    Write tests or quizzes.

Science Performance-Based Tasks

  1. 2.

    How often do you do the following in your science class? (Four-point Likert scale: 1=never, 2=rarely, 3=sometimes, and 4=often)

  1. a)

    Watch the teacher do experiments as demonstrations.

  2. b)

    Do experiments following the instructions of the teacher or textbook.

  3. c)

    Choose your own investigations.

  4. d)

    Design an investigation to test your own ideas.

  5. e)

    Explain your ideas or solutions to other students.

  6. f)

    Spend time doing science activities or investigations.

Teacher Questionnaire (CMEC, 2013b)

Assessment Types

  1. 3.

    In the science class selected for PCAP-2013, how often are students assessed in the following ways? (Four-point Likert scale: 1=never, 2=rarely, 3=sometimes, and 4=often)

  1. a)

    Common school-wide tests or assessments

  2. b)

    Teacher-developed classroom tests

  3. c)

    Student portfolios and/or journals

  4. d)

    Individual student assignments/projects

  5. e)

    Group assignments/projects

  6. f)

    Homework

  7. g)

    Performance assessment (e.g., design a research project, an investigation or a machine)

Item Formats

  1. 4.

    In your teacher-developed science tests/examinations, how often do you use the following kinds of items or questions? (Four-point Likert scale: 1=never, 2=rarely, 3=sometimes, and 4=often)

  1. a)

    Selected-response items (e.g., true/false, multiple choice)

  2. b)

    Short-response items (e.g., one or two words, facts, short sentences)

  3. c)

    Extended-response items requiring an explanation or justification

  4. d)

    Performance assessment (e.g., design a research project, an investigation or a machine)

Science Performance-Based Tasks

  1. 5.

    To what extent do you ask the students to do the following during science instruction in the science class selected for PCAP-2013? (Four-point Likert scale: 1=not at all, 2=a little, 3=more than a little, and 4=a lot)

  1. a)

    Observe natural phenomena and describe what they see

  2. b)

    Watch you demonstrate an experiment or investigation

  3. c)

    Formulate their own questions for investigations

  4. d)

    Design ways to seek answers to their own questions

  5. e)

    Design or plan experiments or investigations

  6. f)

    Conduct experiments or investigations

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chu, MW., Fung, K. Relationships Between the Way Students Are Assessed in Science Classrooms and Science Achievement Across Canada. Res Sci Educ 50, 791–812 (2020). https://doi.org/10.1007/s11165-018-9711-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11165-018-9711-1

Keywords

Navigation