Abstract
In multilevel research on classroom instruction, individual student ratings are often aggregated to the class level in order to obtain a representative indicator of the classroom construct under study. Whether students within a class provide ratings consistent enough to justify aggregation, however, has not been the object of much research. Drawing on data from N = 9524 students from 391 classes who participated in the national extension to the PISA 2006 study in Germany, the interrater reliability and interrater agreement of student ratings of science instruction were examined. Results showed that students within a class tended to accurately and reliably rate various aspects of their science lessons. However, agreement among ratings was influenced by class size, learning time, school track, and science performance. In multiple regression analyses, science performance turned out to be of particular importance in accounting for differences in the homogeneity of ratings. The findings suggest that agreement among students’ perceptions of instruction should be a central consideration for researchers using aggregated measures to examine classroom teaching.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aleamoni, L. M. (1999): Student rating myths versus research facts from 1924 to 1998. In: Journal of Personnel Evaluation in Education, Vol. 13, pp. 153–166.
Baltes, B. B./ Parker, C. P. (2000): Reducing the effects of performance expectations on behavioral ratings. In: Organizational Behavior and Human Decision Processes, Vol. 82, pp. 237–267.
Bliese, P. D. (2000): Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. In: Klein, K. J./ Kozlowski, S. W. (Eds.): Multilevel theory, research, and methods in organizations. — San Francisco, pp. 349–381.
Brown, R. D./ Hauenstein, N. M. A. (2005): Interrater agreement reconsidered: An alternative to the rWG indices. In: Organizational Research Methods, Vol. 8, pp. 165–184.
Chan, D. (1998): Functional relations among constructs in the same context domain at different levels of analysis: A typology of composition models. In: Journal of Applied Psychology, Vol. 83, pp. 234–246.
Chi, M. T. H./ Feltovich, P./ Glaser, R. (1981): Categorization and representation of physics problems by experts and novices. In: Cognitive Science, Vol. 5, pp. 121–152.
Clausen, M. (2002): Unterrichtsqualität: Eine Frage der Perspektive? [Instructional quality: A matter of perspective?]. — Münster.
Cohen, J. (Ed.) (1988): Statistical power analysis for the behavioral sciences. — 2nd ed. — Hillsdale.
Den Brook P./ Brekelmans M./ Wubbels T. 2006 Multilevel Issues in Research using Students’ Perceptions of Learning Environments the Case of the Questionnaire on Teacher Interaction. In Learning Environment Research 9 pp. 199–213.
Greenwald, A. G. (1997): Validity concerns and usefulness of student ratings of instruction. In: American Psychologist, Vol. 52, pp. 1182–1186.
James, L. R./ Demaree, R. G./ Wolf, G. (1984): Estimating within-group interrater reliability with and without response bias. In: Journal of Applied Psychology, Vol. 69, pp. 85–98.
James, L. R./ Demaree, R. G./ Wolf, G. (1993): rWG: An assessment of within-group interrater agreement. In: Journal of Applied Psychology, Vol. 78, pp. 306–309.
Klein et al. 2001 = Klein, K. J./ Buhl conn, A./ Brent Smith, D./ Speer Sorra, J.) (2001: Is everyone in agreement? An exploration of within-group agreement in employee perceptions of the work environment. In: Journal of Applied Psychology, Vol. 86, pp. 3–16.
Koth, C. W./ Bradshaw, C. P./ Leaf, P. J. (2008): A multilevel study of predictors of student perceptions of school climate: The effect of classroom-level factors. In: Journal of Educational Psychology, Vol. 100, pp. 96–104.
Kruglanksi, A. W. (1989): The psychology of being ‘right’: The problem of accuracy in social perception and cognition. In: Psychological Bulletin, Vol. 106, pp. 395–409.
Kunter, M./ Baumert, J. (2006): Who is the expert? Construct and criteria validity of student and teacher ratings of instruction. In: Learning Environment Research, Vol. 9, pp. 231–251.
Lanahan et al. 2005 = Lanahan, L./ Mcgrath, D. J./ Mclaughlin, M./ Burian-Fitzgerald, M./ Sal-Ganik, L. (2005): Fundamental problems in the measurement of instructional processes: Estimating reasonable effect sizes and conceptualizing what is important to measure. — Washington.
Lebreton, J. M./ Senter, J. L. (in press): Answers to 20 questions about interrater reliability and interrater agreement. In: Organizational Research Methods.
Levy et al. 2003 = Levy, J./ den Brok, P./ Wubbels, T./ Brekelmans, M. (2003): Students’ perceptions of interpersonal aspects of the learning environment. In: Learning Environments Research, Vol. 6, pp. 5–36.
Lüdtke et al. 2005 = Lüdtke, O./ Köller, O./ Marsh, H. W./ Trautwein, U. (2005): Teacher frame of reference and the big-fish-little-pond effect. In: Contemporary Educational Psychology, Vol. 30, pp. 263–285.
Lüdtke et al. 2006 = Lüdtke, O./ Trautwein, U./ Kunter, M./ Baumert, J. (2006): Reliability and agreement of student ratings of the classroom environment: A reanalysis of TIMSS data. In: Learning Environment Research, Vol. 9, pp. 215–230.
Marsh, H. W. (1984): Students’ evaluations of university teaching: Dimensionality, reliability, validity, potential biases, and utility. In: Journal of Educational Psychology, Vol. 75, pp. 150–166.
Marsh, H. W./ Hau, K-T. (2007): Applications of latent-variable models in educational psychology: The need for methodological-substantive synergies. In: Contemporary Educational Psychology, Vol. 32, pp. 151–170.
Marsh, H. W./ Martin, A. J./ Jeng, J. H. S. (2008): A multilevel perspective in gender in classroom motivation and climate: Potential benefits of male teachers for boys? In: Journal of Educational Psychology, Vol. 100, pp. 78–95.
Marsh, H. W./ Rowe, K./ Martin, A. (2002): PhD students’ evaluations of research supervision: Issues, complexities and challenges in a nationwide Australian experiment in benchmarking universities. In: Journal of Higher Education, Vol. 73, pp. 313–348.
Mcgraw, K. O./ Wong, S. P. (1996): Forming inferences about some intraclass correlation coefficients. In: Psychological Methods, Vol. 1, pp. 30–46.
Miller, A. D./ Murdock, T. B. (2007): Modeling latent true scores to determine the utility of aggregate student perceptions as classroom indicators in HLM: The case of classroom goal structures. In: Contemporary Educational Psychology, Vol. 32, pp. 83–104.
OECD. (Ed.) (2007): PISA 2006 — Science competencies for tomorrow’s world. — Paris.
Prenzel et al. 2007 = Prenzel, M./ Carstensen, C./ Frey, A./ Drechsel, B./ Rönnebeck, S. (2007): PISA 2006 — Eine Einführung in die Studie [PISA 2006 — An introduction to the study]. In: Prenzel, M./ Artelt, C./ Baumert, J./ Blum, W./ Hammann, M./ Klieme, E./ Pekrun, R. (Hrsg.): PISA 2006. Die Ergebnisse der dritten internationalen Vergleichsstudie. — Münster, S. 31–59.
Prenzel, M./ Kramer, K./ Drechsel, B. (2002): Self-determined and interested learning in vocational education. In: Beck, K. (Ed.): Teaching-learning processes in vocational education. — Frankfurt a.M., pp. 43–68.
Sedlmeier, P. (2006): The role of scales in student ratings. In: Learning and Instruction, Vol. 16, pp. 401–415.
Seidel, T. (2006): The role of student characteristics in studying micro teaching-learning environments. In: Learning Environment Research, Vol. 9, pp. 253–271.
Seidel et al. 2007 = Seidel, T./ Prenzel, M./ Wittwer, J./ Schwindt, K. (2007): Unterricht in den Na-turwissenschaften [Science teaching]. In: Prenzel, M./ Artelt, C./ Baumert, J./ Blum, W./ Hammann, M./ Klieme, E./ Pekrun, R. (Hrsg.): PISA 2006. Die Ergebnisse der dritten internationalen Vergleichs-studie. — Münster. S. 147–179.
Shrout, P. E./ Fleiss, J. L. (1979): Intraclass correlations: Uses in assessing rater reliability. In: Psychological Bulletin, Vol. 86, pp. 420–428.
Stigler, J. W./ Gallimore, R./ Hiebert, J. (2000): Using video surveys to compare classrooms and teaching across cultures: Examples and lessons from the TIMSS video studies. In: Educational Psychologist, Vol. 35, pp. 87–100.
Urdan, T./ Midgley, C./ Anderman, E. M. (1998): The role of classroom goal structure in students’ use of self-handicapping strategies. In: American Educational Research Journal, Vol. 35, pp. 101–122.
Wittwer, J./ Senkbeil, M. (2008): Is students’ computer use at home related to their mathematical performance at school? In: Computers & Education, Vol. 50, pp. 1558–1571.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2009 VS Verlag für Sozialwissenschaften | GWV Fachverlage GmbH, Wiesbaden
About this chapter
Cite this chapter
Wittwer, J. (2009). What Influences the Agreement Among Student Ratings of Science Instruction?. In: Prenzel, M., Baumert, J. (eds) Vertiefende Analysen zu PISA 2006. VS Verlag für Sozialwissenschaften. https://doi.org/10.1007/978-3-531-91815-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-531-91815-0_11
Publisher Name: VS Verlag für Sozialwissenschaften
Print ISBN: 978-3-531-15929-4
Online ISBN: 978-3-531-91815-0
eBook Packages: Humanities, Social Science (German Language)