Effekte der Verbalisierung von Ratingskalen auf die Messqualität

Menold, Natalja

doi:10.1007/978-3-658-24517-7_4

Natalja Menold⁹

Part of the book series: Schriftenreihe der ASI - Arbeitsgemeinschaft Sozialwissenschaftlicher Institute ((SASI))

4313 Accesses
2 Citations

Zusammenfassung

Ratingskalen sind ein wesentlicher Bestandteil von Fragebogen. Der Grad der Verbalisierung und die Nutzung von numerischen Etiketten sind zentrale Merkmale, welche die Messeigenschaften einer Ratingskala festlegen. Im Beitrag werden unterschiedliche experimentelle Studien zu den Effekten der Verbalisierung von Ratingskalen auf die Messqualität und den kognitiven Prozess der Befragten vorgestellt. Über verschiedene Studien hinweg, in denen unterschiedliche Inhalte, Modi der Datenerhebung (online vs. paper-and-pencil) und Stichproben (Studierende vs. heterogene Erwachsenenstichproben) verwendet wurden, war die Messqualität für verbale sieben-kategoriale Ratingskalen stabil hoch. Bei numerischen Etikettierungen war hingegen die Messqualität beeinträchtigt, was sich mit einem erhöhten kognitiven Aufwand erklären ließ. Die Daten zwischen Ratingskalen mit unterschiedlichem Grad der Verbalisierung waren nicht messäquivalent. Als Implikation empfiehlt sich, verbale sieben-kategoriale Ratingskalen zu verwenden und numerische Etikettierungen zu vermeiden. Unterschiede in Ratingskalen schränken die Vergleichbarkeit der Daten ein, was bei vergleichenden Analysen berücksichtigt werden sollte.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Literatur

Aichholzer, J., & Zeglovits, E. (2014). Balancierte Kurzskala autoritärer Einstellungen (B-RWA-6). In D. Danner & A. Glöckner-Rist (Hrsg.), Zusammenstellung sozialwissenschaftlicher Items und Skalen. Mannheim: GESIS. Retrieved from http://zis.gesis.org/pdf/Dokumentation/Aichholzer+%20Balancierte%20Kurzskala%20autoritaerer%20Einstellungen%20%28B-RWA-6%29.pdf
Altemeyer, B. (1981). Right-Wing Authoritarianism. Winnipeg: University of Manitoba Press.
Google Scholar
Alwin, D. F. (2007). Margins of Error: A Study of Reliability in Survey Measurement. New York: John Wiley & Sons, Inc.
Google Scholar
Alwin, D. F., & Krosnick, J. A. (1991). The Reliability of Survey Attitude Measurement: The Influence of Question and Respondent Attributes. Sociological Methods & Research, 20(1), 139–181. https://doi.org/10.1177/0049124191020001005
Article Google Scholar
Andrews, F. M. (1984). Construct Validity and Error Components of Survey Measures: A Structural Modeling Approach. Public Opinion Quarterly, 48(2), 409–442. https://doi.org/10.1086/268840
Article Google Scholar
Barrett, R. S., Taylor, E. K., Parker, J. W., & Martens, L. I. (1958). Rating Scale Content: Scale Information and Supervisory Ratings. Personnel Psychology, 11(3), 333–346. https://doi.org/10.1111/j.1744-6570.1958.tb00021.x
Article Google Scholar
Beierlein, C., Asbrock, F., Kauff, M., & Schmidt, P. (2014). Die Kurzskala Autoritarismus (KSA-3): Ein ökonomisches Messinstrument zur Erfassung dreier Subdimensionen autoritärer Einstellungen: Zusammenstellung sozialwissenschaftlicher Items und Skalen. In D. Danner & A. Glöckner-Rist (Hrsg.), Zusammenstellung sozialwissenschaftlicher Items und Skalen. Mannheim: GESIS. Retrieved from http://www.gesis.org/fileadmin/kurzskalen/working_papers/KSA3_WorkingPapers_2014-35.pdf
Bollen, K. A. (1989). Structural Equations with Latent Variables. New York: Wiley.
Book Google Scholar
Boote, A. S. (1981). Reliability Testing of Psychographic Scales. Journal of Advertising Research, 21(5), 53–60.
Google Scholar
Braun, M. (2006). Funktionale Äquivalenz in interkulturell vergleichenden Umfragen: Mythos und Realität. Mannheim: ZUMA.
Google Scholar
Breyer, B. (2015). Left-Right Self-Placement (ALLBUS). Zusammenstellung sozialwissenschaftlicher Items und Skalen. In D. Danner & A. Glöckner-Rist (Hrsg.), Zusammenstellung sozialwissenschaftlicher Items und Skalen. Mannheim: GESIS. Retrieved from http://zis.gesis.org/pdf/Dokumentation/Breyer%20Left-Right%20Self-Placement.pdf.
Byrne, B. (2011). Structural Equation Modeling with Mplus: Basic Concepts, Applications, and Programming (Multivariate Applications). London: Taylor & Francis.
Google Scholar
Churchill, G. A., & Peter, J. P. (1984). Research Design Effects on the Reliability of Rating Scales: a Meta-Analysis. Journal of Marketing Research, 21(4), 360–375. https://doi.org/10.2307/3151463
Article Google Scholar
Cronbach, L. J. (1951). Coefficient Alpha and the Internal Structure of Tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1007/bf02310555
Article Google Scholar
Davidov, E., Schmidt, P., & Schwartz, S. H. (2008). Bringing Values Back in the Adequacy of the European Social Survey to Measure Values in 20 Countries. Public Opinion Quarterly, 72(3), 420-445. https://doi.org/10.1093/poq/nfn035
Article Google Scholar
Faulbaum, F., Prüfer, P., & Rexroth, M. (2009). Was ist eine gute Frage? Die systematische Evaluation der Fragenqualität. Wiesbaden: VS Verlag.
Book Google Scholar
Finn, R. H. (1972). Effects of Some Variations in Rating Scale Characteristics on the Means and Reliabilities of Ratings. Educational and Psychological Measurement, 32(2), 255–265. https://doi.org/10.1177/001316447203200203
Article Google Scholar
Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2004). Survey Methodology. Hoboken, NJ: Wiley.
Google Scholar
Groves, R. M., & Lyberg, L. E. (2010). Total Survey Error: Past, Present, and Future. Public Opinion Quarterly, 74(5), 849–879. https://doi.org/10.1093/poq/nfq065
Article Google Scholar
Kane, M. T. (2013). Validating the Interpretations and Uses of Test Scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000
Article Google Scholar
Krebs, D., & Hoffmeyer-Zlotnik, J. H. P. (2010). Positive First or Negative First? Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 6(3), 118–127. https://doi.org/10.1027/1614-2241/a000013
Article Google Scholar
Krosnick, J. A., & Alwin, D. F. (1987). An Evaluation of a Cognitive Theory of Response-Order Effects in Survey Measurement. Public Opinion Quarterly, 51(2), 201–219. https://doi.org/10.1086/269029
Article Google Scholar
Krosnick, J. A., & Berent, M. K. (1993). Comparisons of Party Identification and Policy Preferences: The Impact of Survey Question Format. American Journal of Political Science, 37(3), 941–964. https://doi.org/10.2307/2111580
Article Google Scholar
Krosnick, J. A., & Fabrigar, L. R. (1997). Designing Rating Scales for Effective Measurement in Surveys. In L. E. Lyberg, P. Biemer, M. Collins, E. de Leeuw, C. Dippo, N. Schwarz, & D. Trewin (Hrsg.), Survey Measurement and Process Quality (S. 141–164). Hoboken, NJ, USA: John Wiley & Sons, Inc.
Google Scholar
Krosnick, J. A., & Presser, S. (2009). Question and Questionnaire Design. In P. V. Marsden & J. D. Wright (Hrsg.), Handbook of Survey Research (S. 263–313). Bingley: Emerald Group Publishing Limited.
Google Scholar
Lau, M. Y.-K. (2007). Extreme Response Style: An Empirical Investigation of the Effects of Scale Response Format and Fatigue (Doctoral dissertation).
Google Scholar
Leue, A., & Lange, S. (2011). Reliability Generalization: An Examination of the Positive Affect and Negative Affect Schedule. Assessment, 18(4), 487–501.
Article Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Reading: Addison-Wesley.
Google Scholar
Lund, I., Lundeberg, T., Sandberg, L., Budh, C., Kowalski, J., & Svensson, E. (2005). Lack of Interchangeability between Visual Analogue and Verbal Rating Pain Scales: a Cross Sectional Description of Pain Etiology Groups. BMC Medical Research Methodology, 5(1), 5-31. https://doi.org/10.1186/1471-2288-5-31.
Maitland, A. (2009). How Many Scale Points Should I Include for Attitudinal Questions? Survey Practice, 6. Retrieved from AAPOR e-journal.
Google Scholar
McDonald, R. P. (1999). Test Theory. A Unified Treatment. Mahwah, NJ: Erlbaum.
Google Scholar
Mellenbergh, G. J. (1989). Item Bias and Item Response Theory. International Journal of Educational Research, 13(2), 127–143. https://doi.org/10.1016/0883-0355(89)90002-5
Article Google Scholar
Menold, N., Kaczmirek, L., Lenzner, T., & Neusar, A. (2014). How Do Respondents Attend to Verbal Labels in Rating Scales? Field Methods, 26(1), 21–39. https://doi.org/10.1177/1525822x13508270
Article Google Scholar
Menold, N., & Kemper, C. (2015). The Impact of Frequency Rating Scale Formats on the Measurement of Latent Variables in Web Surveys - An Experimental Investigation Using a Measure of Affectivity as an Example. Psihologija, 48(4), 431–449. https://doi.org/10.2298/psi1504431m
Article Google Scholar
Menold, N. (2017). Rating Scale Labeling in Online Surveys: An Experimental Comparison of Verbal and Numeric Rating Scales with Respect to Measurement Quality and Respondents’ Cognitive Processes. Sociological Methods and Research. https://doi.org/10.1177/0049124117729694
Menold, N., & Tausch A. (2016). Measurement of Latent Variables with Different Rating Scales: Reliability and Measurement Equivalence Test by Varying the Number of Categories and Verbalization. Sociological Methods and Research, 45(4), 678-699. doi: dx.doi.org/https://doi.org/10.1177/0049124115583913.
Moors, G., Kieruj, N. D., & Vermunt, J. K. (2014). The Effect of Labeling and Numbering of Response Scales on the Likelihood of Response Bias. Sociological Methodology, 44(1), 369-399. https://doi.org/10.1177/0081175013516114.
Article Google Scholar
Muthén, L. K., & Muthén, B. O. (2014). Mplus User’s Guide. Los Angeles, CA: Muthén & Muthén.
Google Scholar
Parducci, A. (1982). Category Ratings: Still More Context Effects. In B. Wegener (Hrsg.), Social Attitudes and Psychological Measurement (S. 89–105). Hillsdale, NJ: Erlbaum.
Google Scholar
Parducci, A. (1983). Category Ratings and the Relational Character of Judgment. In H.-G. Geissler, H. F. J. M. Bulfart, E. L. H. Leeuwenberg, & Sarris V. (Hrsg.), Modern Issues in Perception (S. 262–282). Berlin: VEB Deutscher Verlag der Wissenschaften.
Chapter Google Scholar
Peters, D. L., & McCormick, E. J. (1966). Comparative Reliability of Numerically Anchored Versus Job-Task Anchored Rating Scales. Journal of Applied Psychology, 50(1), 92–96. https://doi.org/10.1037/h0022935
Article Google Scholar
Rammstedt, B., Beierlein, C., Brähler, E., Eid, M., Hartig, J., Kersting, M., Liebig, S., Lukas, J., Mayer, A.-K., Menold, N., Schupp, J., & Weichselgartner, E. (2014). Qualitätsstandards zur Entwicklung, Anwendung und Bewertung von Messinstrumenten in der sozialwissenschaftlichen Umfrageforschung. Schmollers Jahrbuch, 134(4), 517–546. https://doi.org/10.3790/schm.134.4.517
Raykov, T., & Marcoulides, G. A. (2011). Introduction to Psychometric Theory. New York: Taylor & Francis.
Google Scholar
Rohrmann, B. (1978). Empirische Studien zur Entwicklung von Antwortskalen für die sozialwissenschaftliche Forschung. Zeitschrift für Sozialpsychologie, 9(3), 222-245.
Google Scholar
Saris, W. E., & Gallhofer, I. N. (2007). Design, Evaluation, and Analysis of Questionnaires for Survey Research. Hoboken, NJ: Wiley.
Google Scholar
Schnell, R., Hill, P. B., & Esser, E. (2011). Methoden der empirischen Sozialforschung (9. Auflage). München: Oldenbourg.
Google Scholar
Schweizer, K. (2011). On the Changing Role of Cronbach’s α in the Evaluation of the Quality of a Measure. European Journal of Psychological Assessment, 27(3), 143–144. https://doi.org/10.1027/1015-5759/a000069
Article Google Scholar
Sturgis, P., Roberts, C., & Smith, P. (2014). Middle Alternatives Revisited: How the neither/nor Response Acts as a Way of Saying “I Don’t Know”? Sociological Methods & Research, 43(1), 15–38. https://doi.org/10.1177/0049124112452527
Article Google Scholar
Toepoel, V., & Dillman, D. A. (2011). Words, Numbers, and Visual Heuristics in Web Surveys: Is There a Hierarchy of Importance? Social Science Computer Review, 29(2), 193–207. https://doi.org/10.1177/0894439310370070
Article Google Scholar
Tourangeau, R., Couper, M. P., & Conrad, F. G. (2007). Color, Labels, and Interpretive Heuristics for Response Scales. Public Opinion Quarterly, 71(1), 91–112. https://doi.org/10.1093/poq/nfl046
Article Google Scholar
Tourangeau, R., Rips, L. J., & Rasinski, K. A. (2000). The Psychology of Survey Response. Cambridge: Cambridge Univ. Press.
Google Scholar
Van de Schoot, R., Schmidt, P., De Beuckelaer, A., Lek, K., & Zondervan-Zwijnenburg, M. (2015). Editorial: Measurement Invariance. In R. van de Schoot, P. Schmidt, & A. De Beuckelaer (Hrsg.), Measurement Invariance. Lausanne: Frontiers Media. Retrieved from http://journal.frontiersin.org/article/10.3389/fpsyg.2015.01064 (p. 1064). https://doi.org/10.3389/fpsyg.2015.01064
Visser, P. S., Bizer, G. Y., & Krosnick, J. A. (2006). Exploring the Latent Structure of Strength-Related Attitude Attributes. In M. P. Zanna (Hrsg.), Advances in Experimental Social Psychology (S. 1-68). New York: Academic Press.
Google Scholar
Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and Validation of Brief Measures of Positive and Negative Affect: The PANAS Scales. Journal of Personality and Social Psychology, 54(6), 1063–1070. https://doi.org/10.1037/0022-3514.54.6.1063
Article Google Scholar
Weijters, B., Cabooter, E., & Schillewaert, N. (2010). The Effect of Rating Scale Format on Response Styles: The Number of Response Categories and Response Category Labels. International Journal of Research in Marketing, 27(3), 236–247. https://doi.org/10.1016/j.ijresmar.2010.02.004
Article Google Scholar
Weng, L.-J. (2004). Impact of the Number of Response Categories and Anchor Labels on Coefficient Alpha and Test-Retest Reliability. Educational and Psychological Measurement, 64(6), 956–972. https://doi.org/10.1177/0013164404268674
Article Google Scholar
Wild, K.-P., & Schiefele, U. (1994). Lernstrategien im Studium: Ergebnisse zur Faktorenstruktur und Reliabilität eines neuen Fragebogens. Zeitschrift für Differentielle und Diagnostische Psychologie, 15(4), 185–200.
Google Scholar
Windschitl, P. D., & Wells, G. L. (1996). Measuring Psychological Uncertainty: Verbal versus Numeric Methods. Journal of Experimental Psychology: Applied, 2(4), 343–364. https://doi.org/10.1037//1076-898x.2.4.343
Google Scholar

Download references

Author information

Authors and Affiliations

Mannheim, Deutschland
Natalja Menold

Authors

Natalja Menold
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Natalja Menold .

Editor information

Editors and Affiliations

GESIS – Leibniz-Institut für Sozialwissenschaften, Mannheim, Germany
Natalja Menold
FAU Erlangen-Nürnberg, Nürnberg, Germany
Tobias Wolbring

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Menold, N. (2019). Effekte der Verbalisierung von Ratingskalen auf die Messqualität. In: Menold, N., Wolbring, T. (eds) Qualitätssicherung sozialwissenschaftlicher Erhebungsinstrumente. Schriftenreihe der ASI - Arbeitsgemeinschaft Sozialwissenschaftlicher Institute. Springer VS, Wiesbaden. https://doi.org/10.1007/978-3-658-24517-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-658-24517-7_4
Published: 30 December 2018
Publisher Name: Springer VS, Wiesbaden
Print ISBN: 978-3-658-24516-0
Online ISBN: 978-3-658-24517-7
eBook Packages: Social Science and Law (German Language)

Publish with us

Policies and ethics