Toward More Sensible Achievement Measurement: A Retrospective

McArthur, David L.

doi:10.1007/978-94-009-3257-9_2

David L. McArthur PhD⁴

Part of the book series: Evaluation in Education and Human Services ((EEHS,volume 16))

127 Accesses

Abstract

As a graduate student some twenty years ago, I carried Gulliksen’s Theory of Mental Tests (1950) around with me like a bible; I nearly had it committed to memory before my comprehensive examinations. Yet I always had an uneasy feeling that despite its elegance, there was something off about classical test theory as a theory of measurement. When I stumbled occasionally upon alternatives like Guttman scaling and Loevinger’s homogeneity analysis, something clicked; I resonated to the underlying implications of these approaches to measurement. But they were rather obscure procedures, and, although others claimed otherwise, I could never produce very scalable sets of items. So I carried on business as usual — even to this very day, I can’t resist the urge to calculate KR20 whenever I am presented with a body of test data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allen, M.J., & Yen, W.M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.
Google Scholar
Bentler, P.M. (1971). Monotonicity analysis: An alternative to linear factor and test analysis. In D.R. Green, M.P. Ford & G.B. Flamer (Eds.), Measurement and Piaget. New York: McGraw Hill.
Google Scholar
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296–322.
Google Scholar
Cornfield, J. & Tukey, J.W. (1956). Average values of mean squares in factorials. Annals of Mathematical Statistics. 27, 907–949.
Article Google Scholar
Cox, D.R. (1954). The design of an experiment in which certain treatment arrangements are inadmissible. Biometrika, 40, 287–295.
Google Scholar
Cronbach, L.J. (1947). Test “reliability”: its meaning and determination. Psychometrika, 12, 1–16.
Article Google Scholar
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
Article Google Scholar
Cronbach, L.J., Rajaratnam, N., & Glaser, G.C. (1963). Theory of generalizability: A liberation of reliability theory. British Journal of Statistical Psychology, 16, 137–163.
Article Google Scholar
Cronbach, L.J., Glaser, G.C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements. New York: John Wiley & Sons.
Google Scholar
Festinger, L. (1947). The treatment of qualitative data by “scale analysis.” Psychological Bulletin, 44, 149–161.
Article Google Scholar
Ghiselli, E.E. (1964). Theory of psychological measurement. New York: McGraw Hill.
Google Scholar
Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18, 519–521.
Article Google Scholar
Glaser, R. (1981). The future of testing: A research agenda for cognitive psychology and psychometrics. American Psychologist, 36, 923–936.
Article Google Scholar
Gulliksen, H. (1945). The relation of item difficulty and inter-item correlation to test variance and reliability. Psychometrika, 10, 79–91.
Article Google Scholar
Gulliksen, H. (1950). Theory of mental tests. New York: John Wiley & Sons.
Book Google Scholar
Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139–150.
Article Google Scholar
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282.
Article Google Scholar
Hambleton, R.K. & Cook, L.L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement, 14, 75–96.
Article Google Scholar
Harnisch, D.L., & Linn, R.L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18, 133–146.
Article Google Scholar
Horst, P. (1953). Correcting the Kuder-Richardson reliability for dispersion of item difficulties. Psychological Bulletin, 50, 371–374.
Article Google Scholar
Horst, P. (1966). Psychological measurement and prediction. Belmont, CA: Wadsworth.
Google Scholar
Hoyt, C. (1941). Test reliability estimated by analysis of variance. Psychometrika, 6, 153–160.
Article Google Scholar
Kelley, T.L. (1924). Statistical methods. New York: Macmillan.
Google Scholar
Kuder, G.F., & Richardson, M.W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151–160.
Article Google Scholar
Loevinger, J. (1947). A systematic approach to the construction and evaluation of tests of ability. Psychological Monographs, 61(4), Whole No. 285.
Google Scholar
Loevinger, J. (1948). The technic of homogeneous tests compared with some aspects of “ scale analysis” and factor analysis. Psychological Bulletin, 45, 507–529.
Article Google Scholar
Loevinger, J. (1954). The attenuation paradox in test theory. Psychological Bulletin, 51, 493–504.
Article Google Scholar
Loevinger, J. (1965). Person and population as psychometric concepts. Psychological Review, 72, 143–155.
Article Google Scholar
Lord, E.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, N.J.: Lawrence Erlbaum Associates.
Google Scholar
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley.
Google Scholar
Lumsden, J. (1961). The construction of unidimensional tests. Psychological Bulletin, 58, 122–131.
Article Google Scholar
Lumsden, J. (1976). Test theory. In M.R. Rosenzweig & L.W. Porter (Eds.), Annual Review of Psychology (Volume 27). Palo Alto, CA: Annual Reviews, Inc.
Google Scholar
Magnusson, D. (1967). Test theory. Reading, Mass.: Addison-Wesley.
Google Scholar
Maxwell, A.E. (1959). A statistical approach to scalogram analysis. Educational and Psychological Measurement, 19, 337–349.
Article Google Scholar
Menzel, H. (1953). A new coefficient for scalogram analysis. Public Opinion Quarterly, 17, 268–280.
Article Google Scholar
Miller, M.D. (in press). Measuring between-group differences in instruction. Journal of Educational Measurement.
Google Scholar
Novick, M.R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3, 1–8.
Article Google Scholar
Popham, W.J., & Husek, T.R. (1969). Implications of criterion-referenced measurement. Journal of Educational Measurement, 6, 1–9.
Article Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press, (reprinted 1980).
Google Scholar
Rulon, P.J. (1939). A simplified procedure for determining the reliability of a test by split-halves. Harvard Educational Review, 9, 99–103.
Google Scholar
Sagi, P.C. (1959). A statistical test for the significance of a coefficient of reproducibility. Psychometrika, 24, 19–27.
Article Google Scholar
Sato, T. (1980). The S-P chart and the caution index. NEC (Nippon Electric Company) Educational Information Bulletin. Japan: Computer and Communication Systems Research Laboratories.
Google Scholar
Schuessler, K.F. (1961). A note on statistical significance of scalogram. Sociometry, 24, 312–318.
Article Google Scholar
Spearman, C. (1910). Correlation calculated with faulty data. British Journal of Psychology, 3, 271–295.
Google Scholar
Stevens, S.S. (1951). Mathematics, measurement, and psychophysics. In S.S. Stevens (Ed.) Handbook of experimental psychology. New York: Wiley.
Google Scholar
Tatsuoka, M.M. (1978). Recent psychometric developments in Japan: Engineers grapple with educational measurement problems. Paper presented at the Office of Naval Research Contractor’s Meeting on Individualized Measurement, Columbus, Missouri.
Google Scholar
TenHouten, W.D. (1969). Scale gradient analysis: A statistical method for constructing and evaluating Guttman scales. Sociometry, 32, 80–98.
Article Google Scholar
Torgerson, W.S. (1958). Theory and methods of scaling. New York: John Wiley and Sons.
Google Scholar
Traub, R.E., & Wolfe, R.G. (1981). Latent trait theories and the assessment of educational achievement. In D.C. Berliner (Ed.), Review of Research in Education (Volume 9). American Education Research Association.
Google Scholar
Tryon, R.C. (1957). Reliability and behavior domain validity: Reformulation and historical critique. Psychological Bulletin, 54, 229–249.
Article Google Scholar
Walker, D.A. (1931). Answer-pattern and score-scatter in tests and examinations, British Journal of Psychology, 20, 73–86.
Google Scholar
Walker, D.A. (1936). Answer-pattern and score-scatter in tests and examinations. British Journal of Psychology, 26, 301–308.
Google Scholar
Walker, D.A. (1940). Answer-pattern and score-scatter in tests and examinations. British Journal of Psychology, 30, 248–260.
Google Scholar
Wright, B.D. (1968). Sample-free test calibration and person measurement. In Proceedings of the 1967 Invitation Conference on Testing Problems. Princeton, NJ: Educational Testing Service.
Google Scholar
Wright, B.D., & Stone, M.H. (1979). Best test design. Chicago: Mesa Press.
Google Scholar
Yule, G.U. (1912). On the methods of measuring association between two attributes. Journal of the Royal Statistical Soiety, 75, 579–642.
Article Google Scholar
Yule, G.U. (1922). An introduction to the theory of statistics. London: Charles Griffin and Co.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Student Testing, Evaluation and Standards, Graduate School of Education, University of California Los Angeles, Los Angeles, CA, 90024, USA
David L. McArthur PhD

Authors

David L. McArthur PhD
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Student Testing, Evaluation and Standards, Graduate School of Education, University of California Los Angeles, Los Angeles, CA, 90024, USA
David L. McArthur PhD

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

McArthur, D.L. (1987). Toward More Sensible Achievement Measurement: A Retrospective. In: McArthur, D.L. (eds) Alternative Approaches to the Assessment of Achievement. Evaluation in Education and Human Services, vol 16. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-3257-9_2

Download citation

DOI: https://doi.org/10.1007/978-94-009-3257-9_2
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-7961-7
Online ISBN: 978-94-009-3257-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics