Skip to main content

Toward More Sensible Achievement Measurement: A Retrospective

  • Chapter
Alternative Approaches to the Assessment of Achievement

Part of the book series: Evaluation in Education and Human Services ((EEHS,volume 16))

  • 127 Accesses

Abstract

As a graduate student some twenty years ago, I carried Gulliksen’s Theory of Mental Tests (1950) around with me like a bible; I nearly had it committed to memory before my comprehensive examinations. Yet I always had an uneasy feeling that despite its elegance, there was something off about classical test theory as a theory of measurement. When I stumbled occasionally upon alternatives like Guttman scaling and Loevinger’s homogeneity analysis, something clicked; I resonated to the underlying implications of these approaches to measurement. But they were rather obscure procedures, and, although others claimed otherwise, I could never produce very scalable sets of items. So I carried on business as usual — even to this very day, I can’t resist the urge to calculate KR20 whenever I am presented with a body of test data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Allen, M.J., & Yen, W.M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.

    Google Scholar 

  • Bentler, P.M. (1971). Monotonicity analysis: An alternative to linear factor and test analysis. In D.R. Green, M.P. Ford & G.B. Flamer (Eds.), Measurement and Piaget. New York: McGraw Hill.

    Google Scholar 

  • Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296–322.

    Google Scholar 

  • Cornfield, J. & Tukey, J.W. (1956). Average values of mean squares in factorials. Annals of Mathematical Statistics. 27, 907–949.

    Article  Google Scholar 

  • Cox, D.R. (1954). The design of an experiment in which certain treatment arrangements are inadmissible. Biometrika, 40, 287–295.

    Google Scholar 

  • Cronbach, L.J. (1947). Test “reliability”: its meaning and determination. Psychometrika, 12, 1–16.

    Article  Google Scholar 

  • Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.

    Article  Google Scholar 

  • Cronbach, L.J., Rajaratnam, N., & Glaser, G.C. (1963). Theory of generalizability: A liberation of reliability theory. British Journal of Statistical Psychology, 16, 137–163.

    Article  Google Scholar 

  • Cronbach, L.J., Glaser, G.C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements. New York: John Wiley & Sons.

    Google Scholar 

  • Festinger, L. (1947). The treatment of qualitative data by “scale analysis.” Psychological Bulletin, 44, 149–161.

    Article  Google Scholar 

  • Ghiselli, E.E. (1964). Theory of psychological measurement. New York: McGraw Hill.

    Google Scholar 

  • Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18, 519–521.

    Article  Google Scholar 

  • Glaser, R. (1981). The future of testing: A research agenda for cognitive psychology and psychometrics. American Psychologist, 36, 923–936.

    Article  Google Scholar 

  • Gulliksen, H. (1945). The relation of item difficulty and inter-item correlation to test variance and reliability. Psychometrika, 10, 79–91.

    Article  Google Scholar 

  • Gulliksen, H. (1950). Theory of mental tests. New York: John Wiley & Sons.

    Book  Google Scholar 

  • Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139–150.

    Article  Google Scholar 

  • Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282.

    Article  Google Scholar 

  • Hambleton, R.K. & Cook, L.L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement, 14, 75–96.

    Article  Google Scholar 

  • Harnisch, D.L., & Linn, R.L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18, 133–146.

    Article  Google Scholar 

  • Horst, P. (1953). Correcting the Kuder-Richardson reliability for dispersion of item difficulties. Psychological Bulletin, 50, 371–374.

    Article  Google Scholar 

  • Horst, P. (1966). Psychological measurement and prediction. Belmont, CA: Wadsworth.

    Google Scholar 

  • Hoyt, C. (1941). Test reliability estimated by analysis of variance. Psychometrika, 6, 153–160.

    Article  Google Scholar 

  • Kelley, T.L. (1924). Statistical methods. New York: Macmillan.

    Google Scholar 

  • Kuder, G.F., & Richardson, M.W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151–160.

    Article  Google Scholar 

  • Loevinger, J. (1947). A systematic approach to the construction and evaluation of tests of ability. Psychological Monographs, 61(4), Whole No. 285.

    Google Scholar 

  • Loevinger, J. (1948). The technic of homogeneous tests compared with some aspects of “ scale analysis” and factor analysis. Psychological Bulletin, 45, 507–529.

    Article  Google Scholar 

  • Loevinger, J. (1954). The attenuation paradox in test theory. Psychological Bulletin, 51, 493–504.

    Article  Google Scholar 

  • Loevinger, J. (1965). Person and population as psychometric concepts. Psychological Review, 72, 143–155.

    Article  Google Scholar 

  • Lord, E.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, N.J.: Lawrence Erlbaum Associates.

    Google Scholar 

  • Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley.

    Google Scholar 

  • Lumsden, J. (1961). The construction of unidimensional tests. Psychological Bulletin, 58, 122–131.

    Article  Google Scholar 

  • Lumsden, J. (1976). Test theory. In M.R. Rosenzweig & L.W. Porter (Eds.), Annual Review of Psychology (Volume 27). Palo Alto, CA: Annual Reviews, Inc.

    Google Scholar 

  • Magnusson, D. (1967). Test theory. Reading, Mass.: Addison-Wesley.

    Google Scholar 

  • Maxwell, A.E. (1959). A statistical approach to scalogram analysis. Educational and Psychological Measurement, 19, 337–349.

    Article  Google Scholar 

  • Menzel, H. (1953). A new coefficient for scalogram analysis. Public Opinion Quarterly, 17, 268–280.

    Article  Google Scholar 

  • Miller, M.D. (in press). Measuring between-group differences in instruction. Journal of Educational Measurement.

    Google Scholar 

  • Novick, M.R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3, 1–8.

    Article  Google Scholar 

  • Popham, W.J., & Husek, T.R. (1969). Implications of criterion-referenced measurement. Journal of Educational Measurement, 6, 1–9.

    Article  Google Scholar 

  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press, (reprinted 1980).

    Google Scholar 

  • Rulon, P.J. (1939). A simplified procedure for determining the reliability of a test by split-halves. Harvard Educational Review, 9, 99–103.

    Google Scholar 

  • Sagi, P.C. (1959). A statistical test for the significance of a coefficient of reproducibility. Psychometrika, 24, 19–27.

    Article  Google Scholar 

  • Sato, T. (1980). The S-P chart and the caution index. NEC (Nippon Electric Company) Educational Information Bulletin. Japan: Computer and Communication Systems Research Laboratories.

    Google Scholar 

  • Schuessler, K.F. (1961). A note on statistical significance of scalogram. Sociometry, 24, 312–318.

    Article  Google Scholar 

  • Spearman, C. (1910). Correlation calculated with faulty data. British Journal of Psychology, 3, 271–295.

    Google Scholar 

  • Stevens, S.S. (1951). Mathematics, measurement, and psychophysics. In S.S. Stevens (Ed.) Handbook of experimental psychology. New York: Wiley.

    Google Scholar 

  • Tatsuoka, M.M. (1978). Recent psychometric developments in Japan: Engineers grapple with educational measurement problems. Paper presented at the Office of Naval Research Contractor’s Meeting on Individualized Measurement, Columbus, Missouri.

    Google Scholar 

  • TenHouten, W.D. (1969). Scale gradient analysis: A statistical method for constructing and evaluating Guttman scales. Sociometry, 32, 80–98.

    Article  Google Scholar 

  • Torgerson, W.S. (1958). Theory and methods of scaling. New York: John Wiley and Sons.

    Google Scholar 

  • Traub, R.E., & Wolfe, R.G. (1981). Latent trait theories and the assessment of educational achievement. In D.C. Berliner (Ed.), Review of Research in Education (Volume 9). American Education Research Association.

    Google Scholar 

  • Tryon, R.C. (1957). Reliability and behavior domain validity: Reformulation and historical critique. Psychological Bulletin, 54, 229–249.

    Article  Google Scholar 

  • Walker, D.A. (1931). Answer-pattern and score-scatter in tests and examinations, British Journal of Psychology, 20, 73–86.

    Google Scholar 

  • Walker, D.A. (1936). Answer-pattern and score-scatter in tests and examinations. British Journal of Psychology, 26, 301–308.

    Google Scholar 

  • Walker, D.A. (1940). Answer-pattern and score-scatter in tests and examinations. British Journal of Psychology, 30, 248–260.

    Google Scholar 

  • Wright, B.D. (1968). Sample-free test calibration and person measurement. In Proceedings of the 1967 Invitation Conference on Testing Problems. Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Wright, B.D., & Stone, M.H. (1979). Best test design. Chicago: Mesa Press.

    Google Scholar 

  • Yule, G.U. (1912). On the methods of measuring association between two attributes. Journal of the Royal Statistical Soiety, 75, 579–642.

    Article  Google Scholar 

  • Yule, G.U. (1922). An introduction to the theory of statistics. London: Charles Griffin and Co.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1987 Kluwer Academic Publishers

About this chapter

Cite this chapter

McArthur, D.L. (1987). Toward More Sensible Achievement Measurement: A Retrospective. In: McArthur, D.L. (eds) Alternative Approaches to the Assessment of Achievement. Evaluation in Education and Human Services, vol 16. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-3257-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-94-009-3257-9_2

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-7961-7

  • Online ISBN: 978-94-009-3257-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics