Abstract
As a graduate student some twenty years ago, I carried Gulliksen’s Theory of Mental Tests (1950) around with me like a bible; I nearly had it committed to memory before my comprehensive examinations. Yet I always had an uneasy feeling that despite its elegance, there was something off about classical test theory as a theory of measurement. When I stumbled occasionally upon alternatives like Guttman scaling and Loevinger’s homogeneity analysis, something clicked; I resonated to the underlying implications of these approaches to measurement. But they were rather obscure procedures, and, although others claimed otherwise, I could never produce very scalable sets of items. So I carried on business as usual — even to this very day, I can’t resist the urge to calculate KR20 whenever I am presented with a body of test data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allen, M.J., & Yen, W.M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.
Bentler, P.M. (1971). Monotonicity analysis: An alternative to linear factor and test analysis. In D.R. Green, M.P. Ford & G.B. Flamer (Eds.), Measurement and Piaget. New York: McGraw Hill.
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296–322.
Cornfield, J. & Tukey, J.W. (1956). Average values of mean squares in factorials. Annals of Mathematical Statistics. 27, 907–949.
Cox, D.R. (1954). The design of an experiment in which certain treatment arrangements are inadmissible. Biometrika, 40, 287–295.
Cronbach, L.J. (1947). Test “reliability”: its meaning and determination. Psychometrika, 12, 1–16.
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
Cronbach, L.J., Rajaratnam, N., & Glaser, G.C. (1963). Theory of generalizability: A liberation of reliability theory. British Journal of Statistical Psychology, 16, 137–163.
Cronbach, L.J., Glaser, G.C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements. New York: John Wiley & Sons.
Festinger, L. (1947). The treatment of qualitative data by “scale analysis.” Psychological Bulletin, 44, 149–161.
Ghiselli, E.E. (1964). Theory of psychological measurement. New York: McGraw Hill.
Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18, 519–521.
Glaser, R. (1981). The future of testing: A research agenda for cognitive psychology and psychometrics. American Psychologist, 36, 923–936.
Gulliksen, H. (1945). The relation of item difficulty and inter-item correlation to test variance and reliability. Psychometrika, 10, 79–91.
Gulliksen, H. (1950). Theory of mental tests. New York: John Wiley & Sons.
Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139–150.
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282.
Hambleton, R.K. & Cook, L.L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement, 14, 75–96.
Harnisch, D.L., & Linn, R.L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18, 133–146.
Horst, P. (1953). Correcting the Kuder-Richardson reliability for dispersion of item difficulties. Psychological Bulletin, 50, 371–374.
Horst, P. (1966). Psychological measurement and prediction. Belmont, CA: Wadsworth.
Hoyt, C. (1941). Test reliability estimated by analysis of variance. Psychometrika, 6, 153–160.
Kelley, T.L. (1924). Statistical methods. New York: Macmillan.
Kuder, G.F., & Richardson, M.W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151–160.
Loevinger, J. (1947). A systematic approach to the construction and evaluation of tests of ability. Psychological Monographs, 61(4), Whole No. 285.
Loevinger, J. (1948). The technic of homogeneous tests compared with some aspects of “ scale analysis” and factor analysis. Psychological Bulletin, 45, 507–529.
Loevinger, J. (1954). The attenuation paradox in test theory. Psychological Bulletin, 51, 493–504.
Loevinger, J. (1965). Person and population as psychometric concepts. Psychological Review, 72, 143–155.
Lord, E.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, N.J.: Lawrence Erlbaum Associates.
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley.
Lumsden, J. (1961). The construction of unidimensional tests. Psychological Bulletin, 58, 122–131.
Lumsden, J. (1976). Test theory. In M.R. Rosenzweig & L.W. Porter (Eds.), Annual Review of Psychology (Volume 27). Palo Alto, CA: Annual Reviews, Inc.
Magnusson, D. (1967). Test theory. Reading, Mass.: Addison-Wesley.
Maxwell, A.E. (1959). A statistical approach to scalogram analysis. Educational and Psychological Measurement, 19, 337–349.
Menzel, H. (1953). A new coefficient for scalogram analysis. Public Opinion Quarterly, 17, 268–280.
Miller, M.D. (in press). Measuring between-group differences in instruction. Journal of Educational Measurement.
Novick, M.R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3, 1–8.
Popham, W.J., & Husek, T.R. (1969). Implications of criterion-referenced measurement. Journal of Educational Measurement, 6, 1–9.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press, (reprinted 1980).
Rulon, P.J. (1939). A simplified procedure for determining the reliability of a test by split-halves. Harvard Educational Review, 9, 99–103.
Sagi, P.C. (1959). A statistical test for the significance of a coefficient of reproducibility. Psychometrika, 24, 19–27.
Sato, T. (1980). The S-P chart and the caution index. NEC (Nippon Electric Company) Educational Information Bulletin. Japan: Computer and Communication Systems Research Laboratories.
Schuessler, K.F. (1961). A note on statistical significance of scalogram. Sociometry, 24, 312–318.
Spearman, C. (1910). Correlation calculated with faulty data. British Journal of Psychology, 3, 271–295.
Stevens, S.S. (1951). Mathematics, measurement, and psychophysics. In S.S. Stevens (Ed.) Handbook of experimental psychology. New York: Wiley.
Tatsuoka, M.M. (1978). Recent psychometric developments in Japan: Engineers grapple with educational measurement problems. Paper presented at the Office of Naval Research Contractor’s Meeting on Individualized Measurement, Columbus, Missouri.
TenHouten, W.D. (1969). Scale gradient analysis: A statistical method for constructing and evaluating Guttman scales. Sociometry, 32, 80–98.
Torgerson, W.S. (1958). Theory and methods of scaling. New York: John Wiley and Sons.
Traub, R.E., & Wolfe, R.G. (1981). Latent trait theories and the assessment of educational achievement. In D.C. Berliner (Ed.), Review of Research in Education (Volume 9). American Education Research Association.
Tryon, R.C. (1957). Reliability and behavior domain validity: Reformulation and historical critique. Psychological Bulletin, 54, 229–249.
Walker, D.A. (1931). Answer-pattern and score-scatter in tests and examinations, British Journal of Psychology, 20, 73–86.
Walker, D.A. (1936). Answer-pattern and score-scatter in tests and examinations. British Journal of Psychology, 26, 301–308.
Walker, D.A. (1940). Answer-pattern and score-scatter in tests and examinations. British Journal of Psychology, 30, 248–260.
Wright, B.D. (1968). Sample-free test calibration and person measurement. In Proceedings of the 1967 Invitation Conference on Testing Problems. Princeton, NJ: Educational Testing Service.
Wright, B.D., & Stone, M.H. (1979). Best test design. Chicago: Mesa Press.
Yule, G.U. (1912). On the methods of measuring association between two attributes. Journal of the Royal Statistical Soiety, 75, 579–642.
Yule, G.U. (1922). An introduction to the theory of statistics. London: Charles Griffin and Co.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1987 Kluwer Academic Publishers
About this chapter
Cite this chapter
McArthur, D.L. (1987). Toward More Sensible Achievement Measurement: A Retrospective. In: McArthur, D.L. (eds) Alternative Approaches to the Assessment of Achievement. Evaluation in Education and Human Services, vol 16. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-3257-9_2
Download citation
DOI: https://doi.org/10.1007/978-94-009-3257-9_2
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-7961-7
Online ISBN: 978-94-009-3257-9
eBook Packages: Springer Book Archive