Skip to main content

Part of the book series: Evaluation in Education and Human Services ((EEHS,volume 16))

Abstract

Educational assessment in the Western world has a long but very irregular history. Two distinct threads are woven together: the first is the variety of settings in which testing itself came to have practical use while the second is the incorporation of increasingly rigorous methods by which to make sense out of the results of that testing. This chapter sets out some of the key developments in each of these two areas, from their origins until the dawn of contemporary psychometrics. For extended periods of time even the simplest improvements in either testing or statistics fought long and hard against tradition and inertia. It took many generations for the two threads to finally merge into a full-fledged science of educational measurement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adams, H. F. (1936). Validity, reliability and objectivity. In W.R. Miles (Ed.), Psychological studies of human variability. Psychological Monographs, 57, 329–350.

    Google Scholar 

  • Barthelmess, H. M. (1931). The validity of intelligence test elements. New York: Teachers College.

    Google Scholar 

  • Binet, A. (1898). La mesure en psychologie individuelle. Revue Philosophique, 46, 113–123.

    Google Scholar 

  • Binet, A., & Simon, T. (1905). Methodes nouvelles pour le diagnostic scientifique des etats inferieurs de l’intelligence. L’Annee Psychologique, 11, 163–190.

    Article  Google Scholar 

  • Binet, A., & Simon, T. (1910). Sur la necessite d’une methode applicable au diagnostic des arrierees militaires. Annales Medico-psychologique.

    Google Scholar 

  • Birnbaum, A. (1957). An efficient design and use of tests of a mental ability for various decision making problems. Series Report No. 58-16, USAF School of Aviation Medicine, Randolph, TX.

    Google Scholar 

  • Birnbaum, A. (1958). On the estimation of mental ability. Series Report No.15, USAF School of Aviation Medicine, Randolph, TX.

    Google Scholar 

  • Bower, J. (1975). A history of western education. Civilization of Europe sixth to sixteenth century, vol. 2. New York: St. Martin’s Press.

    Google Scholar 

  • Bright, O. T. (1895). Changes — wise and unwise — in grammar and high schools. Journal of Proceeding and Addresses, St. Paul: National Education Association.

    Google Scholar 

  • Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296–322.

    Google Scholar 

  • Brown, W., & Thompson, G. H. (1940). The essentials of mental measurement, Cambridge, MA: Cambridge University Press.

    Google Scholar 

  • Brownless, V. T., & Keats, J. A. (1958). A retest method of studying partial knowledge and other factors influencing item response. Psychometrika, 23, 67–73.

    Article  Google Scholar 

  • Burt, C. L. (1909). Experimental tests of general intelligence. British Journal of Psychology, 3, 94–177.

    Google Scholar 

  • Burt, C. L. (1936). The use of psychological tests in England. In Sadler, M. E., Abbott, A., Burts, C. L., Burns, C. D., Hartog, P., Spearman, C., and Stirk, S. D. Essays on examinations. London: Macmillan.

    Google Scholar 

  • Campbell, N.R. (1920). Physics, the elements. Cambridge: Cambridge University Press.

    Google Scholar 

  • Campbell, N.R. (1921). What is science? London: Methuen.

    Google Scholar 

  • Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–381.

    Article  Google Scholar 

  • Cattell, R. B. (1964). Validity and reliability: A proposed more basic set of concepts. Journal of Educational Psychology, 55, 1–22.

    Article  Google Scholar 

  • Clarke, A. D. B., and Clarke, A. M. (1985). Mental testing: origins, evolution, and present status. History of Education, 14, 263–272.

    Article  Google Scholar 

  • Cochran, W. G. (1976). Early development of techniques in experimentation. In D. B. Owen (Ed.), On the history of statistics and probability. New York: Dekker.

    Google Scholar 

  • Cremin, L. (1961). The transformation of the school. New York: Knopf.

    Google Scholar 

  • Cronbach, L. J. (1947). Test “reliability”: Its meaning and determination. Psychometrika, 12, 1–16.

    Article  Google Scholar 

  • Cronbach, L. J. (1975). Five decades of public controversy over mental testing. American Psychologist, 30, 1–14.

    Article  Google Scholar 

  • Cullen, M. J. (1975). The statistical movement in early Victorian Britain: The foundations of empirical social research. New York: Barnes & Noble.

    Google Scholar 

  • DuBois, P. H. (1964). A test-dominated society: China, 1115 B.C.-1905 A.D. ETS Invitational conference on testing problems. Princeton: Educational Testing Service.

    Google Scholar 

  • DuBois, P. H. (1970). A history of psychological testing. Boston: Allyn and Bacon.

    Google Scholar 

  • Edgeworth, F. Y. (1890). The element of chance in competitive examinations. Journal of the Royal Statistical Society. 53, 460–475, 644-673.

    Google Scholar 

  • Englehart, M. D. (1950). Examinations. In W. S. Monroe (Ed.), Encyclopedia of educational research. New York: MacMillan.

    Google Scholar 

  • Ferguson, G. A. (1942). Item selection by the constant process. Psychometrika, 7 19–29.

    Article  Google Scholar 

  • Fisher, R. A. (1956). Statistical methods and scientific inference. New York: Hafner.

    Google Scholar 

  • Fisher, A. (1915). The mathematical theory of probabilities and its application to frequency curves and statistical methods. New York: Macmillan.

    Google Scholar 

  • Freeman, F. N. (1926). Mental tests: Their history, principles and applications. Boston: Houghton Mifflin.

    Book  Google Scholar 

  • Goodenough, F. L. (1936). A critical note on the use of the term ‘reliability’ in mental measurement. Journal of Educational Psychology, 27, 173–178.

    Article  Google Scholar 

  • Graves, F. P. (1950). A history of education in modern times. New York: MacMillan.

    Google Scholar 

  • Guilford, J.P. (1936). Psychometric methods. New York: McGraw-Hill.

    Google Scholar 

  • Gulliksen, H. (1961). Measurement of learning and mental abilities. Psychometrika, 26, 93–107.

    Article  Google Scholar 

  • Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.

    Book  Google Scholar 

  • Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139–150.

    Article  Google Scholar 

  • Hambleton, R. K., & Cook, L. L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement, 14, 75–96.

    Article  Google Scholar 

  • Horst, A. P. (1936). Item selection by means of maximizing function. Psychometrika, 1, 229–244.

    Article  Google Scholar 

  • Keats, J. A., & Lord, F. M. (1962). A theoretical distribution for mental test scores. Psychometrika, 27, 59–72.

    Article  Google Scholar 

  • Kelley, T. L. (1927). Interpretation of educational measurements. Yonkers-on-Hudson, NY: World.

    Google Scholar 

  • Kelley, T. L., & Krey, A. C. (1934). Tests and measurements in the social sciences. Report of the Commission on the Social Studies, American Historical Association, Part IV. New York: Charles Scribner’s Sons.

    Google Scholar 

  • Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151–160.

    Article  Google Scholar 

  • Latham, H. (1877). On the action of examinations considered as a means of selection. Cambridge: Deighton Bell.

    Google Scholar 

  • Lawley, D. N. (1943). On problems connected with item selection and test construction. Proceedings of the Royal Society of Edinburgh, 61, Section A, 273–287.

    Google Scholar 

  • Lazarsfeld, P. F. (1960). Latent structure analysis and test theory. In H. Gulliksen and S. Messick (Eds.), Psychological scaling: Theory and applications. New York: Wiley.

    Google Scholar 

  • Lazarsfeld, P. F. (1950). The logical and mathematical foundations of latent struture analysis. In S. A. Stouffer, et al (Eds.), Measurement and prediction. Princeton: Princeton University Press.

    Google Scholar 

  • Lentz, T. F., Hirshstein, B., & Finch, F. H. (1932). Evaluation of methods of evaluating test items. Journal of Educational Psychology, 23, 344–350.

    Article  Google Scholar 

  • Lincoln, E. A. (1932). The unreliability of reliability coefficients. Journal of Educational Psychology, 23, 11–14.

    Article  Google Scholar 

  • Lord, F. M. (1952). A theory of test scores. Psychometric Monographs, No. 7

    Google Scholar 

  • Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley.

    Google Scholar 

  • Macready, G. B., & Dayton, C. M. (1977). The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 2, 99–120.

    Article  Google Scholar 

  • Marks, R. (1977). Providing for individual differences: A history of the intelligence testing movement in North America. Interchange, 7, 3–16.

    Article  Google Scholar 

  • McCall, W. A. (1922). How to Measure in Education. New York: Macmillan.

    Google Scholar 

  • Meitzen, A. (1891). History, theory, and technique of statistics. fnnals of the American Academy of Political and Social Science, 1, 1–237.

    Google Scholar 

  • Meyer, A. E. (1965). Educational history of the western world. New York: McGraw Hill.

    Google Scholar 

  • Monroe, W. S. (1923). Introduction to the theory of educational measurement. Boston: Houghton Mifflin.

    Book  Google Scholar 

  • Monroe, W. S. (1945). Educational measurement in 1920 and 1945. Journal of Educational Research, 38, 334–340.

    Google Scholar 

  • Pearson, E. S. (Ed.) (1978). The history of statistics in the 17th and 18th centuries, against the changing background of intellectual, scientific and religious thought. Lectures by Karl Pearson. London: Charles Griffin.

    Google Scholar 

  • Peterson, J. (1925). Early conceptions and tests of intelligence. Yonkers-on-Hudson, NY: World.

    Book  Google Scholar 

  • Quetelet, M.A. (1849). Letters on the theory of probabilities. London: Charles and Edwin Layton.

    Google Scholar 

  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Neilsen & Lydiche.

    Google Scholar 

  • Rice, J. M. Forum, 1897. Cited in W. H. Wilds & K. V. Lottich, (1970). Foundations of modern education. New York: Holt, Rinehart & Winston.

    Google Scholar 

  • Ruch, G. M. (1929). The objective or new-type examination, an introduction to educational measurement. Chicago: Scott, Foresman.

    Google Scholar 

  • Ruch, G. M., & deGraff, M. H. (1926). Corrections for chance and “guess” vs. “do not guess” instructions in multiple-response tests. Journal of Educational Psychology, 17, 368–375.

    Article  Google Scholar 

  • Rugg, H. O. (1917). Statistical methods applied to education. Boston: Houghton Mifflin.

    Google Scholar 

  • Sadler, M. E. (1936). The scholarship system in England to 1890 and some of its developments. In Sadler, M. E., Abbott, A., Burts, C. L. Burns, C. D., Hartog, P., Spearman, C, and Stirk, S. D. Essays on examinations. London: MacMillan.

    Google Scholar 

  • Sharp, S. E. (1899). Individual psychology: A study in psychological method. American Journal of Psychology, 10, 329–391.

    Article  Google Scholar 

  • Smallwood, M. L. (1935). An historical study of examinations and grading systems in early American universities. Cambridge: Harvard University Press (Harvard Studies in Education vol. 24).

    Google Scholar 

  • Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295.

    Google Scholar 

  • Spearman, C. (1904). General intelligence objectively determined and measured. American Journal of Psychology, 15, 201–292.

    Article  Google Scholar 

  • Spring, J. H. (1972). Psychologists and the war: The meaning of intelligence and the Alpha and Beta tests. History of Education Quarterly, 12, 3–15.

    Article  Google Scholar 

  • Strayer, G. D. (1913).Standards and tests for measuring the efficiency of schools or systems of schools. Bulletin, United States Bureau of Education. Whole No. 13: Report of the Committee of the National Council of Education.

    Google Scholar 

  • Sylvester, D. W. (1970). Educational documents 800-1816. London: Methuen

    Google Scholar 

  • Thompson, G. O. B., & Sharp, S. (1983). History of mental testing. In T. Husen & N. Postlethwaite (Eds.), International encyclopedia of education: Research and studies, Oxford: Pergamon Press.

    Google Scholar 

  • Thorndike, E. L. (1904). An introduction to the theory of mental and social measurements. New York: Science Press.

    Book  Google Scholar 

  • Thorndike, E. L. (1913). Educational measurements of fifty years ago. Journal of Educational Psychology, 6, 551–552.

    Google Scholar 

  • Thurstone, L. L. (1925). A method of scaling psychological and educational tests. Journal of Educational Psychology, 16, 433–451.

    Article  Google Scholar 

  • Thurstone, L. L. (1931). The reliability and validity of tests. Ann Arbor: Edwards.

    Google Scholar 

  • Thurstone, L. L. (1926). The scoring of individual performance. Journal of Educational Psychology, 17, 446–457.

    Article  Google Scholar 

  • Thurstone, L. L. (1927). The unit of measurement in educational scales. Journal of Educational Psychology, 18, 505–524.

    Article  Google Scholar 

  • Toulouse, E., & Pieron, H. (1904). Technique de psychologie experimentale. Paris: Doin.

    Google Scholar 

  • Tryon, R. C. (1957). Reliability and behavior domain validity: Reformulation and historical critique. Psychological Bulletin, 54, 229–249.

    Article  Google Scholar 

  • Tucker, L. R. (1946). Maximum validity of a test with equivalent items. Psychometrika, 11, 1–13.

    Article  Google Scholar 

  • Wilds, E. H., & Lottich, K. V. (1970). Foundations of modern education. New York: Holt, Rinehart & Winston.

    Google Scholar 

  • Wissler, C. (1901). The correlation of mental and physical tests. Psychological Review, Monograph Supplement Vol. 8, No. 16.

    Google Scholar 

  • Wright, B.D. (1984). Despair and hope for educational measurement. Contemporary Education Review, 3, 281–288.

    Google Scholar 

  • Yerkes, R. M. (Ed.) (1921). Psychological examining in the United States Army. Memoirs of the National Academy of Sciences, 15, 1—890.

    Google Scholar 

  • Yule, G.U. (1910). An introduction to the theory of statistics. London: Charles Griffin.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1987 Kluwer Academic Publishers

About this chapter

Cite this chapter

McArthur, D.L. (1987). Educational Assessment: A Brief History. In: McArthur, D.L. (eds) Alternative Approaches to the Assessment of Achievement. Evaluation in Education and Human Services, vol 16. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-3257-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-94-009-3257-9_1

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-7961-7

  • Online ISBN: 978-94-009-3257-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics