Educational Assessment: A Brief History

McArthur, David L.

doi:10.1007/978-94-009-3257-9_1

David L. McArthur PhD⁴

Part of the book series: Evaluation in Education and Human Services ((EEHS,volume 16))

175 Accesses
3 Citations

Abstract

Educational assessment in the Western world has a long but very irregular history. Two distinct threads are woven together: the first is the variety of settings in which testing itself came to have practical use while the second is the incorporation of increasingly rigorous methods by which to make sense out of the results of that testing. This chapter sets out some of the key developments in each of these two areas, from their origins until the dawn of contemporary psychometrics. For extended periods of time even the simplest improvements in either testing or statistics fought long and hard against tradition and inertia. It took many generations for the two threads to finally merge into a full-fledged science of educational measurement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adams, H. F. (1936). Validity, reliability and objectivity. In W.R. Miles (Ed.), Psychological studies of human variability. Psychological Monographs, 57, 329–350.
Google Scholar
Barthelmess, H. M. (1931). The validity of intelligence test elements. New York: Teachers College.
Google Scholar
Binet, A. (1898). La mesure en psychologie individuelle. Revue Philosophique, 46, 113–123.
Google Scholar
Binet, A., & Simon, T. (1905). Methodes nouvelles pour le diagnostic scientifique des etats inferieurs de l’intelligence. L’Annee Psychologique, 11, 163–190.
Article Google Scholar
Binet, A., & Simon, T. (1910). Sur la necessite d’une methode applicable au diagnostic des arrierees militaires. Annales Medico-psychologique.
Google Scholar
Birnbaum, A. (1957). An efficient design and use of tests of a mental ability for various decision making problems. Series Report No. 58-16, USAF School of Aviation Medicine, Randolph, TX.
Google Scholar
Birnbaum, A. (1958). On the estimation of mental ability. Series Report No.15, USAF School of Aviation Medicine, Randolph, TX.
Google Scholar
Bower, J. (1975). A history of western education. Civilization of Europe sixth to sixteenth century, vol. 2. New York: St. Martin’s Press.
Google Scholar
Bright, O. T. (1895). Changes — wise and unwise — in grammar and high schools. Journal of Proceeding and Addresses, St. Paul: National Education Association.
Google Scholar
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296–322.
Google Scholar
Brown, W., & Thompson, G. H. (1940). The essentials of mental measurement, Cambridge, MA: Cambridge University Press.
Google Scholar
Brownless, V. T., & Keats, J. A. (1958). A retest method of studying partial knowledge and other factors influencing item response. Psychometrika, 23, 67–73.
Article Google Scholar
Burt, C. L. (1909). Experimental tests of general intelligence. British Journal of Psychology, 3, 94–177.
Google Scholar
Burt, C. L. (1936). The use of psychological tests in England. In Sadler, M. E., Abbott, A., Burts, C. L., Burns, C. D., Hartog, P., Spearman, C., and Stirk, S. D. Essays on examinations. London: Macmillan.
Google Scholar
Campbell, N.R. (1920). Physics, the elements. Cambridge: Cambridge University Press.
Google Scholar
Campbell, N.R. (1921). What is science? London: Methuen.
Google Scholar
Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–381.
Article Google Scholar
Cattell, R. B. (1964). Validity and reliability: A proposed more basic set of concepts. Journal of Educational Psychology, 55, 1–22.
Article Google Scholar
Clarke, A. D. B., and Clarke, A. M. (1985). Mental testing: origins, evolution, and present status. History of Education, 14, 263–272.
Article Google Scholar
Cochran, W. G. (1976). Early development of techniques in experimentation. In D. B. Owen (Ed.), On the history of statistics and probability. New York: Dekker.
Google Scholar
Cremin, L. (1961). The transformation of the school. New York: Knopf.
Google Scholar
Cronbach, L. J. (1947). Test “reliability”: Its meaning and determination. Psychometrika, 12, 1–16.
Article Google Scholar
Cronbach, L. J. (1975). Five decades of public controversy over mental testing. American Psychologist, 30, 1–14.
Article Google Scholar
Cullen, M. J. (1975). The statistical movement in early Victorian Britain: The foundations of empirical social research. New York: Barnes & Noble.
Google Scholar
DuBois, P. H. (1964). A test-dominated society: China, 1115 B.C.-1905 A.D. ETS Invitational conference on testing problems. Princeton: Educational Testing Service.
Google Scholar
DuBois, P. H. (1970). A history of psychological testing. Boston: Allyn and Bacon.
Google Scholar
Edgeworth, F. Y. (1890). The element of chance in competitive examinations. Journal of the Royal Statistical Society. 53, 460–475, 644-673.
Google Scholar
Englehart, M. D. (1950). Examinations. In W. S. Monroe (Ed.), Encyclopedia of educational research. New York: MacMillan.
Google Scholar
Ferguson, G. A. (1942). Item selection by the constant process. Psychometrika, 7 19–29.
Article Google Scholar
Fisher, R. A. (1956). Statistical methods and scientific inference. New York: Hafner.
Google Scholar
Fisher, A. (1915). The mathematical theory of probabilities and its application to frequency curves and statistical methods. New York: Macmillan.
Google Scholar
Freeman, F. N. (1926). Mental tests: Their history, principles and applications. Boston: Houghton Mifflin.
Book Google Scholar
Goodenough, F. L. (1936). A critical note on the use of the term ‘reliability’ in mental measurement. Journal of Educational Psychology, 27, 173–178.
Article Google Scholar
Graves, F. P. (1950). A history of education in modern times. New York: MacMillan.
Google Scholar
Guilford, J.P. (1936). Psychometric methods. New York: McGraw-Hill.
Google Scholar
Gulliksen, H. (1961). Measurement of learning and mental abilities. Psychometrika, 26, 93–107.
Article Google Scholar
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.
Book Google Scholar
Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139–150.
Article Google Scholar
Hambleton, R. K., & Cook, L. L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement, 14, 75–96.
Article Google Scholar
Horst, A. P. (1936). Item selection by means of maximizing function. Psychometrika, 1, 229–244.
Article Google Scholar
Keats, J. A., & Lord, F. M. (1962). A theoretical distribution for mental test scores. Psychometrika, 27, 59–72.
Article Google Scholar
Kelley, T. L. (1927). Interpretation of educational measurements. Yonkers-on-Hudson, NY: World.
Google Scholar
Kelley, T. L., & Krey, A. C. (1934). Tests and measurements in the social sciences. Report of the Commission on the Social Studies, American Historical Association, Part IV. New York: Charles Scribner’s Sons.
Google Scholar
Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151–160.
Article Google Scholar
Latham, H. (1877). On the action of examinations considered as a means of selection. Cambridge: Deighton Bell.
Google Scholar
Lawley, D. N. (1943). On problems connected with item selection and test construction. Proceedings of the Royal Society of Edinburgh, 61, Section A, 273–287.
Google Scholar
Lazarsfeld, P. F. (1960). Latent structure analysis and test theory. In H. Gulliksen and S. Messick (Eds.), Psychological scaling: Theory and applications. New York: Wiley.
Google Scholar
Lazarsfeld, P. F. (1950). The logical and mathematical foundations of latent struture analysis. In S. A. Stouffer, et al (Eds.), Measurement and prediction. Princeton: Princeton University Press.
Google Scholar
Lentz, T. F., Hirshstein, B., & Finch, F. H. (1932). Evaluation of methods of evaluating test items. Journal of Educational Psychology, 23, 344–350.
Article Google Scholar
Lincoln, E. A. (1932). The unreliability of reliability coefficients. Journal of Educational Psychology, 23, 11–14.
Article Google Scholar
Lord, F. M. (1952). A theory of test scores. Psychometric Monographs, No. 7
Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley.
Google Scholar
Macready, G. B., & Dayton, C. M. (1977). The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 2, 99–120.
Article Google Scholar
Marks, R. (1977). Providing for individual differences: A history of the intelligence testing movement in North America. Interchange, 7, 3–16.
Article Google Scholar
McCall, W. A. (1922). How to Measure in Education. New York: Macmillan.
Google Scholar
Meitzen, A. (1891). History, theory, and technique of statistics. fnnals of the American Academy of Political and Social Science, 1, 1–237.
Google Scholar
Meyer, A. E. (1965). Educational history of the western world. New York: McGraw Hill.
Google Scholar
Monroe, W. S. (1923). Introduction to the theory of educational measurement. Boston: Houghton Mifflin.
Book Google Scholar
Monroe, W. S. (1945). Educational measurement in 1920 and 1945. Journal of Educational Research, 38, 334–340.
Google Scholar
Pearson, E. S. (Ed.) (1978). The history of statistics in the 17th and 18th centuries, against the changing background of intellectual, scientific and religious thought. Lectures by Karl Pearson. London: Charles Griffin.
Google Scholar
Peterson, J. (1925). Early conceptions and tests of intelligence. Yonkers-on-Hudson, NY: World.
Book Google Scholar
Quetelet, M.A. (1849). Letters on the theory of probabilities. London: Charles and Edwin Layton.
Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Neilsen & Lydiche.
Google Scholar
Rice, J. M. Forum, 1897. Cited in W. H. Wilds & K. V. Lottich, (1970). Foundations of modern education. New York: Holt, Rinehart & Winston.
Google Scholar
Ruch, G. M. (1929). The objective or new-type examination, an introduction to educational measurement. Chicago: Scott, Foresman.
Google Scholar
Ruch, G. M., & deGraff, M. H. (1926). Corrections for chance and “guess” vs. “do not guess” instructions in multiple-response tests. Journal of Educational Psychology, 17, 368–375.
Article Google Scholar
Rugg, H. O. (1917). Statistical methods applied to education. Boston: Houghton Mifflin.
Google Scholar
Sadler, M. E. (1936). The scholarship system in England to 1890 and some of its developments. In Sadler, M. E., Abbott, A., Burts, C. L. Burns, C. D., Hartog, P., Spearman, C, and Stirk, S. D. Essays on examinations. London: MacMillan.
Google Scholar
Sharp, S. E. (1899). Individual psychology: A study in psychological method. American Journal of Psychology, 10, 329–391.
Article Google Scholar
Smallwood, M. L. (1935). An historical study of examinations and grading systems in early American universities. Cambridge: Harvard University Press (Harvard Studies in Education vol. 24).
Google Scholar
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295.
Google Scholar
Spearman, C. (1904). General intelligence objectively determined and measured. American Journal of Psychology, 15, 201–292.
Article Google Scholar
Spring, J. H. (1972). Psychologists and the war: The meaning of intelligence and the Alpha and Beta tests. History of Education Quarterly, 12, 3–15.
Article Google Scholar
Strayer, G. D. (1913).Standards and tests for measuring the efficiency of schools or systems of schools. Bulletin, United States Bureau of Education. Whole No. 13: Report of the Committee of the National Council of Education.
Google Scholar
Sylvester, D. W. (1970). Educational documents 800-1816. London: Methuen
Google Scholar
Thompson, G. O. B., & Sharp, S. (1983). History of mental testing. In T. Husen & N. Postlethwaite (Eds.), International encyclopedia of education: Research and studies, Oxford: Pergamon Press.
Google Scholar
Thorndike, E. L. (1904). An introduction to the theory of mental and social measurements. New York: Science Press.
Book Google Scholar
Thorndike, E. L. (1913). Educational measurements of fifty years ago. Journal of Educational Psychology, 6, 551–552.
Google Scholar
Thurstone, L. L. (1925). A method of scaling psychological and educational tests. Journal of Educational Psychology, 16, 433–451.
Article Google Scholar
Thurstone, L. L. (1931). The reliability and validity of tests. Ann Arbor: Edwards.
Google Scholar
Thurstone, L. L. (1926). The scoring of individual performance. Journal of Educational Psychology, 17, 446–457.
Article Google Scholar
Thurstone, L. L. (1927). The unit of measurement in educational scales. Journal of Educational Psychology, 18, 505–524.
Article Google Scholar
Toulouse, E., & Pieron, H. (1904). Technique de psychologie experimentale. Paris: Doin.
Google Scholar
Tryon, R. C. (1957). Reliability and behavior domain validity: Reformulation and historical critique. Psychological Bulletin, 54, 229–249.
Article Google Scholar
Tucker, L. R. (1946). Maximum validity of a test with equivalent items. Psychometrika, 11, 1–13.
Article Google Scholar
Wilds, E. H., & Lottich, K. V. (1970). Foundations of modern education. New York: Holt, Rinehart & Winston.
Google Scholar
Wissler, C. (1901). The correlation of mental and physical tests. Psychological Review, Monograph Supplement Vol. 8, No. 16.
Google Scholar
Wright, B.D. (1984). Despair and hope for educational measurement. Contemporary Education Review, 3, 281–288.
Google Scholar
Yerkes, R. M. (Ed.) (1921). Psychological examining in the United States Army. Memoirs of the National Academy of Sciences, 15, 1—890.
Google Scholar
Yule, G.U. (1910). An introduction to the theory of statistics. London: Charles Griffin.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Student Testing, Evaluation and Standards, Graduate School of Education, University of California Los Angeles, Los Angeles, CA, 90024, USA
David L. McArthur PhD

Authors

David L. McArthur PhD
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Student Testing, Evaluation and Standards, Graduate School of Education, University of California Los Angeles, Los Angeles, CA, 90024, USA
David L. McArthur PhD

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

McArthur, D.L. (1987). Educational Assessment: A Brief History. In: McArthur, D.L. (eds) Alternative Approaches to the Assessment of Achievement. Evaluation in Education and Human Services, vol 16. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-3257-9_1

Download citation

DOI: https://doi.org/10.1007/978-94-009-3257-9_1
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-7961-7
Online ISBN: 978-94-009-3257-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics