Skip to main content
Log in

Measurement issues in designing and implementing longitudinal evaluation studies

  • Published:
Educational Assessment, Evaluation and Accountability Aims and scope Submit manuscript

Abstract

Evaluation is required for almost all educational activities or programs, particularly for federal- and state-funded programs. Such a requirement is quite understandable given that we need to know what has been done actually produces its intended purposes. However, it is almost impossible that bad data can provide justified conclusions, and there are many evaluation studies that are circumvented by the fact that evaluation is a post-hoc test of educational activities. Thus, it is hard to produce the rigor and relevance of an evaluation study that may need good design and analysis to even provide partial answers to the original questions. The goal of this paper is to provide researchers and practitioners with some of the insights that may serve as a useful guide for designing and implementing longitudinal evaluation in a way that can improve the likelihood of high quality data for future evaluation studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Definition of program effectiveness is not clearly described in most literature. Sometimes it is operationalized as program satisfaction. Here program effectiveness is defined in terms of satisfaction.

  2. TerraNova is a test battery that is used to identify a student’s current performance, measure the effectiveness of instruction, provide an accountability mechanism, and track and report student progress. The TerraNova includes norm-referenced, criterion-referenced, and performance-level information on six subscales: Reading, Vocabulary, Reading Comprehension, Language, Language Mechanics, and Language Comprehension. Using national norms, the TerraNova reports national percentiles, scale scores, normal curve equivalents, and Stanine scores.

  3. A high test score does not necessarily indicate a high level of reading ability. It is only an approximation.

  4. We are not suggesting that two assessments per grade annually would be good timing. But two times per grade level is minimum.

  5. The statistical analysis used in analyzing the data is growth mixture modeling using multidimensional scaling model.

References

  • Amsel, E., & Renninger, K. A. (Eds.). (1997). Change and development: Issues of theory, method, and application. Mahwah, New Jersey: Lawrence Erlbaum Associates.

  • Baltes, P. B., Reese, H. W., & Nesselroade, J. R. (1977). Life-span developmental psychology: Introduction to research methods. Monterey: Brooks/Cole.

    Google Scholar 

  • Bennett, W. J., Lance, C. E., & Woehr, D. J. (Eds.). (2005). Performance measurement: Current perspectives and future challenges. Mahwah, NJ: Lawrence Erlbaum Associates.

  • Caspi, A. (1998). Personality development across the life course. In W. Damon (Ed.), Handbook of child psychology: Vol 3. Social, emotional, and personality development (pp. 311–388, 5th ed.). New York: Wiley.

    Google Scholar 

  • Chan, D. (1998). The conceptualization and analysis of change over time: an integrative approach incorporating longitudinal mean and covariance structures analysis (LMACS) and multiple indicator latent growth modeling (MLGM). Organizational Research Methods, 1, 421–483. doi:10.1177/109442819814004.

    Article  Google Scholar 

  • Cognato, C. A. (1999). The effects of transition activities on adolescent self-perception and academic achievement during the progression from eighth to ninth grade. Paper presented at the annual meeting of the National Middle School Association.

  • Eisner, E. W. (1991). Taking a second look: educational connoisseurship revisited. In M. W. McLaughlin, & D. C. Phillips (Eds.), Evaluation and education: At quarter century. Ninetieth year-book of the National Society for the Study of Education, Part II. Chicago: University of Chicago Press.

    Google Scholar 

  • Fitzpatrick, J. L., Sanders, J. R., & Worthen, B. R. (2004). Program evaluation: Alternative approaches and practical guidelines. Boston: Pearson Education.

    Google Scholar 

  • Friedman, S. L., & Haywood H. C. (Eds.). (1994). Developmental follow-up: Concepts, domains, and methods. New York, NY: Academic Press.

  • Goldhaber, D. D., & Brewer, D. J. (1997). Evaluating the effect of teacher degree level on educational performance. In W. J. Fowler (Ed.), Developments in school finance, 1996 (pp. 197–210). Washington: National Center for Education Statistics, U.S. Department of Education.

    Google Scholar 

  • Gottfredson, D. C., Gottfredson, G. D., & Skroban, S. (1996). A multimodel school-based prevention demonstration. Journal of Adolescent Research, 11, 97–115. doi:10.1177/0743554896111006.

    Article  Google Scholar 

  • Gottman, J. M. (1998). Psychololgy and the study of the marital processes. Annual Review of Psychology, 49, 167–197. doi:10.1146/annurev.psych.49.1.169.

    Article  Google Scholar 

  • Hartmann, D. P. (2005). Assessing growth in longitudinal investigations: Selected measurement and design issues. In D. M. Teti (Ed.), Handbook of research methods in developmental science (pp. 319–339). Malden: Blackwell.

    Google Scholar 

  • Hawkins, J. D., Catalano, R. F., Kosterman, R., Abbott, R., & Hill, K. G. (1999). Preventing adolescent health-risk behaviors by strengthening protection during childhood. Archives of Pediatrics & Adolescent Medicine, 153, 226–234.

    Google Scholar 

  • Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement invariance in aging research. Experimental Aging Research, 18, 117–144.

    Google Scholar 

  • Kessler, R. C., Little, R. J. A., & Groves, R. M. (1995). Advances in strategies for minimizing and adjusting for survey nonresponse. Epidemiologic Reviews, 18, 117–144.

    Google Scholar 

  • Provus, M. M. (1971). Discrepancy evaluation. Berkeley: McCutchan.

    Google Scholar 

  • Reise, S. P., Ainsworth, A. T., & Haviland, M. G. (2005). Item response theory: fundamentals, applications, and promise in psychological research. Current Directions in Psychological Science, 14, 95–101. doi:10.1111/j.0963-7214.2005.00342.x.

    Article  Google Scholar 

  • Ribisl, K. M., Walton, M. A., Mowbray, C. T., Luke, D. A., Davidson, W. S., & Bootsmiller, B. J. (1996). Minimizing participant attrition in panel studies through the use of effective retention and tracking strategies: review and recommendations. Evaluation and Program Planning, 19, 1–25. doi:10.1016/0149-7189(95)00037-2.

    Article  Google Scholar 

  • Robins, R. W., Fraley, R. C., & Krueger, R. F. (Eds.). (2007). Handbook of research methods in personality psychology. New York, NY: The Guilford Press.

  • Sanders, W. L., & Horn, S. P. (1998). Research findings from the Tennessee Value-Added Assessment System (TVAAS) database: implications for educational evaluation and research. Journal of Personnel Evaluation in Education, 12(3), 247–256. doi:10.1023/A:1008067210518.

    Article  Google Scholar 

  • Scriven, M. (1993). Hard-won lessons in program evaluation: New directions for program evaluation. San Francisco: Jossey-Bass.

    Google Scholar 

  • Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis. New York: Oxford University Press.

    Google Scholar 

  • Stouthamer-Loeber, M., van Kammen, W., & Loeber, R. (1992). The nuts and bolts of implementing large-scale longitudinal studies. Violence and Victims, 7, 63–78.

    Google Scholar 

  • Stufflebean, D. L., & Shinkfield, A. J. (1985). Systematic evaluation. Boston: Kluwer-Nijhoff.

    Google Scholar 

  • The Joint Committee on Standards for Educational Evaluation. (1994). The program evaluation standards: How to assess evaluations of educational programs (2nd ed.). Thousand Oaks: Sage.

    Google Scholar 

  • Tyler, R. W. (1942). General statement on evaluation. The Journal of Educational Research, 35, 492–501.

    Google Scholar 

  • Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4–70. doi:10.1177/109442810031002.

    Article  Google Scholar 

  • Willett, J. B., Singer, J. D., & Martin, N. C. (1998). The design and analysis of longitudinal studies of development and psychopathology in context: statistical models and methodological recommendations. Development and Psychopathology, 10, 395–426. doi:10.1017/S0954579498001667.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cody S. Ding.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, C.S. Measurement issues in designing and implementing longitudinal evaluation studies. Educ Asse Eval Acc 21, 155–171 (2009). https://doi.org/10.1007/s11092-008-9067-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11092-008-9067-6

Keywords

Navigation