Skip to main content

Best Practices in Test Construction for Developmental-Behavioral Measures: Quality Standards for Reviewers and Researchers

  • Chapter
  • First Online:
Follow-Up for NICU Graduates

Abstract

Developmental-behavioral measurement is fundamental to clinical determinations about children’s and families’ needs. Such measurement consists of three main types (listed in order of complexity): (1) screening, (2) mid-level assessment, and (3) diagnostic tests. The three types of measures depend on the same psychometric precepts although screening tests, despite their inherent brevity, depend on an additional construct: proof of accuracy. In this chapter, we focus on standards in test construction with additional emphasis on the psychometry of screening tests because screens are deployed more frequently and serve as a fundamental decision point for whether more complex measures are needed. For example, screening test results help identify whether children require further vision, hearing, or lead screening referrals for further evaluation by special education services and/or to developmental-behavioral pediatricians or other subspecialists. The powerful role of screening tests in decisions that profoundly affect families’ lives means that screens must be especially well-constructed. This review of methods in psychometry highlights how to research and review developmental-behavioral measures including screening tests. Crucial to test selection is an understanding of principles and policy in standardization, reliability, and validity, including accuracy computations in the case of screening tests, and utility—practical considerations in measurement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Majnemer, A. (2012). Selection and use of outcome measures.

    Google Scholar 

  2. Aylward, G. P., Stancin, T., Wolraich, M. L., Drotar, D. D., Dworkin, P. H., & Perrin, E. C. (2008). Screening and assessment tools. In M. L. Wolraich (Ed.), Developmental behavioral pediatrics evidence and practice (pp. 123–130). Philadelphia, PA: Mosby Elsevier.

    Google Scholar 

  3. Barnes, K. E., & Charles, C. (1982). Preschool screening: The measurement and prediction of children at risk. Springfield, IL: Thomas.

    Google Scholar 

  4. Buros Center for Testing. (2014).

    Google Scholar 

  5. Frankenburg, W. K. (1974). Selection of diseases and tests in pediatric screening. Pediatrics, 54, 612–618.

    PubMed  Google Scholar 

  6. Lichtenstein, R., & Ireton, H. (1984). Preschool screening: Identifying young children with developmental and educational problems. Orlando, FL: Grune & Stratton. Retrieved from www.amazon.com.

    Google Scholar 

  7. Anastasi, A., Urbina, S., & Cliffs, N. J. (2005). Psychological testing. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  8. Cronbach, L. J. (1990). Essentials of psychological testing. Harper & Row: New York.

    Google Scholar 

  9. Aylward, G. P. (2013). Continuing issues with the Bayley-III: Where to go from here. Journal of Developmental and Behavioral Pediatrics, 34, 697–701.

    Article  PubMed  Google Scholar 

  10. Aylward, G. P., & Aylward, B. S. (2011). The changing yardstick in measurement of cognitive abilities in infancy. Journal of Developmental and Behavioral Pediatrics, 32, 465–468.

    Article  PubMed  Google Scholar 

  11. Murphy, K. R., & Davidshofer, C. O. (2005). Psychological testing: Principles and applications. Upper Saddle River, NJ: Pearson Prentice Hall.

    Google Scholar 

  12. Miller, L. A., McIntire, S. A., & Lovler, R. L. (2010). Foundations of psychological testing. Thousand Oaks, CA: Sage.

    Google Scholar 

  13. Popham, W. J., & Alexandria, V. A. (2011). Transformative assessment in action: An inside look at applying the process. Association for Supervision Curriculum Development. Retrieved from www.ascd.org.

  14. Vygotsky, L., & Harvard, P. (1978). Mind in society: The development of higher psychological processes. Cambridge: Harvard University Press.

    Google Scholar 

  15. American Educational Research Association. (2014). American educational research on measurement in education. Washington, DC: Author.

    Google Scholar 

  16. Downing, S. M., & Haladyna, T. M. (Eds.). (2006). Handbook of test development. Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  17. Glascoe, F. P., Marks, K. P., Macias, M. M., Howard, B., Sturner, R., Aydin, M., et al. (2013). Test construction and other research in developmental-behavioral screening. In: Identifying and addressing developmental behavioral problems guide for medical and nonmedical professionals trainees researchers and advocates (pp. 423–51). Nolensville, TN: PEDStest.com, LLC. Retrieved from www.pedstest.org.

  18. Epstein, A. S., & Jama. (2013). Not just words: Caring for the patient by caring about language. Medicine, 173(9), 727–728.

    Google Scholar 

  19. DuBay, W. H. (2006). Smart language: Readers, readability, and the grading of text. Costa Mesa, CA: Impact Information. Retrieved from www.impact-information.com.

    Google Scholar 

  20. Brooks, C. (2014). Lost in translation: 9 International Marketing Fails (Vol. 8).

    Google Scholar 

  21. ITC. (2014). International test commission.

    Google Scholar 

  22. Willis, G. B. (2005). Cognitive interviewing: A tool for improving questionnaire design. Thousand Oaks, CA: Sage.

    Book  Google Scholar 

  23. Camp, B. W. (2007). Evaluating bias in validity studies of developmental/behavioral screening tests. Journal of Developmental and Behavioral Pediatrics, 28(3), 234–240.

    Article  PubMed  Google Scholar 

  24. Gottfredson, L. S. (1994). The science and politics of race-norming. American Psychologist, 48, 955–963.

    Article  Google Scholar 

  25. Viera, A. J., & Garret, J. M. (2005). Understanding interobserver agreement: The kappa statistic. Family Medicine, 37(5), 360–363.

    PubMed  Google Scholar 

  26. Streiner, D. L., Norman, G., & Cairney, J. (2015). Health measurement scales: A practical guide to their development and use (5th ed.). Oxford, UK: Oxford University Press.

    Book  Google Scholar 

  27. Cairney, J., & Streiner, D. L. (2011). Using relative improvement over chance (RIOC) to examine agreement between tests: Three case examples using studies of developmental coordination disorder (DCD) in children. Research in Developmental Disabilities, 32(1), 87–92.

    Article  PubMed  Google Scholar 

  28. Baker, F. B., & Kim, S. (2004). Item response theory: Parameter estimation techniques. New York, NY: Marcel Dekker.

    Google Scholar 

  29. Streiner, D. L. (2010). Measure for measure: New developments in measurement and item response theory. Canadian Journal of Psychiatry, 55(3), 180–186.

    Article  PubMed  Google Scholar 

  30. Glascoe, F. P., & Leew, S. (2010). Parenting behaviors, perceptions, and psychosocial risk: Impacts on young children’s development. Pediatrics, 125(2), 313–319.

    Article  PubMed  Google Scholar 

  31. Sameroff, A. J., Seifer, R., Barocas, R., Zax, M., & Greenspan, S. (1987). Intelligence quotient scores of 4-year-old children: Social-environmental risk factors. Pediatrics, 79(3), 343–350.

    PubMed  Google Scholar 

  32. Taris, T. W., Bok, I. A., & Meijer, Z. Y. (1998). Assessing stability and change of psychometric properties of multi-item concepts across different situations: Approach. The Journal of Psychology, 132, 301–316.

    Article  Google Scholar 

  33. Lynn, M. R. (1986). Determination and quantification of content validity. Nursing Research, 35(6), 382–385.

    Article  PubMed  Google Scholar 

  34. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.

    Article  PubMed  Google Scholar 

  35. Anderson, L. M., Shinn, C., Fullilove, M. T., Scrimshaw, S. C., Fielding, J. E., & Normand, J. (2003). The effectiveness of early childhood development programs: A systematic review. American Journal of Preventive Medicine, 24(3), 32–46.

    Article  PubMed  Google Scholar 

  36. Bailey, D. B., Hebbeler, K., Spiker, D., Scarborough, A., Mallik, S., & Nelson, L. (2005). Thirty-six-month outcomes for families of children who have disabilities and participated in early intervention. Pediatrics, 116(6), 1346–1352.

    Article  PubMed  Google Scholar 

  37. Campbell, F. A., Ramey, C. T., Pungello, E., Sparling, J., & Miller-Johnson, S. (2002). Early childhood education: Young adult outcomes from the Abecedarian Project. Applied Developmental Science, 6(1), 42–57.

    Article  Google Scholar 

  38. Marks, K. P., Shevell, M., Squires, J., Aylward, G., & Glascoe, F. P. (2008). The Thorny nature of predictive validity research on developmental screening tests. Pediatrics, 122(4), 866–868.

    Article  PubMed  Google Scholar 

  39. McCormick, M. C., Brooks-Gunn, J., Buka, S. L., Goldman, J., Yu, J., Salganik, M., et al. (2006). Early intervention in low birth weight premature infants: Results at 18 years of age for the infant health and development program. Pediatrics, 117, 771–780.

    Article  PubMed  Google Scholar 

  40. Muennig, P., Schweinhart, L., Montie, J., & Neidell, M. (2009). Effects of a prekindergarten education intervention on adult health: 37-year follow-up results of a randomized controlled trial. American Journal of Public Health, 99(8), 1431–1437.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Reynolds, A. J., Temple, J. A., Ou, S. R., Arteaga, I., & White, B. (2011). School-based early childhood education and age-28 well-being: Effects by timing, dosage, and subgroups. Science, 333(6040), 360–364.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Wake, M., Gerner, B., & Gallagher, S. (2005). Does parents’ evaluation of developmental status at school entry predict language, achievement, and quality of life 2 years later? Ambulatory Pediatrics, 5(3), 143–149.

    Article  PubMed  Google Scholar 

  43. Guevara, J. P., Gerdes, M., Localio, R., Huang, Y. V., Pinto-Martin, J., & Minkovitz, C. S. (2013). Effectiveness of developmental screening in an urban setting. Pediatrics, 131(1), 30–37.

    Article  PubMed  Google Scholar 

  44. Hix-Small, H., Marks, K. P., Squires, J., & Nickel, R. (2007). Impact of implementing developmental screening at 12 and 24 months in a pediatric practice. Pediatrics, 120(2), 381–389.

    Article  PubMed  Google Scholar 

  45. Schonwald, A., Huntington, N., Chan, E., Risko, W., & Bridgemohan, C. (2009). Routine developmental screening implemented in urban primary care settings: More evidence of feasibility and effectiveness. Pediatrics, 123(2), 660–668.

    Article  PubMed  Google Scholar 

  46. Glascoe, F. P. (2013). Collaborating with parents: Using parents’ evaluation of developmental status in early detection and intervention.

    Google Scholar 

  47. Brown, C., & Davis, H. (2006). Receiver operating characteristic curves and related decision measures: A tutorial. Chemometrics and Intelligent Laboratory Systems, 80, 24–38.

    Article  Google Scholar 

  48. Aylward, E. M., Parrilo, P. A., & Slotine, J. J. E. (2008). Stability and robustness analysis of nonlinear systems via contraction metrics and SOS programming. Automatica, 44(8), 2163–2170.

    Article  Google Scholar 

  49. Glascoe, F. P. (2001). Are over-referrals on developmental screening tests really a problem? Archives of Pediatrics and Adolescent Medicine, 155(1), 54–59.

    Article  PubMed  Google Scholar 

  50. Tarini, B. A., Clark, S. J., Pilli, S., Dombkowski, K. J., Korzeniewski, S. J., Gebremariam, A., et al. (2011). False-positive newborn screening result and future health care use in a state Medicaid cohort. Pediatrics, 128(4), 715–722.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Glascoe, F. P., Robertshaw, N. S., Woods, S. K., & Nolensville, T. N. (2016). PEDS: Developmental milestones professional manual. Retrieved from PEDStest.com.

    Google Scholar 

  52. Briggs-Gowan, M. J., & Carter, A. S. (2008). Social-emotional screening status in early childhood predicts elementary school outcomes. Pediatrics, 121, 957–962.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Guy, A., Seaton, S. E., Boyle, E. M., Draper, E. S., & Field, D. J. (2015). Infants born late/moderately preterm are at increased risk for a positive autism screen at 2 years of age. Journal of Pediatrics, 166, 269–275.

    Article  PubMed  Google Scholar 

  54. Ozonoff, S., Young, G. S., Steinfeld, M. B., Hill, M. M., Cook, I., & Hutman, T. (2009). How early do parent concerns predict later autism diagnosis? Journal of Developmental and Behavioral Pediatrics, 30(5), 367–373.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Sokolova, M., Japkowicz, N., & Szpakowicz, S. (2006). Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. Association for the advancement of artificial intelligence digital library proceedings and conference papers. Retrieved from https://www.aaai.org/Papers/Workshops/2006/WS-06-06/WS06-06-006.pdf.

  56. Brixey, S., Siddique, I., Cohn, J., Johnson, S., Hamilton, C., Li, S., et al. (2009, May). Developmental screening in an urban pediatric clinic. Presentation to the Pediatric Academic Societies Annual Meeting. Retrieved from www.pas-meeting.org.

  57. Roux, A. M., Herrera, P., Wold, C. M., Dunkle, M. C., Glascoe, F. P., & Shattuck, P. T. (2012). Reaching underserved children with autism screening: The 211LA developmental screening project. American Journal of Preventive Medicine, 43, 514–530.

    Article  Google Scholar 

  58. Dobrez, D., Sasso, A. L., Holl, J., Shalowitz, M., Leon, S., & Budetti, P. (2001). Estimating the cost of developmental and behavioral screening of preschool children in general pediatric practice. Pediatrics, 108(4), 913–922.

    Article  PubMed  Google Scholar 

  59. Glascoe, F. P., Foster, E. M., & Wolraich, M. L. (1997). An economic analysis of developmental detection methods. Pediatrics, 99(6), 830–837.

    Article  PubMed  Google Scholar 

  60. Glascoe, F. P., Marks, K. P., Bauer, N. S., Kliegman, R. M., Behrman, R. E., Jenson, H. B., et al. Developmental screening and surveillance. Philadelphia, PA

    Google Scholar 

  61. LaRosa, A. (2016). Developmental and behavioral screening tests in primary care. PediatricsUpToDate. Retrieved from www.uptodate.com.

  62. Glascoe, F. P., & Squires, J. (2009). Questions about the ability of broad-band screens to detect children with ASD. Journal of Developmental and Behavioral Pediatrics, 30, 174.

    Article  PubMed  Google Scholar 

  63. Limbos, M. M., & Joyce, D. P. (2011). Comparison of the ASQ and PEDS in screening for developmental delay in children presenting for primary care. Journal of Developmental and Behavioral Pediatrics, 32(7), 499–511.

    Article  PubMed  Google Scholar 

  64. Marks, K. P. (2007). Should general pediatricians not select the Ages & Stages Questionnaire in light of the Rydz et al study? Pediatrics, 120(2), 457–458.

    Article  PubMed  Google Scholar 

  65. Pinto-Martin, J. A., Young, L. M., Mandell, D. S., Poghosyan, L., Giarelli, E., & Levy, S. E. (2008). Screening strategies for autism spectrum disorders in pediatric primary care. Journal of Developmental Behavioral Pediatrics, 29, 345–350.

    Article  PubMed  Google Scholar 

  66. Kiermer, V. (2014). Eureka once, eureka twice. Scientific American, 310(5), 13.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frances Page Glascoe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Glascoe, F.P., Cairney, J. (2018). Best Practices in Test Construction for Developmental-Behavioral Measures: Quality Standards for Reviewers and Researchers. In: Needelman, H., Jackson, B. (eds) Follow-Up for NICU Graduates. Springer, Cham. https://doi.org/10.1007/978-3-319-73275-6_15

Download citation

Publish with us

Policies and ethics