Scoring Validity in Action

  • Cyril J. Weir
Part of the Research and Practice in Applied Linguistics book series (RPAL)


In Chapter 6 we looked at elements of context validity that need to be considered at the test design stage and made a number of points in relation to test development which could potentially impact on the reliability of our tests. Hughes (2003: Chapter 5) examines a number of these specifically in relation to reliability. He provides a set of guidelines for making the test task itself more likely to produce reliable scores:
  • take enough samples of behaviour;

  • do not allow candidates too much freedom of choice;

  • write unambiguous items;

  • provide clear and explicit instructions;

  • ensure that tests are well laid out and perfectly legible;

  • make candidates familiar with format and testing techniques;

  • provide uniform and non-distracting conditions of administration;

  • use items that permit scoring which is as objective as possible;

  • make comparisons between candidates as direct as possible.

And in relation to the scoring of test performance itself:
  • provide a detailed scoring key;

  • train scorers;

  • agree acceptable responses and appropriate scores at the outset of scoring;

  • exclude items which do not discriminate well between weaker and stronger student;


Language Testing Test Task Rater Training Scoring Validity Oral Test 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Further reading

  1. Brown (1991) provides an accessible introduction to statistics for testers.Google Scholar
  2. Bryman and Cramer (2001) provide a good introduction to SPSS.Google Scholar
  3. Council of Europe (2001) provides a number of scales that might be useful as a basis for customizing to your own needs.Google Scholar
  4. Crocker and Algina (1986) provide a comprehensive explanation of the statistics and concepts discussed in the chapter.Google Scholar
  5. Hughes (2003: Chapter 5) provides a refreshingly accessible overview of all aspects of reliability, including worked examples of using IRT in Appendix 1.Google Scholar
  6. Weigle (2002) provides numerous examples of scales for writing and Luoma (2004) does the same for speaking.Google Scholar

Copyright information

© Cyril J. Weir 2005

Authors and Affiliations

  • Cyril J. Weir
    • 1
  1. 1.Centre for Research in Testing, Evaluation and Curriculum (CRTEC)Roehampton UniversityUK

Personalised recommendations