Construct, Framework and Test Development—From IRT Perspectives

Wu, Margaret; Tam, Hak Ping; Jen, Tsung-Hau

doi:10.1007/978-981-10-3302-5_2

Margaret Wu^4,5,
Hak Ping Tam⁶ &
Tsung-Hau Jen⁷

2606 Accesses
2 Citations

Abstract

In Chap. 1, the terms “latent trait” and “construct” are used to refer to the psycho-social attributes that are of interest to be measured.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adams RJ, Wu ML (2002) PISA 2000 technical report. OECD, Paris
Google Scholar
Downing SM, Haladyna TM (eds) (2006) Handbook of test development. Lawrence Erlbaum Associates, Mahwah, NJ
Google Scholar
Mullis I, Martin M, Ruddock G, O’Sullivan C, Preuschoff C (2009) TIMSS 2011 Assessment Frameworks. TIMSS & PIRLS International Study Center Lynch School of Education Boston College, Boston, MA
Google Scholar
OECD (2013) PISA 2012 Assessment and analytical framework: mathematics, reading, science, problem solving and financial literacy. OECD Publishing, Paris. doi:10.1787/9789264190511-en
Osterlind SJ (2002) Constructing test items: Multiple-choice, constructed-response, performance, and other formats, 2nd edn. Kluwer Academic Publishers, New York
Google Scholar
Reckase MD, Ackerman TA, Carlson JE (1988) Building a unidimensional test using multidimensional items. J Educ Meas 25:193–203
Article Google Scholar

Author information

Authors and Affiliations

National Taiwan Normal University, Taipei, Taiwan
Margaret Wu
Educational Measurement Solutions, Melbourne, Australia
Margaret Wu
Graduate Institute of Science Education, National Taiwan Normal University, Taipei, Taiwan
Hak Ping Tam
National Taiwan Normal University, Taipei, Taiwan
Tsung-Hau Jen

Authors

Margaret Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hak Ping Tam
View author publications
You can also search for this author in PubMed Google Scholar
Tsung-Hau Jen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Margaret Wu .

Appendices

Discussion Points

(1)
In many cases, the clients of a project provide a pre-defined framework, containing specific test blueprints, such as the one shown in Fig. 2.4.
Fig. 2.4
Example client specifications for a test
Full size image

These frameworks and test blueprints were usually developed with no explicit consideration of the latent trait model. So when we assess items from the perspective of item response models, we often face a dilemma whether to reject an item because the item does not fit the latent trait model, but yet the item belongs to part of the blueprint specified by the clients. How do we reconcile the ideals of measurement against client demands?

(2)
To what extent do we make our test “unidimensional”? Consider a spelling test. Spelling words generally have different discriminating power, as shown in the following examples.

Can we select only spelling words that have the same discriminating power to ensure we have “unidimensionality”, and call that a spelling test? If we include a random sample of spelling words with varying discriminating power, what are the consequences in terms of the departure from the ideals of measurement?

(3)
Can we assume that the developmental stages from year 1 to year 12 form one unidimensional scale? If not, how do we carry out equating across the year levels?

Exercises

In the SACMEQ project, some variables were combined to form a composite variable. For example, the following seven variables were combined to derive a composite score:

24.
How often does a person other than your teacher make sure that you have done your homework?

(Please tick only one box.)

PHMWKDON

(1)	I do not get any homework
(2)	Never
(3)	Sometimes
(4)	Most of the time

25.
How often does a person other than your teacher usually help you with your homework?

(Please tick only one box.)

PHMWKHLP

(1)	I do not get any homework
(2)	Never
(3)	Sometimes
(4)	Most of the time

26.
How often does a person other than your teacher ask you to read to him/her?

(Please tick only one box.)

PREAD

(1)	Never
(2)	Sometimes
(3)	Most of the time

27.
How often does a person other than your teacher ask you to do mathematical calculations?

(Please tick only one box.)

PCALC

(1)	Never
(2)	Sometimes
(3)	Most of the time

28.
How often does a person other than your teacher ask you questions about what you have been reading?

(Please tick only one box.)

PQUESTR

(1)	Never
(2)	Sometimes
(3)	Most of the time

29.
How often does a person other than your teacher ask you questions about what you have been doing in Mathematics?

(Please tick only one box.)

PQUESTM

(1)	Never
(2)	Sometimes
(3)	Most of the time

30.
How often does a person other than your teacher look at the work that you have completed at school?

(Please tick only one box.)

PLOOKWK

(1)	Never
(2)	Sometimes
(3)	Most of the time

The composite score, ZPHINT, is an aggregate of the above seven variables.

Q1. In the context of IRT, the value of ZPHINT can be regarded as reflecting the level of a construct, where the seven individual variables are manifest variables. In a few lines, describe what this construct may be.

Q2. For the score of the composite variable to be meaningful and interpretable in the context of IRT, what are the underlying assumptions regarding the seven indicator variables?

Q3. In evaluating the quality of test items, which one of the following is the most undesirable outcome for an item?

The item is difficult and less than 25% of the students obtained the correct answer
One distractor attracted only 5% of the responses. That is, one distractor is not “working” well
The percentage correct for high ability students is about the same as the percentage correct for low ability students
A lot of students skipped this question because they don’t know the answer

Q4. In determining the maximum score of an item (e.g., an item is worth two or four marks), which of the following is the most important consideration?

The number of steps needed to reach the final answer
The difficulty of the question. The more difficult, the higher the maximum score should be
The range of possible responses. If there are more different responses, there should be more score points
The extent to which a question can separate good and poor students

Q5. Answer TRUE or FALSE to the following statement:

For an item where the maximum score is more than 1 (e.g., an item with a maximum score of 3), the scores (0, 1, 2, 3) should reflect increasing difficulties of the expected responses. That is, the assignment of the scores to responses should reflect an increasing ability, where a student receiving a higher score is expected to have a higher ability than a student receiving a lower score on this item.

TRUE/FALSE

Q6. Can you have a think about Questions 4 and 5. Write a short summary about the considerations of the assignment of partial credit scores within an item, and across items?

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wu, M., Tam, H.P., Jen, TH. (2016). Construct, Framework and Test Development—From IRT Perspectives. In: Educational Measurement for Applied Researchers. Springer, Singapore. https://doi.org/10.1007/978-981-10-3302-5_2

Download citation

DOI: https://doi.org/10.1007/978-981-10-3302-5_2
Published: 03 January 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3300-1
Online ISBN: 978-981-10-3302-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Construct, Framework and Test Development—From IRT Perspectives

Abstract

Access this chapter

References

Further Reading