Wiliam’s model of construct validity (1996)

9 January 2007

Wiliam (1996) offers a model that starts from Messick’s four-facet model (1) of validity (subsequently, (1996), enhanced to six facets) and applies it the National Curriculum. Wiliam’s analysis has much to offer when looking at assessment at 16. He takes Messick’s distinction of the evidential and consequential in assessment and adds Moss’s (1992) interpretative basis to the former. Assessment validity needs to be looked at through the evidence, the interpretation and the impact (consequence). For each of these two bases – evidential/interpretive and consequential – Wiliam then builds on Messick’s other dimension of within- and beyond-domain.

Wiliam (1994)
Wiliam then examines each of the four zones in turn.

In regard of within-domain inferences Wiliam explains the work of Popham and others in trying to establish valid tests that test all, and only, the domain that is intended to be tested. The concluding criticism of the validity NC tests may well apply to any external traditional examination – they are unrepresentative of the domain because of their length compared to the length/volume of learning.

For beyond-domain inferences Wiliam cites the predictive nature of the use of test results. High performance in X predicts high performance in Y. He cites Guilford in saying that it doesn’t matter how this correlation is arrived at, merely that it is reliable. The test might not be valid though as it may not be in the same domain. For ICT at 16 there may be aspects of the achievement that is given far greater importance than maybe it should. A learner gets Key Skills level 2 in ICT (2) therefore s/he is functionally literate in ICT. It doesn’t matter how the level 2 was achieved.

Within-domain impact is of particular importance to the design of ICT assessments, I believe. Hence the move towards onscreen testing – it’s ICT so the the technology must be used to assess the capability. In Wiliam’s words, it “must look right” (p132).

Finally, Wiliam considers beyond-domain impact or consequence. In looking at National Curriculum testing, Wiliam argues, some of the validity is driven (or driven away) by beyond-domain impacts such as league tables – these are much higher stakes for schools than learners and so the validity of the assessment is corrupted.

(1) Messick, “Validity,” 20; Lorrie A. Shepard, “Evaluating Test Validity,” in Review of Educational Research, ed. Linda Darling-Hammond (Washington, DC: AERA, 1993), 405-50. cited in Orton (1994)

(2) The functional/key skill component of ICT learning is referred to as IT


10/01/07 Post on Embretson (1983)

11/01/07 Post on Moss (1992) 

The Futurelab model and ICT at 16

9 January 2007

So, taking the model from the Futurelab literature review, how might the dimensions of construct validity manifest themselves in assessment of ICT at 16 – the domain of my study.

Content validity: are items fully representative of the topic being measured?

Here might be included a study of what is included in assessments and an analysis of those against the stated assessment objectives, the content of specifications and, coming back to my specific focus, the topic (ICT learning) as constructed by the learners. What do 16-year olds identify as ICT?

Convergent validity: given the domain definition, are constructs which should be related to each other actually observed to be related to each other?

Here there is something about the relationship between the things above I think. Is there convergence between the assessment objectives, between learners’ constructs and between the two sets? I think there is more to explore here but haven’t quite got my head around it yet…

Discriminant validity: given the domain definition, are constructs which should not be related to each other actually observed to be unrelated?

This is more tricky. Why would there be “constructs which should not be related to each other”? Is this to do with identifying things that are mutually exclusive? Is formal and informal learning ever like this?

Concurrent validity: does the test correlate highly with other tests which supposedly measure the same things?

This too is tricky, but there is something here for me about the relationship between teacher assessment and test results I think