This was extremely informative and useful, challenging my notions of assessment. As the basis for his theoretical standpoint Andrew used these texts
- Brennan, R (2004), Educational Measurement (4th edition). Westport, CT: Greenwood
- Downing, S (2006) Twelve Steps for Effective Test Development in Downing, S and Haldyna, T (2006) Handbook of TEst Development. NY: Routledge
- Gronlund, N (2005), Assessment of Student Achievement (8th edition). NY: Allyn and Bacon [NB 9th edition (2008) now available by Gronlund and Waugh]
The main premise, after Gronlund, is that there is no such thing as a valid test/assessment per se. The validity is driven by the purposes of the test. Thus a test that may well be valid in one context may not be in another. The validity, he argued, is driven by the uses to which the assessment is put. In this respect, he gave an analagy with money. Money only has value when it is put to some use. The ntoes themselves are fairly worthless (except in the esoteric world of the numismatist). Assessments, analogously, have no validity until they are put to use.
Thus a test of English for entrance to a UK university (IELTS) is valid if, the UK university system validates it. Here then is the concept of consequential validity. It is also only valid if it fits the context of those taking it. Here is the concept of face validity – the assessment must be ‘appealing’ to those taking it.
Despite these different facets of validity (and others were covered – predictive validity, concurrent validity, construct validity, content validity), Gronlund argues that validity is a unitary concept. This echoes Cronbach and Messick as discussed earlier. There is no validity without all of these facets I suppose would be one way of looking at this.
Gronlund also argues that validity cannot itself be determined – it can only be inferred. In particular, inferred from statements that are made about, and uses that are made of, the assessment.
The full list of chacteristics that were cited from Gronlund are that validity
- is inferred from available evidence and not measured itself
- depends on many different types of evidence
- is expressed by degree (high, moderate, low)
- is specific to a particular use
- refers to the inferences drawn, not the instrument
- is a unitary concept
- is concerned with the consequences of using an assessment
Some issues arising for me here are that the purposes of ICT assessment at 16 are sometimes, perhaps, far from clear. Is it to certificate someone’s capability in ICT so that they may do a particular type of job, or have a level of skills for employment generally, or have an underpinning for further study or have general life skills, or something else, or all of these? Is ‘success’ in assessment of ICT at 16 a necessary pre requisite for A level study? For entrance to college? For employment?
In particular I think the issue that hit me hardest was – is there face validity: do the students perceive it as a valid assessment (whatever ‘it’ is).
One final point – reliability was considered to be an aspect of validity (scoring validity in the ESOL framework of CA).