Sir Mike Tomlinson lecture

22 October 2008

Sir Mike Tomlinson, chair of the working group on 14-19 reform that led to the 2003 Tomlinson Report, came to NTU today to give the RSA Shipley Lecture. This year the lecture was also a memorial to former NTU and RSA stalwart Anne Bloomfield. The subject, dear to her heart and to Sir Mike’s, was “Vocational education should be a rounded education“.

With the backdrop of the history of attempted introductions of vocational education (the 1944 Butler Education Act with its tripartite system, TVEI, GNVQs, Curriculum 2000 and Diplomas), Tomlinson argued for the move away from debates about ‘parity of esteem’ towards a view of the ‘credibility and value’ of qualifications. Echoes here of the value and validity arguments of Monday’s seminar at Cambridge.

It was also notable that the lecture included issues of how ‘true’ vocational education must have

  • relevance to 16-year olds (face validity),
  • a knowledge base that is used in, and applied to, occupational areas – however broadly drawn (validty determined by use, not by the test itself)
  • a theoretical content balanced with sector-based skills (content validity)

Again this echoes with Monday. Another thread running through was the role of assessment (systems) in undermining the vocational educational initiatives – TVEI assessment becoming ‘traditional’, GNVQ assessment being changed to equate to GCSE/A level, key skills being decoupled from GNVQs, Curriculum 2000’s insistence on equal numbers of units per qualification with a convergence of assessment types.

Also mentioned, although not in the same sense of ‘undermining’ was the persistence of the BTEC model and the way that NVQs were never envisaged to be other than accreditation mechanisms for on the job training.

The BTEC model of occupational credibility and general education was the model that was paramount in the description of vocational education with the caveat ‘what is general education’?

Throughout I was wondering where ICT fits into all this. Never mentioned as a ‘subject’ nor even as a ‘skill’ it was conspicuous by its absence. It is, of course, present in the specialised diplomas and as a functional skill although the former may be bedevilled by the wide diversity of the sector it is serving, I fear.

Tomlinson was upbeat about the Diplomas but focused especially on the need to get a true progression from level 1 through to 3. The custom of level 1 being what you get if you fail level 2 (GCSE grades D-G rather than A*-C) must not be repeated he urged. Also the need to get level 2 threshold systems so that learners who do not reach that threshold at , I would say the magical (and arbitrary, age of 16 could do so by subsequent credit points accumulation – rather than ‘repeating GCSEs’, a model that doesn’t serve well, he argued.

Another hour of useful insights.

Cambridge Assessment seminar

21 October 2008

I attended  a seminar, on the subject of validity, one of a series of events run by Cambridge Assessment (CA). It was led by Andrew Watts from CA.

This was extremely informative and useful, challenging my notions of assessment. As the basis for his theoretical standpoint Andrew used these  texts 

  • Brennan, R (2004), Educational Measurement (4th edition). Westport, CT: Greenwood
  • Downing, S (2006) Twelve Steps for Effective Test Development in Downing, S and Haldyna, T (2006) Handbook of TEst Development. NY: Routledge
  • Gronlund, N (2005), Assessment of Student Achievement (8th edition). NY: Allyn and Bacon [NB 9th edition (2008) now available by Gronlund and Waugh]

He also referred to articles published in CA’s Research Matters and used some of the IELTS materials as examplars. 

The main premise, after Gronlund, is that there is no such thing as a valid test/assessment per se. The validity is driven by the purposes of the test. Thus a test that may well be valid in one context may not be in another. The validity, he argued, is driven by the uses to which the assessment is put. In this respect, he gave an analagy with money. Money only has value when it is put to some use. The ntoes themselves are fairly worthless (except in the esoteric world of the numismatist). Assessments, analogously, have no validity until they are put to use.

Thus a test of English for entrance to a UK university (IELTS) is valid if, the UK university system validates it. Here then is the concept of consequential validity.  It is also only valid if it fits the context of those taking it. Here is the concept of face validity – the assessment must be ‘appealing’ to those taking it.

Despite these different facets of validity (and others were covered – predictive validity, concurrent validity, construct validity, content validity), Gronlund argues that validity is a unitary concept. This echoes Cronbach and Messick as discussed earlier. There is no validity without all of these facets I suppose would be one way of looking at this.

Gronlund also argues that validity cannot itself be determined – it can only be inferred. In particular, inferred from statements that are made about, and uses that are made of, the assessment.

The full list of chacteristics that were cited from Gronlund are that validity

  • is inferred from available evidence and not measured itself
  • depends on many different types of evidence
  • is expressed by degree (high, moderate, low)
  • is specific to a particular use
  • refers to the inferences drawn, not the instrument
  • is a unitary concept
  • is concerned with the consequences of using an assessment

Some issues arising for me here are that the purposes of ICT assessment at 16 are sometimes, perhaps, far from clear. Is it to certificate someone’s capability in ICT so that they may do a particular type of job, or have a level of skills for employment generally, or have an underpinning for further study or have general life skills, or something else, or all of these? Is ‘success’ in assessment of ICT at 16 a necessary pre requisite for A level study? For entrance to college? For employment? 

In particular I think the issue that hit me hardest was – is there face validity: do the students perceive it as a valid assessment (whatever ‘it’ is).

One final point – reliability was considered to be an aspect of validity (scoring validity in the ESOL framework of CA).

KS3 SATS scrapped in England

16 October 2008

This somewhat unexpected announcement was made this week. Tests for 14 year olds in maths, English and science have been scrapped. Given that many schools start their GCSE/level 2 courses at 13 now, especially in ICT, this might change radically the ways in which the middle years of secondary are organised. It may also affect students’ perceptions of assessment as they will not have had those high stakes external tests at 14.

History of assessment

14 October 2008

Blindingly obvious I guess, but nevertheless not a field I have mined much. “History of assessment” needto be a significant contextual filter for my research.

I am attending some of the seminars of the Cambridge Assessment Network as and when I can (Kathryn Ecclestone on Ofqual, Harry Torrance on Policy and Practice). There are others that I have been unable to attend but would like to have done. Fortunately useful overviews such as this one from Helen Patrick are often put online.

Becta report into Web 2.0

6 October 2008

Becta have published a research report (Crook & Harrison, 2008) on use of Web 2.0 by learners, teachers, at home, in school etc. This statement in the summary caught my eye:

Pupils feel a sense of ownership and engagement when they publish their work online and this can encourage attention to detail and an overall improved quality of work. Some teachers reported using publication of work to encourage peer assessment.

Where is the use of these tools in extrenal assessment? Come to that, where is the use of peer assessment in external assessment.

Also noticeable is the emergence of yet another Rogers’ adoption curve – with the earley adopters being the young ones etc… is this true? DO teachers really not use Web 2.0 tools? How does that square with the quote above? It is borne out in the research of Solheim (2007,  which cites OfCOM’s 2006 statistic of 40% of ‘adults’ having used social networking sites*, compared to 70% of 16-24 year olds and Comscore’s 2007 finding that over half (1.3 million) of Facebook’s new users in the previous year were 25 or older.

* OfCOM’2 2008 report tates that ‘only’ 21% have ‘set up a profile’ on such sites.

Comscore (2007), Facebook sees flood of new traffic [online] available at accessed 06/10/08

Crook, C and Harrison, C (2008)  Web 2.0 Technologies for Learning at Key Stages 3 and 4: Summary Report, Becta [online] available at accessed 06/10/08

Ofcom (2006), The communications market. [online] available at accessed 06/10/08

Ofcom (2008),The communications market report, [online] available at accessed 06/10/08

Solheim, H (2007) Digital Natives versus Higher Education: who is ready for whom, MSc dissertation, University of Southampton.

Taxonomy of difficulties in the assessment of ICT

2 October 2008

This paper, from the Assessment in Education Unit at Leeds, is again bang on the line of what I am doing. Albeit that they are looking at the KS3 onscreen ICT tests (the AEU were commissioned as part of the evaluation of that project). Nevertheless there are some very pertinent analyses of the didfficulties students, and the system, encounter in assessment of ICT. For example

… sources of difficulty that relate to the subject being assessed. The assessment of ICT brings particular risks. As McFarlane (2001) points out, assessment of ICT capability need not in itself be computer-based, but as in this case it was, the sources of difficulty in our set that are associated with this aspect all relate also to on-screen assessment, e.g.

Pupils know enough to succeed in the tasks without using ICT for all the steps.

The demands in the interaction between tasks and software on short-term memory and organisational skills are greater than the level of ICT capability that is being assessed.

Activity theory and ICT

2 October 2008

Now that the new academic year is under way, and I am in a new job (again) I hope to be able to crack on with this project. If not.. well I need to decide one way or the other.

Anyway, I am in the middle of three days study leave and ma busily writing up what I have so far. In doing so I have also come across some useful things which I will include here.

Firstly a paper from the School of Education conference at Leeds University by Aisha Walker. This makes some interesting links between activity theory and attainment in ICT and provides a model that may be useful when I come to look at the data colleection and analysis. 

The title “What does it mean to be good at ICT” really caught my eye. That’s what it’s all about isn’t it?

A three axis model

1 May 2008

Taking the ideas from the previous post and putting them into a diagram I get this

Some assessment uses ICT (or technology) – this is e-assessment (x axis).

Some assessment is designed to assess ICT capability (y axis).

Elliott’s Assessment 2.0 seems to be using ICT, not as e-assessment, but as a medium for allowing judgement to be made about the ICT capability (z axis).

Now of course, analysing any one particular assessment methodology one could locate it in this three-dimensional space. for example:

A traditional written paper would be on the y-axis. The NAA online assessment activities designed for KS3 would be in the space between all three axes (with perhaps a lower y- and z-values than x-value. Coursework would have an x-value of 0 but would have some components of y and z. Online assessments such as the driving test would be on the x-axis.

My questions here are “Where is the highest validity”? and “Where is the highest reliability?”. How does one use Elliott’s Assessment 2.0 to determine success in a certificated qualification?

Elliott (2008) Assessment 2.0

1 May 2008

Much as I dislike the nomenclature (Assessment 2.0), I found this paper by Bobby Elliott (and thanks to my colleague Bruce Nightingale and the ALT newsletter for bring the name to my attention) illuminating on many levels. Firstly here was someone making the links between theways in which technology is reportedly used by young people and the ways it could be used for technology. Secondly the author works for a government agency – the Scottish Qualifications Authority (SQA). Is this evidence of policymakers thoughts are changing to embrace the vicarious ways in which evidence of learning can be presented by technological opportunity?

My thoughts return though to the Macfarlane distinction between assessment of technology (eg the ICT curriculum) and assessment through technology (ie the methodology). This paper by Elliott seems to be moving a little away from the latter and perhaps towards the former. But perhaps, also, it is defining a third axis – assessment of technological capability through evidence presented through that technology. Maybe it is asking the question ‘What should be assessing?’ (ie the curriculum) rather than ‘How should we assess it’ (the methodology). But more than that it is saying can we assess the ‘what’ through the ‘how’.

The impressive list of tools that may be used for evidence presenting (and assessment) in Elliott’s paper also underlines my sceptcism of a one size fits all technological solution to assessment. And when I look down that list I am reminded of surveys presented by Terry Freedman (at a TDA conference in Nov 07) and others that show that young people’s use of tools is very diverse and very thinly spread. It is also very transient – MySpace here today gone tomorrow.

The very tool that Elliott uses to present his iPaper may well be a case in point. What if an awarding body decided Scribd was the thing to use. How long before it becomes the sliced bread superceded by the next best thing? How do we build in agility for assessment so that it does not become an exercise in rewarding the fashionable? (as opposed to the current system which rewards the old-fashionable).

PS Yes it’s been a long time… Higher Education management – ie my temporary role for 07/08 – and PhDs don’t easily mix… but I know that is also my excuse… and I’m sticking to it…

General Teaching Council report calls for no school tests for under-16s

11 June 2007

Call to ban all school tests for under-16s | UK News | The Observer (10 June 2007)

“All national exams should be abolished for children under 16 because the stress caused by over-testing is poisoning attitudes towards education, according to an influential teaching body.

In a remarkable attack on the government’s policy of rolling national testing of children from the age of seven, the General Teaching Council is calling for a ‘fundamental and urgent review of the testing regime’. In a report it says exams are failing to improve standards, leaving pupils demotivated and stressed and encouraging bored teenagers to drop out of school.”

Demotivation, stress and, crucially for my work , poisoned attitudes. Will year 11s thoughts on the validity of assessment at 16 be coloured by their experience of testing (and other assessment) pre-16. Fairly inevitable I should think…

QCA annual report on ICT for 2005/06

25 March 2007

QCA  have published their annual report on ICT (and other subjects) as part of their “monitoring the curriculum” exercise. The outputs from this report will (or at least should) influence their review of the secondary curriculum. The report formed part of the basis of this BBC article

Some key points in my reading of the report:

The aims of the national curriculum:

  • There is a clear recognition (by schools) of the potential of ICT to help develop pupils’ enjoyment of, and commitment to, learning.
  • Almost half of year 8 said that they enjoyed ICT compared with only 14 per cent who said that they disliked it.
  • More than a quarter of teacher disagree that the PoS ‘helps give pupils the opportunity to be creative, innovative and enterprising’ supporting previous findings that a significant amount of ICT learning and teaching continues to focus on elementary application of basic skills.
  • QCA believes that there are enormous possibilities in ICT for creativity, enquiry and innovation and the secondary curriculum review has enabled us to bring this to the forefront in the ICT programme of study (PoS). However, there may be additional barriers to using ICT in this way in schools and this needs further investigation.


  • There is work still to be done to assist teachers with assessing ICT. Schools  say they need teacher assessment guidelines/materials for assessing pupils’ progress and continued professional development.
  • QCA has recommended that the on-screen key stage 3 test should be rolled out on a non-statutory basis, but will work to develop the test as a formative assessment tool to support teaching and learning.
  • In the survey of year 8 pupils, nearly a third of pupils felt that the level of ICT work they were being given was too easy.

Questions particular to ICT

  • There remains a lack of consistency and coherence in the ICT qualifications currently on offer, which is unhelpful for users and employers.
  • Although uptake of ICT qualifications continues to rise, enquiries to QCA indicate that for some schools the choice of qualification is made on the basis of points for league tables rather than on the appropriateness of the qualification to the learner.
  • QCA has commissioned an in-depth research probe into the qualifications offered in schools and the progression routes offered post-16. For example, if, at A level, more than 45 per cent of pupils are deciding not to continue with ICT at A2 because of their poor results at AS, it would be useful to find out the prior qualifications of those pupils who decide not to continue or whether there are other factors involved.

Some things to add to the pile of reading…

7 March 2007

Dereks Blog: Taking formal education beyond exams
An article from Malaysia about a change of approach to assessment

DfES: research into ICT
A whole pile of research into ICT in schools and homes

EU: Impact of ICT
Advertised on the newsletter, an impact study in 17 EU nations

QCA review of the National Curriculum

6 February 2007

The Qualifications and Curriculum Authority (QCA) have published the draft of the new National Curriculum (NC) for secondary schools in England. This contains slimmed down programmes of study for all subjects (although ICT was always fairly slim). It also contains a section on the way in which ICT as a functional skill across the curriculum might be specified – and maps this to the subject ICT. This seems an odd mapping in many ways, as both sides of the reationship could be interchanged.

Also of note is the section on assessment strategies. These include some clear guidance on not just relying  the test at the end. I wonder how this will actually pan out at Key Stage 4 (14-16 year olds). Will schools stop entering all and sundry for a full blown level 2 qualification in ICT? Will the functional skills and some assesement of ICT use against KS4 criteria be enough? Or is this too obscure to be bothered with. What will the reporting requirements be for ICT at KS4? It is one of few subjects left in the NC at this level.

There is abother aspect of assesment that caught my eye on first read (and how much harder is a website than a set of printed documents to read!?!) . This was to do with taking a range of evidence

It does mean that more of what learners ordinarily do and know in the classroom is taken into account when teachers come to make a periodic assessment of learners’ progress at the end of term or half-year. For example, all teachers are continually making small-scale judgements about learners’ progress, achievements or the support they require when, over a number of lessons, they are reading or writing a lengthy text, planning and revising a design brief, or researching a historical figure in books or online. Such knowledge tends to be overlooked when only the final outcome, artefact or test is assessed, but it can make a vital contribution to periodic assessment.

What about what learners “ordinarily do” outside the classroom?

BBC: Testing times for school assessment

24 January 2007

The BBC’s Education correspondent Mike Baker gives a very readable account of the changes ahead in the assessment system in his report of 6 January 2007 – Testing times for school assessment.

His main thrust is that changes to the system are coming in. Some of these are reflected in subsequent events that I have written about like the revamping of league tables and possible scrapping of the online ICT test… although the latter of these presumably would have helped personalisation if it was an on-demand test.

The changes, concludes Baker, are due to the growing clamour for that most voguish of educational shibboleths – personalisation.

In the article, he reflects on the Gilbert report from the HM Chief Inspector into personalisation and on how the recommendations of the report might necessarily lead to a greater role for teacher assessment. He ties this in with an IPPR study into the tensions between the dimensions of validity and accountability of assessment. Again teacher assessment is recommended by the authors as a way of enhancing both dimensions. Finally he cites Dylan Wiliam’s research into the ‘shockingly’ (Baker’s word) inaccurate methods of formal assessment.

A very useful summary.

Miles Berry also summarises the Gilbert Report in his blog, again very useful.

Non-formal learning: literature review

14 January 2007

Original post 18/12/06

At the heart of what I am currently interested in is the definition space of learning as represented by the “parameter of fomality”. Namely classifying learning as formal, informal or non-formal or some combination of all three. I blogged on this recently.

Colley, Hodkinson and Malcolm (2002) produced a review of this landscape – very helpful! What is missing though is the extension to school-age students per se – this is where my work should sit.

Update (14/01/07): I have now found this 2003 (?) Futurelab literature review of informal learning with technology outside school. » BBC 2 Wales Christmas idents by six-year olds

14 January 2007

Link from

“So how would a six year old make an ident anyway?

2W ident

A group of six-year olds spent two days creating and filming ‘stop motion’ animations in clay to be broadcast by BBC Wales 2W (the Welsh equivalent of BBC2).

The final results were broadcast and are available on the South Wales Argus website.

The key questions for me are:

  • What did the children learn? According to themselves? According to their teachers? According to the researchers from
  • How would their work have been assessed?

These are six-year olds. What might sixteen-year olds do if given the chance? Does our assessment and qualifications system allow that chance?

Revamped school league tables published

13 January 2007

The DfES have published the 2005/06 school league tables.

Making is debut is the new measure of ‘threshold performance’ – out with 5 A*-C GCSEs or equivalent, in with a level 2 threshold that must include English and mathematics. The threshold is reached when a student has sufficient level 2 passes. This sufficiency is still achieved by 5 GCSEs, and level 2 when the GCSEs are A*-C. In this respect the only change is the mandatory need to pass Enlish and mathematics. It had been feared that making these two subjects a necessary factor in including student ‘success’ in a school’s figures would mean a drastic reduction in average levels of success. The percentage of students reaching threshold fell to 46% – down from 59% under the old measure of 5 A*-C GCSEs. Whilst a significant drop it may not be as bad as some feared.

For my own part, and for ICT, I was interested to see how the change had affected the pioneer of GNVQ IT for all (or at least the majority). Thomas Telford School was lauded a few years ago because of its dramatic increase in 5 A*-C percentage. I had always suspected that this was because of the GNVQ counting for four of them. It is interesting to note therefore that the school registered 95% in the new measure where English and mattics have become compulsory. So maybe 80% (or 4 GCSE-equivalent) qualifications didn’t distort the tables as much as seemed likely. This despite the headlines and horror stories as in this from the BBC:

A few canny pioneers realised that there was a vocational qualification, the GNVQ, which was worth four GCSEs at grade C or above. Add one more GCSE, in any subject, and, hey presto, you meet the target. To be fair, many of the original followers of the GNVQ route felt it offered the best educational option for their pupils. Others, though, realised it was an effective way of avoiding the consequences of falling below government targets. Eventually the government realised that large numbers of schools were achieving the threshold without their pupils achieving GCSE passes in maths and English.

Hence this year’s new requirement that the five GCSE passes must include maths and English. The effects on some schools have been dramatic. One school went from a score of 82% passing the equivalent of five A*-Cs to just 16% when maths and English were included. Many other schools, which had been climbing up the tables in recent years, found themselves slithering back down again. Presumably they will now find new ways of targeting performance in maths and English, no doubt at the cost of something else.

The more significant change though may be use of the phrase ‘ level 2 threshold’. GCSEs are no longer explicitly mentioned in the language of the threshold (even thought the DfES’s own link still says GCSE tables). A GCSE at A*-C is now just a 20% contribution to threshold. Many other qualifications can also contribute, as they could before. But now DIDA, for example, is not described as equivalent to 4 GCSEs but a level 2 DIDA pass is described instead as 80% of threshold.

The list of accredited qualifications, their level, and contribution to threshold is maintained by QCA at the Openquals website (soon to be renamed NDAQ: National Database of Accredited Qualifications). For level 2 ICT/IT, Openquals lists 71 qualifications across the awarding bodies. The market is wide open. The gold standard at 16 has changed.

Also this year saw a change in the value added measure include other factors other than just improvement on performance compared to level 3. Now the profile of the school’s students is taken into account and a figure based on a notional national baseline of 1000 is reported in the Contextual Value Added measure (CVA). Schools such as Greenwood Dale in Nottingham are lauded, quite rightly, for reasons of value-added and not just level 2 performance. This contextual value added doesn’t tell the whole story though, and Leicestershire’s relatively poor performance might have something to do structure in the county. The 14-19 colleges certainly seem poor relations to Nottinghamshire’s comprehensives in respect of CVA even though,  in my opinion, 14-19 is a much more coherent age range. Maybe it is not measuring like with like when there is a change at 14? A classic case of measuring soemthing other than what is puported to be measured?

Moss (1992): validity and assessment of performance

11 January 2007

Pamela Moss’s 1992 paper “Shifting conceptions of validity in educational measurement: implications for performance assessment” (1) is cited by Wiliam in his modification of Messick’s four-facet model. It would seem from the figure I extracted from Wiliam’s paper that he is suggesting that Moss is providing an extra dimension to the evidential paradigm. That was what I saw on first reading. On turning to Moss’s paper and re-reading Wiliam I am now not so sure of where he is placing Moss vis a vis Messick.

Moss’s paper is an overview of the landscape of construct validity from the inception by Cronbach and Meehl in 1955 (2) to its publication in 1992. In doing so she looks at evidential and interpretive aspects of the models of Cronbach (1980) (3) and Messick. The latter is not seen as being purely evidential as Wiliam’s paper might suggest.

The thrust of Moss is that a review was needed of the “Standards” of what might be called the Establishment of (American) assesment and measurement (AERA, APA, NCME). This review, she argues, is because of the emergence of performance assessment as a science (and as commonly used tool) to complement test/item-based assessment. She compares this to the contemporaneous diminuition of the dominance of positivism.

In performance assessment there is a strong interpretive base. The learner will interpret the task and manifest skills, knowledge and understanding through their performance. the assessor will re-interpret this performance to provide evidence to which rules of validity must be applied.

(1) Moss, P (1992) Shifting Conceptions of Validity in Educational Measurement: Implications for Performance Assessment in Review of Educational Research, Vol. 62, No. 3. (Autumn, 1992), pp. 229-258.

(2) Cronbach, L.J. and Meehl, P.E. (1955) Construct validity in psychological tests in Psychological Bulletin, 52, 281-302 also available online at

(3) Cronbach, L.J. (1980). Validity on parole: How can we go straight? in New directions for Testing and Measurement, 5, 99-108.

John Naughton: Welcome to IT class, children; log on and be bored stiff

10 January 2007

Writing in The Observer, John Naughton caricatures another vignette….

There’s a surreal quality to it, conjuring up images of kids trudging into ICT classes and being taught how to use a mouse and click on hyperlinks; receiving instructions in the creation of documents using Microsoft Word and of spreadsheets using Excel; being taught how to create a toy database using Access and a cod PowerPoint presentation; and generally being bored out of their minds.

Then the kids go home and log on to Bebo or MySpace to update their profiles, run half a dozen simultaneous instant messaging conversations, use Skype to make free phone calls, rip music from CDs they’ve borrowed from friends, twiddle their thumbs to send incomprehensible text messages, view silly videos on YouTube and use BitTorrent to download episodes of Lost. When you ask them what they did at school, they grimace and say: ‘We made a PowerPoint presentation, dad. Yuck!’

Wiliam’s model of construct validity (1996)

9 January 2007

Wiliam (1996) offers a model that starts from Messick’s four-facet model (1) of validity (subsequently, (1996), enhanced to six facets) and applies it the National Curriculum. Wiliam’s analysis has much to offer when looking at assessment at 16. He takes Messick’s distinction of the evidential and consequential in assessment and adds Moss’s (1992) interpretative basis to the former. Assessment validity needs to be looked at through the evidence, the interpretation and the impact (consequence). For each of these two bases – evidential/interpretive and consequential – Wiliam then builds on Messick’s other dimension of within- and beyond-domain.

Wiliam (1994)
Wiliam then examines each of the four zones in turn.

In regard of within-domain inferences Wiliam explains the work of Popham and others in trying to establish valid tests that test all, and only, the domain that is intended to be tested. The concluding criticism of the validity NC tests may well apply to any external traditional examination – they are unrepresentative of the domain because of their length compared to the length/volume of learning.

For beyond-domain inferences Wiliam cites the predictive nature of the use of test results. High performance in X predicts high performance in Y. He cites Guilford in saying that it doesn’t matter how this correlation is arrived at, merely that it is reliable. The test might not be valid though as it may not be in the same domain. For ICT at 16 there may be aspects of the achievement that is given far greater importance than maybe it should. A learner gets Key Skills level 2 in ICT (2) therefore s/he is functionally literate in ICT. It doesn’t matter how the level 2 was achieved.

Within-domain impact is of particular importance to the design of ICT assessments, I believe. Hence the move towards onscreen testing – it’s ICT so the the technology must be used to assess the capability. In Wiliam’s words, it “must look right” (p132).

Finally, Wiliam considers beyond-domain impact or consequence. In looking at National Curriculum testing, Wiliam argues, some of the validity is driven (or driven away) by beyond-domain impacts such as league tables – these are much higher stakes for schools than learners and so the validity of the assessment is corrupted.

(1) Messick, “Validity,” 20; Lorrie A. Shepard, “Evaluating Test Validity,” in Review of Educational Research, ed. Linda Darling-Hammond (Washington, DC: AERA, 1993), 405-50. cited in Orton (1994)

(2) The functional/key skill component of ICT learning is referred to as IT


10/01/07 Post on Embretson (1983)

11/01/07 Post on Moss (1992) 

The Futurelab model and ICT at 16

9 January 2007

So, taking the model from the Futurelab literature review, how might the dimensions of construct validity manifest themselves in assessment of ICT at 16 – the domain of my study.

Content validity: are items fully representative of the topic being measured?

Here might be included a study of what is included in assessments and an analysis of those against the stated assessment objectives, the content of specifications and, coming back to my specific focus, the topic (ICT learning) as constructed by the learners. What do 16-year olds identify as ICT?

Convergent validity: given the domain definition, are constructs which should be related to each other actually observed to be related to each other?

Here there is something about the relationship between the things above I think. Is there convergence between the assessment objectives, between learners’ constructs and between the two sets? I think there is more to explore here but haven’t quite got my head around it yet…

Discriminant validity: given the domain definition, are constructs which should not be related to each other actually observed to be unrelated?

This is more tricky. Why would there be “constructs which should not be related to each other”? Is this to do with identifying things that are mutually exclusive? Is formal and informal learning ever like this?

Concurrent validity: does the test correlate highly with other tests which supposedly measure the same things?

This too is tricky, but there is something here for me about the relationship between teacher assessment and test results I think

Futurelab lit review on e-assessment (2004)

7 January 2007

Futurelab, a UK technology and learning ‘thinktank’, commissioned this 2004 literature review (1) on e-assessment. In compiling the report, the authors (Jim Ridgway and Sean McCusker, School of Education, University of Durham and Daniel Pead, School of Education, University of Nottingham) have, not surprisingly, covered a lot of ground to do with assessment per se and not just its technologically-enabled version.

In talking about the use of e-portfolios, the report concludes that “Reliable teacher assessment is enabled. There is likely to be extensive use of teacher assessment of those aspects of performance best judged by humans (including extended pieces of work assembled into portfolios)” (ibid, page 2)… for me, hidden in this is the validity argument. It comes through reliability of teacher assessment, and extended pieces of work. Both of these should help validity I believe.

Section 1 of the report then talks of the nature of assessment – formative and summative. Throughout this the authors continually refer back to the purpose and validity of assessment. Also, the learner is placed at the centre of the the described processes. Of particular note for me is the mendacity quotient, whereby summative assessment often encourages students to actively hide what they don’t know.

Section 2 discusses how and where assessment should be driven, with the focus also on technology as the report is on e-assessment. There are some more generally-applicable points covered here though. “Metcalfe’s Law” of increasing value through networks is used to underpin the need to tie assessment into rapidly increasing technologically-enabled social networks. More simply, perhaps, the use of peer networks for assessment might also be part of this… My work aims to look at the validity of external assessment by using self and peer viewpoints as comparators. In addition to social changes, the report identifies other drivers on assessment change as globalisation, mass education, defending democracy and government-led policies. Here there is a disappointing (for me) relative sparsity of focus on the needs of the learner, although demands of lifelong learning are brought out in the section summary (ibid, page 9).

Section 3 discusses the developments in e-assessment. Or so its section heading states. Actually there is much in here about assessment in general and the need to make it relevant to learner needs and valid. “… some [developments] reflect a desire to improve the technical quality of assessment (such as increased scoring reliability), and to make the assessment process more convenient and more useful to users (by the introduction of on-demand testing, and fast reporting of results, for example).” (ibid, p15).

Under the heading “Opportunities and challenges for e-assessment”, section 4 is a rich vein of resources and opinion on the use of assessment to assess deeper level skills, understandings etc. While the summary of the section appears very parsimonious about what has been written, sub-section 4.1 is full of how assessment should be enabling learning.

Finally, the appendix on page 24 is a good overview of, to use its title, “The fundamentals of assessment” .

(1) Ridgway J, McCusker S and Pead D (2004) Literature Review of E-assessment: A Report for Futurelab [online PDF] available at

Authentic learning and assessment

3 January 2007

The term authentic learning has been around some time, apparently, although I had not come across it before today. This paper cites Archbald and Newman (1998) as the first to apply authentic to learning and assessment, although I’m not sure that this is truly the first use of the term.

It does seem to encapsulate what I am trying to investigate though as this list of points from Maureen O’Rourke (2001) suggests:

Use of ICT to provide students with greater opportunities for communication, collaboration, thinking and creativity also provides us with challenges in terms of authentic assessment. The Australian National Schools Network has recently launched a national Authentic Learning and Digital Portfolios project. Beginning with a focus on the whole person, school communities are clarifying what young people should know, understand and be able to do at particular stages of their education.

[..] The project aims to bring learning and assessment together with:

• students having significant control in the construction of their portfolios
• the portfolio structure providing opportunities for feedback, questioning and reflection
• assessment moving to a more central part of the learning process, conducted with students rather than on them
• rich, authentic tasks providing evidence of learning in multiple domains

Guess I need to follow up the Authentic Learning and Digital Portfolios project, although my experience of DIDA suggests that simply using e-portfolios is no guarantee of authentic assessment or learning.

Some texts on the subject: Tombari and Borich, Guba & Lincoln

“So what’s your PhD about?”

30 December 2006

One of my supervisors, Karen, made the very helpful suggestion that I need to be able to sum up my PhD research to someone I might meet in a ‘lay’ setting – down the pub or on the bus if you like. This situation raised itself over Christmas during family gatherings. “So what’s you PhD about, Pete?”, I was asked. This led to some conversation about ICT and assessment and qualifications. Some from the point of view of professionals, some from parents (of school students), some from ‘lay’ observers of education.

It led onto a discussion at about 3 am as stuff poured from my head. I grabbed a piece of paper and jotted it down. A sort of concept map if you like.

What should students learn? What should be on the curriculum? What is 'should' anyway?

What engages students, learners? Can it be generalised?

What about validity of assessment per se? If teachers teach to tests and learners learn to pass them, does it invalidate the assessment? Is the learning of skills and facts held in to high regard (learning those facts and skills needed to pass)?

Are the previous questions concerns for all subjects not just ICT? What is the specialness about ICT?

How does ICT relate to other subjects? Is the D and T needs/solution/evaluation design cycle a key feature of all learning? Is problem solving at the heart of sustained learning?

What is the relationship of 'assessment objectives' to assessments? Do awarding bodies really reflect the former in the latter?

Whither creativity?

Just a bunch of ideas from the middle of the night but worth noting down maybe…

Validity and validation

20 December 2006

Is an assessment valid? What does this mean? Gipps and Murphy (1994) discuss this semantic issue. They relate validity with bias (or lack of it). If a test, or assessment, is valid it is free from bias (although the opposite is does not necessary follow). They cite Cronbach and Messick’s notions of a unitary model of validity based on the construct. How is the assessment constructed? Does it measure what it intends to measure, and is it free from bias. This construct validity is regarded as one of the three dimensions – the others being content and criterion. They argue that no content or criteria can ever be free from bias, and hence these are less dominant aspects when looking for validity.

On the other hand validity, or validation at least, has a very different meaning. It is used to mean the process of recognising (as valid) that which has been learnt in non-formal settings. See, for example, the ECOTEC project. In higher education this might be equated to the process of Accreditation of Prior Experiential Learning (APEL). In APEL, non-certificated learning is validated against assessment criteria that have been designed to assess formal learning. A judgement (assessment) is made to see if the learning claimed as APEL does equate to that which might be learnt formally. It is used to exempt learners from parts of programmes.

Over arching these two processes, ensuring validity and validation of non-formal learning, is a more thorny concept. That of peer or community validation of skills, knowledge and understanding. Someone regarded as a expert in a field by his or her peers probably has had a more valid assessment of their capability than someone who simply has the piece of paper. Even if that piece of paper has been awarded through scrutiny and validation of some APEL portflio.

Formal, informal and non-formal learning

18 December 2006

I have been musing around the nature of formality in learning and reading some interesting angles eg from Stephen Downes and Graham Attwell. Their arguments, respectively that informal learning is not formless and that informal and formal learning are equally valid, make good sense to me. In the context of my study what a students learn at school about ICT and what they outside of ICT lessons both contribute to their learning. Whether the formal can ever keep up with the informal though is a matter of conjecture. Formal means tested, assessed according to some external criteria (at least it tends to include those things, if not mean them exactly). These take ages to develop and standardise. They cannot keep up. Where is the GCSE that looks at use of wikis?

Anyway… I would like to add that there is a third way. Non-formal. That which is learnt in school but not in formal lessons. And then there is the argument… is that different to the informal learning out of school? I believe it is. In ICT I would categorise ot as the learning to use ICT to support learning (eg in learning English one might use ICT) but which is not assessed. It is incidental to the formal learning. It is not part of the learning objectives framework for the lessons, subjects, students. This comes from reading Michael Eraut from way back in 1994. But it is still relevant today. This is shown by the OECD’s website on the terms. Which is non-formal and which is informal is an interesting semantic sideline. It is interesting to note the OECD placing certification (ie assessment) at the heart of this divide.

Update (21/12) – literature review… and again on 14/01/07… see this blog post.