This BBC report asks if there is “Too much technology in the classroom”?. A fairly light piece, it does make reference to the interface between students’ use of technology outside of school and in it.
Smith C, Dakers J, Dow W, Head G, Sutherland M, Irwin R (2005) A systematic review of what pupils, aged 11–16, believe impacts on their motivation to learn in the classroom. In: Research Evidence in Education Library. London: EPPI-Centre, Social Science Research Unit, Institute of Education, University of London.
This EPPI review, cited by Gilbert, is focusing on motivation of 11-16 year olds. Its main findings identify six themes in the key to motivation. Each theme may have some relevance here. Italics represent direct quotes from the summary of the review.
- The role of self : how is the learner’s own constructs represented in their view of learning? How does the role of the ‘group’ affect this?
- Utility: Students are more motivated by activities they perceive to be useful or relevant.
- Pedagogical issues: Pupils prefer activities that are fun, collaborative, informal and active.
- Influence of peers: Linked to role of self
- Learning. Pupils believe that effort is important and can make a difference; they are influenced by the expectations of teachers and the wider community.
- Curriculum. A curriculum can isolate pupils from their peers and from the subject matter. Some pupils believe it is restricted in what it recognises as achievement; assessment influences how pupils see themselves as learners and social beings. The way that the curriculum is mediated can send messages that it is not accessible at all.
In this last point, the role of assessment is raised. So what does the review have to say about assessment in general?
The way that assessment of the curriculum is constructed and practised in school appears to influence how pupils see themselves as learners and social beings. (Summary, page 4)
… assessment [has a role] in nurturing or negatively influencing motivation (page 6 and page 63)
…the recent systematic review of the impact of summative assessment and tests on student’s motivation for learning acknowledges that ‘motivation is a complex concept’ that ‘embraces… self efficacy, self regulation, interest, locus of control, self esteem, goal orientation and learning disposition’ (Harlen and Deakin Crick, 2002:1) (page 8 of the EPPI review)
Students’ motivation is influenced by their ‘affective assessment’ (Rychlak, 1988) of events, premises and actions which are perceived as meaningful to their existence. (page 35, and linked to ‘logical learning theory’ (uncited))
Student satisfaction with their ‘academic performance tended to be influenced both by grouping, curricular and assessment practices and by its relationship to perceived vocational opportunities’ (Hufton et al., 2002:282). (page 45)
…learning situations that were authentic – in other words, appeared real and relevant to the pupils – could positively influence pupil motivation… ‘Sharing the assessment process with students is another way to capture students’ motivation…When students and teachers analyse pieces of writing together in an exchange of views, students can retain a sense of individual authority as authors and teachers convey standards of writing in an authentic context’ (Potter et al. 2001:53) (page 47 of EPPI)
Harlen W, Deakin Crick R (2002) A systematic review of the impact of summative assessment and tests on students motivation for learning. Version 1.1. In: Research Evidence in Education Library. London: EPPI-Centre, Social Science Research Unit, Institute of Education, University of London.
Hufton NR, Elliott JG, Illushin L (2002) Educational motivation and engagement: qualitative accounts from three countries. British Educational Research Journal 28: 265–289.
Potter EF, McCormick CB, Busching BA (2001) Academic and life goals: insights from adolescent writers. High School Journal 85: 45–55.
The Gilbert report on Education 2020 contains a wealth of findings (or sentiments anyway) that have relevance to my research.
Personalisation, it begins, means assessment-centred, learner-centred and knowledge-centred… “Close attention is paid to learners’ knowledge, skills, understanding and attitudes. Learning is connected to what they already know (including from outside the classroom).”… “Sufficient time is always given for learners’ reflection.” (page 8 and citing Branford et al, 2000) – this ties in well with the meta-learning findings of Demos (2007).
“…schools therefore need increasingly to respond to: [..] far greater access to, and reliance on, technology as a means of conducting daily interactions and transactions ” (page 10, with references in Annex B). “The pace of technological change will continue to increase exponentially. Increases in ‘bandwidth’ will lead to arise in internet-based services, particularly access to video and television. Costs associated with hardware, software and data storage will decrease further. This is likely to result in near-universal access to personal, multi-functional devices, smarter software integrated with global standards and increasing amounts of information being available to search on line (with faster search engines). Using ICT will be natural for most pupils and for an increasing majority of teachers. ” (page 11)
“strengthening the relationship between learning and teaching through: … dialogue between teachers and pupils, encouraging pupils to explore their ideas through talk, to ask and answer questions, to listen to their teachers and peers, to build on the ideas of others and to reflect on what they have learnt” (page 15)
“Pupils are more likely to be engaged with the curriculum they are offered if they believe it is relevant and if they are given opportunities to take ownership of their learning. Learning, clearly, is not confined to the time they spend in school” (page 22, citing EPPI, 2005)
Figure 4 – Ways in which technology might contribute to personalising learning (page 29)
The recommendations on page 30 stop someway short of recognising the relationship between technology inside and outside of formal classroom use however. There is a nod towards it in this extract: “We recommend that…all local authorities should develop plans for engaging all schools in their area on how personalising learning could and should influence the way they approach capital projects… Alongside the design of school buildings, schools will need to consider: – what kind of ICT investment and infrastructure will support desired new ways of working – how the school site and environment beyond the buildings can promote learning and pupils’ engagement… goverment should set standards for software, tools and services commonly used by schools to facilitate exchange and collaboration within and between schools software packages from home.”
Bransford J.D., Brown A. L. and Cocking R. (eds.), How people learn: brain, mind, experience and school, National Academy Press, Washington DC, 2000. teaching principles and the design of quality tools for educators. Technical report
Eppi Centre Review: Asystematic review of what pupils, aged 11-16, believe impacts on their motivation to learn in the classroom, 2005. Available at: http://eppi.ioe.ac.uk/cms/Default.aspx?tabid=304
Wiliam’s paper, referenced by Mike Baker in his BBC summary, is not actually about the validity of National Curriculum (or any other) formal tests per se. It is about the inherent issues of validity and reliability of testing. The reduction of reliability comes from the inability of students to perform exactly the same way in tests. If they were to take the same test several times then they would expect to get different scores, argues Wiliam. This seems intuitively sensible, if impossible to prove as you can’t ever take a test again without it either being a different test or without you learning from your first attempt. The position is a theoretical one. Wiliam uses a simple statistical model to come up with the figures that are used in the BBC report. It is not that a test is 32% inaccurate, but that 32% is the number of misclassifications that might be expected given the nature of testing and quantitative scoring. The stats used by Baker are, themselves, theoretical, and should not be used as ‘headline figures’.
Wiliam then goes on to look at reliability of grades. He points out that we might intuitively know that it would be unreliable to say a student who scores 75% must be ‘better’ than one who scores 74%. But if the results are reported as grades we are more likely to confer reliability to the statement ‘the student achieving the higher level is better ‘.
On validity Wiliam says little in this paper but does point out the tension between validity and reliability. Sometimes making a test reliable means it becomes less valid. He cites the example of the divergent thinker who comes up with an alternative good answer that is not on the markscheme and who therefore receives no credit. this is a standard response by examining teams designed to eliminate differences between markers. While contingencies are always in place to consider exceptional answers, if they are not spotted until the end of the marking period then they cannot be accommodated. If several thousand scripts/tests have already been marked, they cannot be gone back over because one examiner feels that one alternative answer discovered late on should be rewarded. You either reward all those who came up with it or none. Usually it is none for pragmatic reasons, not for reasons of validity.
Wiliam (2000) Reliability, validity, and all that jazz in Education 3-13 vol 29(3) pp 9-13 available online at http://www.aaia.org.uk/pdf/2001DYLANPAPER3.PDF
Wiliam, D. (1992). Some technical issues in assessment: a user’s guide. British Journal for Curriculum and Assessment, 2(3), 11-20.
Wiliam, D. (1996). National curriculum assessments and programmes of study: validity and impact. British Educational Research Journal, 22(1), 129-141.
QCA’s website of accredited qualifications, Openquals, is now known as the National Database of Approved Qualifications (NDAQ). It carries the logos of three of the UK’s qualifications’ authorities – QCA (England), CEA (Northern Ireland) and ACAC (Wales/Cymru). The SQA in Scotland is notable by its absence.
NDAQ is easier to ‘pronounce’, harder to find on Google and is easier on the eye – slightly. The myriad options available at school levels in ICT * are still bewildering. Maybe they will help with ‘personalisation’ but will they help to more validly represent learner’s abilities, achievements, capabilities?
* NDAQ has ICT, Openquals had IT… the nomenclature confusion continues…
The BBC’s Education correspondent Mike Baker gives a very readable account of the changes ahead in the assessment system in his report of 6 January 2007 – Testing times for school assessment.
His main thrust is that changes to the system are coming in. Some of these are reflected in subsequent events that I have written about like the revamping of league tables and possible scrapping of the online ICT test… although the latter of these presumably would have helped personalisation if it was an on-demand test.
The changes, concludes Baker, are due to the growing clamour for that most voguish of educational shibboleths – personalisation.
In the article, he reflects on the Gilbert report from the HM Chief Inspector into personalisation and on how the recommendations of the report might necessarily lead to a greater role for teacher assessment. He ties this in with an IPPR study into the tensions between the dimensions of validity and accountability of assessment. Again teacher assessment is recommended by the authors as a way of enhancing both dimensions. Finally he cites Dylan Wiliam’s research into the ‘shockingly’ (Baker’s word) inaccurate methods of formal assessment.
A very useful summary.
Miles Berry also summarises the Gilbert Report in his blog, again very useful.
One of the features of WordPress (and many other blogs) is the reporting of search terms that have been used, which then result in the blog being found.
Yesterday, the search terms reported included assessment for learning and inclusion.
This got me thinking that I hadn’t really made any use of the simple taxonomy of assessment. Assessment for learning is formative as it informs further learning (Black and Wiliam, 1998). My focus is really on summative learning.
A study of pupil perceptions of assessment for learning (years 7 to 10) was carried out by Cowie (1995). I guess part of my research will be looking at pupil (or student) perceptions of summative learning. It will be interesting to compare results to those found by Cowie.
The Cowie paper was cited on the DfES Standards Site, in a section called the Research Informed Practice Site. I’m not sure about the initials this provides, but the site may well be a useful one both for this research and for my teaching. I hadn’t come across it before. It is useful, not just for its own sake, but because it provides digests of articles…
Black, P and Wiliam, D (1998) Assessment and Classroom Learning in Assessment in Education, Vol. 5, No. 1, pp 7-74
Cowie, B (2005) Pupil commentary on assessment for learning in The Curriculum Journal, Vol. 16, No. 2, June 2005, pp. 137 – 151
DfES (2007), TRIPS – the Research Informed Practice Site, London: DfES [online] available at http://www.standards.dfes.gov.uk/research/ accessed 19/01/07
The Demos report identify groupings of young people. Those that regularly use instant messaging, text messaging and online spaces to interact with peers are classified as ‘everyday communicators’. Those that adopt new technologies and are comfortable with a wide range of technologies are known as ‘digital pioneers’. I have no problem with the latter although I wonder whether the first is a causal or effectual label. Are they ‘everyday communicators’ because the technology enables them to communicate everyday? Or are they using the technology for communication because communicate is what they do – with or without technology?
In other words what is the driver – their need for communication or their ‘digital native’ ability to use technology?
Taking the digital pioneers, the report then identifies four characteristics of their informal learning: self-motivation ownership, purpose and peer-to-peer communication. The last being common to the other group, who are not identified as pioneers.
Taking out of context of the report these are fairly unremarkable. We learn best when we are self motivated, take ownership and have a purpose. Maybe the difference here is in the ownership. Digital pioneers take ownership of the technology perhaps. They go beyond the everyday use, exploiting new techniques and resources. These are the ones who are comfortable in trying out new technological tools to develop their learning – often manifested through creative products such as multimedia. That, at least, would make a far more distinctive definition for me.
And how does this relate to assessment? Is there something in the notion of self-motivation and ownership that distinguishes the higher levels? In another guise this week I have been looking at final year undergraduate and first year postgraduate assessment criteria. Higher levels of achievement are marked by ‘autonomy’. What is this if it is not creativity borne out of self-motivation and ownership?
Returning to the Demos report I see a weak if discernible thread running through it – the relationship between creativity and technology. Or, put another way, the ways in which technology supports or enhances creative skills.
The report cites earlier Demos research in which young people are asked to rank ‘life skills’ in order of importance. Creativity comes a ‘only the eighth most important’ (page 27). In considering the life skills a dichotomy is established between traditional skills for the knowledge economy and the newer skills developed through growing up with technology. These are equated, in some ways, to creativity – or at least to those needed for the creative industries (p 24). Further, when surveying parents, 47% of men and 40% of women believed that their children’s use of technology helped developed creativity.
What are these skills? One set in the report (p 23) looks at those related to the role of guildmaster in the game World of Warcraft. Here it lists those that are to do with group development, apprenticeship, group strategies and dispute management. It goes on to argue that these ‘soft skills’ cannot be pigeonholed into one (or more subjects) and that there is a false split between knowledge and skills.
On the other hand these creative skills can be harnessed in both formal and informal contexts so long as the school does not block of, or deny, the technology that is an everyday part of the learner’s lives. In doing so, I believe, there would be greater potential for the detachment of assessment of the skills and knowledge acquired through using technology from the assessment of them.
But what is creativity? The NACCCE report has a definition ‘imaginative activity fashioned so as to produce outcomes that are both original and of value’ (1999:29). For me there is an interesting exploration here – is what is learnt in informal contexts more ‘imaginative’ and ‘original’ than in formal context, simply by its very nature? Or might that be the perception of learners and teachers?
Another angle on this comes in the Becta report by Twining et al (2006). Here the constraining nature of an assessment-led curriculum is seen as a barrier to creativity:
Assessment and curriculum are closely connected, and while there is little in the way of empirical research that indicates a clear link between the introduction of the National Curriculum and National Strategies and a reduction in risk taking in schools, there is substantial support for this view within the education community (Hacker and Rowe 1997; Harlen 2005; Harlen and Crick 2002; Black and Wiliam 1998). This is accompanied by advocacy of the need to adjust the curriculum and assessment to place greater emphasis on creativity and higher-level skills. The ‘thinning down’ of the National Curriculum in 2000 (DfEE 2000) and the introduction of the new Primary Strategy (DfES 2003), which place emphasis on creativity, suggest that a shift is occurring at least at the ‘lower’ end of the education system.
Twining P et al (2006) Educational change and ICT: an exploration of Priorities 2 and 3 of the DfES e-strategy in schools and colleges: The current landscape and implementation issue, Coventry: Becta
NACCCE (1999) All Our Futures: Creativity, Culture and Education. London: DfEE/DCMS
My original aims were
1. To critically analyse the ways in which students aged 16 construct their learning of ICT capability in formal and informal contexts.
2. To explore the relationship between formal and informal learning within the field of ICT.
3. To explore the methodologies of assessment of ICT capability at 16 and how this affects student perceptions of their capability.
4. To develop a theoretical base to evaluate the construct validity of assessment of ICT at 16.
In looking especially at numbers 2 and 4 a concept map (or at least a list) appears to be emerging. In addition to the concepts contained in these aims – formal and informal learning, validity of assessment, methodologies of assessment, personal constructs of learning – two others are emerging. One is about young people’s appropriation of technology for learning, the other is the policy agenda.
The former is the subject of the reports and books I seem to be drawn to. Maybe it is this topic that will allow me a way into the theory of aim 1 – personal constructs. I have yet to touch on this, but much of the literature on young people’s use of technology seems to be based on this, if implicitly.
So in looking at the Demos report, there is much about how and what young people have learnt. The assumption seems to be that they are controlling the learning, choosing what to learn. Maybe it is also that they are constructing what they have learnt. Certainly if there is to be reverse-ICT then this construct of learning would need to be articulated or manifested in some way. It would be made explicit through the act of learners teaching adults. this is outside the scope of my research here. On the other hand the making of the learning explicit through examination of learners’ perceptions and constructs of their learning is at the heart of aim 1.
I had started to be concerned about the neglect of this aim and the associated theory. The reflection in this post is reassuring me somewhat – and is an example of not knowing what I thought until I wrote it.
The second emerging new concept (or issue) – the policy agenda – must not be forgottenbut is probably best considered as part of aim 3. I guess the next step is to start to build a concept map of ideas and authors to help ‘design’ the literature review section/s of my thesis.
The Demos report I posted about includes a section on the systemic changes needed to recognise the learning of students in the application of ICT. In this extract they conclude that school needs to provide meta-learning opportunites for reflection. To this might be added the peer review that came out as a strong feature of the eVIVA project I worked on at Ultralab. Peer to peer networking is one of the phenomena recognised as in the report as providing a different learning landscaoe for today’s students compared to adults (see page 48 for example).
The model suggested by Demos seems to be – informal learning, formal meta-learning. With, presumably, the latter validating and (to use the language of assessment) standardising the former.
But it is not enough to simply listen to children and orient lessons around their out-of-school practices. Schools need to do more than this in order to recognise the value of, as well as build on, the new kinds of learning that are taking place. They need to create spaces for students to reflect on their learning and articulate their thoughts about it, which will enable them to transfer their skills. This is about:
recognising the new kinds of learning they are undertaking outside school and accepting that some of those emerging skills, knowledge and understanding need to be developed further in an educational environment. (61)
There has been significant research into how this can take place. (62) Meta-cognition is at the heart of it: the capacity to monitor, evaluate, control and change how one thinks and learns. In less formal terms this means reflecting on one’s learning and intentionally applying the results of one’s reflection to further learning. In this context it means reflecting on the kinds of skills young people are developing outside the formal environment. The rise of online, multiplayer gaming and web 2.0 has created a generation that feels comfortable with collaborating on a continuous, casual basis. From contributing to a Wikipedia entry, devoting hours to World of Warcraft or building a website dedicated to expressing their political frustrations there are a multitude of skills that are currently failing to transfer across to
Young people often struggle to explain why they like technology or to articulate what they are learning – this reflection could happen within formal education. (63)
From Demos (2007) Their Space, pp56-57
References (with original Demos numbering)
61 See J Marsh et al, Digital Beginnings: Young children’s use of popular culture, media and new technologies (Sheffield: University of Sheffield, 2005), see http://www.esmeefairbairn.org.uk/docs/DigitalBeginningsReport.pdf (accessed 15 Jan 2007).
62 See About Learning: Report of the Learning Working Group (London: Demos, 2005) for a comprehensive summary and analysis of the research.
63 Interview for this (Demos) project with Valerie Thompson, e-Learning Foundation, 15 Jun 2006.
Original post 18/12/06
At the heart of what I am currently interested in is the definition space of learning as represented by the “parameter of fomality”. Namely classifying learning as formal, informal or non-formal or some combination of all three. I blogged on this recently.
Colley, Hodkinson and Malcolm (2002) produced a review of this landscape – very helpful! What is missing though is the extension to school-age students per se – this is where my work should sit.
Update (14/01/07): I have now found this 2003 (?) Futurelab literature review of informal learning with technology outside school.
Link from digitalcreativity.org
“So how would a six year old make an ident anyway?
A group of six-year olds spent two days creating and filming ‘stop motion’ animations in clay to be broadcast by BBC Wales 2W (the Welsh equivalent of BBC2).
The final results were broadcast and are available on the South Wales Argus website.
The key questions for me are:
- What did the children learn? According to themselves? According to their teachers? According to the researchers from digitalcreativity.org?
- How would their work have been assessed?
These are six-year olds. What might sixteen-year olds do if given the chance? Does our assessment and qualifications system allow that chance?
Summary from Demos website:
Their Space: Education for a digital generation draws on qualitative research with children and polling of parents to counter the myths obscuring the true value of digital media.
Approaching technology from the perspective of children, it tells positive stories about how they use online space to build relationships and create original content. It argues that the skills children are developing through these activities, such as creativity, communication and collaboration, are those that will enable them to succeed in a globally networked, knowledge-driven economy.
Update: post: Making the most of informal learning, 15 Jan 2007
The DfES have published the 2005/06 school league tables.
Making is debut is the new measure of ‘threshold performance’ – out with 5 A*-C GCSEs or equivalent, in with a level 2 threshold that must include English and mathematics. The threshold is reached when a student has sufficient level 2 passes. This sufficiency is still achieved by 5 GCSEs, and level 2 when the GCSEs are A*-C. In this respect the only change is the mandatory need to pass Enlish and mathematics. It had been feared that making these two subjects a necessary factor in including student ‘success’ in a school’s figures would mean a drastic reduction in average levels of success. The percentage of students reaching threshold fell to 46% – down from 59% under the old measure of 5 A*-C GCSEs. Whilst a significant drop it may not be as bad as some feared.
For my own part, and for ICT, I was interested to see how the change had affected the pioneer of GNVQ IT for all (or at least the majority). Thomas Telford School was lauded a few years ago because of its dramatic increase in 5 A*-C percentage. I had always suspected that this was because of the GNVQ counting for four of them. It is interesting to note therefore that the school registered 95% in the new measure where English and mattics have become compulsory. So maybe 80% (or 4 GCSE-equivalent) qualifications didn’t distort the tables as much as seemed likely. This despite the headlines and horror stories as in this from the BBC:
A few canny pioneers realised that there was a vocational qualification, the GNVQ, which was worth four GCSEs at grade C or above. Add one more GCSE, in any subject, and, hey presto, you meet the target. To be fair, many of the original followers of the GNVQ route felt it offered the best educational option for their pupils. Others, though, realised it was an effective way of avoiding the consequences of falling below government targets. Eventually the government realised that large numbers of schools were achieving the threshold without their pupils achieving GCSE passes in maths and English.
Hence this year’s new requirement that the five GCSE passes must include maths and English. The effects on some schools have been dramatic. One school went from a score of 82% passing the equivalent of five A*-Cs to just 16% when maths and English were included. Many other schools, which had been climbing up the tables in recent years, found themselves slithering back down again. Presumably they will now find new ways of targeting performance in maths and English, no doubt at the cost of something else.
The more significant change though may be use of the phrase ‘ level 2 threshold’. GCSEs are no longer explicitly mentioned in the language of the threshold (even thought the DfES’s own link still says GCSE tables). A GCSE at A*-C is now just a 20% contribution to threshold. Many other qualifications can also contribute, as they could before. But now DIDA, for example, is not described as equivalent to 4 GCSEs but a level 2 DIDA pass is described instead as 80% of threshold.
The list of accredited qualifications, their level, and contribution to threshold is maintained by QCA at the Openquals website (soon to be renamed NDAQ: National Database of Accredited Qualifications). For level 2 ICT/IT, Openquals lists 71 qualifications across the awarding bodies. The market is wide open. The gold standard at 16 has changed.
Also this year saw a change in the value added measure include other factors other than just improvement on performance compared to level 3. Now the profile of the school’s students is taken into account and a figure based on a notional national baseline of 1000 is reported in the Contextual Value Added measure (CVA). Schools such as Greenwood Dale in Nottingham are lauded, quite rightly, for reasons of value-added and not just level 2 performance. This contextual value added doesn’t tell the whole story though, and Leicestershire’s relatively poor performance might have something to do structure in the county. The 14-19 colleges certainly seem poor relations to Nottinghamshire’s comprehensives in respect of CVA even though, in my opinion, 14-19 is a much more coherent age range. Maybe it is not measuring like with like when there is a change at 14? A classic case of measuring soemthing other than what is puported to be measured?
Pamela Moss’s 1992 paper “Shifting conceptions of validity in educational measurement: implications for performance assessment” (1) is cited by Wiliam in his modification of Messick’s four-facet model. It would seem from the figure I extracted from Wiliam’s paper that he is suggesting that Moss is providing an extra dimension to the evidential paradigm. That was what I saw on first reading. On turning to Moss’s paper and re-reading Wiliam I am now not so sure of where he is placing Moss vis a vis Messick.
Moss’s paper is an overview of the landscape of construct validity from the inception by Cronbach and Meehl in 1955 (2) to its publication in 1992. In doing so she looks at evidential and interpretive aspects of the models of Cronbach (1980) (3) and Messick. The latter is not seen as being purely evidential as Wiliam’s paper might suggest.
The thrust of Moss is that a review was needed of the “Standards” of what might be called the Establishment of (American) assesment and measurement (AERA, APA, NCME). This review, she argues, is because of the emergence of performance assessment as a science (and as commonly used tool) to complement test/item-based assessment. She compares this to the contemporaneous diminuition of the dominance of positivism.
In performance assessment there is a strong interpretive base. The learner will interpret the task and manifest skills, knowledge and understanding through their performance. the assessor will re-interpret this performance to provide evidence to which rules of validity must be applied.
(1) Moss, P (1992) Shifting Conceptions of Validity in Educational Measurement: Implications for Performance Assessment in Review of Educational Research, Vol. 62, No. 3. (Autumn, 1992), pp. 229-258.
(2) Cronbach, L.J. and Meehl, P.E. (1955) Construct validity in psychological tests in Psychological Bulletin, 52, 281-302 also available online at http://psychclassics.yorku.ca/Cronbach/construct.htm
(3) Cronbach, L.J. (1980). Validity on parole: How can we go straight? in New directions for Testing and Measurement, 5, 99-108.
Writing in The Observer, John Naughton caricatures another vignette….
There’s a surreal quality to it, conjuring up images of kids trudging into ICT classes and being taught how to use a mouse and click on hyperlinks; receiving instructions in the creation of documents using Microsoft Word and of spreadsheets using Excel; being taught how to create a toy database using Access and a cod PowerPoint presentation; and generally being bored out of their minds.
Then the kids go home and log on to Bebo or MySpace to update their profiles, run half a dozen simultaneous instant messaging conversations, use Skype to make free phone calls, rip music from CDs they’ve borrowed from friends, twiddle their thumbs to send incomprehensible text messages, view silly videos on YouTube and use BitTorrent to download episodes of Lost. When you ask them what they did at school, they grimace and say: ‘We made a PowerPoint presentation, dad. Yuck!’
The “top left” quadrant of Wiliam’s enhancement of Messick’s four-facet model of validity deals with within-domain evidential/interpretive validity. How is the assessment designed so as to provide constructs that evidence that which is to be assessed within the domain. He cites Embretson (1983) (1) as providing part of the conceptual model for this quadrant.
Embretson’s model distinguishes between construct representation and nomothetic span. In the former, assessment designed so that it is situated in tasks that represent that which is to be assessed. In the latter it is designed to correlate with other tasks deemed valid.
Mislevy et al (2002) (2) discuss the model in the context of the “psychometric principles” of validity, reliability and comparability. They relate the task model to three other models – the student’s learning, the assessment (or measurement) of this learning and the scoring. Their argument appears to be that the construct representation resonates more with the psychometric principles than does nomothetic span, but that both may be needed.
In the context of my research it would seem that I am doing some sort of comparison between the two parts of Embretson’s dichotomy. Construct representation – using what the students have learnt by way of ICT capability to provide an assessment. Nomothetic span – using some assessment that correlates to this as measured by other assessments.
Is use of the former inherently more engaging than the latter? Does it fit with student’s own constructs of what they have learnt?
(1) Embretson, S. E. (1983), Construct validity: Construct representation versus nomothetic span in Psychological Bulletin, 93, 179-197.
(2) Mislevy R, Wilson M, Ercikan K, Chudowsky N (2002), Psychometric Principles in Student Assessment CSE Technical Report 583, Centre for Studies in Evaluation, LA also available online at http://www.cse.ucla.edu/reports/TR583.pdf
Wiliam (1996) offers a model that starts from Messick’s four-facet model (1) of validity (subsequently, (1996), enhanced to six facets) and applies it the National Curriculum. Wiliam’s analysis has much to offer when looking at assessment at 16. He takes Messick’s distinction of the evidential and consequential in assessment and adds Moss’s (1992) interpretative basis to the former. Assessment validity needs to be looked at through the evidence, the interpretation and the impact (consequence). For each of these two bases – evidential/interpretive and consequential – Wiliam then builds on Messick’s other dimension of within- and beyond-domain.
Wiliam then examines each of the four zones in turn.
In regard of within-domain inferences Wiliam explains the work of Popham and others in trying to establish valid tests that test all, and only, the domain that is intended to be tested. The concluding criticism of the validity NC tests may well apply to any external traditional examination – they are unrepresentative of the domain because of their length compared to the length/volume of learning.
For beyond-domain inferences Wiliam cites the predictive nature of the use of test results. High performance in X predicts high performance in Y. He cites Guilford in saying that it doesn’t matter how this correlation is arrived at, merely that it is reliable. The test might not be valid though as it may not be in the same domain. For ICT at 16 there may be aspects of the achievement that is given far greater importance than maybe it should. A learner gets Key Skills level 2 in ICT (2) therefore s/he is functionally literate in ICT. It doesn’t matter how the level 2 was achieved.
Within-domain impact is of particular importance to the design of ICT assessments, I believe. Hence the move towards onscreen testing – it’s ICT so the the technology must be used to assess the capability. In Wiliam’s words, it “must look right” (p132).
Finally, Wiliam considers beyond-domain impact or consequence. In looking at National Curriculum testing, Wiliam argues, some of the validity is driven (or driven away) by beyond-domain impacts such as league tables – these are much higher stakes for schools than learners and so the validity of the assessment is corrupted.
(1) Messick, “Validity,” 20; Lorrie A. Shepard, “Evaluating Test Validity,” in Review of Educational Research, ed. Linda Darling-Hammond (Washington, DC: AERA, 1993), 405-50. cited in Orton (1994)
(2) The functional/key skill component of ICT learning is referred to as IT
10/01/07 Post on Embretson (1983)
11/01/07 Post on Moss (1992)
So, taking the model from the Futurelab literature review, how might the dimensions of construct validity manifest themselves in assessment of ICT at 16 – the domain of my study.
Content validity: are items fully representative of the topic being measured?
Here might be included a study of what is included in assessments and an analysis of those against the stated assessment objectives, the content of specifications and, coming back to my specific focus, the topic (ICT learning) as constructed by the learners. What do 16-year olds identify as ICT?
Convergent validity: given the domain definition, are constructs which should be related to each other actually observed to be related to each other?
Here there is something about the relationship between the things above I think. Is there convergence between the assessment objectives, between learners’ constructs and between the two sets? I think there is more to explore here but haven’t quite got my head around it yet…
Discriminant validity: given the domain definition, are constructs which should not be related to each other actually observed to be unrelated?
This is more tricky. Why would there be “constructs which should not be related to each other”? Is this to do with identifying things that are mutually exclusive? Is formal and informal learning ever like this?
Concurrent validity: does the test correlate highly with other tests which supposedly measure the same things?
This too is tricky, but there is something here for me about the relationship between teacher assessment and test results I think
As written about earlier, Gipps and Murphy (1994) discuss validity and bias (or lack of it). They cite Cronbach and Messick’s notions of a unitary model of validity based on the construct. This construct validity is regarded, say Gipps and Murphy, as one of the three dimensions – the others being content and criterion. They argue that no content or criteria can ever be free from bias, and hence these are less dominant aspects when looking for validity.
On the other hand the Futurelab report I wrote about yesterday takes this overarching construct validity as being composed of four aspects.
- content validity: are items fully representative of the topic being measured?
- convergent validity: given the domain definition, are constructs which should be related to each other actually observed to be related to each other?
- discriminant validity: given the domain definition, are constructs which should not be related to each other actually observed to be unrelated?
- concurrent validity: does the test correlate highly with other tests which supposedly measure the same things?
(ibid, appendix A, page 24)
The definition in the report is that construct validity = the extent to which a test measures what it claims to measure. This is not at odds with Gipps and Murphy.
David M. Williamson, Irvin R. Katz, and Irwin Kirsch (2005) [online PDF] available at http://www7.nationalacademies.org/bose/
This paper, originally presented to the 2005 AERA conference, contains a wealth of argument about the validity of ICT assessment. Here the topic is presented as ICT literacy. This word ‘literacy’ is laden with other connotations for me – to do with natural use of (in this case ICT). If someone is ICT literate that means so much more than saying they are ICT competent. One is about understandings, internalisations the other about surface skills I believe. The paper’s context is in HE, specifically an assessment that measures a HE student’s abilities to use technology to research, organize and communicate information. What is says though goes much beyond this context and speaks to my interest in assessment at 16.
The authors start from the findings of a 2001 panel looking at ICT assessment. The panel identified a number key issues of concern to policy makers and practitioners in the education community:
- ICT is changing the very nature and relevance of knowledge and information.
- ICT literacy, in its highest form, has the potential to change the way we live, learn and work.
- ICT literacy cannot be defined primarily as the mastery of technical skills.
- There is a lack of information about the current levels of ICT literacy both within and among countries.
In the amplification of the second bullet point they state “The transformative nature of information and communication technologies might similarly influence and change not only the kinds of activities we perform at school, at home and in our communities but also how we engage in those activities.” (ibid, p5)
They then go on to distinguish between issues of access and of proficiency – stating that research into the Digital Divide is insufficient in addressing issues of measuring ICT literacy. Providing the access is not enough – many schools found this with the introduction of Regional Broadband Consortia (or maybe the RBCs found this – I suspect schools knew already!).
The paper then discusses evidence-centred design of assessments. Again, the context for this paper is different to mine as they are trying to design an Internet-delievered test, an approach which may be running into difficulty in English schools. Nevertheless they provide a concise overview of this field and validity theory (Messick, 1989), psychometrics (Mislevy, 1994), philosophy (Toulmin, 1958), and jurisprudence (Wigmore, 1937). The process of assessment design they identify consists of four key questions:
- Purpose: Who is being measured and why are we measuring them? What types of decisions will we be making about people on the basis of this assessment?
- Proficiencies: What proficiencies of people do we want to measure to make appropriate claims from the assessment?
- Evidence: How will we recognize and interpret observable evidence of these proficiencies so that we can make these claims?
- Tasks: Given limitations on test design, how can we design situations that will elicit the observable evidence needed?
These issues again seem central to my thinking at this stage.
Later they break down ICT literacy into seven key evidences – things that are to be measured (or assessed).
- Define: The ability to use ICT tools to identify and appropriately represent information need.
- Access: The ability to collect and/or retrieve information in digital environments.
- Manage: The ability to apply an existing organizational or classification scheme for digital information.
- Integrate: The ability to interpret and represent digital information.
- Evaluate: The ability to determine the degree to which digital information satisfies the needs of the task in ICT environments.
- Create: The ability to generate information by adapting, applying, designing, or inventing information in ICT environments.
- Communicate: The ability to communicate information properly in context in ICT environments.
This model seems rather to close to a skills taxonomy for my liking but it may be useful as one model among many for trying to look at how learner’s construct their knowledge.
Futurelab, a UK technology and learning ‘thinktank’, commissioned this 2004 literature review (1) on e-assessment. In compiling the report, the authors (Jim Ridgway and Sean McCusker, School of Education, University of Durham and Daniel Pead, School of Education, University of Nottingham) have, not surprisingly, covered a lot of ground to do with assessment per se and not just its technologically-enabled version.
In talking about the use of e-portfolios, the report concludes that “Reliable teacher assessment is enabled. There is likely to be extensive use of teacher assessment of those aspects of performance best judged by humans (including extended pieces of work assembled into portfolios)” (ibid, page 2)… for me, hidden in this is the validity argument. It comes through reliability of teacher assessment, and extended pieces of work. Both of these should help validity I believe.
Section 1 of the report then talks of the nature of assessment – formative and summative. Throughout this the authors continually refer back to the purpose and validity of assessment. Also, the learner is placed at the centre of the the described processes. Of particular note for me is the mendacity quotient, whereby summative assessment often encourages students to actively hide what they don’t know.
Section 2 discusses how and where assessment should be driven, with the focus also on technology as the report is on e-assessment. There are some more generally-applicable points covered here though. “Metcalfe’s Law” of increasing value through networks is used to underpin the need to tie assessment into rapidly increasing technologically-enabled social networks. More simply, perhaps, the use of peer networks for assessment might also be part of this… My work aims to look at the validity of external assessment by using self and peer viewpoints as comparators. In addition to social changes, the report identifies other drivers on assessment change as globalisation, mass education, defending democracy and government-led policies. Here there is a disappointing (for me) relative sparsity of focus on the needs of the learner, although demands of lifelong learning are brought out in the section summary (ibid, page 9).
Section 3 discusses the developments in e-assessment. Or so its section heading states. Actually there is much in here about assessment in general and the need to make it relevant to learner needs and valid. “… some [developments] reflect a desire to improve the technical quality of assessment (such as increased scoring reliability), and to make the assessment process more convenient and more useful to users (by the introduction of on-demand testing, and fast reporting of results, for example).” (ibid, p15).
Under the heading “Opportunities and challenges for e-assessment”, section 4 is a rich vein of resources and opinion on the use of assessment to assess deeper level skills, understandings etc. While the summary of the section appears very parsimonious about what has been written, sub-section 4.1 is full of how assessment should be enabling learning.
Finally, the appendix on page 24 is a good overview of, to use its title, “The fundamentals of assessment” .
(1) Ridgway J, McCusker S and Pead D (2004) Literature Review of E-assessment: A Report for Futurelab [online PDF] available at http://www.futurelab.org.uk/download/pdfs/research/lit_reviews/
One chapter of my thesis will almost certainly deal with the policy on assessment of ICT in England… (it would make a boring Mastermind subject though). Today the BBC report yet another possible change. Although it is to do with the possible scrapping of proposed introduction of compulsory on-screen testing at KS3, some of the sentiments expressed might be equally applicable to other assessment regimes. One ICT subject leader Roger Distill is quoted by the BBC:
“One does not need to be a computer geek to realise that the technologies in the real world will have moved on amazingly in that time, while education, as usual, gets left behind as we continue to train our students in the old and limited techniques required to succeed in the test.”
The question here is what is “the real world” of the 14- or 16-year old?
I have just returned from a very useful conference-cum-seminar-cum- network meeting on inclusion and special educational needs. This was held at my own Nottingham Trent University (NTU). The themes were research and inter-agency working in the areas of special educational needs and inclusion. Hopefully the embryonic network that came together to plan the day will be a basis for future sharing and collaboration.
There was a useful (for me) distinction made by paediatrician Linda Marden between inter-agency (agencies working together) and multi-agency (many agencies not necessarily working together). The background for this, and my reason for being there, is of course the all-pervasive Every Child Matters agenda where multi-agency is the preferred term… but I think Linda makes a very valid an pertinent point.
It was good to hear of research in this field, to hear to phrase co-researcher again when applied to research group including the young people at The Shepherd School Nottingham who have been round the world helping disseminate findings of research projects. These projects include those of the Virtual Reality Applications Research Team at the University of Nottingham and the Interactive Systems Research Group at NTU.
The work of these projects in engaging young people in learning with and through technology reminded me of the excellent work done by my former colleagues at the now-vastly-downsized Ultralab. Established and led by Stephen Heppell and the Richard Millwood some of the lab’s legacy into digital creativity is being carried on by Matt Eaves, Hal Maclean and others under the Cleveratom banner.
Continuing the ex-Ultralab theme and maybe more directly aligned to the theme of inclusion (at least if one defines inclusion as trying to include those who are excluded (!)) is the ongoing work of NotSchool and of Jonathan Furness at the Stepping Stones School for children with hemiplegia.
What’s this to do with my PhD? Well for me it resonates well with the themes of authentic learning and ICT. If ICT becomes so enmeshed in the learning, as all of these projects demonstrate, how is it possible to assess it?
The term authentic learning has been around some time, apparently, although I had not come across it before today. This paper cites Archbald and Newman (1998) as the first to apply authentic to learning and assessment, although I’m not sure that this is truly the first use of the term.
It does seem to encapsulate what I am trying to investigate though as this list of points from Maureen O’Rourke (2001) suggests:
Use of ICT to provide students with greater opportunities for communication, collaboration, thinking and creativity also provides us with challenges in terms of authentic assessment. The Australian National Schools Network has recently launched a national Authentic Learning and Digital Portfolios project. Beginning with a focus on the whole person, school communities are clarifying what young people should know, understand and be able to do at particular stages of their education.
[..] The project aims to bring learning and assessment together with:
• students having significant control in the construction of their portfolios
• the portfolio structure providing opportunities for feedback, questioning and reflection
• assessment moving to a more central part of the learning process, conducted with students rather than on them
• rich, authentic tasks providing evidence of learning in multiple domains
Guess I need to follow up the Authentic Learning and Digital Portfolios project, although my experience of DIDA suggests that simply using e-portfolios is no guarantee of authentic assessment or learning.
I have it ingrained in my psyche that one of the key things about doctoral work is the need to prove that one is inquiring into a ‘gap in the knowledge’. This has always been problematic for me. What is knowledge? How do I prove that the gap exists. Simply because I don’t know of something, doesn’t mean it doesn’t exist (three negatives there…). I might think there is a gap, only to be blissfully unaware that someone else has filled it (or, worse, is filling it as I speak/procrastinate).
Notwithstanding this, the gap in the knowledge that I identify is located somewhere in the aridity that is the apparent dearth of writing on the assessment of ICT. Put assessment and ICT into Google and you get 1.42m results. Most of these appear to be about using ICT in the assessment process. Assessment with ICT.
McCormick (2004), writing for the ERNIST project and elsewhere, cites Macfarlane (2001) and Thelwall (2000) in defining a taxonomy for the relationships between ICT and assessment. While his first category is ‘Assessing ICT skills and understanding’, it would seem that this is ignored in the rest of his paper. There is, instead, focus on use of ICT for assessment and affordances provided by use of ICT in other subjects for the assessment of those subjects. Indeed, Thelwall’s work is solely of computer-aided assessment.
Similarly the EPPI studies on ICT and assessment deal with how it is used in assessment or how it can help assess creative and thinking skills in different ways to other media.
So is there a gap in knowledge? Like Popper, I cannot prove that there is but if there is it is somewhere in all of this mist. How do you know when you’ve found a gap anyway? What does the edge of a gap look like?
Last month the NTU research seminar programme (it’s not called this but I forget its title) held a session that was led by Anthony Haynes of P&H. The objective was to look at the ways in which ‘academics’ get published. Two themes emerged that were, in some ways, both parallel and tangential to this.
- Does writing precede or follow publishing?
- Does writing precede or follow thinking?
To this end the notion of regular writing was discussed. The oft described (and observed) image of the researcher with daybook, recording thoughts, observations, references. Writing little and often. This was where the decision to keep this blog came from. Writing a little each day, building up patterns of thought.
How do I know what I am thinking until I see what I am writing?.
The concept of writing up is one that is often cited as filling PhD candidates with dread. The concept of the blank page doing the same for authors. But if you take the starting point that you are writing for a purpose (a thesis, or a publication) and if you take the viewpoint that thinking and writing are indivisible then maybe these dreaded inertias may be avoided. I don’t know, but it seems to be a reasonable premise at this stage…