PISA 2012 Creative Problem Solving: International Comparison of High Achievers’ Performance

This post compares the performance of high achievers from selected jurisdictions on the PISA 2012 creative problem solving test.

It draws principally on the material in the OECD Report ‘PISA 2012 Results: Creative Problem Solving’ published on 1 April 2014.

Pisa ps cover CaptureThe sample of jurisdictions includes England, other English-speaking countries (Australia, Canada, Ireland and the USA) and those that typically top the PISA rankings (Finland, Hong Kong, South Korea, Shanghai, Singapore and Taiwan).

With the exception of New Zealand, which did not take part in the problem solving assessment, this is deliberately identical to the sample I selected for a parallel post reviewing comparable results in the PISA 2012 assessments of reading, mathematics and science: ‘PISA 2012: International Comparisons of High Achievers’ Performance’ (December 2013)

These eleven jurisdictions account for nine of the top twelve performers ranked by mean overall performance in the problem solving assessment. (The USA and Ireland lie outside the top twelve, while Japan, Macao and Estonia are the three jurisdictions that are in the top twelve but outside my sample.)

The post is divided into seven sections:

  • Background to the problem solving assessment: How PISA defines problem solving competence; how it defines performance at each of the six levels of proficiency; how it defines high achievement; the nature of the assessment and who undertook it.
  • Average performance, the performance of high achievers and the performance of low achievers (proficiency level 1) on the problem solving assessment. This comparison includes my own sample and all the other jurisdictions that score above the OECD average on the first of these measures.
  • Gender and socio-economic differences amongst high achievers on the problem solving assessment  in my sample of eleven jurisdictions.
  • The relative strengths and weaknesses of jurisdictions in this sample on different aspects of the problem solving assessment. (This treatment is generic rather than specific to high achievers.)
  • What proportion of high achievers on the problem-solving assessment in my sample of jurisdictions are also high achievers in reading, maths and science respectively.
  • What proportion of students in my sample of jurisdictions achieves highly in one or more of the four PISA 2012 assessments – and against the ‘all-rounder’ measure, which is based on high achievement in all of reading, maths and science (but not problem solving).
  • Implications for education policy makers seeking to improve problem solving performance in each of the sample jurisdictions.

Background to the Problem Solving Assessment

.

Definition of problem solving

PISA’s definition of problem-solving competence is:

‘…an individual’s capacity to engage in cognitive processing to understand and resolve problem situations where a method of solution is not immediately obvious. It includes the willingness to engage with such situations in order to achieve one’s potential as a constructive and reflective citizen.’

The commentary on this definition points out that:

  • Problem solving requires identification of the problem(s) to be solved, planning and applying a solution, and monitoring and evaluating progress.
  • A problem is ‘a situation in which the goal cannot be achieved by merely applying learned procedures’, so the problems encountered must be non-routine for 15 year-olds, although ‘knowledge of general strategies’ may be useful in solving them.
  • Motivational and affective factors are also in play.

The Report is rather coy about the role of creativity in problem solving, and hence the justification for the inclusion of this term in its title.

Perhaps the nearest it gets to an exposition is when commenting on the implications of its findings:

‘In some countries and economies, such as Finland, Shanghai-China and Sweden, students master the skills needed to solve static, analytical problems similar to those that textbooks and exam sheets typically contain as well or better than 15-year-olds, on average, across OECD countries. But the same 15-year-olds are less successful when not all information that is needed to solve the problem is disclosed, and the information provided must be completed by interacting with the problem situation. A specific difficulty with items that require students to be open to novelty, tolerate doubt and uncertainty, and dare to use intuitions (“hunches and feelings”) to initiate a solution suggests that opportunities to develop and exercise these traits, which are related to curiosity, perseverance and creativity, need to be prioritised.’

.

Assessment framework

PISA’s framework for assessing problem solving competence is set out in the following diagram

 

PISA problem solving framework Capture

 

In solving a particular problem it may not be necessary to apply all these steps, or to apply them in this order.

Proficiency levels

The proficiency scale was designed to have a mean score across OECD countries of 500. The six levels of proficiency applied in the assessment each have their own profile.

The lowest, level 1 proficiency is described thus:

‘At Level 1, students can explore a problem scenario only in a limited way, but tend to do so only when they have encountered very similar situations before. Based on their observations of familiar scenarios, these students are able only to partially describe the behaviour of a simple, everyday device. In general, students at Level 1 can solve straightforward problems provided there is a simple condition to be satisfied and there are only one or two steps to be performed to reach the goal. Level 1 students tend not to be able to plan ahead or set sub-goals.’

This level equates to a range of scores from 358 to 423. Across the OECD sample, 91.8% of participants are able to perform tasks at this level.

By comparison, level 5 proficiency is described in this manner:

‘At Level 5, students can systematically explore a complex problem scenario to gain an understanding of how relevant information is structured. When faced with unfamiliar, moderately complex devices, such as vending machines or home appliances, they respond quickly to feedback in order to control the device. In order to reach a solution, Level 5 problem solvers think ahead to find the best strategy that addresses all the given constraints. They can immediately adjust their plans or backtrack when they detect unexpected difficulties or when they make mistakes that take them off course.’

The associated range of scores is from 618 to 683 and 11.4% of all OECD students achieve at this level.

Finally, level 6 proficiency is described in this way:

‘At Level 6, students can develop complete, coherent mental models of diverse problem scenarios, enabling them to solve complex problems efficiently. They can explore a scenario in a highly strategic manner to understand all information pertaining to the problem. The information may be presented in different formats, requiring interpretation and integration of related parts. When confronted with very complex devices, such as home appliances that work in an unusual or unexpected manner, they quickly learn how to control the devices to achieve a goal in an optimal way. Level 6 problem solvers can set up general hypotheses about a system and thoroughly test them. They can follow a premise through to a logical conclusion or recognise when there is not enough information available to reach one. In order to reach a solution, these highly proficient problem solvers can create complex, flexible, multi-step plans that they continually monitor during execution. Where necessary, they modify their strategies, taking all constraints into account, both explicit and implicit.’

The range of level 6 scores is from 683 points upwards and 2.5% of all OECD participants score at this level.

PISA defines high achieving students as those securing proficiency level 5 or higher, so proficiency levels 5 and 6 together. The bulk of the analysis it supplies relates to this cohort, while relatively little attention is paid to the more exclusive group achieving proficiency level 6, even though almost 10% of students in Singapore reach this standard in problem solving.

 .

The sample

Sixty-five jurisdictions took part in PISA 2012, including all 34 OECD countries and 31 partners. But only 44 jurisdictions took part in the problem solving assessment, including 28 OECD countries and 16 partners. As noted above, that included all my original sample of twelve jurisdictions, with the exception of New Zealand.

I could find no stated reason why New Zealand chose not to take part. Press reports initially suggested that England would do likewise, but it was subsequently reported that this decision had been reversed.

The assessment was computer-based and comprised 16 units divided into 42 items. The units were organised into four clusters, each designed to take 20 minutes to complete. Participants completed one or two clusters, depending on whether they were also undertaking computer-based assessments of reading and maths.

In each jurisdiction a random sample of those who took part in the paper-based maths assessment was selected to undertake the problem solving assessment. About 85,000 students took part in all. The unweighted sample sizes in my selected jurisdictions are set out in Table 1 below, together with the total population of 15 year-olds in each jurisdiction.

 

Table 1: Sample sizes undertaking PISA 2012 problem solving assessment in selected jurisdictions

Country Unweighted Sample Total 15 year-olds
Australia 5,612 291,976
Canada 4,601 417,873
Finland 3,531 62,523
Hong Kong 1,325 84,200
Ireland 1,190 59,296
Shanghai 1,203 108,056
Singapore 1,394 53,637
South Korea 1,336 687,104
Taiwan 1,484 328,356
UK (England) 1,458 738,066
USA 1,273 3,985,714

Those taking the assessment were aged between 15 years and three months and 16 years and two months at the time of the assessment. All were enrolled at school and had completed at least six years of formal schooling.

Average performance compared with the performance of high and low achievers

The overall table of mean scores on the problem solving assessment is shown below

PISA problem solving raw scores Capture

 .

There are some familiar names at the top of the table, especially Singapore and South Korea, the two countries that comfortably lead the rankings. Japan is some ten points behind in third place but it in turn has a lead of twelve points over a cluster of four other Asian competitors: Macao, Hong Kong, Shanghai and Taiwan.

A slightly different picture emerges if we compare average performance with the proportion of learners who achieve the bottom proficiency level and the top two proficiency levels. Table 2 below compares these groups.

This table includes all the jurisdictions that exceeded the OECD average score. I have marked out in bold the countries in my sample of eleven which includes Ireland, the only one of them that did not exceed the OECD average.

Table 2: PISA Problem Solving 2012: Comparing Average Performance with Performance at Key Proficiency Levels

 

Jurisdiction Mean score Level 1 (%) Level 5 (%) Level 6 (%) Levels 5+6 (%)
Singapore 562 6.0 19.7 9.6 29.3
South Korea 561 4.8 20.0 7.6 27.6
Japan 552 5.3 16.9 5.3 22.2
Macao 540 6.0 13.8 2.8 16.6
Hong Kong 540 7.1 14.2 5.1 19.3
Shanghai 536 7.5 14.1 4.1 18.2
Taiwan 534 8.2 14.6 3.8 18.4
Canada 526 9.6 12.4 5.1 17.5
Australia 523 10.5 12.3 4.4 16.7
Finland 523 9.9 11.4 3.6 15.0
England (UK) 517 10.8 10.9 3.3 14.2
Estonia 515 11.1 9.5 2.2 11.7
France 511 9.8 9.9 2.1 12.0
Netherlands 511 11.2 10.9 2.7 13.6
Italy 510 11.2 8.9 1.8 10.7
Czech Republic 509 11.9 9.5 2.4 11.9
Germany 509 11.8 10.1 2.7 12.8
USA 508 12.5 8.9 2.7 11.6
Belgium 508 11.6 11.4 3.0 14.4
Austria 506 11.9 9.0 2.0 11.0
Norway 503 13.2 9.7 3.4 13.1
Ireland 498 13.3 7.3 2.1 9.4
OECD Ave. 500 13.2 8.9 2.5 11.4

 .

The jurisdictions at the top of the table also have a familiar profile, with a small ‘tail’ of low performance combined with high levels of performance at the top end.

Nine of the top ten have fewer than 10% of learners at proficiency level 1, though only South Korea pushes below 5%.

Five of the top ten have 5% or more of their learners at proficiency level 6, but only Singapore and South Korea have a higher percentage at level 6 than level 1 (with Japan managing the same percentage at both levels).

The top three performers – Singapore, South Korea and Japan – are the only three jurisdictions that have over 20% of their learners at proficiency levels 5 and 6 together.

South Korea slightly outscores Singapore at level 5 (20.0% against 19.7%). Japan is in third place, followed by Taiwan, Hong Kong and Shanghai.

But at level 6, Singapore has a clear lead, followed by South Korea, Japan, Hong Kong and Canada respectively.

England’s overall place in the table is relatively consistent on each of these measures, but the gaps between England and the top performers vary considerably.

The best have fewer than half England’s proportion of learners at proficiency level 1, almost twice as many learners at proficiency level 5 and more than twice as many at proficiency levels 5 and 6 together. But at proficiency level 6 they have almost three times as many learners as England.

Chart 1 below compares performance on these four measures across my sample of eleven jurisdictions.

All but Ireland are comfortably below the OECD average for the percentage of learners at proficiency level 1. The USA and Ireland are atypical in having a bigger tail (proficiency level 1) than their cadres of high achievers (levels 5 and 6 together).

At level 5 all but Ireland and the USA are above the OECD average, but USA leapfrogs the OECD average at level 6.

There is a fairly strong correlation between the proportions of learners achieving the highest proficiency thresholds and average performance in each jurisdiction. However, Canada stands out by having an atypically high proportion of students at level 6.

.

Chart 1: PISA 2012 Problem-solving: Comparing performance at specified proficiency levels

Problem solving chart 1

.

PISA’s Report discusses the variation in problem-solving performance within different jurisdictions. However it does so without reference to the proficiency levels, so we do not know to what extent these findings apply equally to high achievers.

Amongst those above the OECD average, those with least variation are Macao, Japan, Estonia, Shanghai, Taiwan, Korea, Hong Kong, USA, Finland, Ireland, Austria, Singapore and the Czech Republic respectively.

Perhaps surprisingly, the degree of variation in Finland is identical to that in the USA and Ireland, while Estonia has less variation than many of the Asian jurisdictions. Singapore, while top of the performance table, is only just above the OECD average in terms of variation.

The countries below the OECD average on this measure – listed in order of increasing variation – include England, Australia and Canada, though all three are relatively close to the OECD average. So these three countries and Singapore are all relatively close together.

Gender and socio-economic differences amongst high achievers

 .

Gender differences

On average across OECD jurisdictions, boys score seven points higher than girls on the problem solving assessment. There is also more variation amongst boys than girls.

Across the OECD participants, 3.1% of boys achieved proficiency level 6 but only 1.8% of girls did so. This imbalance was repeated at proficiency level 5, achieved by 10% of boys and 7.7% of girls.

The table and chart below show the variations within my sample of eleven countries. The performance of boys exceeds that of girls in all cases, except in Finland at proficiency level 5, and in that instance the gap in favour of girls is relatively small (0.4%).

 .

Table 3: PISA Problem-solving: Gender variation at top proficiency levels

Jurisdiction Level 5 (%) Level 6 (%) Levels 5+6 (%)
  Boys Girls Diff Boys Girls Diff Boys Girls Diff
Singapore 20.4 19.0 +1.4 12.0 7.1 +4.9 32.4 26.1 +6.3
South Korea 21.5 18.3 +3.2 9.4 5.5 +3.9 30.9 23.8 +7.1
Hong Kong 15.7 12.4 +3.3 6.1 3.9 +2.2 21.8 16.3 +5.5
Shanghai 17.0 11.4 +5.6 5.7 2.6 +3.1 22.7 14.0 +8.7
Taiwan 17.3 12.0 +5.3 5.0 2.5 +2.5 22.3 14.5 +7.8
Canada 13.1 11.8 +1.3 5.9 4.3 +1.6 19.0 16.1 +2.9
Australia 12.6 12.0 +0.6 5.1 3.7 +1.4 17.7 15.7 +2.0
Finland 11.2 11.6 -0.4 4.1 3.0 +1.1 15.3 14.6 +0.7
England (UK) 12.1 9.9 +2.2 3.6 3.0 +0.6 15.7 12.9 +2.8
USA 9.8 7.9 +1.9 3.2 2.3 +0.9 13.0 10.2 +2.8
Ireland 8.0 6.6 +1.4 3.0 1.1 +1.9 11.0 7.7 +3.3
OECD Average 10.0 7.7 +2.3 3.1 1.8 +1.3 13.1 9.5 +3.6

There is no consistent pattern in whether boys are more heavily over-represented at proficiency level 5 than proficiency level 6, or vice versa.

There is a bigger difference at level 6 than at level 5 in Singapore, South Korea, Canada, Australia, Finland and Ireland, but the reverse is true in the five remaining jurisdictions.

At level 5, boys are in the greatest ascendancy in Shanghai and Taiwan while, at level 6, this is true of Singapore and South Korea.

When proficiency levels 5 and 6 are combined, all five of the Asian tigers show a difference in favour of males of 5.5% or higher, significantly in advance of the six ‘Western’ countries in the sample and significantly ahead of the OECD average.

Amongst the six ‘Western’ representatives, boys have the biggest advantage at proficiency level 5 in England, while at level 6 boys in Ireland have the biggest advantage.

Within this group of jurisdictions, the gap between boys and girls at level 6 is comfortably the smallest in England. But, in terms of performance at proficiency levels 5 and 6 together, Finland is ahead.

 .

Chart 2: PISA Problem-solving: Gender variation at top proficiency levels

Problem solving chart 2

The Report includes a generic analysis of gender differences in performance for boys and girls with similar levels of performance in English, maths and science.

It concludes that girls perform significantly above their expected level in both England and Australia (though the difference is only statistically significant in the latter).

The Report comments:

‘It is not clear whether one should expect there to be a gender gap in problem solving. On the one hand, the questions posed in the PISA problem-solving assessment were not grounded in content knowledge, so boys’ or girls’ advantage in having mastered a particular subject area should not have influenced results. On the other hand… performance in problem solving is more closely related to performance in mathematics than to performance in reading. One could therefore expect the gender difference in performance to be closer to that observed in mathematics – a modest advantage for boys, in most countries – than to that observed in reading – a large advantage for girls.’

 .

Socio-economic differences

The Report considers variations in performance against PISA’s Index of Economic, Social and Cultural status (IESC), finding them weaker overall than for reading, maths and science.

It calculates that the overall percentage variation in performance attributable to these factors is about 10.6% (compared with 14.9% in maths, 14.0% in science and 13.2% in reading).

Amongst the eleven jurisdictions in my sample, the weakest correlations were found in Canada (4%), followed by Hong Kong (4.9%), South Korea (5.4%), Finland (6.5%), England (7.8%), Australia (8.5%), Taiwan (9.4%), the USA (10.1%) and Ireland (10.2%) in that order. All those jurisdictions had correlations below the OECD average.

Perhaps surprisingly, there were above average correlations in Shanghai (14.1%) and, to a lesser extent (and less surprisingly) in Singapore (11.1%).

The report suggests that students with parents working in semi-skilled and elementary occupations tend to perform above their expected level in problem-solving in Taiwan, England, Canada, the USA, Finland and Australia (in that order – with Australia closest to the OECD average).

The jurisdictions where these students tend to underperform their expected level are – in order of severity – Ireland, Shanghai, Singapore, Hong Kong and South Korea.

A parallel presentation on the Report provides some additional data about the performance in different countries of what the OECD calls ‘resilient’ students – those in the bottom quartile of the IESC but in the top quartile by perfromance, after accounting for socio-economic status.

It supplies the graph below, which shows all the Asian countries in my sample clustered at the top, but also with significant gaps between them. Canada is the highest-performing of the remainder in my sample, followed by Finland, Australia, England and the USA respectively. Ireland is some way below the OECD average.

.

PISA problem resolving resilience Capture

.

Unfortunately, I can find no analysis of how performance varies according to socio-economic variables at each proficiency level. It would be useful to see which jurisdictions have the smallest ‘excellence gaps’ at levels 5 and 6 respectively.

 .

How different jurisdictions perform on different aspects of problem-solving

The Report’s analysis of comparative strengths and weaknesses in different elements of problem-solving does not take account of variations at different proficiency levels

It explains that aspects of the assessment were found easier by students in different jurisdictions, employing a four-part distinction between:

‘Exploring and understanding. The objective is to build mental representations of each of the pieces of information presented in the problem. This involves:

  • exploring the problem situation: observing it, interacting with it, searching for information and finding limitations or obstacles; and
  • understanding given information and, in interactive problems, information discovered while interacting with the problem situation; and demonstrating understanding of relevant concepts.

Representing and formulating. The objective is to build a coherent mental representation of the problem situation (i.e. a situation model or a problem model). To do this, relevant information must be selected, mentally organised and integrated with relevant prior knowledge. This may involve:

  • representing the problem by constructing tabular, graphic, symbolic or verbal representations, and shifting between representational formats; and
  • formulating hypotheses by identifying the relevant factors in the problem and their inter-relationships; and organising and critically evaluating information.

Planning and executing. The objective is to use one’s knowledge about the problem situation to devise a plan and execute it. Tasks where “planning and executing” is the main cognitive demand do not require any substantial prior understanding or representation of the problem situation, either because the situation is straightforward or because these aspects were previously solved. “Planning and executing” includes:

  • planning, which consists of goal setting, including clarifying the overall goal, and setting subgoals, where necessary; and devising a plan or strategy to reach the goal state, including the steps to be undertaken; and
  • executing, which consists of carrying out a plan.

Monitoring and reflecting.The objective is to regulate the distinct processes involved in problem solving, and to critically evaluate the solution, the information provided with the problem, or the strategy adopted. This includes:

  • monitoring progress towards the goal at each stage, including checking intermediate and final results, detecting unexpected events, and taking remedial action when required; and
  • reflecting on solutions from different perspectives, critically evaluating assumptions and alternative solutions, identifying the need for additional information or clarification and communicating progress in a suitable manner.’

Amongst my sample of eleven jurisdictions:

  • ‘Exploring and understanding’ items were found easier by students in Singapore, Hong Kong, South Korea, Australia, Taiwan and Finland. 
  • ‘Representing and formulating’ items were found easier in Taiwan, Shanghai, South Korea, Singapore, Hong Kong, Canada and Australia. 
  • ‘Planning and executing’ items were found easier in Finland only. 
  • ‘Monitoring and reflecting’ items were found easier in Ireland, Singapore, the USA and England.

The Report concludes:

‘This analysis shows that, in general, what differentiates high-performing systems, and particularly East Asian education systems, such as those in Hong Kong-China, Japan, Korea [South Korea], Macao-China, Shanghai -China, Singapore and Chinese Taipei [Taiwan], from lower-performing ones, is their students’ high level of proficiency on “exploring and understanding” and “representing and formulating” tasks.’

It also distinguishes those jurisdictions that perform best on interactive problems, requiring students to discover some of the information required to solve the problem, rather than being presented with all the necessary information. This seems to be the nearest equivalent to a measure of creativity in problem solving

Comparative strengths and weaknesses in respect of interactive tasks are captured in the following diagram.

.

PISA problem solving strengths in different countries

.

One can see that several of my sample – Ireland, the USA, Canada, Australia, South Korea and Singapore – are placed in the top right-hand quarter of the diagram, indicating stronger than expected performance on both interactive and knowledge acquisition tasks.

England is stronger than expected on the former but not on the latter.

Jurisdictions that are weaker than inspected on interactive tasks only include Hong Kong, Taiwan and Shanghai, while Finland is weaker than expected on both.

We have no information about whether these distinctions were maintained at different proficiency levels.

.

Comparing jurisdictions’ performance at higher proficiency levels

Table 4 and Charts 3 and 4 below show variations in the performance of countries in my sample across the four different assessments at level 6, the highest proficiency level.

The charts in particular emphasise how far ahead the Asian Tigers are in maths at this level, compared with the cross-jurisdictional variation in the other three assessments.

In all five cases, each ‘Asian Tiger’s’ level 6 performance in maths also vastly exceeds its level 6 performance in the other three assessments. The proportion of students achieving level 6 proficiency in problem solving lags far behind, even though there is a fairly strong correlation between these two assessments (see below).

In contrast, all the ‘Western’ jurisdictions in the sample – with the sole exception of Ireland – achieve a higher percentage at proficiency level 6 in problem solving than they do in maths, although the difference is always less than a full percentage point. (Even in Ireland the difference is only 0.1 of a percentage point in favour of maths.)

Shanghai is the only jurisdiction in the sample which has more students achieving proficiency level 6 in science than in problem solving. It also has the narrowest gap between level 6 performance in problem solving and in reading.

Meanwhile, England, the USA, Finland and Australia all have broadly similar profiles across the four assessments, with the largest percentage of level 6 performers in problem solving, followed by maths, science and reading respectively.

The proximity of the lines marking level 6 performance in reading and science is also particularly evident in the second chart below.

.

Table 4: Percentage achieving proficiency Level 6 in each domain

  PS L6  Ma L6 Sci L6 Re L6
Singapore 9.6 19.0 5.8 5.0
South Korea 7.6 12.1 1.1 1.6
Hong Kong 5.1 12.3 1.8 1.9
Shanghai 4.1 30.8 4.2 3.8
Taiwan 3.8 18.0 0.6 1.4
Canada 5.1 4.3 1.8 2.1
Australia 4.4 4.3 2.6 1.9
Finland 3.6 3.5 3.2 2.2
England (UK) 3.3 3.1 1.9 1.3
USA 2.7 2.2 1.1 1.0
Ireland 2.1 2.2 1.5 1.3
OECD Average 2.5 3.3 1.2 1.1

 Charts 3 and 4: Percentage achieving proficiency level 6 in each domain

Problem solving chart 3

Problem solving chart 4

The pattern is materially different at proficiency levels 5 and above, as the table and chart below illustrate. These also include the proportion of all-rounders, who achieved proficiency level 5 or above in each of maths, science and reading (but not in problem-solving).

The lead enjoyed by the ‘Asian Tigers’ in maths is somewhat less pronounced. The gap between performance within these jurisdictions on the different assessments also tends to be less marked, although maths accounts for comfortably the largest proportion of level 5+ performance in all five cases.

Conversely, level 5+ performance on the different assessments is typically much closer in the ‘Western’ countries. Problem solving leads the way in Australia, Canada, England and the USA, but in Finland science is in the ascendant and reading is strongest in Ireland.

Some jurisdictions have a far ‘spikier’ profile than others. Ireland is closest to achieving equilibrium across all four assessments. Australia and England share very similar profiles, though Australia outscores England in each assessment.

The second chart in particular shows how Shanghai’s ‘spike’ applies in all the other three assessments but not in problem solving.

Table 5: Percentage achieving Proficiency level 5 and above in each domain

  PS L5+  Ma L5+ Sci L5+ Re L5+ Ma + Sci + Re L5+
Singapore 29.3 40.0 22.7 21.2 16.4
South Korea 27.6 30.9 11.7 14.2 8.1
Hong Kong 19.3 33.4 16.7 16.8 10.9
Shanghai 18.2 55.4 27.2 25.1 19.6
Taiwan 18.4 37.2 8.4 11.8 6.1
Canada 17.5 16.4 11.3 12.9 6.5
Australia 16.7 14.8 13.5 11.7 7.6
Finland 15.0 15.2 17.1 13.5 7.4
England (UK) 14.2 12.4 11.7 9.1 5.7* all UK
USA 11.6 9.0 7.4 7.9 4.7
Ireland 9.4 10.7 10.8 11.4 5.7
OECD Average 11.4 12.6 8.4 8.4 4.4

 .

Charts 5 and 6: Percentage Achieving Proficiency Level 5 and above in each domain

Problem solving chart 5Problem solving chart 6.

How high-achieving problem solvers perform in other assessments

.

Correlations between performance in different assessments

The Report provides an analysis of the proportion of students achieving proficiency levels 5 and 6 on problem solving who also achieved that outcome on one of the other three assessments: reading, maths and science.

It argues that problem solving is a distinct and separate domain. However:

‘On average, about 68% of the problem-solving score reflects skills that are also measured in one of the three regular assessment domains. The remaining 32% reflects skills that are uniquely captured by the assessment of problem solving. Of the 68% of variation that problem-solving performance shares with other domains, the overwhelming part is shared with all three regular assessment domains (62% of the total variation); about 5% is uniquely shared between problem solving and mathematics only; and about 1% of the variation in problem solving performance hinges on skills that are specifically measured in the assessments of reading or science.’

It discusses the correlation between these different assessments:

‘A key distinction between the PISA 2012 assessment of problem solving and the regular assessments of mathematics, reading and science is that the problem-solving assessment does not measure domain-specific knowledge; rather, it focuses as much as possible on the cognitive processes fundamental to problem solving. However, these processes can also be used and taught in the other subjects assessed. For this reason, problem-solving tasks are also included among the test units for mathematics, reading and science, where their solution requires expert knowledge specific to these domains, in addition to general problem-solving skills.

It is therefore expected that student performance in problem solving is positively correlated with student performance in mathematics, reading and science. This correlation hinges mostly on generic skills, and should thus be about the same magnitude as between any two regular assessment subjects.’

These overall correlations are set out in the table below, which shows that maths has a higher correlation with problem solving than either science or reading, but that this correlation is lower than those between the three subject-related assessments.

The correlation between maths and science (0.90) is comfortably the strongest (despite the relationship between reading and science at the top end of the distribution noted above).

PISA problem solving correlations capture

Correlations are broadly similar across jurisdictions, but the Report notes that the association is comparatively weak in some of these, including Hong Kong. Students here are more likely to perform poorly on problem solving and well on other assessments, or vice versa.

There is also broad consistency at different performance levels, but the Report identifies those jurisdictions where students with the same level of performance exceed expectations in relation to problem-solving performance. These include South Korea, the USA, England, Australia, Singapore and – to a lesser extent – Canada.

Those with lower than expected performance include Shanghai, Ireland, Hong Kong, Taiwan and Finland.

The Report notes:

‘In Shanghai-China, 86% of students perform below the expected level in problem solving, given their performance in mathematics, reading and science. Students in these countries/economies struggle to use all the skills that they demonstrate in the other domains when asked to perform problem-solving tasks.’

However, there is variation according to students’ maths proficiency:

  • Jurisdictions whose high scores on problem solving are mainly attributable to strong performers in maths include Australia, England and the USA. 
  • Jurisdictions whose high scores on problem solving are more attributable to weaker performers in maths include Ireland. 
  • Jurisdictions whose lower scores in problem solving are more attributable to weakness among strong performers in maths include Korea. 
  • Jurisdictions whose lower scores in problem solving are more attributable to weakness among weak performers in maths include Hong Kong and Taiwan. 
  • Jurisdictions whose weakness in problem solving is fairly consistent regardless of performance in maths include Shanghai and Singapore.

The Report adds:

‘In Italy, Japan and Korea, the good performance in problem solving is, to a large extent, due to the fact that lower performing students score beyond expectations in the problem-solving assessment….This may indicate that some of these students perform below their potential in mathematics; it may also indicate, more positively, that students at the bottom of the class who struggle with some subjects in school are remarkably resilient when it comes to confronting real-life challenges in non-curricular contexts…

In contrast, in Australia, England (United Kingdom) and the United States, the best students in mathematics also have excellent problem-solving skills. These countries’ good performance in problem solving is mainly due to strong performers in mathematics. This may suggest that in these countries, high performers in mathematics have access to – and take advantage of – the kinds of learning opportunities that are also useful for improving their problem-solving skills.’

What proportion of high performers in problem solving are also high performers in one of the other assessments?

The percentages of high achieving students (proficiency level 5 and above) in my sample of eleven jurisdictions who perform equally highly in each of the three domain-specific assessments are shown in Table 6 and Chart 7 below.

These show that Shanghai leads the way in each case, with 98.0% of all students who achieve proficiency level 5+ in problem solving also achieving the same outcome in maths. For science and reading the comparable figures are 75.1% and 71.7% respectively.

Taiwan is the nearest competitor in respect of problem solving plus maths, Finland in the case of problem solving plus science and Ireland in the case of problem solving plus reading.

South Korea, Taiwan and Canada are atypical of the rest in recording a higher proportion of problem solving plus reading at this level than problem solving plus science.

Singapore, Shanghai and Ireland are the only three jurisdictions that score above 50% on all three of these combinations. However, the only jurisdictions that exceed the OECD averages in all three cases are Singapore, Hong Kong, Shanghai and Finland.

Table 6: PISA problem-solving: Percentage of students achieving proficiency level 5+ in domain-specific assessments

  PS + Ma PS + Sci PS + Re
Singapore 84.1 57.0 50.2
South Korea 73.5 34.1 40.3
Hong Kong 79.8 49.4 48.9
Shanghai 98.0 75.1 71.7
Taiwan 93.0 35.3 43.7
Canada 57.7 43.9 44.5
Australia 61.3 54.9 47.1
Finland 66.1 65.4 49.5
England (UK) 59.0 52.8 41.7
USA 54.6 46.9 45.1
Ireland 59.0 57.2 52.0
OECD Average 63.5 45.7 41.0

Chart 7: PISA Problem-solving: Percentage of students achieving proficiency level 5+ in domain-specific assessments

Problem solving chart 7.

What proportion of students achieve highly in one or more assessments?

Table 7 and Chart 8 below show how many students in each of my sample achieved proficiency level 5 or higher in problem-solving only, in problem solving and one or more assessments, in one or more assessments but not problem solving and in at least one assessment (ie the total of the three preceding columns).

I have also repeated in the final column the percentage achieving this proficiency level in each of maths, science and reading. (PISA has not released information about the proportion of students who achieved this feat across all four assessments.)

These reveal that the percentages of students who achieve proficiency level 5+ only in problem solving are very small, ranging from 0.3% in Shanghai to 6.7% in South Korea.

Conversely, the percentages of students achieving proficiency level 5+ in any one of the other assessments but not in problem solving are typically significantly higher, ranging from 4.5% in the USA to 38.1% in Shanghai.

There is quite a bit of variation in terms of whether jurisdictions score more highly on ‘problem solving and at least one other’ (second column) and ‘at least one other excluding problem solving (third column).

More importantly, the fourth column shows that the jurisdiction with the most students achieving proficiency level 5 or higher in at least one assessment is clearly Shanghai, followed by Singapore, Hong Kong, South Korea and Taiwan in that order.

The proportion of students achieving this outcome in Shanghai is close to three times the OECD average, comfortably more than twice the rate achieved in any of the ‘Western’ countries and three times the rate achieved in the USA.

The same is true of the proportion of students achieving this level in the three domain-specific assessments.

On this measure, South Korea and Taiwan fall significantly behind their Asian competitors, and the latter is overtaken by Australia, Finland and Canada.

 .

Table 7: Percentage achieving proficiency level 5+ in different combinations of PISA assessments

  PS only% PS + 1 or more% 1+ butNot PS% L5+ in at least one % L5+ in Ma + Sci + Re %
Singapore 4.3 25.0 16.5 45.8 16.4
South Korea 6.7 20.9 11.3 38.9 8.1
Hong Kong 3.4 15.9 20.5 39.8 10.9
Shanghai 0.3 17.9 38.1 56.3 19.6
Taiwan 1.2 17.1 20.4 38.7 6.1
Canada 5.5 12.0 9.9 27.4 6.5
Australia 4.7 12.0 7.7 24.4 7.6
Finland 3.0 12.0 11.9 26.9 7.4
England (UK) 4.4 9.8 6.8 21.0 5.7* all UK
USA 4.1 7.5 4.5 16.1 4.7
Ireland 2.6 6.8 10.1 19.5 5.7
OECD Average 3.1 8.2 8.5 19.8 4.4

Chart 8: Percentage achieving proficiency level 5+ in different combinations of PISA assessments

Problem solving chart 8

The Report comments:

The proportion of students who reach the highest levels of proficiency in at least one domain (problem solving, mathematics, reading or science) can be considered a measure of the breadth of a country’s/economy’s pool of top performers. By this measure, the largest pool of top performers is found in Shanghai-China, where more than half of all students (56%) perform at the highest levels in at least one domain, followed by Singapore (46%), Hong  Kong-China (40%), Korea and Chinese  Taipei (39%)…Only one OECD country, Korea, is found among the five countries/economies with the largest proportion of top performers. On average across OECD countries, 20% of students are top performers in at least one assessment domain.

The proportion of students performing at the top in problem solving and in either mathematics, reading or science, too can be considered a measure of the depth of this pool. These are top performers who combine the mastery of a specific domain of knowledge with the ability to apply their unique skills flexibly, in a variety of contexts. By this measure, the deepest pools of top performers can be found in Singapore (25% of students), Korea (21%), Shanghai-China (18%) and Chinese Taipei (17%). On average across OECD countries, only 8% of students are top performers in both a core subject and in problem solving.’

There is no explanation of why proficiency level 5 should be equated by PISA with the breadth of a jurisdiction’s ‘pool of top performers’. The distinction between proficiency levels 5 and 6 in this respect requires further discussion.

In addition to updated ‘all-rounder’ data showing what proportion of students achieved this outcome across all four assessments, it would be really interesting to see the proportion of students achieving at proficiency level 6 across different combinations of these four assessments – and to see what proportion of students achieving that outcome in different jurisdictions are direct beneficiaries of targeted support, such as a gifted education programme.

In the light of this analysis, what are jurisdictions’ priorities for improving  problem solving performance?

Leaving aside strengths and weaknesses in different elements of problem solving discussed above, this analysis suggests that the eleven jurisdictions in my sample should address the following priorities:

Singapore has a clear lead at proficiency level 6, but falls behind South Korea at level 5 (though Singapore re-establishes its ascendancy when levels 5 and 6 are considered together). It also has more level 1 performers than South Korea. It should perhaps focus on reducing the size of this tail and pushing through more of its mid-range performers to level 5. There is a pronounced imbalance in favour of boys at level 6, so enabling more girls to achieve the highest level of performance is a clear priority. There may also be a case for prioritising the children of semi-skilled workers.

South Korea needs to focus on getting a larger proportion of its level 5 performers to level 6. This effort should be focused disproportionately on girls, who are significantly under-represented at both levels 5 and 6. South Korea has a very small tail to worry about – and may even be getting close to minimising this. It needs to concentrate on improving the problem solving skills of its stronger performers in maths.

Hong Kong has a slightly bigger tail than Singapore’s but is significantly behind at both proficiency levels 5 and 6. In the case of level 6 it is equalled by Canada. Hong Kong needs to focus simultaneously on reducing the tail and lifting performance across the top end, where girls and weaker performers in maths are a clear priority.

Shanghai has a similar profile to Hong Kong’s in all respects, though with somewhat fewer level 6 performers. It also needs to focus effort simultaneously at the top and the bottom of the distribution. Amongst this sample, Shanghai has the worst under-representation of girls at level 5 and levels 5 and 6 together, so addressing that imbalance is an obvious priority. It also demonstrated the largest variation in performance against PISA’s IESC index, which suggests that it should target young people from disadvantaged backgrounds, as well as the children of semi-skilled workers.

Taiwan is rather similar to Hong Kong and Shanghai, but its tail is slightly bigger and its level 6 cadre slightly smaller, while it does somewhat better at level 5. It may need to focus more at the very bottom, but also at the very top. Taiwan also has a problem with high-performing girls, second only to Shanghai as far as level 5 and levels 5 and 6 together are concerned. However, like Shanghai, it does comparatively better than the other ‘Asian Tigers’ in terms of girls at level 6. It also needs to consider the problem solving performance of its weaker performers in maths.

Canada is the closest western competitor to the ‘Asian Tigers’ in terms of the proportions of students at levels 1 and 5 – and it already outscores Shanghai and Taiwan at level 6. It needs to continue cutting down the tail without compromising achievement at the top end. Canada also has small but significant gender imbalances in favour of boys at the top end.

Australia by comparison is significantly worse than Canada at level 1, broadly comparable at level 5 and somewhat worse at level 6. It too needs to improve scores at the very bottom and the very top. Australia’s gender imbalance is more pronounced at level 6 than level 5.

Finland has the same mean score as Australia’s but a smaller tail (though not quite as small as Canada’s). It needs to improve across the piece but might benefit from concentrating rather more heavily at the top end. Finland has a slight gender imbalance in favour of girls at level 5, but boys are more in the ascendancy at level 6 than in either England or the USA. As in Australia, this latter point needs addressing.

England has a profile similar to Australia’s, but less effective at all three selected proficiency levels. It is further behind at the top than at the bottom of the distribution, but needs to work hard at both ends to catch up the strongest western performers and maintain its advantage over the USA and Ireland. Gender imbalances are small but nonetheless significant.

USA has a comparatively long tail of low achievement at proficiency level 1 and, with the exception of Ireland, the fewest high achievers. This profile is very close to the OECD average. As in England, the relatively small size of gender imbalances in favour of boys does not mean that these can be ignored.

Ireland has the longest tail of low achievement and the smallest proportion of students at proficiency levels 5, 6 and 5 and 6 combined. It needs to raise the bar at both ends of the achievement distribution. Ireland has a larger preponderance of boys at level 6 than its Western competitors and this needs addressing. The limited socio-economic evidence suggests that Ireland should also be targeting the offspring of parents with semi-skilled and elementary occupations.

So there is further scope for improvement in all eleven jurisdictions. Meanwhile the OECD could usefully provide a more in-depth analysis of high achievers on its assessments that features:

  • Proficiency level 6 performance across the board.
  • Socio-economic disparities in performance at proficiency levels 5 and 6.
  • ‘All-rounder’ achievement at these levels across all four assessments and
  • Correlations between success at these levels and specific educational provision for high achievers including gifted education programmes.

.

GP

April 2014

Unpacking the Primary Assessment and Accountability Reforms

This post examines the Government response to consultation on primary assessment and accountability.

pencil-145970_640It sets out exactly what is planned, what further steps will be necessary to make these plans viable and the implementation timetable.

It is part of a sequence of posts I have devoted to this topic, most recently:

Earlier posts in the series include The Removal of National Curriculum Levels and the Implications for Able Pupils’ Progression (June 2012) and Whither National Curriculum Assessment Without Levels? (February 2013).

The consultation response contrives to be both minimal and dense. It is necessary to unpick each element carefully, to consider its implications for the package as a whole and to reflect on how that package fits in the context of wider education reform.

I have organised the post so that it considers sequentially:

  • The case for change, including the aims and core principles, to establish the policy frame for the planned reforms.
  • The impact on the assessment experience of children aged 2-11 and how that is likely to change.
  • The introduction of baseline assessment in Year R.
  • The future shape of end of KS1 and end of KS2 assessment respectively.
  • How the new assessment outcomes will be derived, reported and published.
  • The impact on floor standards.

Towards the end of the post I have also provided a composite ‘to do’ list containing all the declared further steps necessary to make the plan viable, with a suggested deadline for each.

And the post concludes with an overall judgement on the plans, in the form of a summary of key issues and unanswered questions arising from the earlier commentary. Impatient readers may wish to jump straight to that section.

I am indebted to Warwick Mansell for his previous post on this topic. I shall try hard not to parrot the important points he has already made, though there is inevitably some overlap.

Readers should also look to Michael Tidd for more information about the shape and content of the new tests.

What has been published?

The original consultation document ‘Primary assessment and accountability under the new national curriculum’ was published on 17 July 2013 with a deadline for response of 17 October 2013. At that stage the Government’s response was due ‘in autumn 2013’.

The response was finally published on 27 March, some four months later than planned and only five months prior to the introduction of the revised national curriculum which these arrangements are designed to support.

It is likely that the Government will have decided that 31 March was the latest feasible date to issue the response, so they were right up against the wire.

It was accompanied by:

  • A press release which focused on the full range of assessment reforms – for primary, secondary and post-16.

Shortly before the response was published, the reply to a Parliamentary question asked on 17 March explained that test frameworks were expected to be included within it:

‘Guidance on the nature of the revised key stage 1 and key stage 2 tests, including mathematics, will be published by the Standards and Testing Agency in the form of test framework documents. The frameworks are due to be released as part of the Government’s response to the primary assessment and accountability consultation. In addition, some example test questions will be made available to schools this summer and a full sample test will be made available in the summer of 2015.’ (Col 383W)

.

.

In the event, these documents – seven in all – did not appear until 31 March and there was no reference to any of the three commitments above in what appeared on 27 March.

Finally, the Standards and Testing Agency published on 3 April a guidance page on national curriculum tests from 2016. At present it contains very little information but further material will be added as and when it is published.

Partly because the initial consultation document was extremely ‘drafty’, the reaction of many key external respondents to the consultation was largely negative. One imagines that much of the period since 17 October has been devoted to finding the common ground.

Policy makers will have had to do most of their work after the consultation document issued because they were not ready beforehand.

But the length of the delay in issuing the response would suggest that they also encountered significant dissent amongst internal stakeholders – and that the eventual outcome is likely to be a compromise of sorts between these competing interests.

Such compromises tend to have observable weaknesses and/or put off problematic issues for another day.

A brief summary of consultation responses is included within the Government’s response. I will refer to this at relevant points during the discussion below.

 .

The Case for Change

 .

Aims

The consultation response begins – as did the original consultation document – with a section setting out the case for reform.

It provides a framework of aims and principles intended to underpin the changes that are being set in place.

The aims are:

  • The most important outcome of primary education is to ‘give as many pupils as possible the knowledge and skills to flourish in the later phases of education’. This is a broader restatement of the ‘secondary ready’ concept adopted in the original consultation document.
  • The primary national curriculum and accountability reforms ‘set high expectations so that all children can reach their potential and are well prepared for secondary school’. Here the ‘secondary ready’ hurdle is more baldly stated. The parallel notion is that all children should do as well as they can – and that they may well achieve different levels of performance. (‘Reach their potential’ is disliked by some because it is considered to imply a fixed ceiling for each child and fixed mindset thinking.)
  • To raise current threshold expectations. These are set too low, since too few learners (47%) with KS2 level 4C in both English and maths go on to achieve five or more GCSE grades A*-C including English and maths, while 72% of those with KS2 level 4B do so. So the new KS2 bar will be set at this higher level, but with the expectation that 85% of learners per school will jump it, 13% more than the current national figure. Meanwhile the KS4 outcome will also change, to achievement across eight GCSEs rather than five, quite probably at a more demanding level than the present C grade. In the true sense, this is a moving target.
  • No child should be allowed to fall behind’. This is a reference to the notion of ‘mastery’ in its crudest sense, though the model proposed will not deliver this outcome. We have noted already a reference to ‘as many children as possible’ and the school-level target – initially at least – will be set at 85%. In reality, a significant minority of learners will progress more slowly and will fall short of the threshold at the end of KS2.
  • The new system ‘will set a higher bar’ but ‘almost all pupils should leave primary school well-placed to succeed in the next phase of their education’. Another nuanced version of ‘secondary ready’ is introduced. This marks a recognition that some learners will not jump over the higher bar. In the light of subsequent references to 85%, ‘almost all’ is rather over-optimistic.
  • We also want to celebrate the progress that pupils make in schools with more challenging intakes’. Getting ‘nearly all pupils to meet this standard…’ (the standard of secondary readiness?) ‘…is very demanding, at least in the short term’. There will therefore be recognition of progress ‘from a low starting point’ – even though these learners have, by definition, been allowed to fall behind and will continue to do so.

So there is something of a muddle here, no doubt engendered by a spirit of compromise.

The black and white distinction of ‘secondary-readiness’ has been replaced by various verbal approximations, but the bottom line is that there will be a defined threshold denoting preparedness that is pitched higher than the current threshold.

And the proportion likely to fall short is downplayed – there is apparent unwillingness at this stage to acknowledge the norm that up to 15% of learners in each school will undershoot the threshold – substantially more in schools with ‘challenging intakes’.

What this boils down to is a desire that all will achieve the new higher hurdle – and that all will be encouraged to exceed it if they can – tempered by recognition that this is presently impossible. No child should be allowed to fall behind but many inevitably will do so.

It might have been better to express these aims in the form of future aspirations – and our collective efforts to bridge the gap between present reality and those ambitious aspirations.

Principles

The section concludes with a new set of principles governing pedagogy, assessment and accountability:

  • ‘Ongoing, teacher-led assessment is a crucial part of effective teaching;
  • Schools should have the freedom to decide how to teach their curriculum and how to track the progress that pupils make;
  • Both summative teacher assessment and external testing are important;
  • Accountability is key to a successful school system, and therefore must be fair and transparent;
  • Measures of both progress and attainment are important for understanding school performance; and
  • A broad range of information should be published to help parents and the wider public know how well schools are performing.’

These are generic ‘motherhood and apple pie’ statements and so largely uncontroversial. I might have added a seventh – that schools’ in-house assessment and reporting systems must complement summative assessment and testing, including by predicting for parents the anticipated outcomes of the latter.

Perhaps interestingly, there is no repetition of the defence for the removal of national curriculum levels. Instead, the response concentrates on the support available to schools.

It mentions discussion with an ‘expert group on assessment’ about ‘how to support schools to make best use of the new assessment freedoms’. We are not told the membership of this group (which, as far as I know, has not been made public) or the nature of its remit.

There is also a link to information about the Assessment Innovation Fund, which will provide up to 10 grants of up to £10,000 which schools and organisations can use to develop packages that share their innovative practice with others.

 

Children’s experience of assessment up to the end of KS2

The response mentions the full range of national assessments that will impact on children between the ages of two and 11:

  • The statutory progress check at two years of age.
  • A new baseline assessment undertaken within a few weeks of the start of Year R, introduced from September 2015.
  • An Early Years Foundation Stage Profile undertaken in the final term of the year in which children reach the age of five. A revised profile was introduced from September 2012. It is currently compulsory but will be optional from September 2016. The original consultation document said that the profile would no longer be moderated and data would no longer be collected. Neither of those commitments is repeated here.
  • The Phonics Screening Check, normally undertaken in Year 1. The possibility of making these assessments non-statutory for all-through primary schools, suggested in the consultation document, has not been pursued: 53% of respondents opposed this idea, whereas 32% supported it.
  • End of KS1 assessment and
  • End of KS2 assessment.

So a total of six assessments are in place between the ages of two and 11. At least four – and possibly five – will be undertaken between ages two and seven.

It is likely that early years’ professionals will baulk at this amount of assessment, no matter how sensitively it is designed. But the cost and inefficiency of the model is also open to criticism.

The Reception Baseline

Approach

The original consultation document asked whether:

  • KS1 assessment should be retained as a baseline – 45% supported this and 41% were opposed.
  • A baseline check should be introduced at the start of Reception – 51% supported this and 34% were opposed.
  • Such a baseline check should be optional – 68% agreed and 19% disagreed.
  • Schools should be allowed to choose from a range of commercially available materials for this baseline check – 73% said no and only 15% said yes.

So, whereas views were mixed on where the baseline should be set, there were substantial majorities in favour of any Year R baseline check being optional and following a single, standard national format.

The response argues that Year R is the most sensible point at which to position the baseline since that is:

‘…the earliest point that nearly all children are in school’.

What happens in respect of children who are not in school at this point is not discussed.

There is no explanation of why the Government has disregarded the clear majority of respondents by choosing to permit a range of assessment approaches, so this decision must be ideologically motivated.

The response says ‘most’ are likely to be administered by teaching staff, leaving open the possibility that some options will be administered externally.

Design

Such assessments will need to be:

‘…strong predictors of key stage 1 and key stage 2 attainment, whilst reflecting the age and abilities of children in Reception’.

Presumably this means predictors of attainment in each of the three core subjects – English, maths and science – rather than any broader notion of attainment. The challenge inherent in securing a reasonable predictor of attainment across these domains seven years further on in a child’s development should not be under-estimated.

The response points out that such assessment tools are already available for use in Year R, some are used widely and some schools have long experience of using them. But there is no information about how many of these are deemed to meet already the description above.

In any case, new criteria need to be devised which all such assessments must meet. Some degree of modification will be necessary for all existing products and new products will be launched to compete in the market.

There is an opportunity to use this process to ratchet up the Year R Baseline beyond current expectations, so matching the corresponding process at the end of KS2. The consultation response says nothing about whether this is on the cards.

Interestingly, in his subsequent ‘Unsure start’ speech about early years inspection, HMCI refers to:

‘…the government’s announcement last week that they will be introducing a readiness-for-school test at age four. This is an ideal opportunity to improve accountability. But I think it should go further.

I hope that the published outcomes of these tests will be detailed enough to show parents how their own child has performed. I fear that an overall school grade will fail to illuminate the progress of poor children. I ask government to think again about this issue.’

The terminology – ‘readiness for school’ is markedly blunter than the references to a reception baseline in the consultation response. There is nothing in the response about the outcomes of these tests being published, nor anything about ‘an overall school grade’.

Does this suggest that decisions have already been made that were not communicated in the consultation response?

.

Timeline, options, questions

Several pieces of further work are required in short order to inform schools and providers about what will be required – and to enable both to prepare for introduction of the assessments from September 2015. All these should feature in the ‘to do’ list below.

One might reasonably have hoped that – especially given the long delay – some attempt might have been made to publish suggested draft criteria for the baseline alongside the consultation response. The fact that even preliminary research into existing practice has not been undertaken is a cause for concern.

Although the baseline will be introduced from September 2015, there is a one-year interim measure which can only apply to all-through primary schools:

  • They can opt out of the Year R baseline measure entirely, relying instead on KS1 outcomes as their baseline; or
  • They can use an approved Year R baseline assessment and have this cohort’s progress measured at the end of KS2 (which will be in 2022) by either the Year R or the KS1 baseline, whichever demonstrates the most progress.

In the period up to and including 2021, progress will continue to be measured from the end of KS1. So learners who complete KS2 in 2021 for example will be assessed on progress since their KS1 tests in 2017.

Junior and middle schools will also continue to use a KS1 baseline.

Arrangements for infant and first schools are still to be determined, another rather worrying omission at this stage in proceedings.

It is also clear that all-through primary schools (and infant/first schools?) will continue to be able to opt out from the Year R baseline from September 2016 onwards, since the response says:

‘Schools that choose not to use an approved baseline assessment from 2016 will be judged on an attainment floor standard alone’.

Hence the Year R baseline check is entirely optional and a majority of schools could choose not to undertake it.

However, they would need to be confident of meeting the demanding 85% attainment threshold in the floor standard.

They might be wise to postpone that decision until the pitch of the progress expectation is determined. For neither the Year R baseline nor the amount of progress that learners are expected to make from their starting point in Year R is yet defined.

This latter point applies at the average school level (for the purposes of the floor standard) and in respect of the individual learner. For example, if a four year-old is particularly precocious in, say, maths, what scaled scores must they register seven years later to be judged to have made sufficient progress?

There are several associated questions that follow on from this.

Will it be in schools’ interests to acknowledge that they have precocious four year-olds at all? Will the Year R baseline reinforce the tendency to use Reception to bring all children to the same starting point in readiness for Year 1, regardless of their precocity?

Will the moderation arrangements be hard-edged enough to stop all-through primary schools gaming the system by artificially depressing their baseline outcomes?

Who will undertake this moderation and how much will it cost? Will not the decision to permit schools to choose from a range of measures unnecessarily complicate the moderation process and add to the expense?

The consultation response neither poses these questions nor supplies answers.

The future shape of end KS1 and end KS2 assessment

.

What assessment will take place?

At KS1 learners will be assessed in:

  • Reading – test plus teacher assessment
  • Writing – test (of grammar, punctuation and spelling) plus teacher assessment
  • Speaking and listening – teacher assessment
  • Maths – test plus teacher assessment
  • Science  – teacher assessment

The new test of grammar, punctuation and spelling did not feature in the original consultation and has presumably been introduced to strengthen the marker of progress to which four year-olds should aspire at age seven.

The draft test specifications for the KS1 tests in reading, GPS and maths outline the requirements placed on the test developers, so it is straightforward to compare the specifications for reading and maths with the current tests.

The GPS test will include a 20 minute written grammar and punctuation task; a 20 minute test comprising short grammar, punctuation and vocabulary questions; and a 15 minute spelling task.

There is a passing reference to further work on KS1 moderation which is included in the ‘to do’ list below.

At KS2 learners will be assessed in

  • Reading – test plus teacher assessment
  • Writing – test (of grammar spelling and punctuation) plus teacher assessment
  • Maths – test plus teacher assessment
  • Science  – teacher assessment plus a science sampling test.

Once again, the draft test specifications – reading, GPS, maths and science sampling – describe the shape of each test and the content they are expected to assess.

I will leave it to experts to comment on the content of the tests.

 .

Academies and free schools

It is important to note that the framing of this content – by means of detailed ‘performance descriptors’ – means that the freedom academies and free schools enjoy in departing from the national curriculum will be largely illusory.

I raised this issue back in February 2013:

  • ‘We know that there will be a new grading system in the core subjects at the end of KS2. If this were to be based on the ATs as drafted, it could only reflect whether or not learners can demonstrate that they know, can apply and understand ‘the matters, skills and processes specified’ in the PoS as a whole. Since there is no provision for ATs that reflect sub-elements of the PoS – such as reading, writing, spelling – grades will have to be awarded on the basis of separate syllabuses for end of KS2 tests associated with these sub-elements.
  • This grading system must anyway be applied universally if it is to inform the publication of performance tables. Since some schools are exempt from National Curriculum requirements, it follows that grading cannot be derived directly from the ATs and/or the PoS, but must be independent of them. So this once more points to end of KS2 tests based on entirely separate syllabuses which nevertheless reflect the relevant part of the draft PoS. The KS2 arrangements are therefore very similar to those planned at KS4.’

I have more to say about the ‘performance descriptors’ below.

 .

Single tests for all learners

A critical point I want to emphasise at this juncture – not mentioned at all in the consultation document or the response – is the test development challenge inherent in producing single papers suitable for all learners, regardless of their attainment.

We know from the response that the P-scales will be retained for those who are unable to access the end of key stage tests. (Incidentally, the content of the P-scales will remain unchanged so they will not be aligned with the revised national curriculum, as suggested in the consultation document.)

There will also be provision for pupils who are working ‘above the P-scales but below the level of the test’.

Now the P-scales are for learners working below level 1 (in old currency). This is the first indication I have seen that the tests may not cater for the full range from Level 1-equivalent to Level 6-equivalent and above. But no further information is provided.

It may be that this is a reference to learners who are working towards level 1 (in old currency) but do not have SEN.

The 2014 KS2 ARA booklet notes:

‘Children working towards level 1 of the national curriculum who do not have a special educational need should be reported to STA as ‘W’ (Working below the level). This includes children who are working towards level 1 solely because they have English as an additional language. Schools should use the code ‘NOTSEN’ to explain why a child working towards level 1 does not have P scales reported. ‘NOTSEN’ replaces the code ‘EAL’ that was used in previous years.’

The consultation document said:

‘We do not propose to develop an equivalent to the current level 6 tests, which are used to challenge the highest-attaining pupils. Key stage 2 national curriculum tests will include challenging material (at least of the standard of the current level 6 test) which all pupils will have the opportunity to answer, without the need for a separate test’.

The draft test specifications make it clear that the tests should:

‘provide a suitable challenge for all children and give every child the opportunity to achieve as high a standard…as possible.’

Moreover:

‘In order to improve general accessibility for all children, where possible, questions will be placed in order of difficulty.’

The development of single tests covering this span of attainment – from level 1 to above level 6 – tests in which the questions are posed in order of difficulty and even the highest attainers must answer all questions – seem to me to be a very tall order, especially in maths.

More than that, I urgently need persuading that this is not a waste of high attainers’ time and poor assessment practice.

 .

How assessment outcomes will be derived, reported and published

Deriving assessment outcomes

One of the reasons cited for replacing national curriculum levels was the complexity of the system and the difficulty parents experienced in understanding it.

The Ministerial response to the original report from the National Curriculum Expert Panel said:

‘As you rightly identified, the current system is confusing for parents and restrictive for teachers. I agree with your recommendation that there should be a direct relationship between what children are taught and what is assessed. We will therefore describe subject content in a way which makes clear both what should be taught and what pupils should know and be able to do as a result.’

The consultation document glossed the same point thus:

‘Schools will be able to focus their teaching, assessment and reporting not on a set of opaque level descriptions, but on the essential knowledge that all pupils should learn.’

However, the consultation response introduces for the first time the concept of a ‘performance descriptor’.

This term is defined in the glossaries at the end of each draft test specification:

Description of the typical characteristics of children working at a particular standard. For these tests, the performance descriptor will characterise the minimum performance required to be working at the appropriate standard for the end of the key stage.’

Essentially this is a collective term for something very similar to old-style level descriptions.

Except that, in the case of the tests, they are all describing the same level of performance.

They have been rendered necessary by the odd decision to provide only a single generic attainment target for each programme of study. But, as noted back in February 2013, the test developers need a more sophisticated framework on which to base their assessments.

According to the draft test specifications they will also be used

‘By a panel of teachers to set the standards on the new tests following their first administration in May 2016’.

When it comes to teacher assessment, the consultation response says:

‘New performance descriptors will be introduced to inform the statutory teacher assessments at the end of key stage one [and]…key stage two.’

But there are two models in play simultaneously.

In four cases – science at KS1 and reading, maths and science at KS2 – there will be ‘a single performance descriptor of the new expected standard’, in the same way as there are in the test specifications.

But in five cases – reading, writing, speaking and listening and maths at KS1; and writing at KS2 :

‘teachers will assess pupils as meeting one of several performance descriptors’.

These are old-style level descriptors by another name. They perform exactly the same function.

The response says that the KS1 teacher assessment performance descriptors will be drafted by an expert group for introduction in autumn 2014. It does not mention whether KS2 teacher assessment performance descriptors will be devised in the same way and to the same timetable.

 .

Reporting assessment outcomes to parents

When it comes to reporting to parents, there will be three different arrangements in play at both KS1 and KS2:

  • Test results will be reported by means of scaled scores (of which more in a moment).
  • One set of teacher assessments will be reported by selecting from a set of differentiated performance descriptors.
  • A second set of teacher assessments will be reported according to whether learners have achieved a single threshold performance descriptor.

This is already significantly more complex than the previous system, which applied the same framework of national curriculum levels across the piece.

It seems that KS1 test outcomes will be reported as straightforward scaled scores (though this is only mentioned on page 8 of the main text of the response and not in Annex B, which compares the new arrangements with those currently in place).

But, in the case of KS2:

‘Parents will be provided with their child’s score alongside the average for their school, the local area and nationally. In the light of the consultation responses, we will not give parents a decile ranking for their child due to concerns about whether decile rankings are meaningful and their reliability at individual pupil level.’

The consultation document proposed a tripartite reporting system comprising:

  • A scaled score for each KS2 test, derived from raw test marks and built around a ‘secondary readiness standard’. This standard would be set at a scaled score of 100, which would remain unchanged. It was suggested for illustrative purposes that a scale based on the current national curriculum tests might run from 80 to 130.
  • An average scaled score in each test for other pupils nationally with the same prior attainment at the baseline. Comparison of a learner’s scaled score with the average scaled score would show whether they had made more or less progress than the national average.
  • A national ranking in each test – expressed in terms of deciles – showing how a learner’s scaled score compared with the range of performance nationally.

The latter has been dispensed with, given that 35% of consultation respondents disagreed with it, but there were clearly technical reservations too.

In its place, the ‘value added’ progress measure has been expanded so that there is a comparison with other pupils in the learner’s own school and the ‘local area’ (which presumably means local authority). This beefs up the progression element in reporting at the expense of information about the attainment level achieved.

So at the end of KS2 parents will receive scaled scores and three average scaled scores for each of reading, writing and maths – twelve scores in all – plus four performance descriptors, of which three will be singleton threshold descriptors (reading, maths and science) and one will be selected from a differentiated series (writing). That makes sixteen assessment outcomes altogether, provided in four different formats.

The consultation response tells us nothing more about the range of the scale that will be used to provide scaled scores. We do not even know if it will be the same for each test.

The draft test specifications say that:

‘The exact scale for the scaled scores will be determined following further analysis of trialling data. This will include a full review of the reporting of confidence intervals for scaled scores.’

But they also contain this worrying statement:

‘The provision of a scaled score will aid in the interpretation of children’s performance over time as the scaled score which represents the expected standard will be the same year on year. However, at the extremes of the scaled score distribution, as is standard practice, the scores will be truncated such that above and below a certain point, all children will be awarded the same scaled score in order to minimise the effect for children at the ends of the distribution where the test is not measuring optimally.’

This appears to suggest that scaled scores will not accurately describe performance at the extremes of the distribution, because the tests will not accurately measure such performance. This might be describing a statistical truism, but it again begs the question whether the highest attainers are being short-changed by the selected approach.

.

Publication of assessment outcomes

The response introduces the idea that ‘a suite of indicators’ will be published on each school’s own website in a standard format. These are:

  • The average progress made by pupils in reading, writing and maths. (This is presumably relevant to both KS1 and KS2 and to both tests and teacher assessment.)
  • The percentage of pupils reaching the expected standard in reading, writing and mathematics at the end of key stage 2. (This is presumably relevant to both tests and teacher assessment.)
  • The average score of pupils in their end of key stage 2 assessments. (The final word suggests teacher assessment as well as tests, even though there will not be a score from the former.)
  • The percentage of pupils who achieve a high score in all areas at the end of key stage 2. (Does ‘all areas’ imply something more than statutory tests and teacher assessments? Does it mean treating each area separately, or providing details only of those who have achieved high scores across all areas?)

The latter is the only reference to high attainers in the entire response. It does not give any indication of what will count as a high score for these purposes. Will it be designed to catch the top-third of attainers or something more demanding, perhaps equivalent to the top decile?

A decision has been taken not to report the outcomes of assessment against the P-scales because the need to contextualise such information is perceived to be relatively greater.

And, as noted above, HMCI let slip the fact that the outcomes of reception baselines would also be published, but apparently in the form of a single overall grade.

We are not told when these requirements will be introduced, but presumably they must be in place to report the outcomes of assessments undertaken in spring 2016.

Additionally:

‘So that parents can make comparisons between schools, we would like to show each school’s position in the country on these measures and present these results in a manner that is clear for all audiences to understand. We will discuss how best to do so with stakeholders, to ensure that the presentation of the data is clear, fair and statistically robust.’

This suggests inclusion in the 2016 School Performance Tables, but this is not stated explicitly.

Indeed, apart from references to the publication of progress measures in the 2022 Performance Tables, there is no explicit coverage of their contribution in the response, nor any reference to the planned supporting data portal, or how data will be distributed between the Tables and the portal.

The original consultation document gave several commitments on the future content of performance tables. They included:

  • How many of a school’s pupils are amongst the highest attaining nationally, by showing the percentage of pupils achieving a high scaled score in each subject.
  • Measures to show the attainment and progress of learners attracting the Pupil Premium.
  • Comparison of each school’s performance with that of schools with similar intakes.

None are mentioned here, nor are any of the suggestions advanced by respondents taken up.

Floor standards

Changes are proposed to the floor standards with effect from September 2016.

This section of the response begins by committing to:

‘…a new floor standard that holds schools to account both on the progress that they make and on how well their pupils achieve.’

But the plans set out subsequently do not meet this description.

The progress element of the current floor standard relates to any of reading, writing or mathematics but, under the new floor standard, it will relate to all three of these together.

An all-though primary school must demonstrate that:

‘…pupils make sufficient progress at key stage 2 from their starting point…’

As we have noted above, all-through primaries can opt to use the KS1 baseline or the Year R baseline in 2015. Moreover, from 2016 they can choose not to use the Year R baseline and be assessed solely on the attainment measure in the floor standards (see below).

Junior and middle schools obviously apply the KS1 baseline, while arrangements for infant and first schools have yet to be finalised.

What constitutes ‘sufficient progress’ is not defined. Annex C of the response says:

‘For 2016 we will set the precise extent of progress required once key stage 2 tests have been sat for the first time.’

Presumably this will be progress from KS1 to KS2, since progress from the Year R baseline will not be introduced until 2023.

The attainment element of the new floor standards is for schools to have 85% or more of pupils meeting the new, higher threshold standard at the end of KS2 in all of reading, writing and maths. The text says explicitly that this threshold is ‘similar to a level 4b under the current system’.

Annex C clarifies that this will be judged by the achievement of a scaled score of 100 or more in each of the reading and maths tests, plus teacher assessment that learners have reached the expected standard in writing (so the GPS test does not count in the same way, simply informing the teacher assessment).

As noted above, this a far bigger ask than the current reference to 65% of learners meeting the expected (and lower 4c) standard. The summary at the beginning of the response refers to it as ‘a challenging aspiration’:

‘Over time we expect more and more schools to achieve this standard.’

The statement in the first paragraph of this section of the response led us to believe that these two requirements – for progress and attainment respectively – would be combined, so that schools would be held account for both (unless, presumably, they exercised their right to opt out of the Year R baseline assessment).

But this is not the case. Schools need only achieve one or the other.

It follows that schools with a very high performing intake may exceed the floor standards on the basis of all-round high attainment alone, regardless of the progress made by their learners.

The reason for this provision is unclear, though one suspects that schools with an extremely high attaining intake, whether at Reception or Year 3, will be harder pressed to achieve sufficient progress, presumably because some ceiling effects come into play at the end of KS2.

This in turn might suggest that the planned tests do not have sufficient headroom for the highest attainers, even though they are supposed to provide similar challenge to level 6 and potentially extend beyond it.

Meanwhile, schools with less than stellar attainment results will be obliged to follow the progress route to jump the floor standard. This too will be demanding because all three domains will be in play.

There will have been some internal modelling undertaken to judge how many schools would be likely to fall short of the floor standards given these arrangements and it would be very useful to know these estimates, however unreliable they prove to be.

In their absence, one suspects that the majority of schools will be below the floor standards, at least initially. That of course materially changes the nature and purpose of the standards.

To Do List

The response and the draft specifications together contain a long list of work to be carried out over the next two years or so. I have included below my best guess as to the latest possible date for each decision to be completed and communicated:

  • Decide how progress will be measured for infants and first schools between the Year R baseline and the end of KS1 (April 2014)
  • Make available to schools a ‘small number’ of sample test questions for each key stage and subject (Summer 2014)
  • Work with experts to establish the criteria for the Year R baseline (September 2014)
  • KS1 [and KS2?] teacher assessment performance descriptors to be drafted by an expert group (September 2014)
  • Complete and report outcomes of a study with schools that already use Year R baseline assessments (December 2014)
  • Decide how Year R baseline assessments will be moderated (December 2014)
  • Publish a list of assessments that meet the Year R baseline criteria (March 2015)
  • Decide how Year R baseline results will be communicated to parents and to Ofsted (March 2015)
  • Make available to schools a full set of sample materials including tests and mark schemes for all KS1 and KS2 tests (September 2015)
  • Complete work with Ofsted and Teachers to improve KS1 moderation (September 2015)
  • Provide further information to enable teachers to assess pupils at the end of KS1 and KS2 who are ‘working above the P-scales but below the level of the test’ (September 2015)
  • Decide whether to move to external moderation of P-scale teacher assessment (September 2015)
  • Agree with stakeholders how to compare schools’ performance on a suite of assessment outcomes published in a standard format (September 2015)
  • Publish all final test frameworks (Autumn 2015)
  • Introduce new requirements for schools to publish a suite of assessment outcomes in a standard format (Spring 2016)
  • Panels of teacher use level descriptors to set the standards on the new tests following their first administration in May 2016 (Summer 2016)
  • Define what counts as sufficient progress from the Year R baseline to end KS1 and end KS2 respectively (Summer 2016)

Conclusion

Overall the response is rather more cogent and coherent than the original consultation document, though there are several inconsistencies and many sins of omission.

Drawing together the key issues emerging from the commentary above, I would highlight twelve key points:

  • The declared aims express the policy direction clumsily and without conviction. The ultimate aspirations are universal ‘secondary readiness’ (though expressed in broader terms), ‘no child left behind’ and ‘every child fulfilling their potential’ but there is no real effort to reconcile these potentially conflicting notions into a consensual vision of what primary education is for. Moreover, an inconvenient truth lurks behind these statements. By raising expectations so significantly – 4b equivalent rather than 4c; 85% over the attainment threshold rather than 65%; ‘sufficient progress’ rather than median progress and across three domains rather than one – there will be much more failure in the short to medium term. More learners will fall behind and fall short of the thresholds; many more schools are likely to undershoot the floor standards. It may also prove harder for some learners to demonstrate their potential. It might have been better to acknowledge this reality and to frame the vision in terms of creating the conditions necessary for subsequent progress towards the ultimate aspirations.
  • Younger children are increasingly caught in the crossbeam from the twin searchlights of assessment and accountability. HMCI’s subsequent intervention has raised the stakes still further. This creates obvious tensions in the sector which can be traced back to disagreements over the respective purposes of early years and primary provision and how they relate to each other. (HMCI’s notion of ‘school readiness’ is no doubt as narrow to early years practitioners as ‘secondary readiness’ is to primary educators.) But this is not just a theoretical point. Additional demands for focused inspection, moderation and publication of outcomes all carry a significant price tag. It must be open to question whether the sheer weight of assessment activity is optimal and delivers value for money. Should a radical future Government – probably with a cost-cutting remit – have rationalisation in mind?
  • Giving schools the freedom to choose from a range of Year R baseline assessment tools also seems inherently inefficient and flies in the face of the clear majority of consultation responses. We are told nothing of the perceived quality of existing services, none of which can – by definition – satisfy these new expectations without significant adjustment. It will not be straightforward to construct a universal and child-friendly instrument that is a sufficiently strong predictor of Level 4b-equivalent performance in KS2 reading, writing and maths assessments undertaken seven years later. Moreover, there will be a strong temptation for the Government to pitch the baseline higher than current expectations, so matching the  realignment at the other end of the process. Making the Reception baseline assessment optional – albeit with strings attached – seems rather half-hearted, almost an insurance against failure. Effective (and expensive) moderation may protect against widespread gaming, but the risk remains that Reception teachers will be even more predisposed to prioritise universal school readiness over stretching their more precocious four year-olds.
  • The task of designing an effective test for all levels of prior attainment at the end of key stage 2 is equally fraught with difficulty. The P-scales will be retained (in their existing format, unaligned with the revised national curriculum) for learners with special needs working below the equivalent of what is currently level 1. There will also be undefined provision ‘for those working above the level of the P-scales but below the level of the test’, even though the draft test development frameworks say:

‘All eligible children who are registered at maintained schools, special schools, or academies (including free schools) in England and are at the end of key stage 2 will be required to take the…test, unless they have taken it in the past.’

And this applies to all learners other than those in the exempted categories set out in the ARA booklets. The draft specifications add that test questions will be placed in order of difficulty. I have grave difficulty in understanding how such assessments can be optimal for high attainers and fear that this is bad assessment practice.

  • On top of this there is the worrying statement in the test development frameworks that scaled scores will be ‘truncated’ at the extremes of the distribution’. This does not fill one with confidence that the highest and lowest attainers will have their test performance properly recognised and reported.
  • The necessary invention of ‘performance descriptors’ removes any lingering illusion that academies and free schools have significant freedom to depart from the national curriculum, at least as far as the core subjects are concerned. It is hard to understand why these descriptors could not have been published alongside the programmes of study within the national curriculum.
  • The ‘performance descriptors’ in the draft test specifications carry all sorts of health warnings that they are inappropriate for teacher assessment because they cover only material that can be assessed in a written test. But there will be significant overlap between the test and teacher assessment versions, particularly in those that describe threshold performance at the equivalent of level 4b. For we know now that there will also be hierarchies of performance descriptors – aka level descriptors – for KS1 teacher assessment in reading, writing, speaking and listening and maths, as well as for KS2 teacher assessment in writing. Levels were so problematic that it has been necessary to reinvent them!
  • What with scaled scores, average scaled scores, threshold performance descriptors and ‘levelled’ performance descriptors, schools face an uphill battle in convincing parents that the reporting of test outcomes under this system will be simpler and more understandable. At the end of KS2 they will receive 16 different assessments in four different formats. (Remember that parents will also need to cope with schools’ approaches to internal assessment, which may or may not align with these arrangements.)
  • We are told about new requirements to be placed on schools to publish assessment outcomes, but the description is infuriatingly vague. We do not know whether certain requirements apply to both KS1 and 2, and/or to both tests and teacher assessment. The reference to ‘the percentage of pupils who achieve a high score in all areas at the end of key stage 2’ is additionally vague because it is unclear whether it applies to performance in each assessment, or across all assessments combined. Nor is the pitch of the high score explained. This is the only reference to high attainers in the entire response and it raises more questions than it answers.
  • We also have negligible information about what will appear in the school performance tables and what will be relegated to the accompanying data portal. We know there is an intention to compare schools’ performance on the measures they are required to publish and that is all. Much of the further detail in the original consultation document may or may not have fallen by the wayside.
  • The new floor standards have all the characteristics of a last-minute compromise hastily stitched together. The consultation document was explicit that floor standards would:

‘…focus on threshold attainment measures and value-added progress measures’

It anticipated that the progress measure would require average scaled scores of between 98.5 and 99.0 adding:

‘Our modelling suggests that a progress measure set at this level, combined with the 85% threshold attainment measure, would result in a similar number of schools falling below the floor as at present.’

But the analysis of responses fails to report at all on the question ‘Do you have any comments about these proposals for the Department’s floor standards?’ It does include the response to a subsequent question about including an average point score attainment measure in the floor standards (39% of respondents were in favour of this against 31% against). But the main text does not discuss this option at all. It begins by stating that both an attainment and a progress dimension are in play, but then describes a system in which schools can choose one or the other. There is no attempt to quantify ‘sufficient progress’ and no revised modelling of the impact of standards set at this level. We are left with the suspicion that a very significant proportion of schools will not exceed the floor. There is also a potential perverse incentive for schools with very high attaining intakes not to bother about progress at all.

  • Finally, the ‘to do’ list is substantial. Several of those with the tightest deadlines ought really to have been completed ahead of the consultation response, especially given the significant delay. There is nothing about the interaction between this work programme and that proposed by NAHT’s Commission on Assessment. Much of this work would need to take place on the other side of a General Election, while the lead time for assessing KS2 progress against a Year R baseline is a full nine years. This makes the project as a whole particularly vulnerable to the whims of future governments.

I’m struggling to find the right description for the overall package. I don’t think it’s quite substantial or messy enough to count as a dog’s breakfast. But, like a poorly airbrushed portrait, it flatters to deceive. Seen from a distance it appears convincing but, on closer inspection, there are too many wrinkles that have not been properly smoothed out

GP

April 2014