Watchdog fears public bodies will avoid statistical models following exam grading backlash

Public bodies may be “less willing” to use statistical models to support decisions in future after the system to award exam grades last year failed to “command public confidence”, the statistics watchdog has warned. A review by the Office for Statistics Regulation, the regulatory arm of the UK Statistics Authority, found the grading system prompted […]

Billy Camden

Regulators and boards ‘worked with integrity’

In the wake of last year’s fiasco, ministers were widely reported to have attempted to lay the blame with exams regulator Ofqual, and chief regulator Sally Collier resigned shortly afterwards.

The prime minister Boris Johnson also sought to blame the “mutant algorithm” for the problems when he addressed students in the summer.

But the OSR found that teams in regulators and exam boards in all four UK nations “worked with integrity to try to develop the best method in the time available to them”.

“In each country there were aspects of the model development that were done well, and aspects where a different choice may have led to a different outcome.

“However, none of the models were able to command public confidence and there was widespread public dissatisfaction of how the grades had been calculated and the impact on students’ lives.”

The OSR’s main conclusion is that achieving public confidence in statistical models is “not just about the technical design of the model – taking the right decisions and actions with regards transparency, communication and understanding public acceptability throughout the end to end process is just as important”.

It also concluded that guidance and support for public bodies developing models “should be improved”.

Government has a “central role to play in ensuring that models developed by public bodies command public confidence”, the OSR said.

This “includes directing the development of guidance and support, ensuring that the rights of individuals are fully recognised and that accountabilities are clear”.

‘Limitations’ not fully communicated

The OSR said regulators and exam boards faced “numerous challenges” in developing the system last year, which meant it was “always going to be difficult for a statistical algorithm to command public confidence”.

However, the “limitations of statistical models, and uncertainty in the results of them, were not fully communicated”.

“More public discussion of these limitations and the mechanisms being used to overcome them, such as the appeals process, may have helped to support public confidence in the results.”

And while regulators undertook activities to communicate information about the models to those affected by them and published technical documentation on results day, full details around the methodology to be used “were not published in advance”.

“This was due [to] a variety of reasons, including short timescales for model development, a desire not to cause anxiety amongst students and concerns of the impact on the centre assessed grades had the information been released sooner.

“The need to communicate about the model, whilst also developing it, inevitably made transparency difficult.”

‘Limited professional statistical consensus’

Although regulators drew on expertise in the qualifications and education sector, there was “limited professional statistical consensus on the proposed method”.

The methods were “not exposed to the widest possible audience of analytical and subject matter experts, though we acknowledge that time constraints were a limiting factor in this case”.

There was also “limited public discussion ahead of the release of results about the likely historical patterns in the underlying data and how they might impact on the results from the model”.

Regulators carried out equality impact analyses, which were “based on the premise that attainment gaps should not widen, and their analyses showed that gaps did not in fact widen”.

Despite this analytical assurance, there was a “perception when results were released that students in lower socio-economic groups were disadvantaged by the way grades were awarded”.

“In our view, this perception was a key cause of the public dissatisfaction.”

‘Key lessons’ for government

The OSR said there were “key lessons to be learned for government and public bodies looking to develop statistical models to support decisions”.

It said that for statistical models used to support decisions in the public sector to command confidence, the bodies developing them need guidance and support “to be available, accessible and coherent”.

“Our review has found that there is a fast-emerging community that can provide support and guidance in statistical models, algorithms, AI and machine learning.

“However, it is not always clear what is relevant and where public bodies can turn for support – the landscape is confusing, particularly for those new to model development and implementation.”

Ofqual said it welcomed the OSR’s work to “build public confidence in statistical approaches”, and said the report “recognises the challenging task Ofqual – and our counterparts in Wales, Northern Ireland and Scotland – faced in awarding grades in the absence of exams last summer”.

“We have learned lessons from last summer. We continue to work with other government departments to make data available for wider scrutiny and we recently set out, jointly with the DfE, our approach to awarding grades in 2021, after our largest-ever public consultation.”

The DfE was approached for comment.