Artificial intelligence (AI) tools can pass almost all types of level 3 assessments, a new study has found.
The Open University found that AI performed “particularly highly” at this advanced level across a range of subjects, although its performance was lower at higher levels of 4 and above.
It also found that while markers’ ability to detect generative AI answers increased after training, this was undermined by an increased number of false positives.
Jonquil Lowe, senior lecturer in economics and personal finance, said that rather than focusing on detection, colleges and universities should use AI to design more “robust questions” that focus on the “added value” that humans bring.
He added: “This shifts us away from merely testing knowledge, towards what is often called ‘authentic assessment’ that requires explicit application of what has been learned in order to derive specific conclusions and solutions.”
The study confirms fears raised in a recent FE Week investigation, that students can ‘cheat’ their way through almost any non-exam assignment by using large language models of generative AI such as ChatGPT.
It also addresses concerns that AI detection tools are unreliable, giving rise to false accusations and a breakdown of trust between educators and students.
The study, funded by awarding body and education charity, NCFE, analysed generative AI’s performance by asking a group of 43 markers to grade almost one thousand scripts and to flag those they suspected were AI-generated.
A review of the results found that the most robust assessment types were audience-tailored, observation by learner and reflection on work practice.
While the study found that subjects did not affect AI’s performance, certain disciplines such as law, were easier to detect.
False positives emerged as “hallmarks” of AI-generated scripts, such as superficial answers or not focusing on the question, are also common in weaker students’ work.
The study recommends that institutions designing assessments should focus on question design marking guidance and student skills interventions rather than detecting AI misuse.
When students are identified as using AI in their assessments, institutions should focus on helping them develop their study skills.
Training for dealing with generative AI in assessments should also be ongoing.
Gray Mytton, assessment innovation manager at NCFE, said: “This report highlights the challenges in detecting genAI misuse in assessments, showing that training markers to spot AI-generated content can lead to an increase in the rate of false positives.
“To address this, educators could help students develop study skills, including genAI use where appropriate, while awarding bodies can focus on creating more authentic assessments, which will also benefit learners as they enter the workforce.”
Read the full report here.
Your thoughts