Critical Evaluation of Code: Principles of Usability, Efficiency, and Accuracy (CPD Case Study)

Friday 10 October 2025

Dr Dan Lucas (School of Maths and Statistics)
[email protected]

What motivated you to use AI in this module, and what goals or challenges were you aiming to address?
AI has revolutionised the way many people write computer code, a simple query to an LLM will provide quite good Python code for a range of mathematical problems. However, AI is known to hallucinate and give wildly incorrect responses when the prompt lies a long way from the training data. For this reason, it is important for students in mathematical computing to be aware of these limitations and be able to recognise when an AI tool has provided a poor response. This requires a strong foundation in the topic, and an awareness of how generative AI works in practice.

How did you design or adapt the assessment, and how did you prepare students for using AI appropriately?
In this module we have introduced a lecture on generative AI, covering some of the basics of large language models but focussing on the strengths and weaknesses of them as tools in this area. We demonstrate, by example, where AI does well with “textbook” examples but struggles on more sophisticated problems or where the prompt is vague. The students then have an assessed tutorial where they are asked to critique various codes, some AI generated, some human generated, under the lens of “best practice” and accuracy.

Additionally, to reduce student’s ability to perform well in this module while relying too heavily on AI, we have introduced more free-text response questions into assessments, which test the student’s understanding and so “down-weight” the mark given to the production of the code. In project work we have introduced open-ended questions which are excellent at assessing the higher level “synthesis” type of learning as students are prompted to move beyond the scope of the structured questions and provide their own investigation into the topic. Some suggestions are provided but it is up to the individual students to decide on and design their study. Again, marks are provided in the grading criteria for communicating their understanding effectively.

What challenges did you encounter, and how did you address them?
Our main challenge has been the pace at which LLMs have advanced year-on-year. New examples need to be generated each time the course runs and finding interesting and relevant cases of LLMs failing with introductory mathematical computing problems is getting quite difficult. We have to be creative with our prompts and the examples that we use.

There is quite a significant workload in both setting and marking assessment which is robust to AI misuse while also giving students the opportunity to critique it. We do employ some auto-grading tools to reduce the burden on marking the code, but these tools are not able to be used on the open-ended questions.

What benefits did you see for students and for your own teaching practice?
We have observed that students have become more cautious and critical when using AI for their studies and we hope that even where students do find some value in using these tools that they are doing so more informed and better equipped to critique them.
We believe that our approach to reducing the effectiveness of AI has created more authentic assessments which differentiate the students more effectively than previous versions which were more formulaic.

How did you evaluate the usefulness of this assessment to ensure that it reflected the desirable learning outcomes?

Compared to other tutorials on the module, the one critiquing AI typically has a mark distribution which is more consistent with other modules. In other words, this tutorial seems to stretch students better than some of the more “fundamental” tutorials the students see early in the module. This indicates to us that the questions are demanding about the correct level of understanding of the students.

What would you do differently next time, and what advice would you give to colleagues?

In the future we may consider a more direct approach to generative AI in the longer-form projects, for instance some allowance for its use, but with some declaration about what was used and a reflective submission about how the students thought that it helped them.

Our advice to colleagues would be that, in some circumstances, it is not necessary to abandon coursework in the age of AI. While a carefully designed project does require more work in setting and marking, there are significant benefits.

Do you have example materials (e.g., task brief, rubric, guidance documents) you’d be willing to share?
Yes – see here – Tutorial Example

Are there any references or resources that informed your approach?
The TALMO workshops have been a helpful resources. DL attended and presented at the workshop “Rethinking Assessment in the Mathematical Sciences in Times of Generative AI” hosted by the Mathematics Department in Glasgow, which was also valuable in shaping our thinking about assessments.

Posted in