AI and data security – let’s worry about the right things 

Instead of fixating on training data risks, we should focus on broader security concerns

Instead of fixating on training data risks, we should focus on broader security concerns

7 Mar 2025, 5:50

Do AI chatbots learn from everything we enter? That’s probably the most common question I’m asked when it comes to AI and data security.  

I can see the logic in asking this. Because AI chatbots talk to us in a human-like way, it’s easy to assume that they also learn in a human-like way. There’s also a tendency to assume that, just like a human, not only might they learn from a conversation, but they are probably awful at keeping secrets and might well share the information they have learnt with many other users. 

Why does this matter? Because I think it’s driving how we approach AI and security, with people often focusing too much on the very limited risks around AI models and training data, and not enough on other, more pressing AI security issues, such as the overall security of the tools they are using. 

At one end of the spectrum this is causing users to think any system is safe if it says it’s not using interactions for training, and at the other end, leading to the idea that developing in-house solutions to avoid data being used for training is the most effective way to get a secure AI system

AI and training data 

To understand this, it’s good to think a little more about the data involved. The data used to train large language models is typically scraped from the internet, is licensed from big media companies, or includes books obtained by potentially unethical means.  As an aside, that means any data we intentionally or otherwise make available on the internet could be used to train AI models – it’s not just about interactions with AI tools. 

Language models are trained infrequently, as each training run can cost millions of pounds. Their model isn’t updated as we use them. 

As these are language models, the companies behind them are looking for high-quality texts, so things we tend to type into AI tools generally aren’t useful as training data. They are random chats, fragments of text, etc.   

The companies will ask for permission to use our data in the terms and conditions primarily to understand how we use the tool, to help them improve its overall performance.  

But even if our text did find its way into the training set, the chance of it being output by the model is close to zero. That’s because the size of the training set is vast—typically 10 trillion words. To put this into context, if you started reading it today, it would take over 120,000 years to read it all! (and yes, I should confess, ChatGPT helped me work that out).  

Language models aren’t knowledge models, so they can’t look things up from the training set. Instead, each extra bit of data has a tiny, tiny impact on the overall output, which, in the end, is just a text prediction. It’s partly why we don’t see lots of personal data revealed by tools such as ChatGPT and Gemini. 

Data security 

So, does that mean we don’t need to worry about using AI tools at all from a security perspective?   Absolutely not!  

Legally we must prevent our staff and students’ private data from being used for purposes other than those necessary and contractually agreed. But we also need to understand what the actual risks are, and it’s much more about general data security.  

The biggest risks are actually around data being poorly secured or shared with other third parties. Only a few weeks ago security researchers found a basic security flaw that meant anyone could access Chinese chatbot DeepSeek’s database and view all the chat history. 

It’s tempting perhaps, to think the best way to get secure AI is to create or host your own solutions. But securing software is really hard! And don’t forget, our word processors and spreadsheets, for example, handle our most sensitive data, but we don’t build or host our own in-house versions of them. We manage the risks through contracts, user training, policies and technical controls. 

So, what’s the best approach to using AI safely and securely? It’s simple. Don’t think of AI systems as gossipy friends eager to spill your secrets. Instead, treat them like any other IT system—where security and contracts matter most. 

Contracts are key, and if you only take away one thing from reading this it should be to use AI systems with a robust agreement in place, ensuring responsible and secure data handling. 

Latest education roles from

Principal & Chief Executive – Bath College

Principal & Chief Executive – Bath College

Dodd Partners

IT Technician

IT Technician

Harris Academy Morden

Teacher of Geography

Teacher of Geography

Harris Academy Orpington

Lecturer/Assessor in Electrical

Lecturer/Assessor in Electrical

South Gloucestershire and Stroud College

Director of Management Information Systems (MIS)

Director of Management Information Systems (MIS)

South Gloucestershire and Stroud College

Exams Assistant

Exams Assistant

Richmond and Hillcroft Adult & Community College

Sponsored posts

Sponsored post

Keeping it real – enriching T Level teaching with Industry Insights

T Level teachers across all subjects are getting invaluable support from the Education and Training Foundation’s (ETF) Industry Insights...

Advertorial
Sponsored post

The Role of Further Education Colleges in Bridging the UK’s Digital Skills Gap 

In today's rapidly evolving digital landscape, the UK faces a pressing challenge: a significant shortage of digital skills within...

Advertorial
Sponsored post

Apprenticeships are for life, not just National Apprenticeship Week

National Apprenticeship Week is one of the awareness events that we all mark in our calendars. It’s a hive...

Advertorial
Sponsored post

Functional Skills reimagined: Drive success in English & Mathematics with modern qualifications.

In today’s educational landscape, supporting learners with essential English and maths skills goes beyond traditional teaching. It’s about providing...

Advertorial

More from this theme

AI, Colleges

Ministers plan to appoint edtech evidence checkers

Experts to scrutinise classroom impact of technology tools as part of new AI training package for teachers worth up...

Lucas Cumiskey
AI, Ofsted

Ofsted to explore how AI can help it make ‘better decisions’

Exams regulator Ofqual also publishes AI strategy, revealing 'modest numbers' of coursework malpractice

Samantha Booth
AI, Apprenticeships

Higher-level apprenticeships ‘most exposed’ training route to AI advancement

Education jobs also among the most affected occupation by artificial intelligence

Anviksha Patel
AI

Minister wants education providers to benefit from AI revolution

Data privacy experts consulted over future use of pupil data by artificial intelligence

Freddie Whittaker

Your thoughts

Leave a Reply

Your email address will not be published. Required fields are marked *