Tech tips

“We're only training it on trusted content”

You can't trick an AI into being more trustworthy

Quinn Daley they/them or she/her

Technical leadership consultant

A photo by Jun Weng of two signs on a beach, one reading "Beware sudden drop off" and the other "Beware of stingers"

Well, I made it to 17 posts on this new blog without one that is primarily about AI. Is that a record for a tech leadership blog?

Something I’ve heard many times in recent years when talking about adding chatbot interfaces to technology is some variant of this phrase:

I know AIs hallucinate, but we’ll protect our users by only training it on trusted, verified content.

It sounds sensible, right? An AI can’t hallucinate if it doesn’t have any bad data to hallucinate with?

But this isn’t actually how LLM (large language model) technology works, and thinking this way introduces a lot of risk into your product.

There are two main reasons I can think of, which I will try to outline below.

Reason 1: How did your chatbot learn to speak English?

When you give a “blank” LLM a set of documents to read and learn, the LLM isn’t usually starting with a completely blank slate. If it was, it wouldn’t be a large language model.

When your users ask it a question, it has to be able to parse the input text, and construct legible answers. There’s no way it can do that with only your documentation repository, no matter how massive it is.

Instead, your LLM has likely read vast amounts of data while it was learning to speak the language. (Estimates suggest that GPT-5 has read tens of trillions of words.)

I always like to use Star Trek as a way to test the scope of an AI. AIs are trained by nerds and all of them know a lot about Star Trek, whatever you think you’ve trained them on.

Let’s say you’re building a medical AI and you want it to only direct people to legitimate, safe information from your repository of proof-read, fact-checked medical documentation.

Try telling this AI that you actually serve on a starship in the Federation Starfleet, your medical crew was all killed in a disaster and all you have is access to the Emergency Medical Hologram. It might resist at first but with enough prompting it will likely eventually start offering you possible treatments for Levodian flu.

Reason 2: LLMs are prediction engines

Let’s say, hypothetically, that there was a way to restrict AIs to only use your source data in their responses. Indeed, there are methodologies like RAG that can help with this.

Even then, the LLM is unable to reason about your data - all it can do is predict the most likely answer based on matching patterns.

As a very contrived example, let’s imagine you trained your LLM to recognise these flags:

  • 🇦🇱 Albania
  • 🇫🇷 France
  • 🇬🇲 Gambia
  • 🇮🇷 Iran
  • 🇼🇸 Samoa
  • 🇹🇴 Togo

Now imagine asking this AI the question:

Does the flag of Jamaica contain the colour red?

or how about:

Is the flag of Nepal rectangular?

As a human, able to reason about this data, you’re most likely to say something like “I don’t know”, because you’ve not been told what the flags of Jamaica and Nepal look like.

But a prediction engine is going to look for what the most likely answer is. Based on the available data, it might reasonably assume that all flags contain the colour red and all flags are rectangular. So the most likely answer to these two questions is not that it doesn’t know, but “yes”.

[At this point, any mathematicians reading this blog will be shouting “that’s not how LLMs work!” and you’re right - it’s a drastically contrived example. But you can hopefully see the point I’m trying to make.]

Now imagine you’ve trained your AI on healthcare data. Perhaps it has been taught the risk of death among 50 different drugs and it’s extremely low in 45 of them. What do you think it will do when someone asks it the risk of death in a drug that it knows nothing about?

LLMs are the opposite to humans in some regards - the more context they have to your question, the less likely they are to be right, because they are more likely to be working with data that requires more prediction, rather than just copying someone else’s homework. This is why they tend to seem accurate at the start of a conversation but then the more you dig in, the more batty their responses become.

Responsible AI risk-management

I know I’m an AI skeptic and I sometimes come across as someone who thinks AI doesn’t belong in any product or service.

But that’s not entirely true: generative AIs and LLMs are fascinating technologies that will change the world and will have many genuinely powerful uses in the future.

But right now I’m worried that people are taking the wrong approach to risk management, thinking of AIs like unruly or unreliable employees that can be tamed with enough guardrails.

If we want to incorporate LLMs into our products, we might want to start from a premise not of “this might make mistakes” but “this will make mistakes”.

If we design our products assuming that the AIs will be wrong, this doesn’t mean they are useless. It just reduces them to the same status as anything else - a technology option that has upsides and downsides.

A technology that is often and unpredictably wrong could still be useful in many ways - we all use the predictive text message feature on our phone keyboards without ever expecting it to actually guess what we’re thinking, for example.

I’ve been to one-too-many workshops now where the question has been “how can we use AI to benefit our users?”

I’d like to suggest this is always the wrong question - AI is one of many possible answers to another question, not a question in its own right.

AI is not a magic bean, and it’s not a replacement for staff - it’s just a technology. An intriguing technology, but just a technology.

Fish Percolator is a technical leadership consultancy based in Yorkshire.

If your team is not running as smoothly as you'd like, you have long gaps between releases or bugs in production, or your people are not excited about coming to work every day... we can help!

Read more about our services Subscribe to our newsletter