TruthHoundTruthHound
— Blog

Why AI detectors get worse over time, and what to do about it

TruthHound TeamApril 20265 min readIndustry

If you bought an AI-image-detection tool in 2023, it was probably accurate at launch and is probably useless now. This is not a defect in the specific tool you bought. It is a structural property of detection systems trained against generative models. Detection has a half-life, and the half-life is getting shorter.

This post explains why, and what we do about it at TruthHound. It's the most honest thing we can write about our own product.

The problem in one sentence

Detection models learn to spot the artefacts of current generative models. Generative models are updated specifically to remove those artefacts. The detection model's accuracy on the new generation drops, often sharply. The cycle takes weeks, not years.

A short history

In late 2022, GPT-detection tools were briefly very accurate. By mid-2023, they were near-random on text written with GPT-4. The image equivalent: late-2023 detection tools were good at Stable Diffusion 1.5 outputs and bad at SDXL outputs by early 2024. By late 2024, the same tools were unreliable on Midjourney v6. Each generation of generator pushed the detector below the threshold of usefulness within months.

Why this happens

Generative model training, increasingly, includes adversarial passes against published detection methods. If a detector flags a specific frequency-domain pattern, the next generation of the generator is trained to suppress that pattern. The cat-and-mouse is not metaphorical — it's literally the optimisation loop.

The other factor is data drift. A detector trained on 2024 outputs of a model has, baked into its weights, the assumption that the model behaves a certain way. When the model is retrained — even on the same architecture — the detector's assumptions go stale.

What honest detection looks like

There are three things any detection product should do, and most don't:

Publish an accuracy curve over time, not a single number. A "98% accurate" claim with no date is meaningless. Accuracy on what generation, on what dataset, evaluated when? The honest version is a graph that shows how the model has performed against each new generator release.

Retrain frequently and document it. A detection model that hasn't been retrained in six months is, almost certainly, much less accurate than its launch claims. The retraining cadence matters more than the headline accuracy number.

Return confidence scores, not yes/no verdicts. Any detector that returns a binary "AI" or "not AI" is hiding uncertainty that the user needs to see. A confidence score lets the user combine the model's signal with their own judgement, which is the only sustainable way detection works.

What we do at TruthHound

We retrain our image-authenticity model every two weeks against the latest releases of the major generators (Midjourney, Stable Diffusion, DALL-E, Flux, and the long tail of open-source variants). We also include adversarial samples — images run through compression pipelines designed to defeat detection — because that's what scammers actually use.

Our voice-cloning detector is on a slower cadence (every six weeks) because the major voice-synthesis platforms release less frequently. We document each retraining in our internal log; the per-generator accuracy numbers are available on request to enterprise customers.

We return confidence scores between 0-100, not verdicts. We tell users which specific signals triggered. We tell them when the confidence is low and they should not rely on it. We do this not because it's a marketing differentiator but because the alternative — a polished yes/no UI that hides the uncertainty — would mean lying to users about a problem we can't actually solve cleanly.

Why this matters for the broader ecosystem

A lot of trust-and-safety infrastructure is being built right now on the assumption that AI detection is a solved problem. It is not. It will not be. The specific implication is that any system architecture which relies on a "this is AI / this is not" classifier as a hard gate is going to fail, because the classifier will degrade silently between the day it was deployed and the day someone notices.

The architectures that actually work treat detection as one signal among several — combined with provenance metadata (C2PA), behavioural patterns, cross-image consistency, and human review for high-stakes decisions. Detection is necessary. It is not, and will never be, sufficient.

What this means for you

If you use any AI-detection tool — ours included — assume its accuracy is decaying and verify the publication date and last-retrain date before you trust the result. If the tool doesn't publish those dates, don't use it.

For our part: we will keep retraining, keep publishing, and keep returning confidence scores instead of verdicts. It is the only honest version of this product. If a competitor tells you they've solved AI detection, ask them when they last retrained. The answer will tell you everything you need to know.

— What now

See pricing

From £14.99 for a one-off Single Verdict, or £19.99/month for ongoing Watch.