Model collapse might break the internet

What happens when AI models cannibalize their own data? You get "model collapse."

Feb 20, 2025

∙ Paid

*Not* actually a peacock (left), despite what your AI search tells you.

Have you run an internet search lately? Chatted with a customer service ‘bot? Asked a search engine to find you a picture of something? You’re using an AI, whether you know it or not — which begs the question, can you trust the answer?

There’s a really interesting ShortWave podcast about what happens when AI cannibalizes its own data.1

Most AI you’ll run into today falls into the large language model (LLM) category. When you ask Claude or ChatGPT a question, you’re using an LLM. You’re also using an LLM when you ask, for instance, “what does a baby peacock look like?”

How do you know if the answer you get is correct? Or, on the other hand, if it’s been totally made up by the AI?

That’s a problem AI engineers have been struggling with — and the problem could become even more challenging to solve.

AI models are trained on huge data sets. Think about it this way: a brand new LLM needs data — so, where’s the biggest source of …

Keep reading with a 7-day free trial

Subscribe to Customer Obsessed Engineering to keep reading this post and get 7 days of free access to the full post archives.