Model collapse, according to Wikipedia refers to refers to “the gradual degradation in the output of a generative artificial intelligence model trained on synthetic data, meaning the outputs of another model (including prior versions of itself)”.
It is an interesting and realistic concept. Generating output based on a model gives the ability to generate near infinite amount of output based on a limited original data sample. At the moment, we are in the outset of Large Language Models (LLMs) churning out new content based on human created data. But as more and more people turn to using these models to generate content, eventually a larger portion of the content out there will be AI generated… and it will all be based on the original content.
AI does not have a way to measure the truth or the value of content. Instead depending on the human generator or a reader caring enough to correct it if necessary. This is unlikely to happen when the content generation keep increasing exponentially as the low barrier to entry bring in people who hope to get some monetary or other benefit off it.
It will certainly be interesting to see how this all works out over the next few decades. The outcome I assign a high probablility is for a clear segragation of tasks that will work with LLMs and the need for different AI models for other tasks. LLMs are fine for things like generating basic communication, summarization and areas where the information is for personal consumption. Generating new information needs models that can follow a process of experimentation or observation through senses that is different from the current basic pattern matching. My belief is, as human conciousness is largely shaped by the 5 senses, we would need to better integrate all these inputs in to models so it can operate and evolve in a sandbox representative of the real world, or a world where we seek understanding of.
My thoughts are based on the following article which descibes the attitudes in the research community on if model collapse is inevitable.