AI Companies Seeking AI-Produced Data for Recursive Training

AI companies such as Microsoft, OpenAI, and Cohere seem to be doing everything in their power to find synthetic data to train their AI products. Given the limited availability of “organic” human-generated data on the World Wide Web, these companies aim to use AI-generated (synthetic) data in a kind of infinite loop. , training is done on data that has already been generated generatively.

“It would be great if we could get all the data we need from the web,” said Aidan Gomez, chief executive of $2 billion LLM startup Cohere. To the Financial Times. “In practice, the web is so noisy and messy that it can’t really represent the data we need. The web doesn’t do everything we need.”

