Technology

Would Large Language Models Be Better If They Weren’t So Large?

When it comes to artificial intelligence chatbots, bigger is usually better.

Large language models, such as ChatGPT and Bard, generate original conversational text and thus improve as more data is fed. Every day, bloggers take to the Internet to see how the latest advances — apps that summarize articles, AI-generated podcasts, and fine-tuned models that can answer any question about professional basketball — are “changing everything.” I’m explaining.

But creating larger, more capable AI requires processing power that most companies don’t have, and a small group including Google, Meta, OpenAI, and Microsoft have made the technology nearly perfect. There is growing concern that it will come to control.

Also, the larger the language model, the harder it is to understand. These are often described as “black boxes,” even by designers, and key figures in the field have expressed concern that the goals of AI will ultimately not align with our own. increase. If bigger is better, it’s also more opaque and more exclusive.

In January, a group of young academics working in natural language processing, a field of AI focused on language understanding, issued a challenge to upend this paradigm. The group called on teams to create functional language models using datasets less than 1/10,000 of his size used in state-of-the-art large-scale language models. Successful mini models will be much smaller, more accessible, and more compatible with humans, while having roughly the same functionality as high-end models. This project is called his BabyLM Challenge.

“We encourage people to think small and build efficient systems that more people can use,” said Aaron Mueller, a computer scientist at Johns Hopkins University and organizer of BabyLM. I ask them to be more focused,” he said.

Alex Wallstadt, a computer scientist at ETH Zurich and another of the project’s organizers, said, “The challenge isn’t so much about ‘how big can we make the model’ as it is about human language learning. It raises questions about the Be at the center of the conversation. ”

A large language model is a neural network designed to predict the next word in a given sentence or phrase. They are trained for this task using a corpus of words collected from transcripts, websites, novels, and newspapers. A typical model makes guesses based on example phrases and adjusts according to how close it gets to the correct answer.

By repeating this process over and over, the model builds a map of how words relate to each other. In general, the more words the model is trained on, the better the model will be. Every phrase provides context to the model, and more context translates into a more detailed impression of each word’s meaning. Released in 2020, OpenAI’s GPT-3 was trained on 200 billion words. DeepMind’s chinchilla, released in 2022, has been trained on 1 trillion animals.

For Ethan Wilcox, a linguist at ETH Zurich, the fact that something other than humans can generate language presents an exciting opportunity. Could AI language models be used to study how humans learn languages?

For example, chauvinism, an influential theory dating back to Noam Chomsky’s early work, argues that humans have an innate understanding of how language works, which allows them to learn language quickly and efficiently. increase. But language models also learn languages ​​quickly, but they don’t seem to have an innate understanding of how languages ​​work. So perhaps chauvinism is unconvincing.

The challenge is that language models learn very differently from humans. Human beings have bodies, social lives, and rich senses. You can smell the mulch, feel the feathers of the feathers, hit the door, taste the peppermint. Early on, we are exposed to simple spoken language and syntax that are often not expressed in writing. Dr. Wilcox therefore concluded that computers that generate language after being trained on hundreds of millions of written words can only know so much about our own language processes.

But if language models are exposed only to words that young humans encounter, they may interact with language in ways that can address specific questions we have about our abilities. .

So Dr. Wilcox, Mr. Mueller, and Dr. Walstadt, along with six colleagues, devised the BabyLM challenge to bring language models a little closer to human comprehension. In January, they recruited a team to train a language model on the same number of words a 13-year-old would encounter (about 100 million words). Candidate models are tested for how well they can generate and recognize linguistic nuances, and a winner is declared.

McGill University linguist Eva Porterance faced this challenge the day it was announced. Her research straddles the blurred line between her science of computers and linguistics. Her first foray into AI in the 1950s was sparked by a desire to model human cognitive abilities in computers. The basic unit of information processing in AI is the “neuron”, and language models from the 1980s and early 90s were directly inspired by the human brain. ‌

But as processors became more powerful and companies began to work on developing marketable products, computer scientists decided that rather than forcing language models into psychologically-informed structures, I’ve found it’s often easier to train language models based on volume data. As a result, Dr. Porterance said, “They give us human texts, but there is no relationship between us and how they work.”

For scientists interested in understanding how the human mind works, these large-scale models offer limited insight. It also requires an enormous amount of processing power that few researchers have access to. “Only a few industry labs with vast resources can afford to train models with billions of parameters for trillions of words,” Wilcox said.

“Or you can even load them,” added Mueller. “That’s why research in this area feels a little less democratic these days.”

Dr. Porterance said the BabyLM Challenge can be seen as a step away from the arms race for bigger language models and a step towards a more accessible and more intuitive AI.

The potential of such research programs is not ignored by large industrial laboratories. Sam Altman, CEO of OpenAI, said: recently said Increasing the size of the language model will not provide the improvements we have seen in the last few years. Companies such as Google and Meta are also investing in research into more efficient language models based on human cognitive structures. After all, models that can produce language when trained on less data may also scale up.

Whatever benefits BabyLM’s success may bring, the goal will be more academic and abstract for those who take on this challenge. Even the awards defy reality. “I’m just proud,” Dr. Wilcox said.

Related Articles

Back to top button