When A.I. Chatbots Hallucinate – The New York Times

admin May 1, 2023

0 4 minutes read

When did the New York Times first report on “artificial intelligence”?

According to ChatGPT, it was July 10, 1956, in an article entitled “Machines Will Learn, Solve Problems, Scientists Can Predict” and about an important conference at Dartmouth College. The chatbot added:

1956 conference was realThe article was not. ChatGPT simply made it. ChatGPT can sometimes not only get things wrong, but also fabricate information. name and date. medical explanation. book plot. Internet address. Even historical events that never happened.

When ChatGPT was recently asked how James Joyce and Vladimir Lenin first met (there is no evidence that they met), they replied:

Such forgeries are common. As the tech industry races to develop new AI systems, understanding why chatbots make things up and how to solve them has become one of the most pressing problems facing researchers. increase.

Chatbots like ChatGPT are used by hundreds of millions of people for an ever-wider range of tasks, including email services, online tutors, and search engines. And it has the potential to change the way people interact with information. However, there is no way to guarantee that these systems will produce accurate information.

Called generative AI, the technology relies on complex algorithms that analyze how humans put words together on the internet. It does not determine what is true and what is not. This uncertainty raises concerns about the reliability of this new breed of artificial intelligence, raising questions about how useful it will be until the problem is resolved or controlled.

The tech industry often refers to this inaccuracy as “hallucinations.” But for some researchers, “hallucination” is too much of a euphemism. Even researchers at technology companies worry that people will become overly reliant on these systems, relying on medical and legal advice and other information they use to make everyday decisions. I’m here.

Subbarao Kambhampati, professor and artificial intelligence researcher at Arizona State University, said:

ChatGPT wasn’t the only company to mention AI for the first time in The Times. Both Google’s Bard and Microsoft’s Bing chatbot repeatedly provided inaccurate answers to the same question. Although false, the answer seemed plausible because it obscured and confused people, events, and ideas.

Microsoft’s Bing cited its findings in a realistic-looking web address for The Times’ website.

According to The Times archives, the chatbot was all wrong. They cited non-existent articles.Covering early research on thinking machines 1930’sIt wasn’t until 1963 that the Times published the first article containing the term “artificial intelligence.”

Google spokesperson Jennifer Rodstrom said: “These are top priorities for us as we continue to fine-tune Bard.”

Like Google, Microsoft and OpenAI say they are working to reduce hallucinations.

New AI. The system is “built to be compelling, not truthful,” says an internal Microsoft document. “This means that even though the output looks very realistic, it may contain statements that are not true.”

Chatbots are powered by a technology called Large Language Models (LLM). LLMs learn skills by analyzing large amounts of digital text collected from the Internet.

By identifying patterns in that data, LLM learns to do one specific thing. It’s about guessing the next word in a string of words. It works like a stronger version of the autocomplete tool. Given the sequence “The New York Times is a ____”, you might guess “newspaper”.

The internet is so full of false information that technology learns to repeat the same lies. It can also be invented by chatbots. Combine billions of patterns in unexpected ways to generate new text. This means that learning from exact text alone may produce something that is not.

These systems learn from more data than humans can analyze, so even AI experts can’t understand why they generate a particular set of text at a particular moment. If he asks the same question twice, a different text may be generated.

This complicates the task of fact-checking and improving results.

Bard said in one chat:

Bird then said in another chat:

Companies such as OpenAI, Google, and Microsoft have developed methods to improve accuracy. For example, OpenAI seeks to improve its technology based on feedback from human testers.

When people test ChatGPT, they rate the chatbot’s responses and separate answers that are helpful and truthful from those that aren’t. Then, using a technique called reinforcement learning, the system spends weeks analyzing the ratings to better understand fact and fiction.

A new version of ChatGPT called ChatGPT Plus (available for a $20/month subscription) consistently avoided answering questions about the first mention of artificial intelligence in The Times. This may be the result of reinforcement learning or other changes to the system applied by OpenAI.

Microsoft built the Bing chatbot on top of OpenAI’s underlying technology, called GPT-4, and has layered other methods to improve accuracy. The company uses his GPT-4 to compare chatbot responses to underlying data and evaluate model performance. In other words, Microsoft uses AI to make AI better.

The company is also looking to leverage traditional Internet search engines to improve chatbot responses. When you enter a query into the Bing Chatbot, Microsoft performs an Internet search on the same subject, compiles the results into a query, and sends it to the bot. Sarah Bird, responsible for her AI efforts at Microsoft, says that editing queries can push the system to produce better results.

Google uses similar methods to improve the accuracy of its Bard chatbot. It uses human feedback to improve how the system works, and uses information from Google’s search engine to “ground” the system, said Eli Collins, Google’s vice president of research. .

Byrd said Microsoft does not check the accuracy of bot responses in real time, but is investigating how to do so. Check the accuracy of a small fraction of the results after the fact and use that analysis.

However, being more accurate may also have a downside. Recent research papers From OpenAI. As chatbots become more trustworthy, users may trust them too much.

“Counterintuitively, hallucinations can become more dangerous as models become more truthful, as users build trust when models provide truthful information in familiar domains,” the paper said. said.

Steve Lohr and Nico Grant contributed to the report. Jack Begg and Susan C. Beachy Contributed to research.