The patient, a 39-year-old woman, visited the emergency department at Beth Israel Deaconess Medical Center in Boston. Her left knee has been hurting for several days. I had a fever of 102 degrees the day before. I was cured by now, but I still had the chills. And her knee was red and swollen.
what was the diagnosis?
On a recent steamy Friday, resident Dr. Megan Landon presented this real-life case to a room full of medical students and residents. They were brought together to learn a very difficult skill to teach: how to think like a doctor.
“Doctors are bad at telling other doctors how we think,” says Adam Rodman, M.D., an internist, medical historian and event organizer at Beth Israel Deaconess.
But this time, you can ask the experts for help in arriving at the diagnosis of GPT-4, the latest version of the chatbot released by OpenAI.
Artificial intelligence is transforming many aspects of medical practice, and some medical professionals are using these tools to aid diagnosis. Physicians at Beth Israel Deaconess, a teaching hospital affiliated with Harvard Medical School, decided to investigate how chatbots could be used and abused in future physician training.
Lecturers like Dr. Rodman want to be able to use GPT-4 and other chatbots to do what doctors call street visits when medical students pull their colleagues aside and ask for their opinion on difficult cases. The idea is to use chatbots in the same way doctors ask each other for suggestions and insights.
For more than a century, doctors have been portrayed as detectives gathering clues and using them to find criminals. But experienced doctors actually use another method, pattern recognition, to identify what’s wrong. In medicine, this is called the script of the disease. This is the signs, symptoms, and test results that doctors put together to tell a coherent story based on similar cases they know or have seen themselves.
When disease scripts don’t help, doctors turn to other strategies, such as assigning probabilities to different diagnoses that might fit, Dr. Rodman said.
For more than half a century, researchers have tried to design computer programs to make medical diagnoses, but nothing has really worked.
Doctors say GPT-4 is different. “It would create something very similar to the sick script,” Dr. Rodman said. In that respect, “it’s fundamentally different from a search engine,” he added.
Dr. Rodman and other doctors at Beth Israel Deaconess turned to GPT-4 for the potential diagnosis of difficult cases.and study A paper published last month in the medical journal JAMA found that they outperformed most doctors on a weekly diagnostic task published in the New England Journal of Medicine.
But they learned that there are tricks and pitfalls to using the program.
Dr. Christopher Smith, director of the internal medicine residency program at the medical center, said medical students and residents are “definitely using it.” But “whether they’re learning anything is an open question,” he added.
The concern is that we may rely on AI for diagnosis in the same way that we use calculators on smartphones to solve math problems. According to Dr. Smith, it’s dangerous.
Learning, he said, involves trying to understand things. Part of learning is struggle. When he outsources his learning to GPT, the pain goes away. “
At the meeting, students and residents were divided into groups to try to figure out what was wrong with patients with swollen knees. So they turned to his GPT-4.
The group has tried different approaches.
One person used GPT-4 to perform Internet searches the same way they use Google. The chatbot spat out a list of possible diagnoses, including trauma. However, when a member of the group asked him to explain why, Bott was disappointed and explained his choice by stating that “trauma is a common cause of knee injuries.”
Another group came up with possible hypotheses and asked GPT-4 to confirm them. The chatbot’s list matched that of groups such as infectious diseases, including Lyme disease. Includes gout, which is arthritis (a type of arthritis involving crystals in the joints). and trauma.
GPT-4 added rheumatoid arthritis to the top of the list of possibilities, but it was not high on the list of groups. Instructors later told the group that gout was unlikely because the patient was young and female. Also, only one of her joints was inflamed, and only for a few days, so rheumatoid arthritis could probably be ruled out.
As an off-site consultation, GPT-4 seems to have passed the test, or at least got the consent of the students and residents. But this exercise offered neither insight nor a sick script.
One reason is believed to be that students and residents used the bot more like a search engine than a point-of-sale consultation.
To use the bot correctly, instructors say you need to start by telling GPT-4 something like, “You are a doctor who sees a 39-year-old woman with knee pain.” Next, the doctor should ask for her diagnosis, just as he does with her colleagues, listing her symptoms before continuing with questions about the bot’s reasoning.
Instructors say this is a way to harness the power of GPT-4. But it’s also important to recognize that chatbots can make mistakes, “hallucinate”, or provide answers that aren’t really based. To use it, you need to know when it’s wrong.
“There’s nothing wrong with using these tools,” said Dr. Byron Crowe, an internist at the hospital. “You just have to use it the right way.”
He told the group an analogy.
“Pilots use GPS,” Dr. Crowe said. But airlines “set very high standards when it comes to reliability,” he added. In healthcare, the use of chatbots is “very attractive,” but the same high standards should apply, he said.
“It’s a great thought partner, but it’s no substitute for deep spiritual expertise,” he said.
At the end of the session, the instructor revealed the real reason why the patient’s knee was swollen.
This turned out to be a possibility that various groups had considered and GPT-4 proposed.
She had Lyme disease.
Olivia Allison contributed to the report.