ChatGPT fools scientists more than 1/3 of the time

Ever since OpenAI released its chatbot ChatGPT in November, it’s been making waves and attracting a huge number of users. (It must be a huge number. I haven’t been able to log in for a week because it’s always over capacity.) Some researchers have been putting the large language-model system through its paces and performing tests to see how effective or potentially dangerous it is. In one experiment, researchers asked a group of scientists to evaluate a number of research paper abstracts. Some of them were written by other scientists and post-graduate research students while others were generated by ChatGPT after being prompted to create such an abstract on a variety of scientific subjects. They aren’t saying that the chatbot is better than or even as good as a top human scientist, but the journal Nature reports that the scientists were unable to distinguish ChatGPT’s work from that of humans roughly one-third of the time.

An artificial-intelligence (AI) chatbot can write such convincing fake research-paper abstracts that scientists are often unable to spot them, according to a preprint posted on the bioRxiv server in late December¹. Researchers are divided over the implications for science.

“I am very worried,” says Sandra Wachter, who studies technology and regulation at the University of Oxford, UK, and was not involved in the research. “If we’re now in a situation where the experts are not able to determine what’s true or not, we lose the middleman that we desperately need to guide us through complicated topics,” she adds.

The chatbot, ChatGPT, creates realistic and intelligent-sounding text in response to user prompts. It is a ‘large language model’, a system based on neural networks that learn to perform a task by digesting huge amounts of existing human-generated text.

The abstracts were all based on medical science reports taken from a selection of top journals including the New England Journal of Medicine. The scientists participating in the test didn’t just read the abstracts and offer an opinion on the quality of the work. They ran the abstracts through a plagiarism detector and an AI-output detector. (Not that I was aware that we have such things.)

So the main point here is that ChatGPT isn’t just good enough to fool human experts. It was able to regularly (though not consistently) fool the software used to identify work created by other AI models or by humans relying on the plagiarized work of others. That is of concern for a couple of reasons. We’re already seeing some schools ban access to devices capable of using ChatGPT over concerns of cheating by students. Universities rely on detection software to catch cheaters, but it looks like ChatGPT is quickly approaching the point where those tools will be useless.

ChatGPT does still make mistakes at times, to be sure. A friend of mine recently asked it to create a summary of the career of a relatively famous person. It produced four paragraphs of perfectly valid, applicable information composed in a quite professional style. It then added a fifth paragraph that contained a blatantly, provably false claim. My friend was able to immediately pick up on that being something of an expert in that field, but many people might not have.

To be clear, ChatGPT wasn’t trying to “sneak one past” my friend. It doesn’t “know” that it’s putting out false or inaccurate information. It’s simply finding matches and connections in its massive database. Somewhere out there it found an article with some incorrect information in it and elected to weave that into the resulting output. It’s garbage in, garbage out, as with all things in computing.

These types of errors cause concerns for publishers as well. There are already some online publishers putting out brief news summaries generated by AI in this fashion with only light checking by a human editor. This approach allows them to produce more content (and advertising revenue) faster without having to pay a human writer. Yes, humans make errors sometimes and that’s why we have editors. But ChatGPT gets things wrong as well. And if it’s not being checked as closely as a human writer based on the assumption that it won’t make mistakes, the quality of the output will eventually decrease.

Nobody seems to know what to do about this, however. It seems that it’s too late to put this particular genie back in the bottle. The chatbots have arrived, for better or worse, and they will likely only become all the more ubiquitous over time.

ChatGPT fools scientists more than 1/3 of the time

Join the conversation as a VIP Member

Trending on HotAir Videos

ChatGPT fools scientists more than 1/3 of the time

Join the conversation as a VIP Member

Trending on HotAir Videos

TRENDING ON TOWNHALL MEDIA