A new study showed people real restaurant reviews and ones produced by A.I. They couldn’t tell the difference.
The White Clam Pizza at Frank Pepe Pizzeria Napoletana in New Haven, Conn., is a revelation. The crust, kissed by the intense heat of the coal-fired oven, achieves a perfect balance of crispness and chew. Topped with freshly shucked clams, garlic, oregano and a dusting of grated cheese, it is a testament to the magic that simple, high-quality ingredients can conjure.
Sound like me? It’s not. The entire paragraph, except the pizzeria’s name and the city, was generated by GPT-4 in response to a simple prompt asking for a restaurant critique in the style of Pete Wells.
I have a few quibbles. I would never pronounce any food a revelation, or describe heat as a kiss. I don’t believe in magic, and rarely call anything perfect without using “nearly” or some other hedge. But these lazy descriptors are so common in food writing that I imagine many readers barely notice them. I’m unusually attuned to them because whenever I commit a cliché in my copy, I get boxed on the ears by my editor.
He wouldn’t be fooled by the counterfeit Pete. Neither would I. But as much as it pains me to admit, I’d guess that many people would say it’s a four-star fake.
The person responsible for Phony Me is Balazs Kovacs, a professor of organizational behavior at Yale School of Management. In a recent study, he fed a large batch of Yelp reviews to GPT-4, the technology behind ChatGPT, and asked it to imitate them. His test subjects — people — could not tell the difference between genuine reviews and those churned out by artificial intelligence. In fact, they were more likely to think the A.I. reviews were real. (The phenomenon of computer-generated fakes that are more convincing than the real thing is so well known that there’s a name for it: A.I. hyperrealism.)
Dr. Kovacs’s study belongs to a growing body of research suggesting that the latest versions of generative A.I. can pass the Turing test, a scientifically fuzzy but culturally resonant standard. When a computer can dupe us into believing that language it spits out was written by a human, we say it has passed the Turing test.