In its lawsuit, Reddit said Anthropic had also declined to enter into a licensing agreement for data and had unjustly enriched itself at Reddit’s expense.
Reddit sued Anthropic on Wednesday, accusing the artificial intelligence start-up of unlawfully using the data of Reddit’s more than 100 million daily users to train its A.I. systems.
The lawsuit, which was filed in the Superior Court of California in San Francisco, claimed that Anthropic had obtained access or tried to obtain access to Reddit data more than 100,000 times, in a breach of the online platform’s content policies. Anthropic also declined to enter into a licensing agreement for the data, the lawsuit said, and unjustly enriched itself at Reddit’s expense.
“We will not tolerate profit-seeking entities like Anthropic commercially exploiting Reddit content for billions of dollars without any return for redditors or respect for their privacy,” Ben Lee, Reddit’s chief legal officer, said in a statement. “A.I. companies should not be allowed to scrape information and content from people without clear limitations on how they can use that data.”
A spokeswoman for Anthropic did not immediately provide a comment.
The lawsuit was the latest clash over the use of digital data by A.I. companies amid a heated race to develop the technology. For years, A.I. companies have gobbled up as much data across the internet as possible to fine-tune their systems, which rely on information to improve the responses they generate. But those data sources are quickly drying up, as companies lock down more of their data to keep it from being used without permission.
Reddit, which is 20 years old and went public last year, is among a generation of social media companies with a wealth of user-generated conversational data. The site is a kind of social message board, where people can hold discussions on any number of topics, from dogs to television shows to cryptocurrency.
In recent years, Reddit executives began realizing just how valuable the company’s data was to the rest of the industry. Steve Huffman, Reddit’s chief executive, began talking to companies like Google and OpenAI to potentially strike licensing deals. Reddit eventually reached deals with Google and OpenAI for access to Reddit public conversation data to train their A.I. systems for a fee.