Reddit Sues Perplexity Over Alleged Data Scraping

Reddit sues Perplexity for allegedly scraping user content to power AI models, highlighting tensions over data ownership and privacy.

5 min read
Reddit Sues Perplexity Over Alleged Data Scraping

Reddit Sues Perplexity and Data Scrapers Over Alleged Industrial-Scale Data Theft

Reddit has launched a high-profile lawsuit against the AI answer engine company Perplexity and several data scraping service providers, accusing them of illegally harvesting vast quantities of Reddit user content to power AI models without permission. The legal action, filed in Manhattan federal court in October 2025, highlights the escalating conflict between AI companies hungry for data and websites seeking to protect their intellectual property and user privacy.

Background: The Conflict Between AI Companies and Content Platforms

Reddit, one of the world's largest social media platforms known for its active community discussions, alleges that Perplexity and others engaged in "industrial-scale" scraping of its user comments and posts. The lawsuit claims that these companies circumvented Reddit’s digital defenses, including its robots.txt file—a standard used to block automated data extraction—and scraped Reddit content via indirect channels, especially by mining Google search results where Reddit posts are indexed.

Reddit’s complaint metaphorically describes the accused data scraping firms as “would-be bank robbers” who, unable to access Reddit’s "vault" (the site’s direct database), target the “armored truck” — Google’s search engine results — to extract protected content. This circumvention, Reddit asserts, violates its terms of service and users’ rights, as Reddit has not authorized these companies to use its data at this scale or in this manner.

Key Allegations and Evidence

The lawsuit states that Perplexity, despite receiving a cease-and-desist letter from Reddit in May 2024 demanding it halt scraping activities unless a licensing agreement was reached, escalated its data extraction efforts. Notably, Reddit created a test post designed to be crawlable only by Google but not directly by Reddit scrapers. Within hours, Perplexity reportedly produced content from that post, strongly suggesting that it was scraping Google’s cached Reddit content rather than Reddit itself.

Reddit claims Perplexity’s AI tools used scraped Reddit comments to generate user responses, and that Perplexity increased citations to Reddit content by a factor of forty after the cease-and-desist letter. The suit accuses Perplexity of deploying “increasingly devious schemes” to bypass Reddit’s security and policies, including relying on third-party data scraping services such as SerpApi, Oxylabs, and AWMProxy.

Perplexity’s Response

Perplexity has denied wrongdoing. The company’s head of communications, Jesse Dwyer, told The Verge that Perplexity was not training its AI models on Reddit content and that it respects Reddit’s robots.txt directives. Perplexity claims it provides factual answers with accurate AI and supports open and fair access to public knowledge. The company said it had not yet received the formal legal filing at the time of its public statements and vowed to "fight vigorously" against what it views as threats to openness and the public interest.

Additionally, Perplexity stated on Reddit that it does not train models on Reddit posts but rather summarizes and cites content when generating answers — a distinction it says is critical to its approach and compliance.

Broader Industry Context

This lawsuit is part of a growing wave of legal challenges from content platforms against AI companies accused of scraping vast amounts of user-generated data without consent. While major AI firms like Google and OpenAI reportedly have licensing agreements with Reddit, smaller or newer players like Perplexity appear to have bypassed these arrangements, intensifying tensions.

Reddit’s aggressive legal stance highlights the broader clash over data ownership, user privacy, and the ethics of using publicly accessible content to train AI systems. The case could set important precedents for how AI companies source training data and respect content platforms’ digital rights in the future.

Implications and Outlook

  • For AI companies: The lawsuit signals increased scrutiny and legal risk associated with scraping user-generated content without explicit permission. AI firms may need to negotiate licensing deals or find alternative training data sources.

  • For content platforms: Reddit’s actions demonstrate a willingness to aggressively enforce data protections, potentially inspiring other platforms to pursue similar legal actions to safeguard their data and user privacy.

  • For users: The case raises questions about how user-generated content is used by AI and whether users have control or compensation rights when their posts fuel commercial AI products.

The legal battle between Reddit and Perplexity will be closely watched as it unfolds, potentially shaping the future of AI training practices and digital content rights.


Visual Context

  • Reddit Logo and Website Interface: Illustrates the platform whose content is at the center of the dispute.
  • Perplexity AI Interface Screenshot: Demonstrates the AI tool accused of using scraped Reddit data.
  • Diagram of Data Scraping Mechanism: Visualizes the alleged indirect scraping via Google search results.
  • Court Filing Documents Image: Represents the formal legal action taken by Reddit.

This case underscores the complex dynamics at the intersection of AI innovation and digital content ownership, with significant consequences for industry standards and user rights moving forward.

Tags

RedditPerplexitydata scrapingAI modelslegal actionuser privacyintellectual property
Share this article

Published on October 24, 2025 at 09:01 PM UTC • Last updated 1 hour ago

Related Articles

Continue exploring AI news and insights