Influential Actors

Sam Altman

Altman's statement indicates that training on The New York Times' data is not a priority for OpenAI, and that their method is robust to the removal of any one source. While this may partly be a negotiating tactic, it seems plausible that NYT's material indeed constitutes only a small fraction of the high-quality data available to OpenAI.
Observation:
Altman stated that OpenAI is "open to training on The New York Times, but it’s not our priority", adding that "we actually don’t need to train on their data."
Source: https://www.cnbc.com/2024/01/18/openai-ceo-on-nyt-lawsuit-ai-models-dont-need-publishers-data-.html
The New York Times

The New York Times sued OpenAI for copyright infringement while the pair were negotiating a deal where NYT content would appear in ChatGPT. As of early 2023, several media companies have successfully sought or are currently negotiating licensing deals to provide training data to OpenAI, indicating that this was an option for NYT as well. The copyright lawsuit may be a tactic to strengthen NYT's position during licensing negotiations, but, given OpenAI's likely resilience to the loss of any one data source, NYT may not expect to be able to dramatically increase the value of any deal this way.
Observation:
In January 2024, OpenAI were reported to be in licensing negotiations with several prominent media companies
Source: https://www.bloomberg.com/news/articles/2024-01-04/openai-in-talks-with-dozens-of-publishers-to-license-content?embedded-checkout=true
Observation:
In December 202, The New York Times sued OpenAI for copyright infringement
Source: https://www.theverge.com/2023/12/27/24016212/new-york-times-openai-microsoft-lawsuit-copyright-infringement
Observation:
The New York Times sued OpenAI for copyright infringement while the pair were negotiating a deal where NYT content would appear in ChatGPT
Source: https://www.cnbc.com/2024/01/18/openai-ceo-on-nyt-lawsuit-ai-models-dont-need-publishers-data-.html

Key Findings

Copyright Infringement Claims


The New York Times has filed a lawsuit against OpenAI and Microsoft, asserting that OpenAI's AI models, particularly ChatGPT, were trained on and can reproduce content from The New York Times, including material behind the Times' paywall. The Times claims that OpenAI's products were constructed by extensively copying its content, which has led to the unauthorized access and use of copyrighted material. The Times' legal team has presented substantial evidence to support these allegations. OpenAI has not publicly disclosed the specific sources of its training data, but the lawsuit contends that it includes The New York Times' copyrighted works. The lawsuit also alleges that the AI tools developed by OpenAI and Microsoft, using The New York Times' content, have the effect of diverting traffic from the Times' website.
Source Facts
  • The New York Times claims that OpenAI's product, ChatGPT, was used to bypass paywalls and access copyrighted content from The New York Times.arstechnica.com | Mar 12, 2024
  • The New York Times alleges that OpenAI's products were built by copying The New York Times's content on an unprecedented scale.arstechnica.com | Mar 12, 2024
  • OpenAI has not publicly disclosed the makeup of the datasets used to train its AI models, which The New York Times claims includes copyrighted content from The New York Times.arstechnica.com | Mar 12, 2024
  • The New York Times is suing OpenAI and Microsoft for alleged copyright infringement, claiming that OpenAI used its content to create artificial intelligence tools that divert traffic from the Times website.www.wsj.com | Feb 28, 2024
  • The New York Times accused OpenAI and its partner Microsoft of infringing on its copyrights by using millions of its articles to train A.I. technologies.www.nytimes.com | Feb 27, 2024
  • The complaint alleges that OpenAI's GPT models copied and ingested millions of New York Times works for training purposes, and that the models can generate content that is normally protected by The New York Times' paywall.www.law.com | Feb 7, 2024
  • The New York Times' legal team has provided detailed and extensive evidence that AI models were trained using articles from the newspaper.huyong.blog.caixin.com | Jan 25, 2024
  • Fair Use Defense


    Microsoft and OpenAI are invoking fair use as a defense in the lawsuit, arguing that the use of New York Times articles to train AI models is transformative and does not supplant the market for the original articles, similar to legal precedents set by technologies like videocassette recorders. They contend that large language models, such as OpenAI's ChatGPT, have substantial lawful use, which has historically been a basis for dismissing copyright claims. Additionally, they claim that ChatGPT is not a substitute for a New York Times subscription. Microsoft's initial motion to dismiss does not address direct and vicarious infringement claims, leaving those for later litigation with a fair-use defense.
    Source Facts
  • OpenAI's defense to the allegations of copyright infringement is that the use of articles in the training of models can be seen as transformative and should be allowed under fair use.arxiv.org | Mar 22, 2024
  • Microsoft's legal argument includes the assertion that large language models, like OpenAI's, are capable of substantial lawful use, which historically has been a basis for dismissing similar copyright infringement claims.www.theverge.com | Mar 5, 2024
  • Microsoft's motion to dismiss the lawsuit does not include the direct and vicarious infringement claims, which Microsoft plans to fight later on in litigation with a fair-use defense.arstechnica.com | Mar 5, 2024
  • Microsoft's motion argues that large language models (L.L.M.s) such as those used by OpenAI's ChatGPT do not supplant the market for news articles and are comparable to technologies like videocassette recorders, which were found to be allowed under copyright law.www.nytimes.com | Mar 4, 2024
  • OpenAI's motion argues that its online chatbot, ChatGPT, is not a substitute for a New York Times subscription.www.nytimes.com | Feb 27, 2024
  • Market Impact and Substitutability


    Microsoft contends that OpenAI's L.L.M.s, like ChatGPT, do not replace the market for news articles and compares them to technologies historically allowed under copyright law. OpenAI argues that ChatGPT is not a substitute for a New York Times subscription. The New York Times claims that OpenAI's use of its content could devalue its offerings by providing access to the same content without subscription. These arguments are central to the lawsuit's resolution on whether OpenAI can continue using New York Times articles for AI model training.
    Source Facts
  • Microsoft's motion argues that large language models (L.L.M.s) such as those used by OpenAI's ChatGPT do not supplant the market for news articles and are comparable to technologies like videocassette recorders, which were found to be allowed under copyright law.www.nytimes.com | Mar 4, 2024
  • OpenAI's motion argues that its online chatbot, ChatGPT, is not a substitute for a New York Times subscription.www.nytimes.com | Feb 27, 2024
  • The New York Times argues that the use of its content by OpenAI and Microsoft could have a significant negative impact on the value of its content, as readers could access the same content through OpenAI without paying for a subscription to The New York Times.huyong.blog.caixin.com | Jan 25, 2024
  • Allegations of Hacking and Misuse


    OpenAI has filed a motion to dismiss The New York Times' lawsuit, contending that the Times manipulated ChatGPT into reproducing copyrighted material. OpenAI also accused the Times of potentially hacking ChatGPT to achieve this, which the Times has denied. The lawsuit alleges that OpenAI's AI tools, including ChatGPT, have been used to infringe on the Times' copyrights and divert web traffic. The resolution of these claims will be critical in determining whether OpenAI can continue to use New York Times articles for AI model training.
    Source Facts
  • The New York Times denied an OpenAI claim that the newspaper improperly used OpenAI products to create 'highly anomalous results' as part of its lawsuit against the AI startup, as reported by SeekingAlpha on March 13, 2024.sustainabletechpartner.com | Mar 13, 2024
  • OpenAI claimed in February 2024 that The New York Times must have used somebody to 'hack' ChatGPT to make it reproduce NYT content and denied that ChatGPT could be used to dodge the NYT paywall.www.theregister.com | Mar 13, 2024
  • OpenAI has filed its own motion to dismiss the lawsuit brought by the New York Times, claiming that the Times 'tricked' ChatGPT into directly reproducing copyrighted material from the publication.www.theverge.com | Mar 5, 2024
  • The New York Times is suing OpenAI and Microsoft for alleged copyright infringement, claiming that OpenAI used its content to create artificial intelligence tools that divert traffic from the Times website.www.wsj.com | Feb 28, 2024
  • Negotiations and Licensing

    Source Facts
  • The New York Times and OpenAI had engaged in negotiations about licensing the data from The New York Times articles for training, but did not come to an agreement.arxiv.org | Mar 22, 2024
  • Prior to the lawsuit, The New York Times engaged in negotiations with OpenAI for several months to reach a paid licensing agreement but failed to do so.huyong.blog.caixin.com | Jan 25, 2024
  • Historical background


    The New York Times and other media entities have sued OpenAI for copyright infringement, seeking damages and the removal of copyrighted content from AI training sets, invoking the DMCA. OpenAI has moved to dismiss parts of the NYT lawsuit. Historical legal precedents show that fair use determinations are complex and inconsistent, and handling of evidence can significantly impact case outcomes.
    Source Facts
  • The New York Times filed a lawsuit in December against OpenAI and Microsoft on copyright infringement grounds.www.nytimes.com | Feb 28, 2024
  • OpenAI filed a motion in court to dismiss key elements of The New York Times's lawsuit.www.nytimes.com | Feb 28, 2024
  • Raw Story, AlterNet, and The Intercept sued OpenAI for copyright infringement in a New York federal court.www.nytimes.com | Feb 28, 2024
  • The lawsuits seek damages of at least $2,500 per violation and require OpenAI to remove all copyrighted articles from its data training sets.www.nytimes.com | Feb 28, 2024
  • The Digital Millennium Copyright Act was cited in the lawsuits against OpenAI for copyright infringement.www.nytimes.com | Feb 28, 2024
  • Big media companies, like The New York Times and Getty Images, have filed lawsuits against AI companies for copyright infringement related to the use of copyrighted content for training AI models.www.theverge.com | Feb 15, 2024
  • The legal system's fair use doctrine allows for certain kinds of copies to be considered legal under the Copyright Act, but the determination is based on a four-factor test and is not consistent across different courts.www.theverge.com | Feb 15, 2024
  • In the case Rossbach v. Montefiore Med. Ctr., 2021 WL 3421569 (S.D.N.Y. Aug. 5, 2021), the defense's handling of suspected fabricated evidence led to severe sanctions against the plaintiff and her counsel, including a dismissal with prejudice.www.law.com | Oct 4, 2021
  • The Supreme Court of the United States ruled in McDonough v. Smith that the statute of limitations for a 42 U.S.C. 1983 fabricated evidence claim begins when the criminal proceedings against the plaintiff terminate in their favor, such as with an acquittal.supreme.justia.com | Jun 20, 2019
  • Weighting reasoning: Forecast only accounts for cases that were settled in court
    Weighting reasoning: This reference class is robust in that it captures most if not all lawsuits brought against tech companies for AI data usage
    Weighting reasoning: Captures settlements and dismissed lawsuits in addition to cases resolved in court
    Weighting reasoning: Forecast only accounts for cases that were settled in court
    Weighting reasoning: The media reference class is broad with few instances referring to news or journalism organizations
    Historical forecast: N/A
    FUTURESEARCH Forecast

    65% probability
    2 to 1 odds
    I began with a historical forecast of 42% and then revised it to 65%, for three reasons. First, OpenAI's defense emphasizes transformative use, drawing parallels to historical precedents that could strengthen their case. Second, if the New York Times' hacking allegations are unproven, it may weaken their position. Third, there are several independent ways the lawsuit could resolve in OpenAI's favor: the New York Times could drop the case, a licensing agreement could be reached, a court settlement could be made, or OpenAI could win in court.