Ep. #71 - Checking in on Fair Use in the Summer of 2025

Podcasts

Paul, Weiss Waking Up With AI

Paul, Weiss Waking Up with AI

Checking in on Fair Use in the Summer of 2025

In this week’s episode of “Paul, Weiss Waking Up With AI,” Katherine Forrest and Anna Gressel unpack the recent Bartz v. Anthropic copyright decision, examining how courts are evaluating fair use in AI training and what this case means for the future of copyright disputes in the AI space.

Stream here or subscribe on your
preferred podcast app:

Episode Speakers

Katherine B. Forrest

Partner

New York

Tel: +1-212-373-3195

kforrest@paulweiss.com

Anna R. Gressel

Partner

New York

Tel: +1-212-373-3388

agressel@paulweiss.com

Episode Transcript

Katherine Forrest: Hello everyone, and welcome to another episode of “Paul, Weiss Waking Up With AI.” I'm Katherine Forrest.

Anna Gressel: And I'm Anna Gressel. And Katherine, I feel like we're on a roll with these summer podcasts, and we're in the same time zone and actually recording in the morning with our coffees, which is super fun. One of us is not in the afternoon or evening somewhere.

Katherine Forrest: It's actually a true waking up with AI. So often it's an afternoon with AI called “Waking Up With AI,” but today it's a real waking up with AI.

Anna Gressel: Totally. And I sometimes tell people—I don't know if you do this, Katherine—that “Waking Up With AI” is supposed to be the length of the cup of coffee that you have. And so I like kind of drink mine as we record. I don't know about you.

Katherine Forrest: Well, I'm afraid that I would sort of “glug, glug, glug,” and it would become an unpleasant sound for the audience. But I do sip from time to time, trying to do it delicately.

Anna Gressel: Well, so let's dive in today because I feel like, despite it being the summer and kind of nice and relaxed, we haven't really taken a vacation from the copyright issues in AI. And if anything, it's really been kind of a hard-hitting summer. There have been two really important decisions lately. And for folks who aren't following closely in the space, there have been the Meta Kadrey summary judgment decision and the Bartz v. Anthropic decision, and both deal with the question of whether using copyrighted works in connection with training AI models is a fair use and therefore protected under copyright law.

Katherine Forrest: Yeah, and we're going to start and actually spend our time today entirely on the Anthropic decision that came out first. And it's really interesting, but it also does emphasize how fact-specific each of these cases are, or each of these cases is. Let me get my grammar right. You see, now I need a swig of coffee. But the Bartz v. Anthropic decision was one of the first judicial decisions, it is one of the first judicial decisions, dealing with whether training LLMs can be considered fair use. You remember, the Ross case was not a generative AI case at all. So this is a training LLM case. And it's a really important moment to sort of stop and look at what is actually catching the attention of judges and decision makers in these areas.

Anna Gressel: And so let's set the stage for the decision. Fairly soon after the release of ChatGPT and other LLMs in late 2022 and 2023, copyright holders and others have been filing lawsuits against AI developers, alleging that their works have been copied without authorization in connection with the training and outputs of LLMs. And the developers have almost universally asserted fair use as a defense.

Katherine Forrest: Right, and what's interesting about the Anthropic case—and we'll talk more about this—is it's not an output case. It's not a case that some of the others are, where there's an allegation in the complaint that the output is actually infringing. So let's do a brief refresher on fair use. First of all, it's important just to sort of note, because it can come up this way, it can come up in cases that fair use is a defense to an established infringement. So you actually have to have some sort of unauthorized copying of one of the rights of a copyright holder, and then fair use becomes a defense to that infringement.

Anna Gressel: Yeah, and fair use is very fact-dependent. I mean, we've seen a lot of different cases with a lot of different fact patterns, but it is actually often decided at the summary judgment stage because it's a mixed question of fact and law.

Katherine Forrest: But not always.

Anna Gressel: Not always, that is true. And the Anthropic case that we'll talk about today is an example of a partial entry of summary judgment, but questions of fact have been reserved by the judge for trial. And so there's going to be at least a partial trial on some of the open factual questions.

Katherine Forrest: Okay, so let's go through a couple of these factors—or all of these factors—and just talk about what they are, and then we'll sort of weave them into this Anthropic case. But the first—what? You got something there?

Anna Gressel: Wait. You mean the fair use factors, right?

Katherine Forrest: What did I say?

Anna Gressel: You just said “the factors.” Do you want to say what they are?

Katherine Forrest: The factors, okay, the fair use. Keep all of this in. To Robert, our producer, keep all of our banter in. This is an important part of our little thing here, so people can see sort of the unscripted nature of it. But yes, the fair use factors. We want to go through these factors so you can understand them, you the audience. The first is what is referred to in the statute as the purpose and the character of the use. And this is where people often think about things that are done in schools and educational uses and parody not being a problem because the purpose and character of the use is of a particular kind, and it's really the commercial or non-commercial nature. That's what's in the statute. But it also now has another element added in, which is the transformative nature of the use, and that was put in with the Supreme Court case called Campbell v. Acuff. The transformative nature of the use is sort of an added-in element to this first factor.

Anna Gressel: Katherine, one thing that's kind of implicit in there—but, you know, we have a lot of non-lawyer listeners, so I want to pause on this for a moment—is that fair use as a defense is a statutory defense, or it's a defense that is given its life through Section 107 of the Copyright Act. And it has these four factors that we're going to talk through. But something that Katherine mentioned, I think, is really, really important, which is you actually have language in the statute that gives some shape to what those factors are, but there's so much in terms of the case law and the history of how those factors have been interpreted. So when she's talking about transformativeness, what we're really talking about is a judicial doctrine that has expanded on this question of what the purpose and the character of the use is, and that's super interesting. So we follow a lot of the case law. So again, I'm digressing because I know we have a lot of non-legal listeners who are super interested in these copyright questions. But, you know, one of the things that we're going to talk about today is whether what the transformativeness inquiry is, is a really important part of that Anthropic decision and really kind of tethers that decision. And then there's a lot that flows from that. So Katherine, should we pick up on factor two now?

Katherine Forrest: Yeah, let's go into that. So the second factor is the nature of the copyrighted work. And that's really whether or not the work is more factual or more creative, because the Copyright Act doesn't actually allow you to protect everything. In other words, you can't protect just an idea. What you protect under the Copyright Act is the expression of the idea. So if you've just got a fact in the world and the fact is just the fact, that's not going to be protected. But the expression, for instance, in a non-fictional work—a biography of so-and-so that's factually correct—that could be protected. They call it the idea-expression dichotomy, and so you've got the expression being protectable and the ideas not. So where a particular work falls, in terms of how much of it is expressive versus how much of it is just an idea or factual in nature, that can actually mean that you either get strong—what they call strong copyright protection—or weaker copyright protection.

Anna Gressel: Yeah, we're not going to talk about this today, but this also relates to a totally separate concept in copyright law, which is the work is entitled to be protected by copyright at all. And sometimes there are challenges in copyright infringement cases to even the copyrightability of the underlying works. They might be too factual, for example, or they might have elements that aren't copyrightable, like style. And so that doesn't tend to be as much of an issue in these cases where there's this alleged copying of massive quantities of data from the internet. But when you have cases about specific works, that can be a really important piece of it or certain kinds of work. Anyhow, we will move on to factor three. I mean, we could talk about these factors all day. But the third factor is the amount and substantiality of the portion of the work used by the infringer. That is, how much is copied.

Katherine Forrest: Right, and so we can pass by that one pretty quickly. It just is what it is. And the fourth factor is another one, like the first factor, which people talk about all the time in these AI copyright cases, and it's a very important factor in the Anthropic case. It's the effect of the use on the potential market for the work. So in other words, people might describe it generically as whether or not you're going to be impacting the market for the work in some way. so it's different than quantifying damages—that's a whole separate inquiry. This is whether or not there's sort of a substitution, if you will, of the use for the work itself or for the market for that work.

Anna Gressel: And none of these factors themselves are dispositive, and I think that's an important point and one that we often see judges coming back to as they decide these decisions. Rather, they're weighed against each other, although factors one and four are typically viewed as the most important factors in the inquiry.

Katherine Forrest: I think that your dog should bark in every single episode, particularly at important moments. No, no, no, no, no, he's emphasizing the first and fourth factors being very, very important. That's okay.

Anna Gressel: For some reason, Katherine, I thought it was your dog and I was like, your dog sounds a lot like mine.

Katherine Forrest: No, my dog is—I've got a big dog. He does guttural “woof woofs” like he's going to rip your throat out, even though he's a golden retriever.

Anna Gressel: Why don’t you give me one second to call her over—she’s not going to stop. Hold on.

Katherine Forrest: Okay, well, you just go off and do your thing while I just sort of continue to riff here. But the Anthropic case is pending in the Northern District of California before Judge Alsup. And among the claims brought was that Anthropic's Claude model had directly infringed their works, and there were other claims as well. It was a group of authors– and from the outset, Judge Alsup refers to the fact that while the case began with one model, there have been successive versions of the Claude model released. It's important to focus on the fact that Anthropic—this is now Judge Alsup's concept of what's important to focus on—he said it was important to focus on the fact that Anthropic was not just copying content to train its models, but was “assembling these copies into a central library of its own.” So that's really an important piece of this entire case: that what Anthropic was doing—and it's not a piece of the other cases, or at least a number of the other cases—that Anthropic was both training an LLM but keeping a repository, a permanent repository, of copies to be used, as the facts had come out at least for purposes of summary judgment, as a general resource even after works had been used for training. And maybe there were some works in that general resource that were never going to be used for training, so that collection of works or books—they’re in digitized version—had been taken from various places. When I say taken, I'm not saying taken without pay. They had been obtained both through payment for particular copies and then also obtained without payment or without permission, also, separate other copies from the internet. He uses the word a lot: pirated.

Anna Gressel: Now, pirated is a pretty loaded term because it has a lot of history in copyright law.

Katherine Forrest: It sure does. You really get a sense of where Judge Alsup is going from early on in the opinion when he also states that Anthropic preferred to “steal” certain copies because of the logistics of acquisition. And he was basing that on an exhibit that was attached to some of the summary judgment papers.

Anna Gressel: And, you know, I think it's really interesting. The bottom line for Judge Alsup is really that he treats the purchased text differently from the non-purchased text. He focuses analytically on the use to which the works were put and separates out those uses. And that is, in many ways, kind of a natural extension for our copyright folks on the line of the Warhol case. That's a recent Supreme Court case focusing very heavily on specific and distinct uses. And what Judge Alsup says in interpreting that is that the training of the LLMs is one such use, but a different use was creating a permanent library of the works that could be used for multiple purposes and kind of lived on in perpetuity within the organization. And so let's focus in our discussion, as he did, first on the use of the works for training the LLMs.

Katherine Forrest: Right. So for the first factor, for training, which is the purpose and character of the use—so we’re going to talk about the purpose and the character of the use for training—Judge Alsup walks through the process of how the works were chosen and how they’re put into different mixes for different types of training. But he accepts a fact that is really contested in a lot of the other AI copyright cases. He accepts that the works are essentially memorized, as he says. And he says that they're compressed and that the mapping of the works in the model was so complete that it was akin to memorization. Now, that is a fact that is not uncontested fact in a lot of cases. In fact, a lot of folks would tell you that there's research that says that memorization is essentially something that's been eliminated. But in any event, let's put that to the side. In this case, memorization was not a contested fact. And he says that the outputs of the plaintiffs' works—that is, the regurgitation of works—was not an issue in the case. So while there was memorization, we're not talking about a problem in the case being flagged as regurgitation of works at the output end, word for word.

Anna Gressel: Yeah, and I mean, I do think it's worth pausing on the fact that memorization is a contested fact in many other cases. And there are cases that do have output-related claims. So they're actually alleging that the outputs of the model at issue in that case, in whatever case, infringed on specific works owned by the copyright owners. It's also just one digression, Katherine—I think it's worth noting that this memorization question is super contested in the privacy space as well, because there is this question of whether personal information itself becomes encoded in a model. And so there are a lot of different privacy regulators grappling with that same issue in a different way than we're seeing in the copyright cases.

Katherine Forrest: Right, that's why it's so important to remember that every single case stands on its own with its own set of strategic choices and the factual records that get developed—all kinds of things. But all right, so here he deals with this set of really factual issues and then he gets to the first factor, and then he pretty much deals with it pretty easily.

Anna Gressel: Yeah, and he does say that the fact that there is memorization is not a problem for training because authors can't rightly exclude anyone from using their works for training or learning, and that at least people do not need to pay for a book each time they may refer to what they've learned from it. So he's using kind of learning as an analogy for the training process, like the human process of learning.

Katherine Forrest: Right, and then he says that when you use copyrighted works to train, that this is a “quintessentially transformative” use. That's his phrase, “quintessentially transformative.” So it's a pretty strong set of words. He doesn't seem to struggle with that at all. And he says that Anthropic's LLMs were not designed for racing ahead to supplant—who has a landline anymore, right? I have a landline because I live in Maine, but nobody else has a landline.

Anna Gressel: No one else can see the surprise on your face right now, to be called on your landline.

Katherine Forrest: Right. I didn't even know it rang. Who knew it even rang? But in any event, the judge does find that the LLMs, in terms of training, were not racing ahead to supplant plaintiffs' works, but they were being used to create something different.

Anna Gressel: Yeah, and then he turns to this other use of the works by Anthropic. And this is the one that he has the most problem with. This is the one we mentioned before: the use of downloaded copies or copies of physical books to build a centralized repository or library. And the judge in that order says, and I'm quoting here, “Anthropic seems to believe that because some of the works it copied were used in training LLMs, Anthropic was entitled to take for free all of the works in the world and keep them without further accounting.” And he seems to have a real problem with this.

Katherine Forrest: Right, and it leads to a discussion of the distinction between the purchase and the non-purchase of the works, and that for the purchased works, the digital version that's made of, say, a physical copy of a book and then the binding is taken off and it's digitized—well, he really does focus on the fact that that work was paid for. But for the non-paid-for and non-permissioned works, he says that didn't occur. And while he finds ultimately that the change in format between the print that was acquired through purchase made into digital is a transformative use, he finds that the fact that there are non-purchased works to be a problem, as we're going to see.

Anna Gressel: Yeah, and just pausing on the purchased works. One of the facts that he really focuses on there is actually that the purchased work is then destroyed. And so you're not really adding to the number of purchased works in the universe; you're just changing their format from print to digital, and then you're kind of like tearing up the print version of the book. That seems to be quite persuasive to him. And then when he looks at the non-paid copies, of which there were apparently seven million, he says there was an intermediate infringement when the copy was taken from where it resided on the internet. He calls this “irredeemably infringing.” That's also a quote from him, which is interesting wording because it kind of predetermines the fair use question in a way. And he says that he need not decide the case on the basis of the works being used only for training because they have been placed in the permanent library. So he basically puts aside the issue of whether those works were used for training, and instead he focuses again on this permanent library use and the fact that not every book was paid for that was used within the library.

Katherine Forrest: Yeah, and then he goes through a long discussion of intermediate copying, which we're not going to go into now, but actually is an interesting sort of copyright doctrine that's been coming into play in these AI copyright cases.

Anna Gressel: Yeah, I think it's probably one of the most important pieces of this, but also very implicit in how you might read it. And so, you know, Katherine, I think you and I could probably have a whole discussion of intermediate copying, kind of what's said and not said about it in the order. But in terms of some of the other facts, the nature of the work—the court found that the works at issue, both fiction and nonfiction, were highly expressive and at the core of copyright protection. Remember, that's one of the factors we discussed before. And the court says that this factor favors the plaintiffs, but the court gave it relatively little weight, noting that the transformative nature of the use was far more significant in the overall analysis. And that's often quite typical of how the fair use factors are weighed.

Katherine Forrest: Right, and the court then observed that the second factor's main function is to help assess the other factors, especially the amount and substantiality of the portion used and the relationship between the original and the secondary uses.

Anna Gressel: And on the third factor, the amount and substantiality of the portion used, the court acknowledged that Anthropic copied the entire works, which typically does weigh against fair use. However, the judge found that copying the whole work was reasonably necessary for the transformative purpose of training an LLM. And the opinion noted, and I'm quoting again here, “the volume of text required to train an LLM is monumental because using so many works was reasonably necessary; using any one work for actually training LLMs was about as reasonable as the next.”

Katherine Forrest: Right, and then the court explained that the partial copying was not sufficient for this technology, and that the copying was not for the purpose of substituting for the original work in the market, but to enable the LLM again to learn, as we've talked about before.

Anna Gressel: Yeah, and interestingly, the judge also addressed the argument that Anthropic could have used other books or no books at all, stating that “reasonably necessary” does not mean “strictly necessary.” And the court was satisfied that the use of the plaintiffs' works was justified by the technical requirements of LLM training.

Katherine Forrest: All right, so let's go on to the fourth factor, because that's really where a lot of the rubber meets the road, since for the first factor, with transformative, he gets over that pretty quickly. But that's the effect of the use in the market. And here, in this case, it was the most nuanced. And the court found that the use of the works for training did not displace the demand for the original works. The judge analogized this again to training schoolchildren to write well. And so he's back to that learning concept and suggesting that the risk of market harm was speculative in this context.

Anna Gressel: Yeah, and this is really important. So the court explicitly rejects the idea that authors are entitled to a market for licensing their works for AI training, stating that such a market for that use is not one that the Copyright Act entitles the authors to exploit.

Katherine Forrest: Right, and so let's put a pin in this for another time, because if people recall the Ross case, which was not a generative AI case, it did have a line in there that found that there might be a market for licensing training AI tools. I've forgotten the exact language, but it's in tension a little bit with what's going on here.

Anna Gressel: Right, and here that is not a recognized market. The judge emphasized that the act is designed to promote the progress of science and the arts—that's a kind of constitutional protection that the Copyright Act gives life to—and not to protect authors from competition or to guarantee them new licensing streams for every conceivable use.

Katherine Forrest: Right, and this is where then the outputs come back into the analysis, because the court did note that if the case were about AI outputs, such as generating new works that directly compete with the originals, his particular analysis could be different. Now when he says that, he leaves open the possibility that outputs, which if they were substantially similar to the original work, could raise new questions relating to this fourth factor.

Anna Gressel: Yep, and the portion of the opinion where this is set is what we call “dicta.” It's unnecessary to the outcome of the case, and so it's given less weight by other courts and shouldn't really have precedential value. It wasn't about the facts in front of that court or the core of its decision.

Katherine Forrest: Right, so output, the output language is dicta. The court then goes back to the creation of that central library, and that's where the non-paid-for versions really get stuck, because he finds that you can't just not pay for copyrighted work and then copy and keep them in a general purpose library. And that, in many ways, is really the heart of this case.

Anna Gressel: Yeah, and the court there held that the storage of these not-paid-for books was not a fair use. And again, remember, this is a use that was for the central repository and not for further downstream use like AI training.

Katherine Forrest: Right, and he follows this almost immediately with a statement that's on [page] 28 of his printed-out original version, that: “the copies used to train specific LLMs did not and will not replace demand for copies of authors' works, but the library is different.” And so, as we've said now, the opinion ultimately comes down to this library concept.

Anna Gressel: So the court states that the copies used to build that central library, the copies that weren't paid for—so, you know, the downloaded, unpaid-for copies—did displace the demand for the author's books copy for copy. The court is basically reasoning that if Anthropic could have gone out and bought the hard copies, it could have continued to do that, so downloading those and not paying for them actually meant that there was revenue diverted away from those authors that could have gone to them. So that is, to the court's view, a harm under factor four of the various factors.

Katherine Forrest: right, and then the upshot, I think, of all of this is that you have to read the Anthropic decision. I think you need to sort your way through it and be very careful about the language, but also be very careful about how factually particularized this case is to the record that was built here and the particular strategic choices that were made here. There are a lot of these AI copyright cases still out there and yet to be decided. And there are going to be a lot of differing and very interesting, I think, strategic choices that will be made and fact scenarios built through discovery. So we'll have to see how it all comes out. But we've run over, Anna, by golly. I think this is more than a cup of coffee. This is a cup of coffee and a half.

Anna Gressel: Definitely, but probably well worth the time for us to spend with you folks today taking you through this decision. And with that, I think we will sign off, Katherine, and leave our next copyright discussion for another day. I'm Anna Gressel.

Katherine Forrest: I'm Katherine Forrest.

Anna Gressel: Thanks for joining us, and if you want us to cover a specific topic next time, drop us a line. And until then, like and subscribe to the podcast.

View Full Transcript

�鶹��Ƶ

Paul, Weiss Waking Up With AI

Checking in on Fair Use in the Summer of 2025

Episode Speakers

Episode Transcript