• TehPers@beehaw.org
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    2
    ·
    1 day ago

    The books were purchased and destroyed to digitize them. There is nothing wrong with digitizing a work. The books were destroyed because duplicating a work without permission is illegal, but destroying the original means that there is only one copy in the end still.

    The LLM training is the problem. This is not.

    • blindsight@beehaw.org
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      1 hour ago

      Hit the nail on the head.

      Millions and millions of print books are destroyed all the time, and very rarely is anything of value lost. Libraries, thrift stores, and used book stores get inundated thousands of books donated to them, most of which nobody wants. Unless you, personally, are going to take on sorting, transporting, and storing dozens of duplicate copies of books in poor condition, and have some purpose for them (presumably?), then get off your high horse about the destruction of bulk-purchased used books.

      Individual copies of mass-published books are not precious. Only rare books are important for preservation. And, even then, digital copies are much more practical for long-term storage than physical books. Anna’s Archive’s preservation project as a shadow library is only possible because data storage is very cheap, infinitely replicable, and practically free to transport.

    • Vodulas [they/them]@beehaw.org
      link
      fedilink
      arrow-up
      5
      ·
      13 hours ago

      The books were destroyed because duplicating a work without permission is illegal

      It is not illegal if you don’t distribute, which the judge ruled meant this was fair use. They destroyed the books as part of the digitizing project because it is likely faster and cheaper than non-destructive methods.

      but destroying the original means that there is only one copy in the end still.

      That is not how this works at all. As long as you aren’t distributing, you are well within your rights to make copies of a book you purchase.

      • TehPers@beehaw.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        10 hours ago

        Quoting the analysis in the ruling:

        Authors also complain that the print-to-digital format change was itself an infringement not abridged as a fair use (Opp. 15, 25).

        In other words, part of what is being ruled is whether digitizing the books was fair use. Reinforcing that:

        Recall that Anthropic purchased millions of print books for its central library… [further down past stuff about pirated copies] Anthropic purchased millions of print copies to “build a research library” (Opp. Exh. 22 at 145, 148). It destroyed each print copy while replacing it with a digital copy for use in its library (not for sharing nor sale outside the company). As to these copies, Authors do not complain that Anthropic failed to pay to acquire a library copy. Authors only complain that Anthropic changed each copy’s format from print to digital (see Opp. 15, 25 & n.15).

        Bold text is me. Italics are the ruling.

        Further down:

        Was scanning the print copies to create digital replacements transformative? [skipping each party’s arguments]

        Here, for reasons narrower than Anthropic offers, the mere format change was fair use.

        The judge ruled that the digitization is fair use.

        Notably, the question about fair use is important because of what the work is being used for. These are being used in a commercial setting to make money, not in a private setting. Additionally, as the works were inputs into the LLM, it is related to the judge’s decision on whether using them to train the LLM is fair use.

        Naturally the pirated works are another story, but this article is about the destruction of the physical copies, which only happened for works they purchased. Pirating for LLMs is unacceptable, but that isn’t the question here.

        The ruling does go on to indicate that Anthropic might have been able to get away with not destroying the originals, but destroying them meant that the format change was “more clearly transformative” as a result, and questions around fair use are largely up to the judge’s opinion on four factors (purpose of use, nature of the work, amount of work used, and effect of use on the market).

        The print original was destroyed. One replaced the other. And, there is no evidence that the new, digital copy was shown, shared, or sold outside the company. [The question about LLM use is earlier in the ruling] This use was even more clearly transformative than those in Texaco, Google, and Sony Betamax (where the number of copies went up by at least one), and, of course, more transformative than those uses rejected in Napster (where the number went up by “millions” of copies shared for free with others).

        … Anthropic already had purchased permanent library copies (print ones). It did not create new copies to share or sell outside.

        TL;DR: Destroying the original had an effect on the judge’s decision and increased the transformativeness of digitizing the books. They might have been fine without doing it, but the judge admitted that it was relevant to the question of fair use.

        • Vodulas [they/them]@beehaw.org
          link
          fedilink
          arrow-up
          1
          ·
          5 hours ago

          That is true, and they may have been doing to cover their asses, but I would bet they did the destructive method because it was faster or cheaper (or both). We will probably never know the minutia of that decision though