One man's look at copyright law

This article by Dan Polansky looks at copyright law, especially the United States law.

Protected works

As per Wikisource: United States Code/Title 17/Chapter 1/Sections 102 and 103: "Works of authorship include the following categories:

(1) literary works;

(2) musical works, including any accompanying words;

(3) dramatic works, including any accompanying music;

(4) pantomimes and choreographic works;

(5) pictorial, graphic, and sculptural works;

(6) motion pictures and other audiovisual works;

(7) sound recordings; and

(8) architectural works."

We pay attention specifically to computer software in the following.

Computer software

The U.S. copyright law codification itself does not list computer programs among protected work categories. However, they are mentioned in "House Report No. 94-1476 (Extract)" in Wikisource: United States Code/Title 17/Chapter 1/Sections 102 and 103:

'The term "literary works" does not connote any criterion of literary merit or qualitative value: it includes catalogs, directories, and similar factual, reference, or instructional works and compilations of data. It also includes computer data bases, and computer programs to the extent that they incorporate authorship in the programmer's expression of original ideas, as distinguished from the ideas themselves.'

As per Wikisource: United States Code/Title 17/Chapter 1/Section 101:

"‘‘Literary works’’ are works, other than audiovisual works, expressed in words, numbers, or other verbal or numerical symbols or indicia, regardless of the nature of the material objects, such as books, periodicals, manuscripts, phonorecords, film, tapes, disks, or cards, in which they are embodied."

Arguably, this appears to be terminological stretch since computer programs are not literary works by a naive terminology. Also technical writing does not quite match "literary", unless it means "by means of letters". But then, what if it is by means of Chinese characters, which are not letters? Be it as it may, if "literary" means "by means of letters", a computer program in 7-bit ASCII is a literary work. Since the term "literary work" is defined as quoted above, this paragraph has no material impact.

Is the binary executable of a computer program a literary work, and if so, by what standard? Sure enough, we can "disassemble" (translate) the binary into the assembly language, which uses mnemonics, and then, we come closer to something like works "expressed in words, numbers, or other verbal or numerical symbols or indicia". In any case, using naive terminology of "literary work", stating that binary executable is a literary work is a stretch.

As an aside, the above language of works "expresses in [...] numbers" opens the door to all digital objects being "literary works"; since, e.g. a PNG raster image is a work expressed in numbers. That was probably not intended. Indeed, even an 7-bit ASCII file containing English text would be a work expressed in numbers (via its digital storage), whereas an intuitive understanding would be that it is a work expressed in words.

Notably, a computer game is a package of different kinds of elements: the computer code embodying algorithms, screen/room layouts, graphical elements, background music, sound, etc. Thus, a computer game is something like a rich compound, and its different parts come under different work categories listed by the copyright law.

Further reading:

Software copyright, wikipedia.org
Wikidata: literary work
Digital Law Online: History, digital-law-online.info

Fixing in tangible form

As per Wikisource: United States Code/Title 17/Chapter 1/Sections 102 and 103:

"Copyright protection subsists, in accordance with this title, in original works of authorship fixed in any tangible medium of expression, now known or later developed, from which they can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device."

When someone gives a lecture--by means of speech, no slides--and a student makes lecture notes, is it the student that is the copyright holder and not the teacher? One might think so: the teacher did not fix his lecture in any tangible medium of expression. Indeed, lecture notes taken by students are sometimes being published online. Thus, sound waves (of changes in air pressure) are not a tangible medium. However, whether this is a standard/accepted legal interpretation needs to be clarified.

If someone makes a sound recording of an originally improvised musical performance by someone else, is it the recorder who holds the copyright?

One may ponder whether computer files are really a tangible medium, given one cannot touch the files, unlike a sheet of paper, a book, a painting, a photograph or a physical photographic film. For the purpose of copyright law, almost certainly, but it is not clear how or whether this is codified.

A corollary seems to be that when a journalist interviews someone by means of speech (not e.g. email), it is the journalist that is the copyright holder of the whole interview, and the interviewee has no copyright to what he or she said. This should be better sourced since it may be a non-standard analysis.

Further reading:

Copyrightability - Copyright Basics, guides.lib.umich.edu
Wikibooks: US Copyright Law#Fixation
Are students' notes that were made during a university course a breach of copyright?, quora.com
Class notes -- who owns the copyright: student or teacher?, law.stackexchange.com

Originality

As per Wikisource: United States Code/Title 17/Chapter 1/Sections 102 and 103:

"Copyright protection subsists, in accordance with this title, in original works of authorship fixed in any tangible medium of expression, now known or later developed, from which they can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device."

What does original mean? For one thing, it means "not copied". But does it means something else?

Let us produce a list of pseudo-random numbers seeded from clock (say, Mersenne Twister used by Python). Let this list be only be found in our publication, nowhere else. Is it original work protected by copyright? It is a work of an algorithm, not a human. Moreover, the statement of the form "X is the Y-th number generated by Mersenne Twister generator seeded from seed S" is a statement of fact. Can the algorithm claimed to be an author and engage in authorship? But then, under physicalism, human brains are something like embodiments of huge complexes of algorithms, and then, counter-intuitive as it may seem, human authorship is also a result of algorithm, just that no one knows what that algorithm is exactly. This requires clarification.

Let us produce a list of random numbers by a method that can claim to produce genuinely random numbers rather than pseudo-random ones. Then, the result is not produced by a deterministic algorithm. Is the result protected by copyright?

Further reading:

Copyrightability - Copyright Basics, guides.lib.umich.edu
Threshold of originality, wikipedia.org
Commons: Commons:Threshold of originality

Idea-expression dichotomy

As per Wikisource: United States Code/Title 17/Chapter 1/Sections 102 and 103:

"(b) In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work."

Moreover, "[...] the fundamental axiom of copyright law that no one may copyright facts or ideas" as per Wikisource: Feist Publications v. Rural Telephone Service.

Further reading:

Idea–expression distinction, wikipedia.org
The Myth of the Idea/Expression Dichotomy in Copyright Law by Richard H. Jones, 1990, digitalcommons.pace.edu

Feist v. Rural

In Feist v. Rural (1991), the U.S. Supreme Court ruled that a telephone directory is a mere listing of facts with no original selection or arrangement and that it is therefore not copyright protected.

Court actions and courts involved:

The District Court granted summary judgment to Rural, agreeing with Rural that telephone directories are protected by copyright.
The Court of Appeals affirmed.
The Supreme Court reversed the judgment of the Court of Appeals.

Further reading:

Merger doctrine

From Murray 2006: "The merger doctrine in copyright states that if an idea and the expression of the idea are so tied together that the idea and its expression are one - there is only one conceivable way or a drastically limited number of ways to express and embody the idea in a work - then the expression of the idea is uncopyrightable because ideas may not be copyrighted."^[1] A similar idea is expressed in Clayton 2005^[2].

According to Wikipedia, "United States courts are divided on whether merger prevents copyrightability in the first place, or should instead be considered when determining if the defendant copied protected expression." If the latter option would prevail, one would think this: someone accidentally arriving at the same phrasing would be fine, whereas someone stating "I copied the expression from source so-and-so but applied the merger doctrine" would be in violation of the copyright law.

References:

↑ Copyright, Originality, and the End of the Scenes a Faire and Merger Doctrines for Visual Works by Michael D. Murray, 2006
↑ The Merger Doctrine by Lewis R. Clayton, 2005, paulweiss.com

Further reading:

Idea–expression distinction#Merger doctrine, wikipedia.org

Sweat of the brow

The "sweat of the brow" doctrine is the idea that effort alone (of collecting information) is worth protection regardless of originality. In the U.S., the doctrine was rejected in Feist v. Rural in 1991.

Further reading:

Sweat of the brow, wikipedia.org
Copyright - Fact Compilations - Sweat of the Brow Doctrine Is Inapplicable and White Pages Are Not Sufficiently Original to Warrant Copyright Protection - Feist Publications v. Rural Tel. Serv. Co., 111 S. Ct. 1282 (1991). by Linda A. Tancs, scholarship.shu.edu

Fair use

As per copyright.gov: "Under the fair use doctrine of the U.S. copyright statute, it is permissible to use limited portions of a work including quotes, for purposes such as commentary, criticism, news reporting, and scholarly reports."

Wikimedia Commons does not allow fair use.^[1] The rationale is that the project intends to serve wikis in different languages and different countries support or interpret fair use differently.

Further reading:

Fair Use (FAQ) | U.S. Copyright Office, copyright.gov
Fair use, wikipedia.org
Wikisource: Copyright Act of 1976#§ 107. Limitations on exclusive rights: Fair use
Wikisource: United States Code/Title 17/Chapter 1/Section 107
Measuring Fair Use: The Four Factors, fairuse.stanford.edu

De minimis

Sources indicate there exists de minimis defense in the U.S. copyright law, an abbreviation of the phrase de minimis non curat lex. It seems to be distinct from and not part of fair use. It remains to be clarified what it is exactly; different sources seem to use the phrase differently.

Further reading:

Meta: Wikilegal/De Minimis Use of Protected Works under US Copyright Law
Commons: Commons:De minimis
The de minimis defense in copyright law. De mini-what?, blogs.library.unt.edu
Measuring Fair Use: The Four Factors - Copyright Overview by Rich Stim, fairuse.stanford.edu
Who Speaks Latin Anymore? Translating De Minimis Use for Application To Music Copyright Infringement and Sampling Application To Music Copyright Infringement and Sampling by David S. Blessing, 2004, scholarship.law.wm.edu
Copyrightable Authorship: What Can Be Registered, copyright.gov

Compilation

As per copyright.gov, 'Compilations of data or compilations of preexisting works (also known as “collective works”) may also be copyrightable if the materials are selected, coordinated, or arranged in such a way that the resulting work as a whole constitutes a new work. When the collecting of the preexisting material that makes up the compilation is a purely mechanical task with no element of original selection, coordination, or arrangement, such as a white-pages telephone directory, copyright protection for the compilation is not available.'

Further reading:

compilation, law.cornell.edu
Copyright in Derivative Works and Compilation (Circular 14), copyright.gov
Help: Collective Works | U.S. Copyright Office, copyright.gov

Government works

The U.S. government works are not protected by copyright, per Wikisource:United States Code/Title 17/Chapter 1/Sections 105 and 106.

Further reading:

Copyright term

Term (duration) in the U.S.:

Generally the life of the author plus 70 years, but it depends on when the work was created and on other things.

Further reading:

Copyright law of the United States#Duration of copyright, wikipedia.org
Wikisource: United States Code/Title 17/Chapter 3/Sections 302 and 303
List of countries' copyright lengths, wikipedia.org

Example rulings

Example rulings:

Wikisource: Feist Publications v. Rural Telephone Service

U.S. constitution

U.S. constitution about copyright per Wikisource: Constitution of the United States of America:

"The Congress shall have Power [...] To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;

Above, the language is of "writing"; thus, e.g. paintings would not be protected.

Above, the language is of "useful" rather than "beautiful" or "pleasant"; narrowly construed, novels would not be protected, but technical and scientific writing would be protected.

Copyright notice

As per Circular 1, "Notice was required for works published in the United States before March 1, 1989. Works published without notice before that date may have entered the public domain in this country."

As per Circular 3, "Copyright notice is optional for works published on or after March 1, 1989, unpublished works, and foreign works; however, there are legal benefits for including notice on your work."

Further reading:

Copyright Basics (Circular 1), copyright.gov
Copyright Notice (Circular 3), copyright.gov
Copyright law of the United States#Copyright notices, wikipedia.org

Berne Convention

Berne Convention is an international copyright treaty. Its main goal is the mutual protection of copyright between signatory countries.

As per Wikisource: Convention for the Protection of Literary and Artistic Works, the general minimum term of protection is the death of the author + 50 years, but there are different conditions for certain classes of works.

Further reading:

Berne Convention, wikipedia.org
Wikisource: Convention for the Protection of Literary and Artistic Works
Berne Convention, britannica.com

Paraphrasing

According to Wikipedia:Close paraphrasing, paraphrasing the source does not necessarily avoid copyright violation. This seems strange since it is the expression that is protected, not the fact or idea. This requires more deliberation and research.

Meta: Wikilegal/Close Paraphrasing indicates the answer to "Is close paraphrasing of a copyrighted work a copyright infringement?" is yes. But then, there is something like close paraphrasing and something like non-close paraphrasing. The page does not provide any explanation for what "close" means, nor any examples. The Wikilegal page takes an exception to the answer it has given, admitting that if one closely paraphrases a copyrighted work published under CC-BY-SA in a work under CC-BY-SA and traces the sentence to that work, there is no copyright violation. The proper proofreading and vetting of that page is unclear; the page was created in 2012 by an anonymous IP address.

Further reading:

Paraphrasing of copyrighted material, wikipedia.org
Wikipedia:Close paraphrasing, wikipedia.org -- largely unsourced essay
Meta: Wikilegal/Close Paraphrasing

Plagiarism

The concept of plagiarism relates to copyright violation, but is not exactly the same thing.

According to Britannica 1911, plagiarism is "an appropriation or copying from the work of another, in literature or art, and the passing off of the same as original or without acknowledgment of the real authorship or source."

One point of contrast to copyright violation: If a publisher publishes a copyrighted work without author's (or the rightful publisher's) permissions but correctly states the author of the work, it would be a copyright violation but no plagiarism since the transgressing publisher does not misrepresent or hide the authorship.

Even passing someone's ideas as one's own is plagiarism, according to multiple sources^[2]^[3]^[4]^[5] but not Britannica online^[6]. That is a point of contrast to copyright, which does not protect ideas. How and to what extent giving credit for ideas to other people and sources is practicable and practiced is unclear. Since, surely authors would all too often read something somewhere, forget where, and later use the ideas with forgotten provenance in their writing or other production.

At least two sources count using someone's information without attribution as plagiarism.^[7]^[8] One may wonder whether information, and also data, are species of ideas and thus covered by the above paragraphs.

As for data, article Plagiarism quotes Kennedy 2006 as stating: "Plagiarism is the illegal practice of taking someone else's ideas, data, findings, the language, illustrative material, images, or writing, and presenting them as if they were your own." Following this definition, even using data without attribution is plagiarism. However, the quotation states that plagiarism is illegal, at odds with the following paragraph. If we accept using someone's data without attribution as plagiarism, Wikidata will probably end up being largely plagiaristic; this character is reduced by entries listing certain sources in the "Identifiers" section. Many Wiktionaries would be plagiaristic as well; the German Wiktionary extensively listing sources less so.

Plagiarism is not illegal in the U.S., unlike copyright violation^[5]; plagiarism is an ethical concern. This is corroborated by considering plagiarism of public domain materials without attribution: it meets the definition of plagiarism but is not copyright violation in the U.S. sense. This is not to be understood as saying that no act that is plagiarism is ever illegal; an act is not illegal as plagiarism, but it may be illegal as copyright violation (some plagiarism is at the same time copyright violation).

Using the above notion (sourced one) that unattributed use of ideas (rather than e.g. word sequences) of others is plagiarism, we get that large language models (LLM) are high-volume plagiarists. That is so in so far as as they do not cite their sources, which they usually do not do. One may counter that a LLM learns an idea only if it is expressed in fairly many sources. This raises the question: if one takes an idea not from one but from many sources and does not state the source, is it still plagiarism? Or can one rather argue that, in that case, it is part of common knowledge or store of ideas, which bars the attribution requirement?

A rather different concept of plagiarism is used by Masaryk University in Czechia.^[9] According to them, "Plagiarism constitutes the intentional copying of another author's text and the representation and publication of such a test as one's own original work, careless or inaccurate citation of source literature and/or the omission of required bibliographical information (however unintentional)." Here, taking of others ideas is not covered. Moreover, they state that "Sanctions for plagiarism are determined by the Copyright Act", which is at odds with the notion of the preceding paragraphs that copyright violation and plagiarism are fundamentally orthogonal concepts and that plagiarism is not illegal in the U.S. Perhaps Czechs have a different understanding of the concept of plagiarism; perhaps the cited source is wrong.

Further reading:

Plagiarism, wikipedia.org
Wikipedia:Plagiarism, wikipedia.org
Wikisource:1911 Encyclopædia Britannica/Plagiarism
IAC - Copyright: Copyright vs. Plagiarism, cws.auburn.edu

Translation pairs

Translation pairs, e.g. "cat" --> "Katze", are one application of copyright law. Arguably, they often constitute facts expressed in an obvious canonical form (a pair), and thereby would not be protected by copyright.

However, one can argue that in so far as the chosen translation pairs differ between different dictionaries, there is something original to an extent to them. Requires clarification.

Protection against copyright infringement I am using:

Use multiple sources for each translation pair rather than blindly copying a single source.
Double check and question rather than blindly accepting what the sources indicate.
Check definitions of the items in the pair for match.

Further reading:

Dictionary definitions

Above, we touched on #Translation pairs, but there are other dictionary artifacts potentially subject to copyright, especially definitions.

Arguably, dictionary definitions are copyrighted, given the variation in their formulation across dictionaries. On the other hand, one could argue that they are not protected given the merger doctrine: there is only a handful of ways how to accurately define a word, one might think. The Richards v. Merriam Webster, Inc. case suggests definitions are protected.

Further reading:

Dictionary quotations

Some dictionaries contain short quotations of word use from literature (a sentence or two), stating the work title, author and publication date. Since the authorship is properly credited, it is not plagiarism but it could still be a copyright violation in principle. This practice is probably allowed via fair use. Finding a good source on the subject would be worthwhile.

Slogans and short phrases

As per copyright.gov, "Copyright does not protect names, titles, slogans, or short phrases. In some cases, these things may be protected as trademarks."

Further reading:

What Does Copyright Protect? (FAQ), copyright.gov

Photographs

One could think that, unlike paintings, photographs are a straightforward capture of facts: how the world looks at a point in time at a certain place from certain angle, etc. However, since photographs are copyright protected, there must be a rationale for doing so. Finding out the rationale requires more research.

Contrast can be drawn between 1) merely making a photograph and 2) arranging things to be photographed and then making a photograph. Thus, the lawgiver could consider "plain" photographs to be free from copyright protection.

Further reading:

What Photographers Should Know about Copyright, copyright.gov
Burrow-Giles Lithographic Co. v. Sarony, wikipedia.org
DACS - Knowledge Base - Factsheets, dacs.org.uk
Copyright for Photographers - Copyright, guides.libraries.indiana.edu
Copyright Act Amendment, Washington D.C. (1865), copyrighthistory.org

Charts and graphs

Charts and graphs would seem to be free from copyright in so far as they are a straightforward presentation of data and data and facts are not copyrightable. However, one might reckon that color schemes and similar somewhat arbitrary choices could make a chart copyrightable.

Wikimedia Commons have Template:PD-chart that labels charts that are considered uncopyrightable; one may investigate the particular charts to get an idea. Moreover, Commons:Threshold of originality#Charts links to multiple deletion discussions.

Further reading:

Commons: Commons:Threshold of originality#Charts
Copyrightability of Tables, Charts and Graphs, deepblue.lib.umich.edu

Computer-made art

It is unclear to what extent and what computer-made art is subject to copyright. One can think e.g. of iterated function system fractals or Mandelbrot set images, which are drawn by the computer. Considering Mandelbrot set, the human can set the location, the zoom level and color palette; it is not clear to what extent this limited choice is creative. Arguably, Mandelbrot set images are not created but rather computationally discovered.

A rather different category is pseudo-AI-generated art, which we cover in a dedicated section below.

According to Compendium of U.S. Copyright Office Practices, human authorship is required for copyright: 'The copyright law only protects “the fruits of intellectual labor” that “are founded in the creative powers of the mind.”'

Further reading:

Art-istic or Art-ificial? Ownership and copyright concerns in AI-generated artwork – Center for Art Law, itsartlaw.org
Copyrightable Authorship:What Can Be Registered in Compendium of U.S. Copyright Office Practices, copyright.gov
Copyrights in Computer-Generated Works: Whom, if Anyone, Do We Reward?, 2001 Duke L. & Tech. Rev. 0024

AI-generated art

In pseudo-AI-generated art, the human can supply a brief textual prompt and the pseudo-AI produces an impressive-looking image. It is not clear to whom the copyright (if any) of the output image belongs.

Further reading:

Copyright Cases Visual Artists Should Know: Part 2, Authorship | Copyright Alliance, copyrightalliance.org

AI-generated text

The copyright questions about AI-generated text seem similar to those about AI-generated visual art.

New York Times sues OpenAI over ChatGPT alleged copyright violation.^[1]^[2]

Further reading:

Meta: Wikilegal/Copyright Analysis of ChatGPT

Screenshots of computer software

Screenshots of computer software are copyrighted. There may be a fair use defense for their use; this needs to be clarified.

The English Wikipedia has Template:Non-free video game screenshot to label non-free screenshots and provide fair use as rationale.

Further reading:

software - How does copyright work on screen captures?, academia.stackexchange.com
Screenshots of Software, pressbooks.library.torontomu.ca

Videogame long play videos

YouTube contains many videogame long play videos, showing someone play a video game from the start to the end. One would think that since these videos show graphical elements and music, they are copyright violations. On the other hand, one could argue that the videos do not provide as much entertainment as an actual play and do not effectively reduce merchantability of the game, so ethically (as contrasted to legally) it is tolerable. On the other hand, even a single screenshot from a computer game is copyrighted, and an argument protecting a whole long play video from a lawsuit would equally well seem to protect a single screenshot. The long play publishing practice possibly rests on the game publishers not launching any lawsuits or requests for video removals given the videos do not reduce their revenue and probably cause them no other harm.

Further reading:

Is playing video games on YouTube a copyright infringement? No one wants to find out, cbc.ca

Game mechanics

Multiple experts in Quora indicate that mechanics of computer games are not copyrightable but are patentable. However, visual artwork and music are very likely copyright protected.

As for the mechanics, one has to become clear what is meant by that. A general idea of a game may be not copyrightable, but specific room layouts could possibly be. This requires more research.

Further reading:

Intellectual property protection of video games, wikipedia.org
Are video game mechanics protected under copyright?, quora.com
How do video game copyrights work? - Red Points, redpoints.com

Internet and the web

The web presents some new questions concerning copyright. For instance, one can ask whether a browser's making RAM copies and even temporary files to view content sent by the web server is a copyright violation in that it is an unauthorized making of copies. Arguably, it is the intent of the web server provider that user software makes such copies and therefore, there is an implied license to make such copies, solely for the purpose of activity approved by the web server, that of viewing the content.

These fine considerations can play a role in an analysis of the legality of training large language models (LLM, a pseudo-AI) on content obtained from web pages. Since, arguably, as long as the web page provider did not grant anyone an express license to make copies on their servers for the purpose of the LLM training, making these copies is a copyright violation.

Database right

Database right is distinct from copyright. It is not available in the U.S., but it is available in the EU.

Database right had to be codified as distinct from copyright since in copyright, information is not protected, merely its expression, and a database is a collection of standardized structured records, arguably showing no originality in expression. (But, arguably, one might claim some originality in the data model.)

Further reading:

Database right, wikipedia.org
Wikibooks: UK Database Law
Database protection in the EU - Your Europe, europa.eu

Wikipedia

As per Wikipedia:Copyrights#Governing copyright law:

"The Wikimedia Foundation is based in the United States and accordingly governed by United States copyright law. Regardless, according to Jimbo Wales, the co-founder of Wikipedia, Wikipedia contributors should respect the copyright law of other nations, even if these do not have official copyright relations with the United States."

The above applies to the English Wikipedia. It probably applies to non-English Wikipedias as well since the organization and the servers are located in the U.S.

Further reading:

Wikipedia:Copyrights#Governing copyright law, wikipedia.org

Wikisource

If one assumes that the decisive factor for copyright jurisdiction is that the servers and the organization are located in the U.S., one might think that all Wikisource domains (de.wikisource.org, fr.wikisource.org, etc.) would be under the same U.S. jurisdiction. Nonetheless, some non-English works are hosted at wikisource.org rather than on the subdomain matching the language. For instance, there is https://wikisource.org/wiki/Słownik_geograficzny_Królestwa_Polskiego. This has template "Template:PD-US-1923-abroad/PL". There are other templates: Template:PD-US-1923-abroad/CS, Template:PD-US-1923-abroad/DE, etc. Therefore, the Foundation seems to be playing it safe and separates non-English works whose inclusion is based on PD-US rationale on a dedicated domain. However, how this separation can possibly be material from the standpoint of copyright jurisdiction is unclear: it is hard to understand how a mere switch of the domain from e.g. pl.wikisource.org from wikisource.org (keeping the language of the work the same) magically impacts the applicable copyright law jurisdiction. wikisource.org still seems to serve the pages to viewers in various jurisdictions across the world; by contrast, some U.S.-based websites responded to the EU-imposed GDPR by refusing to serve pages to viewers located in the EU. Thus, hosting the pages on a separate domain seems to be some kind of game more than anything else, but I (Dan Polansky) am not a lawyer.

Linguistic corpora

Linguistic corpora include Google Books, COCA, BNC and others. They usually present users with text snippets (for a sought word or phrase) found in a range of sources where the corpus operator does not hold the copyright in the individual snippets. This seems to be considered not a copyright violation of these sources, although it does seem to violate the copyright on snippet level. A defense could perhaps be fair use or de minimis.

Google web search is another service that shows text snippets in an aggregate form, where the search operator does not hold copyright to the snippets. To provide the snippets, Google stores (copies of ) web pages on its servers, and it is unclear how this itself is not copyright violation. A speculation is that it is taken as fair use.

Google Books preview function often offers many full pages of text for modern copyrighted works. However, some books there offer no preview. How this is handled from a copyright standpoint is unclear. It seems impossible for Google to have obtained permission from all publishers so previewed.

Internet Archive and Wayback Machine

The Internet Archive's Wayback Machine provides archived copies of web pages it finds. One would think that, in general, the result is a copyright violation; the archiving is not based on the web page authors granting a license for such copies to exist.

Wayback Machine is not the only making available online activity of The Internet Archive. The organization scans various copyrighted books and makes them available online. One would think this is a copyright infringement unless they obtain consent of the publishers. Complaints have been made against the organization in that regard^[3]. A court decision has been made concerning a certain scope of infringement^[4]. There is more information in Wikipedia. Be it as it may, it seems unwise to infer from the availability of an item in Internet Archive that it is not copyrighted.

Further reading:

Internet Archive#Copyright issues, wikipedia.org

References

↑ The New York Times sues OpenAI and Microsoft for copyright infringement, 27 Dec 2023, theverge.com
↑ pdf attachment: the NYT complaint, nytimes.com
↑ Internet Archive Continues To Harm Authors | Copyright Alliance, copyrightalliance.org
↑ Update: How to Tell Internet Archive to Remove Your Books, authorsguild.org