Should Wikiversity allow editors to post content generated by LLMs?

From Wikiversity
Jump to navigation Jump to search

This debate is about whether contributing content generated by generative artificial intelligence (genAI) large language models (LLMs), such as ChatGPT, Google Gemini, Claude AI, Llama, and Mixtral, should be allowed on Wikiversity. Some items for consideration:

  • ease of text production
  • copyright
  • accuracy
  • learning by doing
  • overwhelming proofcheckers by volume

Wikiversity should allow editors to post LLM-generated content

[edit | edit source]
  • Pro Contribution of LLM-generated material which is consistent with the mission of Wikiversity should be permitted. There is nothing in this project's mission to indicate that it is only for human-generated content.
  • Pro LLMs allow editors to produce and contribute material that they would not otherwise be able to by themselves or not so quickly.
    • Objection Easiness is in opposition to diligence, in this case. Allowance of LLM content will encourage laziness.
      • Objection On the other hand, it may take a lot of effort (e.g., multiple prompts and iterations, use of different LLMs etc.) to produce useful content. There is a naive assumption being implied here that LLM-generated content is "easy" and therefore somehow "bad" or unwanted. An alternative perspective is that the easiness-difficulty dimension could be orthogonal to the quality dimension.
    • Objection A simple copy-paste input must be viewed as plagiarism.
      • Objection There are currently no legal rulings that support the argument that LLM-generated content is copyrighted. The prompter owns the copyright and, as such, can choose to contribute that material under a license that is accepted by Wikiversity.
  • Pro Editors who use LLMs to produce material should use best practices in interacting with LLMs, and contribute valid content to Wikiversity.
  • Pro Editors should be able to post LLM content under the provision that the output is clearly marked as originating from a specific LLM. Then the reader can know to take into account an increased risk of mistakes.
    • Objection LLM content should not be allowed because initial LLM responses are often only starting points that may not be worthwhile.
      • Objection Wikis are iterative by nature. It is acceptable to start in a basic way. LLM-generated content can provide useful starting points that can then be improved upon.
    • Objection LLM content should not be allowed because specific responses of an AI may change with time, or at other uncontrolled conditions.[explain?].
      • Objection Human responses may also change over time or across different conditions. But this doesn't mean that humans should be disallowed from editing.
  • Pro Until the arguments made in opposition in to the failed LLM policy on Wikipedia are addressed, a complete ban (rather than nuanced/selective use of LLMs and other generative AI) is draconian/overreactive.
    • Objection Clear and laconic policy like "complete ban" of AI-generated content may increase the creditability of the platform among the wide audience. Tangled policy may vice-versa disrupt the trust.
      • Objection Trust should be based on the actual quality of resources rather than hypothetical reasoning.
  • Con LLM output can be false, inaccurate, misleading, useless, or hallucinated[1][2]. As such, its use beyond a mere writing aid should be prohibited on Wikiversity.
    • Objection Humans should fact-check all material before contributing.
    • Objection Most of what is outputted by frontier LLMs is useful and correct.
      • Objection That is false and depends on what these are asked. They often only seem correct and useful but are actually partly misleading or inaccurate.
        • Objection LLMs and humans are similarly imperfect, although in different ways; thus, combining them often leads to better results than using either one alone (e.g., [1]).
  • Con Unless the editor is a subject matter expert, it will be often hard for the editor to detect mistakes, as pointed out by Stack Overflow.[3]
    • Objection LLM-generated content should be added by subject matter experts and reviewed by the Wikiversity community.
  • Con LLMs, having "read" much more material than most humans, seems especially adept at feigning competence that it actually lacks.
    • Objection It is not necessary for an LLM to know what it is doing or to be correct all the time. The only thing that matters is whether it can create some sequences of text that are useful to Wikiversity.
    • Objection Humans should fact-check all material before contributing.
  • Con The copyright status of LLM output is not settled: there is an ongoing lawsuit. See Is the output of ChatGPT copyrighted?
    • Objection LLM-generated text is not copyrighted. People are also allowed to learn from copyrighted texts and then write similar things, include using minor parts of them, or write things that are heavily inspired by these texts.
      • Objection This is not that universal. There are GPTs trained on exclusively proprietary material, and their output obviously inherits the copyright. Like the wikiHow's AI.
        • Objection There are currently no legal rulings that support the argument that LLM-generated content is copyrighted. The prompter owns the copyright and, as such, can choose to contribute that material under a license that is accepted by Wikiversity.
    • Objection This could also be interpreted as an argument for allowing such content (i.e., until such time that LLM content is determined to be copyrighted, it is free to use)
  • Con An editor learns something by trying to research a topic and formulate text on it, even if the editor makes mistakes or the writing has poor style. By contrast, by submitting a query to LLMs, the editor may learn very little: there is no longer learning by doing.
    • Objection Irrelevant. The debate topic is not about whether an editor is learning. The debate topic is whether LLM-generated content should be allowed on Wikiversity.
    • Objection Using an LLM does not preclude learning. Many people learn by prompting LLMs, reading the responses, and iteratively entering further prompts.
    • Objection Learning can occur by reading text that one hasn't authored. For example, many people learn from reading a book, even though they weren't actively involved in writing the book.
    • Objection Wikiversity does not provide room for learning by doing in the physical world, but its does allow for the manipulation of digital representations. This has implications for learning across different fields, such as chemistry (physical) vs journalism (digital) or anatomy (physical) vs programming (digital).
      • Objection Irrelevant. The debate topic is not about learning. The debate topic is whether LLM-generated content should be allowed on Wikiversity.
  • Con Editors using LLMs can insert vast amounts of material into a Wikiversity page, at a rate so fast that humans will be unable to properly evaluate the quality of the project.
    • Objection There is not yet any evidence of this occurring.
    • Objection Editors can try to use a LLM for the content evaluation process[explain?].
  • Con Stack Overflow bans the use of generative AI, even for rewording purposes[3]. They are smart and competent, and their policy page contains ample discussion. Inconclusive yet suggestive.
    • Objection Wikiversity is an independent community with a different mission. The approach used by other websites is interesting and useful. However, an independent decision should be made for Wikiversity.
    • Objection Stack Overflow policy page[factual?] indicates they had thousands of answers produced by genAI. By contrast, Wikiversity does not have this scale of the problem.
      • Comment: It does not have it yet, but it may have it eventually. On the other hand, one may wait until the problem becomes acute.
    • Objection The Stack Overflow company decided not to ban genAI across their network (they operate other stack exchange sites than only Stack Overflow)[4]. Thus, one would think there is something specific about Stack Overflow and its domain of software development.

See also

[edit | edit source]

References

[edit | edit source]
  1. https://link.springer.com/article/10.1007/s10676-024-09775-5
  2. https://www.nature.com/articles/s41537-023-00379-4
  3. 3.0 3.1 https://meta.stackoverflow.com/questions/421831/temporary-policy-generative-ai-e-g-chatgpt-is-banned Policy: Generative AI (e.g., ChatGPT) is banned], meta.stackoverflow.com
  4. Ban ChatGPT network-wide, meta.stackexchange.com