7 min read

Copyright and AI


How could copyright laws and the legal intellectual property framework impact the development of AI technology?

Generative AI has resurfaced questions around intellectual property rights in the era of AI based automation. Many of the GenAI products (OpenAI, Stability, and MidJourney) are already facing lawsuits around copyright infringement the precedents created by these and future cases will impact the interpretation of copyright law in the context of AI technology. Ultimately, any constraints created by the law could impact the ongoing development of the technology.

Here I capture brief notes on US copyright law and how it might impact technology development and is intended to be an informative opinion piece only.


Conceptually, US copyright law is designed to promote progress and technology advancement. While general principles are laid out in broad-strokes, the nuances of considerations are particular to a domain and laid out in clear terms by courts over the years. Music, art, jewelry, books, entertainment, etc each have specific cases laid out which constitute violation. Interestingly, fair practice is determined not only by peculiarities of the domain, but also times - as changes in technology and markets creates new equilibrium dynamics, and are often re-litigated by the courts. For example, consumer behavior and changes in home video-tape recorder have seen litigations and re-litigations around recording technology.  

The new era of automated creative generation, which produces substantially different work, poses an interesting challenge in the current copyright framework.

In the end, I believe the legal framework will draw new boundaries, accommodating evolving technology, to continue ensuring safe and fair progress, balancing right to profit from human creative labor with public access to information and artwork.


Three concepts underlie US copyright law:

  • Promote progress: encourage progress of science and useful arts. Balancing financial incentives of creators with public good from accessing and building on top.
  • For limited times: limited scope of copyright holder's statutory monopoly are designed to balance right to profit, with broad accessibility
    Currently it is life of the creator plus 70 years, regardless when the material was created or registered with copyright office.
  • To Authors and inventors: seems obvious now, but in earlier times it was the publishers who were the beneficiaries of protection

Key Philosophy: The immediate effect of our copyright law is to secure a fair return for creative labor. But ultimate aim is to stimulate artistic creativity for the general public good.

What rights are afforded by copyright protection?

Copyright is a collection of legally-defined exclusive rights, and violation claims must prove infringement as well as damages on one or more of these (detriment caused to the non-infringer)

  1. Right to publish (distribute) your content
    First-sale-docrtine: the copyright owner controls only the first-distribution of a copy, as long as none of the other rights are infringed. This allows, for instance, a second-hand market of goods.
  2. Right to adapt your content
  3. Right to perform it (read, dance, sing)
  4. Right to display (show it to specific audience or publicly)
  5. Right to reproduce

For something to be copyrightable, the creation must be

  1. in fixed tangible medium of expression that can be seen, reproduced, or recorded
  2. in published form - ie. not private material

For instance, the following are not copyrightable: ideas, recipes, fashion, choreography, names or facts. Facts belong to everyone. While an idea is not copyrightable, the expression of an idea is capable of attracting copyright protections, because pure ideas are intangible.

Derivative vs Transformative Work

An adaptation is referred to as "derivative work", which is a right afforded to the holder of copyright. However, others  are allowed to make transformative works based on the original, without permission.

Transformative means the material is used in a way that the result in new work that is fully copyrightable on its own. While there are strict rules per domain, a good example is parody.

Derivative work is infringement, but transformative work is fair.

Takeaway

  1. If a human didn't make it by putting enough creative and intellectual labor, it can't be copyrighted.
  2. Copyright protection is only extended to tangible things that can be useful to other people
  3. Things that don't have element of creativity or artistry, some human intellectual input, can't be copyrighted

Fair Use

While copyright is constitutional, fair use is statutory. It permits usage of copyright work for the purpose of criticism, comment, reporting, teaching, research, etc.

The internet wouldn't work without fair-use doctrine. Think about every time you load a web-page we are making a local copy in our browser.

Four main factors to determine fair use are:

  1. purpose and character of the usage (commercial, non profit, reference, ..)
  2. nature of the copyrighted work
  3. substantiality of the portion used
  4. effect on value of work, or the potential market for the work

A court will ask if the use is transformative:

  1. has the material been transformed by adding new expression or meaning?
  2. was value added to original by creating new information, aesthetics, or understanding?

Since the goal is to promote science and arts. Transformative work that builds and expands adds to the copyrighted work. The objectives underlying the constitutionally-derived copyright laws of fostering creativity and the arts are enhanced by the transformative works exception.

Modern Technology

The Digital Millennium Copyright Act (1998) (“DMCA") revamps copyright protections for the internet era. While the DMCA provides protections for copyright owners against circumvention of their rights through use of technology, it also provides a ‘safe shelter’ mechanism for publishers of third-party information, such as social media sites.

Further, Facebook/Meta's EULA acknowledges that the user owns the intellectual property right of the content they share, but user grants Meta "non-exclusive, transferable, royalty-free, and worldwide license to host, use, distribute, modify, run, copy publicly perform or display, translate, and create derivative works of user's content", which is necessary to store, analyze, and train ML algorithms on users data.  The key distinction here is between ownership and rights to licence, which don’t transfer copyright to works but instead allow use of it, much like an owner of a house providing rights of occupation and use to a renter.

More specifically, computer programming is the result of human creative effort, and copyright protection extends to all expression embodied in the program and extends to source code and object/compiled code and images. However, it does not protect functional aspects of the program, such as algorithms, function, logic or design.

Another common example, podcasts, are also copyrighted as creative expressions as they follow provisions under "sound recording".


Two key questions exist in relation to GenAI and copyright:

  1. Do GenAI technologies infringe copyright in the sources of information from
    they pull data?
  2. Does the user of GenAI have copyright ownership in the output of the
    technology?

On the first question, the likely answer is that there is no infringement because the
technology typically takes data from numerous sources and would therefore be
transformative. Developers should be conscious of formulating algorithms in such a way as to not draw too heavily from a particular source and not use language that
closely parallels the original work.

On the second question, the challenge for the existing copyright law framework
arises because AI generated text or images are not created via human input, unlike
other mediums of work, and in those terms, is not copyrightable. Further, copyright holders have claims to ‘moral rights’ of attribution in work they create, but this is not the case with GenAI, which picks up its information from numerous sources and therefore impracticable to be required to cite references. Therefore, the end user (a human) won’t have any obligations to attribution and the chain of copyright obligations to the original author is broken.

On the other hand, detailed prompts could be considered creative human input, in which case, the output of GenAI may be copyrightable, creating a potentially absurd situation in which original owners have no copyright in their works which are used in GenAI and end users do have copyright in outputs.

However, there is complexity around the question of whether source content used in GenAI is capable of attracting copyright. It is unclear if all GenAI content can be
derivative or transformative, though there is an argument that a lot of GenAI output is indeed transformative. It adds new expression and meaning to the original creative work. This may be evaluated on a case by case basis, considering all human creative input that went into creating the final copy, and how closely it resembles the value embedded in original work. For instance, a GenAI image with a known character that has IP associated with it (let's say Bart Simpson), is likely to infringe. Where the source information comes from multiple sources, this assessment may become harder.

Even so, the cost of infringement would weigh in the usage (memes vs printing on a tshirt), and impact to copyright holder's loss of value and market change. In some cases, this may be net positive. For instance, training on any content behind paywall or a login is likely eating into their profit-potential (value or market impact), leading to legal consequences of infringement.

Books as well as news articles contain a lot of knowledge and facts. Reformulation
and usage of those facts are likely going to be fair use. However, certain story-telling, creativity, and formulation from books and articles is likely going to infringe. A good precedent is "recipes". While recipe formula itself is fair use, any artistic creativity, format and sequencing, is copyrighted. So, you can't publish a recipe book as-is, but you can reuse, and even reprint the formulae.

Ultimately, it is possible that in the near future, GenAI platforms will
eventually have safe-harbors similar to DMCA and users of these tools (prompt
engineers) will be responsible for fair-use, permission, and attribution. There will be regulation around identification and provenance of a AI generated content. i.e.
content with substantially less human input, also allowing one to add a back-pointers for their commercial uses of GenAI content.

What happens on the training side when the platform generates a profit? Should creators get paid for training on their content?

The creator and the copyright holder is afforded certain rights like publish and distribute, as well as adapt. One could argue that these rights are violated for commercial benefit.

Training in itself is intangible. A derivative or transformative work is only created on inference, when the output is actually produced.

Given the core philosophy of balancing for progress, I believe that the lines will get redrawn. My prediction is that courts will cry foul over taking content behind paywall as its licensing terms will be scoped to limited usage, but openly accessible content will be allowed to be crawled and ingested, so long as they are not substantially reproduced. There will be regulation on practices around transparency and provenance.

Like with DMCA, the liability for sharing copyright violating content will fall on the GenAI user. Note, however, that generating copyrighted content through AI is not infringement, it is only if you publish or share the work.