Copyright Throughout a Creative AI Pipeline

By Sancho McCann · , edited:

This is an im­age I cre­at­ed us­ing the AI tool, Deep Dream Generator. I used it to ap­ply the style of Van Gogh’s A Wheatfield with Cypresses to a pho­to that I took of a bay at Cortes Island. That web­site does not claim own­er­ship of the AI out­put.

My ar­ti­cle, “Copyright Throughout a Creative AI Pipeline,” was just pub­lished by the Canadian Journal of Law & Technology. It is avail­able open-ac­cess here.

This work is in­creas­ing­ly rel­e­vant as AI tools such as Dall-E, Stable Diffusion, ChatGPT (and oth­er large-lan­guage mod­els—LLMs) are pro­duc­ing ar­guably nov­el out­puts. And the ques­tion of who owns the copy­right to the mod­el weights or pa­ra­me­ters has be­come rel­e­vant giv­en the leak of the mod­el pa­ra­me­ters be­hind one in­stance of Facebook’s LLaMa (Large Language Model Meta AI). One Twitter user asks, “Is re­dis­trib­ut­ing the LLaMa weights [] even le­gal? Can copy­right cov­er a big ta­ble of ma­chine gen­er­at­ed num­bers?” I hope this pa­per pro­vides a start­ing point for think­ing about these prob­lems.

Abstract

Consider the fol­low­ing fact pat­tern.

Alex paints some orig­i­nal works on can­vas and posts pho­tos of them on­line. Becca down­loads those im­ages and uses them to train an AI (train­ing configures the AI’s mod­el pa­ra­me­ters to use­ful val­ues). Becca posts the re­sult­ing trained pa­ra­me­ter val­ues on her web­site un­der a li­cense that re­serves to Becca the right to use the pa­ra­me­ters com­mer­cial­ly. Cory uses those pa­ra­me­ter val­ues in a pro­gram that is de­signed to pro­duce art­work. Cory clicks cre­ate and the pro­gram pro­duces a work. This work is new to Cory, but it looks a lot like one of Alex’s orig­i­nal can­vas im­ages. Cory sells the work. Advise Cory about their po­ten­tial copy­right li­a­bil­i­ty to Alex (for the sub­stan­tial­ly sim­i­lar work that the pro­gram pro­duced and that Cory sub­se­quent­ly sold) and to Becca (for tak­ing Becca’s pa­ra­me­ters and us­ing them com­mer­cial­ly, con­trary to the li­cense).

Cory clicks cre­ate again. The pro­gram pro­duces an­oth­er work, this time quite different from any of Alex’s orig­i­nal paint­ings. Cory shares new work on Instagram. Danny copies this im­age from Cory’s Instagram feed and sells a bunch of post­cards that fea­ture that im­age. Advise Danny about their copy­right li­a­bil­i­ty to Cory.

These sce­nar­ios are not as con­trived as they might ini­tial­ly seem. People fre­quent­ly use copy­right­ed works when train­ing an AI (more pre­cise­ly: when train­ing an AI’s pa­ra­me­ters). The re­sult­ing trained pa­ra­me­ters are be­ing shared un­der li­cences that as­sume the pa­ra­me­ters are the sub­ject of copy­right. People do use these pa­ra­me­ters in pro­grams that can pro­duce nov­el con­tent. The re­sult­ing work can be quite sur­pris­ing to the end-user and there are gen­er­al­ly no checks in place to en­sure that the new works do not take too di­rect­ly from the orig­i­nal train­ing data. However, many of the new works will be quite different from any con­tent al­ready in the world. And the end-users of the cre­ative pro­gram of­ten claim copy­right own­er­ship over the re­sult­ing nov­el work.

I will first present the train­ing and use of a cre­ative pro­gram based on a neur­al net­work, a pop­u­lar mod­el that forms the ba­sis of state-of-the-art cre­ative AIs. Then, I will ex­am­ine each of the is­sues just raised:

1. Does the per­son man­ag­ing the au­to­mat­ic train­ing of a neur­al net­work’s pa­ra­me­ters ob­tain a copy­right in the re­sult­ing trained pa­ra­me­ters?

2. Does a per­son us­ing a pro­gram that pro­duces artis­tic out­put ob­tain a copy­right in that out­put?

3. The au­to­mat­ic train­ing of a neur­al net­work re­quires large amounts of ex­am­ple data (a train­ing set). Can im­ages from around the in­ter­net be copied for the pur­pose of train­ing a neur­al net­work?

4. What if a per­son uses an AI to pro­duce a work that looks sub­stan­tial­ly sim­i­lar to one of the train­ing ex­am­ples? Is that an in­fringe­ment? And who is in­fring­ing?

Today’s state-of-the-art “cre­ative” AI tools are based on a tech­nol­o­gy (neur­al net­works) that serve to sep­a­rate the pro­gram­mer and train­er from any of the even­tu­al ex­pres­sion, even the ex­pres­sion stored in the au­to­mat­i­cal­ly-learned net­work pa­ra­me­ters. It would be very rare that a pro­gram­mer or train­er might ob­tain copy­right in the out­put from an au­to­mat­i­cal­ly trained “cre­ative” AI. However, there are a mul­ti­tude of ways to use such an AI to pro­duce out­put, many of which would very well jus­ti­fy award­ing copy­right to the end-user, es­pe­cial­ly when they use the AI as an elab­o­rate brush with which to bring their own ideas to life in ex­pres­sion.

The cur­rent meth­ods of train­ing these cre­ative AI tools re­quires large amounts of train­ing data: ex­ist­ing works of­ten pro­tect­ed by copy­right. It is un­clear whether Canada’s fair deal­ing user right al­lows for such copy­ing for the pur­pose of train­ing a neur­al net­work, par­tic­u­lar­ly when not for pri­vate pur­pos­es. When a fair deal­ing user right is not avail­able, this copy­ing would be copy­right in­fringe­ment: unau­tho­rized re­pro­duc­tion of ex­ist­ing works. Canada should clar­i­fy or ex­pand the fair deal­ing user right to al­low for such copy­ing.

Trainers must be care­ful that they have not sim­ply em­bed­ded a rep­re­sen­ta­tion of the train­ing ex­am­ples in the AI. If the AI effectively con­tains “di­rect reflections” of the train­ing data such that it reg­u­lar­ly re­pro­duces them, dis­trib­ut­ing such an AI would be copy­right in­fringe­ment. The train­er has a bur­den to ver­i­fy that they are not dis­trib­ut­ing copies of the train­ing data.

This analy­sis al­lo­cates copy­right in a man­ner con­sis­tent with a prag­mat­ic con­cep­tion of cre­ativ­i­ty and art. It keeps the fo­cus on hu­man ex­pres­sion and al­lows for free dis­tri­bu­tion of the ma­te­r­i­al need­ed for more peo­ple to have more prac­tice with cre­ative tools while pre­serv­ing pro­tec­tion for orig­i­nal ex­pres­sion.

Acknowledgements

I would like to thank Professor Jon Festinger, Q.C., for many help­ful dis­cus­sions while su­per­vis­ing this work and Professor Graham Reynolds for valu­able feed­back on an ear­li­er draft.