ChatGPT is a conversational artificial intelligence (AI) service based on Generative Pre-trained Transformer (GPT)1, a language model grounded in artificial neural networks, developed by OpenAI. Launched in 2022, ChatGPT is trained with voluminous text-based data sets of garden-variety nature. Its strength lies in its ability to comprehend language patterns and structures with precision, which led to its sensational popularity around the world. Many companies have announced plans to introduce ChatGPT to their work flows. Even within the legal industry, there is a discussion of utilizing ChatGPT in tasks requiring human thinking, such as drafting briefs, writing patent specifications and searching prior art, well beyond drafting simple template contracts.
Presently, however, laws or regulations governing AI are not clearly established, thus raising various legal issues in the contexts of ChatGPT. Especially, on intellectual property, vigorous discussions are underway to address multiple issues, including data mining of ChatGPT, the legal question of copyrightability of ChatGPT-generated answers and necessary precautions in utilizing such answers.
This newsletter will examine IP-related issues pertaining to ChatGPT and the current status of legislative enactments, including proposed amendments, concerning AI in Korea as well as their potential implications.
1. Legal Issues Associated with Data Mining of AI Models
A. Korean Courts’ Position on Web Scraping and Crawling
Training AI models such as ChatGPT requires collection of text data on a massive scale. During the course of training AI models, training data are collected using various methods, including web scraping or crawling whereby text data are automatically extracted from websites. However, random collection of online content by way of web scraping or crawling may trigger liability under the Korean law, such as the Copyright Act and the Unfair Competition Prevention Act (UCPA). In representative cases involving the issue of web scraping or crawling, Korean courts have recognized infringement of database rights or unauthorized use of others’ achievements under Article 2(1)(m) of the UCPA.
① Dispute on Recruiting Information - JobKorea v. Saramin HR
JobKorea, Korea’s leading job search platform service provider, brought a lawsuit against its competitor Saramin HR, which crawled, without permission, recruiting information on JobKorea’s webpage. The recruiting information displayed on JobKorea’s site was classified using certain categories (job types, business types, regions and company types), with further classification based on details such as careers, majors, foreign language skills, preferential attributes, certificates of qualification and advanced degrees. The court held that Saramin HR’s repeated and systematic copying and display of the crawled information for its own business use constituted infringement of JobKorea’s database rights and therefore constituted a violation under the Copyright Act (infringement of the database rights).
② Dispute on Wiki Information – Rigveda Wiki v. Enha Wiki Mirror
Enha Wiki Mirror, a Korean wiki, set up and operated a complete mirror site of Rigveda Wiki, crawling and using content from the latter. The court found, on the merits, that Enha Wiki Mirror infringed the reproduction and transmission rights of Rigveda Wiki as the database author. The court also held, in a preliminary injunction action, that Enha Wiki Mirror’s conduct amounted to unauthorized use of others’ achievements.
③ Dispute on Accommodation Information – Yanolja v. GC Company
A crawling dispute arose between Yanolja and GC Company, Korea’s rivalling online travel platform service providers. Their accommodation information included various items, such as unique number identifiers, names, accommodation types (i.e., motel, hotel, guest house or pension), accommodation addresses and rates for individual rooms (both normal and discounted). The information was displayed on the website serviced by Yanolja, and was crawled and used by GC Company. In the civil lawsuit, the court found that GC Company’s operational strategy of copying Yanolja’s database through an internally-developed crawling program constituted unauthorized use of other’s achievements prohibited under the UCPA.
These cases show that infringement of database rights under the Copyright Act or unauthorized use of others’ achievements under the UCPA may be recognized if it can be shown that ChatGPT repeatedly and systematically copied a substantial amount of other companies’ databases using web scraping or crawling in the course of collecting training data. Also, further caution is needed in that, recently, many websites are blocking access of robots under the robots exclusion standard (robots.txt). Thus, a company planning to develop a ChatGPT-type AI model should ensure that training data are lawfully collected.
B. The Proposed Amendment to the Copyright Act
On January 15, 2021, an amendment to the Copyright Act was proposed to address legal uncertainties faced by AI business operators and to support the growth of the AI industry. The proposed amendment aims to provide guidance and increased predictability as to what activities would not be subject to copyright infringement, by expressly setting forth the grounds on which IP rights may be limited in the contexts of using copyrighted works during the course of analyzing AI and big data.
Specifically, the relevant language provides that analyzing information “for the purpose of creating additional information or value through analysis of a massive amount of information using computer-aided, automated analytical technology” does not trigger copyright infringement liabilities if (i) it does not use thoughts or emotions expressed in the copyrighted works, (ii) the copyrighted works are copied and transmitted only to the extent necessary and (iii) such copyrighted works can be lawfully accessed. It states that such limitation applies equally to the database rights.
This proposed amendment is meaningful in that, if passed in the National Assembly and comes into effect, it will legislatively reduce, at least to a certain extent, the risk of Copyright Act violation that may arise during AI training.
2. Legal Issues Related to ChatGPT-Generated Answers
A. Copyrightability of ChatGPT-Generated Answers
Since ChatGPT is an AI language model, it is capable of creating text-based content, such as song lyrics, novels, poems, SNS posts and catchphrases for advertisements. However, there may be an issue as to whether ChatGPT-generated answers can be a copyrighted work protected under the Copyright Act.
In most countries, including Korea, outputs generated by AI are not recognized as copyrighted works. However, this conclusion appears to have resulted from the stance that AI does not fall within the meaning of the ‘human author,’ rather than because of the issue of ‘creativity.’ In fact, in Korea, the Korea Music Copyright Association ceased payment of copyright use fees for a song that was created by AI on the ground that there was no legal basis for continuing to make the payments. Further, it announced through a public notice that “AI is not an (human) author and is ineligible to apply for copyright registration, and AI-created works are not the subject matter of registration since they are not copyrighted works in the first place.”
B. Precautions when Using ChatGPT-Generated Answers
ChatGPT is trained with various text information, and the answers it generates are prepared based on the information it has been fed for training. Given this mechanism, ChatGPT-generated answers, which are outputted in line with the user’s directions, may turn out to resemble literary works that receive copyright protection. In such scenario, there is a risk of violating the Copyright Act if a ChatGPT-generated answer that is substantially similar to an existing copyrighted work is used by means of reproduction, distribution, etc. Accordingly, when reproducing, distributing or engaging in commercial use of ChatGPT-generated answers, it is necessary to thoroughly review whether there is substantial similarity between the ChatGPT-generated answers and existing copyrighted works.
3. Legislative Status of the Framework Act on AI
The legislation sub-committee of the Committee on Science, ICT, Broadcasting and Communication within the National Assembly passed the “Bill on Fostering AI Industry and Creating Foundations for Trust” (the AI Framework Bill) during its session on February 14, 2023, which has attracted huge attention. The AI Framework Bill consolidates seven separate bills on AI, and, if ultimately enacted as a law, will be the first-ever framework act on AI in Korea. In particular, it will serve as the starting point for establishing and/or amending specific, detailed laws and regulations governing AI and is expected to generate further momentum for subsequent legislative activities.
The AI Framework Bill, for the most part, defines the roles and responsibilities of the government in supporting the growth of the AI industry. Of note, business operators that utilize AI in high-risk areas (e.g., healthcare, public health and criminal investigation) are imposed with a set of obligations, including the obligation to provide advance notice and to maintain measures for securing reliability and safety. In this regard, related business operators who are already using AI or planning to use one are advised to exercise extra caution.
4. Implications
With the latest ChatGPT craze, discussions on how to introduce and utilize ChatGPT have gained significant momentum among various actors, whether they be the government, public organizations, businesses or academic institutions. In parallel, vigorous discussions are underway to lay the legal foundation that would enable people to freely use ChatGPT while simultaneously protect the rights of the related right holders.
It will take some time until the related laws and regulations are in place and further clarity on courts’ position on the same is gained. However, as AI technologies, including ChatGPT, have become hugely sophisticated over time and capable of generating high-quality content, having a good understanding of IP issues that may arise in the course of using AI technology as well as their implications will be crucial not only to industry players but also to users. It is therefore necessary to keep a watchful eye on legislative developments and trends, and court decisions that would be issued in the years to follow.
1GPT-4, the 4th generation of the GPT series, was launched on March 14, 2023.
If you have any questions regarding this article, please contact below:
Un Ho Kim (unho.kim@leeko.com)
John Kim (john.kim@leeko.com)
Julie Shin (julie.shin@leeko.com)
Junghwan (Justin) Maeng (justin.maeng@leeko.com)
Jaewoo Kwak (jaewoo.kwak@leeko.com)
For more information, please visit our website: www.leeko.com