A “Living” AI: How ChatGPT Raises Novel Data Privacy Issues

Alexa Johnson-Gomez, MJLST Staffer

At the end of 2022, ChatGPT arrived on the scene with tremendous buzz and discourse to follow. “Is the college essay dead?”[1]“Can AI write my law school exams for me?”[2] “Will AI like ChatGPT take my job?”[3] While the public has been grappling with the implications of this new technology, an area that has been a bit less buzzy is how this massive boom in AI technology inextricably involves data privacy.

ChatGPT is a machine learning model that constantly evolves through a process of collecting and training on new data.[4] In teaching AI to generate text with a natural language style, computer scientists engage in “pre-generative training” involving feeding AI huge swaths of unlabeled text followed by repeated rounds of “fine-tuning.”[5] Since its public launch, that process has only grown in scale; the chatbot continues to utilize its interactions with users to fine-tune itself. This author asked ChatGPT itself how its machine learning implements user data, and it described itself as a “living” AI—one that is constantly growing with new user input. While such a statement might evoke dystopian sci-fi themes, perhaps much more unsettling is the concept that this AI is indiscriminately sucking in user data like a black hole.

In an era where “I didn’t read the privacy policy” is the default attitude, understanding what an AI might be able to glean from user data seems far beyond the purview of the general public. Yet this collection of user data is more salient than ever. Sure, one might worry about Meta targeting its advertisements based on user data or Google recommending restaurants based on their GPS data. In comparison, the way that our data is being used by ChatGPT is in a league of its own. User data is being iterated upon, and most importantly, is dispositive in how ChatGPT learns about us and our current moment in human culture.[6] User data is creating ChatGPT; it is ChatGPT.

In other words, the general public may not have full awareness of what kind of privacy protections—or lack thereof—are in place in the United States. In brief, we tend to favor free expression over the protection of individual privacy. The privacy act that regulates information sent over the Internet is the Electronic Communications Privacy Act, 18 U.S.C. §§ 2510–2523. Enacted in 1986, the bulk of ECPA predates the modern internet. As a result, any amendments have been meager changes that do not keep up with technological advancement. A majority of ECPA touches things like interceptions of communication with, for example, wiretapping or government access to electronic communications via warrants. “Electronic Communications” may be a concept that includes the Internet, yet the Internet is far too amorphous to be regulated by this outdated Act, and AI tools existing on the Internet are several technological steps away from its scope.

In contrast, the European Union regulates online data with the General Data Protection Regulation (GDPR), which governs the collection, use, and storage of personal data of people in the EU. The GDPR applies to all companies whose services reach individuals within the EU, regardless of where the company is based, and non-compliance can result in significant fines and legal penalties. It is considered to be one of the most comprehensive privacy regulations in the world. Since ChatGPT is accessible by those in the EU, interesting questions are raised about how the use and collection of data is the base function of this AI. Does the GDPR even allow for the use of ChatGPT, considering how user data is being constantly used to evolve the technology?[7] The collection and use of European citizens’ data is a violation of the GDPR, but the definition of “use” as it pertains to ChatGPT is not clear. The use of data in ChatGPT’s fine-tuning process could arguably be a violation of the GDPR.

While a bit of a unique use-case, a particularly troubling example raised by a recent Forbes article is a lawyer using ChatGPT to generate a contract, and inputting confidential information in the chatbot in the process.[8] That information is stored by ChatGPT, and would potentially violate ABA rules. As ChatGPT brews even more public fervor, professionals are likely to try to use the tool to make their work more efficient or thorough. But individuals should think long and hard about what kind of information they are inputting into the tool, especially if confidential or personally-identifying information is at play.

The privacy policy of OpenAI, the company responsible for ChatGPT, governs ChatGPT’s data practices. OpenAI stipulates collecting information including contact info (name, email, etc), profiles, technical info (IP, browser, device), and interactions with ChatGPT. OpenAI “may” share data with third parties that perform services for the company (e.g., website hosting, conducting research, customer service), affiliates and subsidiaries of the company, the government & law enforcement, “or other third parties as required by law.” OpenAI explicitly claims to comply with the GDPR and other privacy laws like the California Consumer Privacy Act (CCPA), in that transparency is a priority, and users can access and delete data upon request. However, compliance with the GDPR and CCPA must be in name only, as these regulations did not even contemplate what it means for user data to form the foundation of a machine learning model.

In conclusion, the rapid growth of AI technology presents important data privacy issues that must be addressed by lawmakers, policy experts, and the public alike. The development and use of AI arguably should be guided by regulations that balance innovation with privacy concerns. Yet public education is perhaps the most vital element of all, as regulation of this sort of technology is likely to take a long time in the U.S., if ever. If users of ChatGPT can be cognizant of what they are inputting into the tool, and stay informed about what kind of obligation OpenAI has to its users’ privacy, then perhaps privacy can be somewhat protected.

Notes

[1] Stephen Marche, The College Essay is Dead, The Atlantic (Dec. 6, 2022), https://www.theatlantic.com/technology/archive/2022/12/chatgpt-ai-writing-college-student-essays/672371/.

[2] Jonathan H. Choi et al., ChatGPT Goes to Law School (2023).

[3] Megan Cerullo, AI ChatgPT Is Helping CEOs Think. Will It Also Take Your Job?, CBS News (Jan. 24, 2023), https://www.cbsnews.com/news/chatgpt-chatbot-artificial-intelligence-job-replacement/.

[4] Richie Koch, ChatGPT, AI, and the Future of Privacy, Proton (Jan. 27, 2023), https://proton.me/blog/privacy-and-chatgpt.

[5] Alec Radford & Karthik Narasimhan, Improving Language Understanding by Generative Pre-Training (2018).

[6] Lance Eliot, Some Insist That Generative AI ChatGPT Is a Mirror Into the Soul of Humanity, Vexing AI Ethics and AI Law, Forbes (Jan. 29, 2023), https://www.forbes.com/sites/lanceeliot/2023/01/29/some-insist-that-generative-ai-chatgpt-is-a-mirror-into-the-soul-of-humanity-vexing-ai-ethics-and-ai-law/?sh=1f2940bd12db.

[7] Kevin Poireault, #DataPrivacyWeek: Addressing ChatGPT’s Shortfalls in Data Protection Law Compliance, Info Security Magazine (Jan. 28, 2022), https://www.infosecurity-magazine.com/news-features/chatgpt-shortfalls-data-protection/.

[8] Lance Eliot, Generative AI ChatGPT Can Disturbingly Gobble Up Your Private and Confidential Data, Forewarns AI Ethics and AI Law, Forbes (Jan. 27, 2023),  https://www.forbes.com/sites/lanceeliot/2023/01/27/generative-ai-chatgpt-can-disturbingly-gobble-up-your-private-and-confidential-data-forewarns-ai-ethics-and-ai-law/?sh=9f856a47fdb1.