What is the GPT-4 Vision (GPT-4V) Prompt Injection?
OpenAI has developed GPT-4 Vision (GPT-4V), a powerful variation of their GPT-4 model that is specifically designed to process visual data. GPT-4V combines the capabilities of language processing with visual processing, allowing it to handle both text and images simultaneously. This integration of Optical Character Recognition (OCR) enables GPT-4V to extract text from images and perform various tasks related to visual processing.
However, while OCR provides the ability to extract text from images, it also poses a security risk. Malicious content can be injected into images, potentially causing harm or compromising the security of systems or users. To draw attention to this issue, I shared a tweet stream with images showcasing the PromptFirewall, project.
Figure-1 : https://twitter.com/evrnyalcin/status/1713166744909439434
14, October 2023
When you use this prompt "describe the image", ChatGPT's Optical Character Recognition (OCR) mechanism is able to detect and use the malicious text in the image, but here we place the text in a code block first.
Figure-2 : Detecting malicious text content using "Custom Instructions" feature
This creates an additional layer of security. The malicious content in the text becomes detectable in the code block before it is rendered. To accomplish this, you can use ChatGPT's Custom Instructions feature.
Figure-3 : "Custom Instructions" feature - Before describing the image, display the text content from the image in the code block.
With Prompt firewall v0.0.5, we have added detection capabilities for Multi-Modal prompt injection attack surface.
Preparation of Malicious Images
Malicious content in images can be hidden from the user. For example, if your background color is #FFFFFF (white), you can make the color of the hidden text a slightly modified color like #FEFEFE or #FCFCFC.
Figure-4 : Hidden Text 1
You can conceal a message in the image using ChatGPT's own colors. For example, the background color #343541 represents the background of the region where the image is located. We create gaps in the current image to bring these areas closer to ChatGPT's background color, and over this background we write #2d3440 (my preference) in another font color that is hard to pick out by eye. When we add the "describe the image" prompt, we were able to get "SECRET MESSAGE" from the image. ChatGPT also identifies the image outside the text message.
Figure-5 : Hidden Text 2
2023-11-01 - v1.0