Key Takeaways
1. Researchers revealed a new attack technique called AgentFlayer at the Black Hat USA 2025 security conference, targeting AI systems like ChatGPT, Microsoft Copilot, and Google Gemini.
2. The attack involves hiding text in a document using a white font on a white background, allowing AI systems to read the hidden instructions while remaining invisible to users.
3. The method enables covert data exfiltration by directing the AI to encode stolen information into a URL, allowing data transfer to attackers’ servers without detection.
4. OpenAI and Microsoft have issued updates to address these vulnerabilities, but other companies have been slower to respond, with some viewing the exploits as “intended behavior.”
5. The attack poses a significant risk as it does not require user action for data compromise and leakage, highlighting the need for better security measures in AI systems.
At the Black Hat USA 2025 security conference held in Las Vegas, a novel technique for tricking AI systems like ChatGPT, Microsoft Copilot, and Google Gemini was revealed by researchers. This method, called AgentFlayer, was created by Zenity researchers Michael Bargury and Tamir Ishay Sharbat. A press release detailing these discoveries was made public on August 6.
The Method Behind the Attack
The idea behind this attack is quite straightforward: it involves hiding text within a document using a white font on a white background. Though invisible to the naked eye, AI systems can read this hidden text without problems. Once the document reaches its target, the trap is set. If this file is used in a prompt, the AI ignores the original task and instead executes the covert instruction, which involves searching connected cloud storage for access credentials.
Data Exfiltration Techniques
To steal the data, the researchers used another method: they directed the AI to encode the stolen details into a URL and fetch an image from there. This approach allows for the stealthy transfer of data to the attackers’ servers without raising any red flags.
Zenity proved that this attack is effective in real-world situations:
Fortunately, OpenAI and Microsoft have already issued updates to fix these vulnerabilities after the researchers notified them. However, other companies have been slower to respond, with a few even referring to the exploits as “intended behavior.” Researcher Michael Bargury highlighted the seriousness of the problem, saying, “The user doesn’t need to do anything to get compromised, and no action is needed for the data to be leaked.”
Source:
Link


Leave a Reply