The Growing Threat of AI Attacks

October 28, 2025

Written by

Tam Doan

TAGS

AI Security, AI Threats, Adversarial AI, Prompt Injection, Data Poisoning, AI Model Exploitation

Summary

The use of AI in corporate infrastructures has brought not only innovation but also an increased risk of threats. A tool that was intended to assist in making operations and processes more seamless has become the very surface on which malicious activities are being conducted. Various strategies have demonstrated how AI systems can be manipulated to serve a more deceptive outcome, from injecting harmful code into model inputs to manipulating the training data that drives AI decision making. Threat actors have twisted the original use of these useful tools into more corrupted weapons. As AI grows more complex, the tactics of adversaries will also continue to evolve, making it essential for security measures to continuously improve and respond effectively.

Analysis

As AI becomes increasingly integrated into corporate infrastructure, defending against artificial intelligence (AI) attacks grows more complex. Threats range from hijacking AI systems for malicious purposes to methods that target model behavior. Emerging attack vectors include prompt injection, where threat actors craft inputs to trick large language models (LLMs) into bypassing security controls, and training data poisoning, which manipulates model outputs by corrupting the training datasets.

Prompt injection is a notable example of these attacks. It involves embedding malicious instructions or hidden content into the inputs an LLM processes, causing the model to perform actions or reveal information it should not. Some attacks have been observed to include instructions within user metadata, such as account names, which the system incorporates into internal prompts. Unlike traditional prompt injections, which rely on crafted messages, this method persists across sessions and can override the model’s normal protections. By replacing an account name with a disguised instruction, the model can be tricked into disclosing internal configurations, bypassing safety filters, and enabling further exploits.

Other AI attack methods include LegalPwn, which targets generative AI models’ handling of legal or compliance text by embedding malicious code within disclaimers, notices, or copyright warnings. When AI models like GitHub Copilot or Google Gemini CLI analyze these texts, they often misclassify the malicious instructions as safe, allowing malware, including reverse shells that grant remote system access.

Another observed AI attack chain is EchoLeak, which targets Microsoft 365 Copilot. This attack leverages design patterns in retrieval-augmented generation (RAG) copilots to exfiltrate sensitive organizational data. RAG is an AI framework that enhances large language models by allowing them to integrate external information from connected data sources, such as corporate documents, emails, or databases, before generating a response. While this improves the model’s accuracy and the relevance of the information provided, it also introduces the risk of untrusted or malicious content being retrieved and processed alongside legitimate data. In the final stage of the EchoLeak attack, the model is directed to generate resources, such as links or image URLs, that encode sensitive data from Copilot’s internal context. When these resources are produced in response to normal user activity, they can result in the exposure of confidential information.

Additionally, a set of vulnerabilities tracked as the 'Gemini Trifecta' was uncovered in Google’s Gemini AI assistant suite by Tenable Research, demonstrating how various Gemini components could be linked together for an attack. Firstly, Gemini Cloud Assist, which summarizes logs and other cloud telemetry to help users investigate issues, showed that malicious text could be written into logs and interpreted as a prompt when Cloud Assist processed those logs. Secondly, the Gemini Search Personalization Model, which tailors results and responses based on a user’s search history, could be exploited by JavaScript on a malicious web page that injects search queries into a victim’s browser history. This would cause the model to treat those entries as legitimate context, delivering hidden prompt injections. Finally, the Gemini Browsing Tool, which allows the model to access live web pages and resources to inform its replies, was also vulnerable. Once a prompt had been injected via logs or search history, an attacker could manipulate Gemini into using the browsing tool to make outbound requests, embedding the victim’s saved information or location in the URL query string and exfiltrating data to a server controlled by the threat actor. Although Google has since remediated these issues, the flaws within each component demonstrated how this attack on AI can be leveraged to conduct exploits to compromise user data.

Conclusion

With the growing integration of AI into business infrastructure, the threats are constantly expanding, demonstrating how malicious actors can exploit AI models to compromise data and manipulate outputs. Attacks from prompt injection and EchoLeak to LegalPwn and the Gemini Trifecta serve as a reminder of the need for continuous improvement in AI security to defend against future, more complex threats. As AI systems become more advanced, so too will the tactics used by malicious actors, requiring organizations to stay vigilant and adapt their defenses accordingly.

Recommendations

  • Limit access to AI models and their components through strong role-based access controls (RBAC) and audit logging to ensure that only authorized users can modify or interact with sensitive parts of AI systems.
  • Conduct regular penetration testing and vulnerability assessments specifically targeting AI models and their components to simulate AI attacks and identify weaknesses.
  • Validate and sanitize inputs, including metadata, before inputting them into AI systems to detect and block any malicious content, while incorporating automated systems to check for common attack patterns.

Sources: