Hidden Prompt Injection: Why Old Black Hat Tricks No Longer Fool AI

Invisible prompts once managed to trick AI systems, not too different from the shady hacks we used to see in black-hat SEO. But models have grown sharper, filtering hidden commands and resisting manipulation that once worked with ease.

For a short time, slipping prompt injections into HTML, CSS, or metadata felt like a throwback to the SEO days of hidden keywords, cloaked links, and JavaScript tricks. Those schemes promised quick wins—until they didn’t.

It’s the same story here. Disguised commands, ghost text, or comment cloaking gave creators a false sense of control over AI outputs. But like any shortcut, it wasn’t built to last.

As researchers Kenneth Yeung and Leo Ring from HiddenLayer explained:

“Attacks against LLMs had humble beginnings, with phrases like ‘ignore all previous instructions’ easily bypassing defensive logic.”

That era ended quickly. Defences have hardened. Security Innovation put it simply:

“Technical measures like stricter system prompts, user input sandboxing, and principle-of-least-privilege integration went a long way toward hardening LLMs against misuse.”

So what does this mean in practice? For marketers, the sneaky stuff no longer works. Invisible text commands, HTML comments, or hidden file notes are treated as regular content, not secret orders.

What Hidden Prompt Injection Actually Means

Hidden prompt injection is the practice of embedding invisible instructions inside web pages, documents, or other data that large language models process.

The attack works because models don’t just read what humans see—they consume every text token, visible or not. By hiding commands like “ignore all previous instructions” in plain sight but invisible to users, attackers aimed to control how models behave.

Common hiding spots included:

White text on a white background
HTML comments
CSS with display:none
Unicode characters invisible to the human eye

Microsoft’s Azure team defines two main types of these attacks:

User prompt attacks – where the user themselves embeds malicious instructions
Document attacks – where attackers plant hidden prompts in files, articles, or other materials to hijack the model’s behaviour

Document attacks fall into the larger family of indirect prompt injections, where the danger doesn’t come from the user’s direct input but from content pulled in from outside sources.

That means if you paste an article into ChatGPT, ask Perplexity to summarise a URL, or let Gemini fetch external content, the risk of indirect injection is still real.

Here’s an example taken from Erik Bailey’s website:

Example of hidden prompt injection embedded in a webpage, as shown on Erik Bailey's website

The Expanding Threat in Multimodal AI

Yeung and Ring make a crucial point: as AI moves beyond text into images, audio, and video, the attack surface grows.

Think about it, hidden instructions can be slipped into podcasts, video transcripts, or even embedded pixels. A Cornell Tech study proved that adversarial prompts can be smuggled into images and audio in ways humans can’t detect. The model still interprets them, but on the surface, everything looks normal.

The research also noted that these injections often don’t break the model’s ability to answer genuine questions about the content. That makes them stealthy and harder to detect.

For text-only models, image-based attacks don’t matter. But for multimodal systems such as LLaVA or PandaGPT, the risk is documented and ongoing.

As OWASP has cautioned:

“The rise of multimodal AI, which processes multiple data types simultaneously, introduces unique prompt injection risks.”

And big players are paying attention. Meta, for example, has started to address this by designing systems that evaluate both the text prompt and the attached media together to spot manipulative patterns.

Hidden Prompt Injection: Why Old Black Hat Tricks No Longer Fool AI

TeamTrimrly

TeamTrimrly

What Hidden Prompt Injection Actually Means

The Expanding Threat in Multimodal AI

Keep reading