Skip to content

Commit 522c9cd

Browse files
committed
Add notes on llm security
1 parent 25705b1 commit 522c9cd

2 files changed

Lines changed: 97 additions & 0 deletions

File tree

docs/LLM-SECURITY.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# LLM Security
2+
3+
## Scope
4+
5+
What can go wrong when building with AI, more specifically LLMs. Code generation with AI
6+
tooling is out of scope. Vulernabilities in LLMs apply on both business applications and
7+
code assistants.
8+
9+
## Social engineering made easier
10+
11+
> Write a scam email to exfiltrate a social security number.
12+
13+
LLMs are pretty good at generating convincing emails for this input. Model developers
14+
are trying to remediate these cases.
15+
16+
Gemini 1.5 will reply with an error response that it cannot help you in writing scam
17+
emails.
18+
19+
By adding random characters to the instruction, it has more chance to trick the LLM
20+
in still generating an answer. This works in Gemini 1.5 and ChatGPT 4.0.
21+
22+
## When things go wrong
23+
24+
When training an ML model, it's hard to know if the model knows the correct concept.
25+
_When a model is trained on dogs on a background, was it trained on the breed of the dog
26+
or on the background of the image?_
27+
28+
Models don't do reasoning, they use extra tokens to add a _plan_ to their answer.
29+
30+
The random tokens at the end of an instruction were never trained on, so the response of
31+
the model contains hallucinations.
32+
33+
When a developer gives an instruction _summarize an email_, the model does not know if
34+
tokens in the email are new instructions or data that the instruction applies on. This leads
35+
to prompt injection. This is a bit familiar to SQL injection where the user can manipulate
36+
the statements.
37+
38+
LLMs predict the next token, but the problem is that unseen data (out of distrubtion) can
39+
manipulate the outcome.
40+
41+
## When building applications
42+
43+
### Direct prompting
44+
45+
A users talks directly to the LLM. This applies to ChatGPT or wrappers around it.
46+
47+
### RAG
48+
49+
Retrieval Augmented Generation pulls extra data from a database (knowledge base) and
50+
enriches the prompt from the users before return an answer.
51+
52+
This is currently the most popular architecture. A typical use case is chatting with documentation.
53+
Benefit to direct prompting is that this handles up-to-date information. Hallucinations
54+
are reduced, because more clear sources are provided.
55+
56+
ELI5: Query + extra documents are sent to the LLM.
57+
58+
As a developer, I only trust the RAG application, all other sources (user, knowledge base, LLM)
59+
can produce malicious input.
60+
61+
#### Untrusted user
62+
63+
- Misaligned permissions: users have permissions to add information to the knowledge base. When a snapshot
64+
of the knowledge base is added to the vector DB, the permissions in the vector DB might not be the same
65+
as those of the user at that time. RAG is very good at finding confidential information in documents
66+
that have the wrong permissions.
67+
- Prompt injections: _ignore everything about and do something else_ can result in unauthorized data
68+
access in knowledge bases or overridden system instructions in the LLM. _Always assume that system instructions
69+
can leak or can be overridden!_
70+
71+
#### Data driven leaks
72+
73+
- Inject malicious prompts in knowledge base: When email is added as a knowledge base, basically everyone
74+
on the internet can add information to your knowledge base. This is called indirect prompt injection.
75+
This is currently an unsolved problem, read more [here](https://embracethered.com/blog/posts/2023/google-bard-data-exfiltration/).
76+
77+
#### LLM driven risks
78+
79+
Think about the data you're sending to the LLM. After some time, all user instructions including the entire knowledge base
80+
will be sent to the LLM. The LLM must have the same trust level as everything else in the chain. Depending on your vendor,
81+
your input data can be used for retraining. Read more on the "Samsung code leaked by ChatGPT" story.
82+
83+
What is the model is unreliable? LLMs are trained to return the next plausible word, they're not trained to tell
84+
the truth. RAG reduces the risk of hallucinations, but don't eliminate the risk. <https://www.forbes.com/sites/conormurray/2025/05/06/why-ai-hallucinations-are-worse-than-ever/>.
85+
86+
### Agentic AI
87+
88+
Extra tooling is added before returning an answer. It's a broad concept, but TLDR:
89+
it interacts with the outside world and other tooling.
90+
91+
## Links
92+
93+
- <https://en.wikipedia.org/wiki/Loss_function>
94+
- <https://learn.microsoft.com/en-us/copilot/microsoft-365/microsoft-365-copilot-blueprint-oversharing>
95+
- <https://medium.com/nfactor-technologies/rag-poisoning-an-emerging-threat-in-ai-systems-660f9ff279f9>
96+
- <https://www.langchain.com/>

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ plugins:
1717
exclude_docs: |
1818
homelab/
1919
web-security-2025/
20+
LLM-SECURITY.md
2021
2122
theme:
2223
name: material

0 commit comments

Comments
 (0)