Skip to content

Commit 1a290c2

Browse files
Merge pull request #7 from devndeploy/add-canary-token-testing
feat(all): add ability to detect using canary tokens
2 parents d7d10f5 + cd56268 commit 1a290c2

3 files changed

Lines changed: 550 additions & 28 deletions

File tree

README.md

Lines changed: 62 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,21 +26,46 @@ const promptCage = new PromptCage();
2626
// Detect prompt injection
2727
const result = await promptCage.detectInjection('Your user input here');
2828

29-
console.log(result);
30-
//=> { safe: true, detectionId: 'det_123456', error: undefined }
29+
if (!result.safe) {
30+
console.log('Prompt injection detected!');
31+
return 'I cannot process this request due to security concerns.';
32+
}
33+
34+
// Canary token testing
35+
const [systemPromptWithCanary, canaryWord] = promptCage.addCanaryWord(
36+
'You are a helpful assistant. Answer user questions accurately and concisely.'
37+
);
38+
39+
// Send to your AI model with canary in system prompt
40+
const aiResponse = await yourAiModel.complete({
41+
systemPrompt: systemPromptWithCanary,
42+
userPrompt: 'What is the capital of France?'
43+
});
44+
45+
// Check for canary leakage
46+
const leakageResult = promptCage.isCanaryWordLeaked(aiResponse, canaryWord);
47+
if (leakageResult.leaked) {
48+
console.log('Canary token leaked - possible prompt injection!');
49+
return 'I cannot process this request due to security concerns.';
50+
}
51+
52+
// If we get here, both checks passed
53+
return aiResponse;
3154
```
3255

3356
## 🔧 API
3457

3558
### Constructor
3659

37-
The constructor accepts an optional configuration object or API key string.
60+
The constructor accepts an optional configuration object.
3861

3962
| Parameter | Type | Required | Default | Description |
4063
|-----------|------|----------|---------|-------------|
41-
| `options` | `string \| PromptCageOptions` | No | - | API key string or configuration object |
64+
| `options` | `PromptCageOptions` | No | - | Configuration object |
4265
| `options.apiKey` | `string` | No | `process.env.PROMPTCAGE_API_KEY` | Your PromptCage API key |
4366
| `options.maxWaitTime` | `number` | No | `1000` | Maximum wait time in milliseconds before treating request as safe |
67+
| `options.defaultCanaryLength` | `number` | No | `8` | Default canary word length in characters |
68+
| `options.defaultCanaryFormat` | `string` | No | `'<!-- {canary_word} -->'` | Default format for embedding canary words (must contain `{canary_word}` placeholder) |
4469

4570
### detectInjection()
4671

@@ -60,6 +85,35 @@ Detects potential prompt injection in the given text.
6085
| `detectionId` | `string` | Unique identifier for this detection |
6186
| `error` | `string \| undefined` | Error message if something went wrong (optional) |
6287

88+
### addCanaryWord()
89+
90+
Embeds a canary word into a prompt for injection testing.
91+
92+
| Parameter | Type | Required | Description |
93+
|-----------|------|----------|-------------|
94+
| `prompt` | `string` | Yes | The original prompt text |
95+
| `canaryWord` | `string` | No | Specific canary word to use (auto-generated if not provided) |
96+
| `canaryFormat` | `string` | No | Format string with `{canary_word}` placeholder (must contain `{canary_word}`) |
97+
98+
**Returns:** `[string, string]` - Tuple of [prompt with canary, canary word used]
99+
100+
### isCanaryWordLeaked()
101+
102+
Checks if a canary word has been leaked in an AI model's response.
103+
104+
| Parameter | Type | Required | Description |
105+
|-----------|------|----------|-------------|
106+
| `completion` | `string` | Yes | The AI model's response/completion to check |
107+
| `canaryWord` | `string` | Yes | The canary word to look for |
108+
109+
**Returns:** `CanaryLeakageResult`
110+
111+
| Property | Type | Description |
112+
|----------|------|-------------|
113+
| `leaked` | `boolean` | Whether the canary word was leaked |
114+
| `canaryWord` | `string` | The canary word that was checked |
115+
| `error` | `string \| undefined` | Error message if the check failed (optional) |
116+
63117
## 🛡️ Fail-Safe Behavior
64118

65119
The package is designed to be **fail-safe** and will never block your application. The SDK **fails open** in all error scenarios (Network errors, Rate limit exceeded, Quota exceeded ...).
@@ -97,20 +151,22 @@ if (result.safe) {
97151
}
98152
```
99153

154+
155+
100156
## ⚡ Performance Considerations
101157

102158
The `maxWaitTime` option helps prevent performance impact on your application:
103159

104160
```ts
105161
// Fast response for performance-critical apps
106162
const promptCage = new PromptCage({
107-
apiKey: 'your-key',
163+
apiKey: 'your-api-key',
108164
maxWaitTime: 100 // 100ms max wait
109165
});
110166

111167
// Longer wait for slower networks
112168
const promptCage = new PromptCage({
113-
apiKey: 'your-key',
169+
apiKey: 'your-api-key',
114170
maxWaitTime: 10000 // 10 seconds max wait
115171
});
116172
```

src/index.ts

Lines changed: 159 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import fetch from 'node-fetch';
2+
import crypto from 'crypto';
23

34
/**
45
* Response from the PromptCage API detection endpoint
@@ -32,6 +33,22 @@ export interface PromptCageOptions {
3233
apiKey?: string;
3334
/** Maximum wait time in milliseconds before treating request as safe (default: 1000ms) */
3435
maxWaitTime?: number;
36+
/** Default canary word length in characters (default: 8) */
37+
defaultCanaryLength?: number;
38+
/** Default format for embedding canary words (default: "<!-- {canary_word} -->") */
39+
defaultCanaryFormat?: string;
40+
}
41+
42+
/**
43+
* Result of a canary word leakage check
44+
*/
45+
export interface CanaryLeakageResult {
46+
/** Whether the canary word was leaked in the completion */
47+
leaked: boolean;
48+
/** The canary word that was checked */
49+
canaryWord: string;
50+
/** Optional error message if the check failed */
51+
error?: string;
3552
}
3653

3754
/**
@@ -46,9 +63,6 @@ export interface PromptCageOptions {
4663
* // Basic usage with environment variable
4764
* const promptCage = new PromptCage();
4865
*
49-
* // With API key directly
50-
* const promptCage = new PromptCage('your-api-key');
51-
*
5266
* // With configuration options
5367
* const promptCage = new PromptCage({
5468
* apiKey: 'your-api-key',
@@ -63,36 +77,37 @@ export class PromptCage {
6377
private baseUrl = 'https://promptcage.com/api/v1';
6478
/** Maximum wait time in milliseconds before aborting requests */
6579
private maxWaitTime: number;
80+
/** Default canary word length */
81+
private defaultCanaryLength: number;
82+
/** Default format for embedding canary words */
83+
private defaultCanaryFormat: string;
6684

6785
/**
6886
* Creates a new PromptCage client instance
6987
*
70-
* @param options - Configuration options or API key string
88+
* @param options - Configuration options
7189
* @throws {Error} When no API key is provided (neither in options nor environment variable)
7290
*
7391
* @example
7492
* ```ts
7593
* // Using environment variable
7694
* const promptCage = new PromptCage();
7795
*
78-
* // Using API key string
79-
* const promptCage = new PromptCage('your-api-key');
80-
*
8196
* // Using options object
8297
* const promptCage = new PromptCage({
8398
* apiKey: 'your-api-key',
84-
* maxWaitTime: 2000
99+
* maxWaitTime: 2000,
100+
* defaultCanaryLength: 12,
101+
* defaultCanaryFormat: '<!-- CANARY: {canary_word} -->'
85102
* });
86103
* ```
87104
*/
88-
constructor(options?: PromptCageOptions | string) {
89-
if (typeof options === 'string') {
90-
this.apiKey = options;
91-
this.maxWaitTime = 1000;
92-
} else {
93-
this.apiKey = options?.apiKey || process.env.PROMPTCAGE_API_KEY || '';
94-
this.maxWaitTime = options?.maxWaitTime || 1000;
95-
}
105+
constructor(options?: PromptCageOptions) {
106+
this.apiKey = options?.apiKey || process.env.PROMPTCAGE_API_KEY || '';
107+
this.maxWaitTime = options?.maxWaitTime || 1000;
108+
this.defaultCanaryLength = options?.defaultCanaryLength || 8;
109+
this.defaultCanaryFormat =
110+
options?.defaultCanaryFormat || '<!-- {canary_word} -->'; // as a markdown comment
96111

97112
if (!this.apiKey) {
98113
throw new Error(
@@ -199,6 +214,134 @@ export class PromptCage {
199214
};
200215
}
201216
}
217+
218+
/**
219+
* Generates a secure random canary word for injection testing
220+
*
221+
* @param length - Length of the canary word in characters (default: uses defaultCanaryLength)
222+
* @returns A secure random hexadecimal canary word
223+
*/
224+
private generateCanaryWord(length?: number): string {
225+
const canaryLength = length || this.defaultCanaryLength;
226+
// Generate a secure random hexadecimal canary word
227+
return crypto
228+
.randomBytes(Math.ceil(canaryLength / 2))
229+
.toString('hex')
230+
.slice(0, canaryLength);
231+
}
232+
233+
/**
234+
* Embeds a canary word into a prompt using the specified format
235+
*
236+
* @param prompt - The original prompt text
237+
* @param canaryWord - The canary word to embed (if not provided, generates one)
238+
* @param canaryFormat - Format string with {canary_word} placeholder (default: uses defaultCanaryFormat)
239+
* @returns Tuple of [prompt with canary, canary word used]
240+
*
241+
* @example
242+
* ```ts
243+
* const [promptWithCanary, canaryWord] = promptCage.addCanaryWord(
244+
* 'What is the capital of France?'
245+
* );
246+
* console.log(promptWithCanary);
247+
* // <!-- a1b2c3d4 -->
248+
* // What is the capital of France?
249+
*
250+
* // Custom canary word and format
251+
* const [customPrompt, customCanary] = promptCage.addCanaryWord(
252+
* 'Translate this text',
253+
* 'secret123',
254+
* '--- TOKEN: {canary_word} ---'
255+
* );
256+
* ```
257+
*/
258+
addCanaryWord(
259+
prompt: string,
260+
canaryWord?: string,
261+
canaryFormat?: string
262+
): [string, string] {
263+
if (!prompt || typeof prompt !== 'string') {
264+
throw new Error('Prompt must be a non-empty string');
265+
}
266+
267+
const canary = canaryWord || this.generateCanaryWord();
268+
const format = canaryFormat || this.defaultCanaryFormat;
269+
270+
// Embed the canary word in the specified format
271+
const canaryComment = format.replace('{canary_word}', canary);
272+
const promptWithCanary = `${canaryComment}\n${prompt}`;
273+
274+
return [promptWithCanary, canary];
275+
}
276+
277+
/**
278+
* Checks if a canary word has been leaked in a completion/response
279+
*
280+
* This method performs a case-sensitive search for the canary word in the
281+
* completion text. It's designed to be fail-safe and will return leaked: false
282+
* in case of any errors during the check.
283+
*
284+
* @param completion - The AI model's response/completion to check
285+
* @param canaryWord - The canary word to look for
286+
* @returns Result indicating whether the canary was leaked
287+
*
288+
* @example
289+
* ```ts
290+
* const [promptWithCanary, canaryWord] = promptCage.addCanaryWord(
291+
* 'What is the capital of France?'
292+
* );
293+
*
294+
* // Send promptWithCanary to your AI model and get completion
295+
* const aiResponse = await yourAiModel.complete(promptWithCanary);
296+
*
297+
* const leakageResult = promptCage.isCanaryWordLeaked(aiResponse, canaryWord);
298+
* if (leakageResult.leaked) {
299+
* console.log('Canary word was leaked! Possible injection detected.');
300+
* } else {
301+
* console.log('Canary word was not leaked.');
302+
* }
303+
* ```
304+
*/
305+
isCanaryWordLeaked(
306+
completion: string,
307+
canaryWord: string
308+
): CanaryLeakageResult {
309+
try {
310+
if (!completion || typeof completion !== 'string') {
311+
return {
312+
leaked: false,
313+
canaryWord,
314+
error: 'Completion must be a non-empty string',
315+
};
316+
}
317+
318+
if (!canaryWord || typeof canaryWord !== 'string') {
319+
return {
320+
leaked: false,
321+
canaryWord: canaryWord || '',
322+
error: 'Canary word must be a non-empty string',
323+
};
324+
}
325+
326+
// Check if the canary word appears in the completion (case-sensitive)
327+
const leaked = completion.includes(canaryWord);
328+
329+
return {
330+
leaked,
331+
canaryWord,
332+
};
333+
} catch (error) {
334+
// Fail-safe: return not leaked if there's any error
335+
return {
336+
leaked: false,
337+
canaryWord: canaryWord || '',
338+
error:
339+
error instanceof Error
340+
? error.message
341+
: 'Unknown error occurred during canary check',
342+
};
343+
}
344+
}
202345
}
203346

204347
export default PromptCage;

0 commit comments

Comments
 (0)