Skip to content

Commit 32aa2d8

Browse files
Add prompt optimization file
1 parent e317390 commit 32aa2d8

1 file changed

Lines changed: 84 additions & 0 deletions

File tree

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Automatic Prompt Optimization
2+
A system that access the quality of prompt and updates it for better performance
3+
4+
## Types
5+
- Without Dataset
6+
- [OpenAI gpt-5 prompt optimizer](https://platform.openai.com/chat/edit?models=gpt-5&optimize=true)
7+
- [Anthropic prompt improver](https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/prompt-improver)
8+
- [Google VertexAI prompt optimizer](https://discuss.google.dev/t/vertex-ai-prompt-optimizer-new-algorithm-sdk-for-the-general-availability/253359)
9+
- With Dataset (recommended)
10+
- [dspy's optimizers](https://dspy.ai/learn/optimization/optimizers/)
11+
- COPRO, MIPRO, SIMBA, GEPA etc
12+
13+
## Understanding GEPA (Genetic Pareto)
14+
15+
The goal is to improve the valset performance using the feedbacks from trainset
16+
17+
### Terminologies
18+
19+
Program - An entire system with one or more predictors
20+
Predictor - The prompt
21+
22+
### Workflow
23+
- State Initialization
24+
- Best programs on Valset - [{P0},{P0},{P0}]
25+
- While Budget available, do the following steps
26+
1) Pareto Selection (Select Program)
27+
2) Mini batch Selection from trainset (random sampling by default)
28+
3) Generate and Evaluate Program on mini batch and collect score and feedback
29+
4) Module Selection (Select Predictors to Update, round robin by default)
30+
5) Make reflective dataset
31+
- For each example in mini batch, frame Prompt + Generated output + evaluation feeback in context
32+
6) Construct a prompt with reflective dataset as context and Use reflection model to propose new prompt
33+
7) Generate and Evaluate Program on same minibatch #2 with new prompt
34+
8) Add the program with new prompt to candidate pool, if the scores are higher than #3
35+
9) If Programs has more than one predictor and a common ancestor
36+
- merge them (Genetic Algo) to get new program and repeat step #2 to #8
37+
38+
### Understanding Pareto Sampling
39+
40+
| | E1 | E2 | E3 | Average |
41+
|-----|-----|-----|-----|---------|
42+
| P1 | 0.5 | 0.3 | 0.6 | 0.46 |
43+
| P2 | 0.5 | 0.4 | 0.7 | 0.53 |
44+
| P3 | 0.8 | 0.1 | 0.2 | 0.36 |
45+
46+
Here If we go by max of Average score to select candidate for optimization, we would have picked P1 always
47+
but P3 has some information which makes it perform better on E1 will be lost
48+
49+
So here, the candidates are P2 and P3, (leaving P1 since it is being dominated by P2 in all examples)
50+
from these candidates weighted sampling is used to identify the one for optimization
51+
in this case, P2 performs good in 2 examples, P3 performs good in 1 example, so
52+
```python
53+
import random
54+
random.choice(["P3","P2","P2"])
55+
```
56+
So P2 has 66% chance of being selected and P3 has 33% chance of being selected
57+
this is to avoid local optima
58+
59+
### Understanding Genetic Algorithm
60+
[Genetic Algorithm](https://medium.com/@AnasBrital98/genetic-algorithm-explained-76dfbc5de85d) will be used to fuse best performing features from parent to create a child
61+
62+
lets say we have a RAG system which involves two prompts (predictors)
63+
1) Query Generation to retrieve documents
64+
2) Final Answer Generation from the retieved documents
65+
66+
lets say we have new programs P2 and P3 which are derived from P1 (common ancestor)
67+
we can create P4 by using predictor #1 from P2 and predictor #2 from P3
68+
we can create P5 by vice versa, and much more if the number of predictors under a program increases
69+
70+
these programs will be evaluated and if there is a best performer, it will be added to the candidate pool
71+
72+
### Things to Keep in mind
73+
- focus on building a robust system before trying to optimize prompt
74+
- quality of dataset is very important
75+
- feedback function should provide very detailed instructions on what went wrong
76+
- try your best to come up with a good base prompt
77+
- It is preferred to have a SOTA reflection model
78+
- validation dataset should be minimal and diverse
79+
- more iterations you run the better the results will be
80+
- review the optimized prompt
81+
82+
## References
83+
- [Research Paper](https://arxiv.org/pdf/2507.19457)
84+
- [GEPA dspy Tutorial](https://youtu.be/gstt7E65FRM?si=n9QdDHgaXdMzPTQK)

0 commit comments

Comments
 (0)