|
| 1 | +# Automatic Prompt Optimization |
| 2 | +A system that access the quality of prompt and updates it for better performance |
| 3 | + |
| 4 | +## Types |
| 5 | +- Without Dataset |
| 6 | + - [OpenAI gpt-5 prompt optimizer](https://platform.openai.com/chat/edit?models=gpt-5&optimize=true) |
| 7 | + - [Anthropic prompt improver](https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/prompt-improver) |
| 8 | + - [Google VertexAI prompt optimizer](https://discuss.google.dev/t/vertex-ai-prompt-optimizer-new-algorithm-sdk-for-the-general-availability/253359) |
| 9 | +- With Dataset (recommended) |
| 10 | + - [dspy's optimizers](https://dspy.ai/learn/optimization/optimizers/) |
| 11 | + - COPRO, MIPRO, SIMBA, GEPA etc |
| 12 | + |
| 13 | +## Understanding GEPA (Genetic Pareto) |
| 14 | + |
| 15 | +The goal is to improve the valset performance using the feedbacks from trainset |
| 16 | + |
| 17 | +### Terminologies |
| 18 | + |
| 19 | +Program - An entire system with one or more predictors |
| 20 | +Predictor - The prompt |
| 21 | + |
| 22 | +### Workflow |
| 23 | +- State Initialization |
| 24 | + - Best programs on Valset - [{P0},{P0},{P0}] |
| 25 | +- While Budget available, do the following steps |
| 26 | + 1) Pareto Selection (Select Program) |
| 27 | + 2) Mini batch Selection from trainset (random sampling by default) |
| 28 | + 3) Generate and Evaluate Program on mini batch and collect score and feedback |
| 29 | + 4) Module Selection (Select Predictors to Update, round robin by default) |
| 30 | + 5) Make reflective dataset |
| 31 | + - For each example in mini batch, frame Prompt + Generated output + evaluation feeback in context |
| 32 | + 6) Construct a prompt with reflective dataset as context and Use reflection model to propose new prompt |
| 33 | + 7) Generate and Evaluate Program on same minibatch #2 with new prompt |
| 34 | + 8) Add the program with new prompt to candidate pool, if the scores are higher than #3 |
| 35 | + 9) If Programs has more than one predictor and a common ancestor |
| 36 | + - merge them (Genetic Algo) to get new program and repeat step #2 to #8 |
| 37 | + |
| 38 | +### Understanding Pareto Sampling |
| 39 | + |
| 40 | +| | E1 | E2 | E3 | Average | |
| 41 | +|-----|-----|-----|-----|---------| |
| 42 | +| P1 | 0.5 | 0.3 | 0.6 | 0.46 | |
| 43 | +| P2 | 0.5 | 0.4 | 0.7 | 0.53 | |
| 44 | +| P3 | 0.8 | 0.1 | 0.2 | 0.36 | |
| 45 | + |
| 46 | +Here If we go by max of Average score to select candidate for optimization, we would have picked P1 always |
| 47 | +but P3 has some information which makes it perform better on E1 will be lost |
| 48 | + |
| 49 | +So here, the candidates are P2 and P3, (leaving P1 since it is being dominated by P2 in all examples) |
| 50 | +from these candidates weighted sampling is used to identify the one for optimization |
| 51 | +in this case, P2 performs good in 2 examples, P3 performs good in 1 example, so |
| 52 | +```python |
| 53 | +import random |
| 54 | +random.choice(["P3","P2","P2"]) |
| 55 | +``` |
| 56 | +So P2 has 66% chance of being selected and P3 has 33% chance of being selected |
| 57 | +this is to avoid local optima |
| 58 | + |
| 59 | +### Understanding Genetic Algorithm |
| 60 | +[Genetic Algorithm](https://medium.com/@AnasBrital98/genetic-algorithm-explained-76dfbc5de85d) will be used to fuse best performing features from parent to create a child |
| 61 | + |
| 62 | +lets say we have a RAG system which involves two prompts (predictors) |
| 63 | +1) Query Generation to retrieve documents |
| 64 | +2) Final Answer Generation from the retieved documents |
| 65 | + |
| 66 | +lets say we have new programs P2 and P3 which are derived from P1 (common ancestor) |
| 67 | +we can create P4 by using predictor #1 from P2 and predictor #2 from P3 |
| 68 | +we can create P5 by vice versa, and much more if the number of predictors under a program increases |
| 69 | + |
| 70 | +these programs will be evaluated and if there is a best performer, it will be added to the candidate pool |
| 71 | + |
| 72 | +### Things to Keep in mind |
| 73 | +- focus on building a robust system before trying to optimize prompt |
| 74 | +- quality of dataset is very important |
| 75 | +- feedback function should provide very detailed instructions on what went wrong |
| 76 | +- try your best to come up with a good base prompt |
| 77 | +- It is preferred to have a SOTA reflection model |
| 78 | +- validation dataset should be minimal and diverse |
| 79 | +- more iterations you run the better the results will be |
| 80 | +- review the optimized prompt |
| 81 | + |
| 82 | +## References |
| 83 | +- [Research Paper](https://arxiv.org/pdf/2507.19457) |
| 84 | +- [GEPA dspy Tutorial](https://youtu.be/gstt7E65FRM?si=n9QdDHgaXdMzPTQK) |
0 commit comments