We can extract meaningful pattern from the workflow library code and state_results such as mean KL divergence between agent prompt, lengh, complexity, type of agents, etc and how it relate to the score/performance. Storing these pattern after workflow execution would allow to get information such as :
Correlations with Final Reward
| Feature |
Pearson r |
Spearman ρ |
p-value |
Interpretation |
| num_agents |
-0.158 |
-0.117 |
0.0007*** |
More agents = worse |
| mean_js |
-0.148 |
-0.129 |
0.0015** |
Lower JS divergence = better |
| entropy_gradient |
-0.097 |
-0.080 |
0.0393* |
Negative gradient = better |
| max_kl |
-0.104 |
-0.094 |
0.0267* |
Avoid sharp peaks |
| mean_kl |
-0.095 |
-0.087 |
0.0424* |
Lower divergence = better |
| mean_cosine |
+0.094 |
+0.089 |
0.0446* |
Higher similarity = better |
With value thresholds this can be used to seed the workflow generation prompt with tips such as:
- Increate prompt lengh
- Ideally use 3 agents
- Reduce prompt complexity
- Use more similar agent
Providing search direction for generating the next workflow variant
We can extract meaningful pattern from the workflow library code and state_results such as mean KL divergence between agent prompt, lengh, complexity, type of agents, etc and how it relate to the score/performance. Storing these pattern after workflow execution would allow to get information such as :
Correlations with Final Reward
With value thresholds this can be used to seed the workflow generation prompt with tips such as:
Providing search direction for generating the next workflow variant