Summary
Carry source, decision, and output provenance through the main workflow so downstream agents can audit and cite it.
This issue was generated from an org-wide EvalOps mining pass on 2026-05-10 07:57 UTC. It combines live GitHub repo signals with a per-repo arXiv search. Treat the research links as grounding for a concrete implementation, not as a request for a literature review.
Repo Evidence
- Repository description: State-of-the-art prompting techniques implementation with DSpy - Manager-style prompts, role personas, meta-prompting, and more
- Tree signals: 0 docs files, 0 workflows, 0 proto files, 1 test-like files.
README.md:58 includes latent-spec language: ### 10. Evaluation Framework - Test cases more valuable than prompts
README.md:255 includes latent-spec language: # Run evaluation framework python -m src.evaluations.evaluation_framework
README.md:261 includes latent-spec language: Each technique includes built-in evaluation metrics: - Accuracy: How well the prompt performs its intended task
README.md:288 includes latent-spec language: ### Building Evaluation Suites
README.md:359 includes latent-spec language: 1. Prompts as Onboarding Docs: Treat prompts like you're onboarding a new employee 2. Test Cases > Prompts: Evaluation frameworks are more valuable than the prompts themselves 3. Uncertainty is Good: Better to admit uncertainty than hallucinate
CONTRIBUTING.md:17 includes latent-spec language: - Add tests for new techniques - Update documentation as needed - Include examples in your implementations
Research Grounding
Repo axes: infra, governance, security, evaluation
Search keywords: prompts, techniques, evaluation, examples, api, dspy, src, prompt, import, uncertainty, your, test
- arXiv:2506.11019v1 Mind the Metrics: Patterns for Telemetry-Aware In-IDE AI Application Development using the Model Context Protocol (MCP) (Vincent Koc, Jacques Verre, Douglas Blank, Abigail Morgan), 2025.
- arXiv:2507.03620v1 Is It Time To Treat Prompts As Code? A Multi-Use Case Study For Prompt Optimization Using DSPy (Francisca Lemos, Victor Alves, Filipa Ferraz), 2025.
- arXiv:2412.15298v1 A Comparative Study of DSPy Teleprompter Algorithms for Aligning Large Language Models Evaluation Metrics to Human Evaluation (Bhaskarjit Sarmah, Kriti Dutta, Anna Grigoryan, Sachin Tiwari, Stefano Pasquali, Dhagash Mehta), 2024.
- arXiv:2604.04869v1 Optimizing LLM Prompt Engineering with DSPy Based Declarative Learning (Shiek Ruksana, Sailesh Kiran Kurra, Thipparthi Sanjay Baradwaj), 2026.
- arXiv:2506.02032v2 Towards Secure MLOps: Surveying Attacks, Mitigation Strategies, and Research Challenges (Raj Patel, Himanshu Tripathi, Jasper Stone, Noorbakhsh Amiri Golilarz, Sudip Mittal, Shahram Rahimi), 2025.
- arXiv:2307.13473v1 Exploring MLOps Dynamics: An Experimental Analysis in a Real-World Machine Learning Project (Awadelrahman M. A. Ahmed), 2023.
- arXiv:2503.15577v1 Navigating MLOps: Insights into Maturity, Lifecycle, Tools, and Careers (Jasper Stone, Raj Patel, Farbod Ghiasi, Sudip Mittal, Shahram Rahimi), 2025.
- arXiv:2601.20415v1 An Empirical Evaluation of Modern MLOps Frameworks (Jon Marcos-Mercadé, Unai Lopez-Novoa, Mikel Egaña Aranguren), 2026.
- arXiv:2001.07935v2 CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking (Grigori Fursin, Herve Guillou, Nicolas Essayan), 2020.
- arXiv:2407.09107v1 MLOps: A Multiple Case Study in Industry 4.0 (Leonhard Faubel, Klaus Schmid), 2024.
What To Build
- Add stable identifiers for source records, derived decisions, and emitted outputs.
- Thread those identifiers through logs/events/API responses without leaking secrets.
- Provide a query or debug surface that reconstructs the chain for one completed workflow.
Acceptance Criteria
Notes
- Generated issue 2/5 for
evalops/dspy-advanced-prompting by evalops_org_miner.py.
- Before implementation, confirm the sampled latent-spec snippets still match
main; this issue intentionally cites exact file paths/lines where the mining pass saw them.
Summary
Carry source, decision, and output provenance through the main workflow so downstream agents can audit and cite it.
This issue was generated from an org-wide EvalOps mining pass on 2026-05-10 07:57 UTC. It combines live GitHub repo signals with a per-repo arXiv search. Treat the research links as grounding for a concrete implementation, not as a request for a literature review.
Repo Evidence
README.md:58includes latent-spec language: ### 10. Evaluation Framework - Test cases more valuable than promptsREADME.md:255includes latent-spec language: # Run evaluation framework python -m src.evaluations.evaluation_frameworkREADME.md:261includes latent-spec language: Each technique includes built-in evaluation metrics: - Accuracy: How well the prompt performs its intended taskREADME.md:288includes latent-spec language: ### Building Evaluation SuitesREADME.md:359includes latent-spec language: 1. Prompts as Onboarding Docs: Treat prompts like you're onboarding a new employee 2. Test Cases > Prompts: Evaluation frameworks are more valuable than the prompts themselves 3. Uncertainty is Good: Better to admit uncertainty than hallucinateCONTRIBUTING.md:17includes latent-spec language: - Add tests for new techniques - Update documentation as needed - Include examples in your implementationsResearch Grounding
Repo axes: infra, governance, security, evaluation
Search keywords: prompts, techniques, evaluation, examples, api, dspy, src, prompt, import, uncertainty, your, test
What To Build
Acceptance Criteria
Notes
evalops/dspy-advanced-promptingbyevalops_org_miner.py.main; this issue intentionally cites exact file paths/lines where the mining pass saw them.