Skip to content

Commit cc7ca75

Browse files
leestottCopilot
andcommitted
update
Co-authored-by: Copilot <copilot@github.com>
1 parent a226bf4 commit cc7ca75

21 files changed

Lines changed: 1238 additions & 305 deletions
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
name: 🐛 Bug report
2+
description: Something is broken or behaving unexpectedly in the evaluation toolkit.
3+
title: "[Bug]: "
4+
labels: ["bug", "needs-triage"]
5+
body:
6+
- type: markdown
7+
attributes:
8+
value: |
9+
Thanks for taking the time to file a bug report! Please fill in as much of the form below as you can — the more detail you provide, the faster we can help.
10+
11+
> **Before you submit:** Please search [existing issues](https://github.com/microsoft-foundry/Model-Router-Auto-Evaluation/issues?q=is%3Aissue) to avoid duplicates, and check the [FAQ](../blob/main/docs/faq.md) for known issues.
12+
13+
- type: checkboxes
14+
id: prereqs
15+
attributes:
16+
label: Pre-flight checklist
17+
description: Please confirm the following before submitting.
18+
options:
19+
- label: I have searched existing issues and this is not a duplicate.
20+
required: true
21+
- label: I have read the [FAQ](../blob/main/docs/faq.md).
22+
required: true
23+
- label: I have removed any API keys, endpoints, or other secrets from logs and config snippets I paste below.
24+
required: true
25+
26+
- type: textarea
27+
id: summary
28+
attributes:
29+
label: Summary
30+
description: A clear, concise description of the bug.
31+
placeholder: When I run `python scripts/run_eval.py --dry-run`, the script crashes with a KeyError.
32+
validations:
33+
required: true
34+
35+
- type: textarea
36+
id: reproduce
37+
attributes:
38+
label: Steps to reproduce
39+
description: Exact commands and inputs we can run to see the bug ourselves.
40+
placeholder: |
41+
1. Clone the repo at commit <SHA>
42+
2. Create `.env` with the following variables (values redacted): ...
43+
3. Run `python scripts/run_eval.py --dataset datasets/sample_custom.jsonl --sample-size 5`
44+
4. See the error
45+
validations:
46+
required: true
47+
48+
- type: textarea
49+
id: expected
50+
attributes:
51+
label: Expected behaviour
52+
description: What did you expect to happen?
53+
validations:
54+
required: true
55+
56+
- type: textarea
57+
id: actual
58+
attributes:
59+
label: Actual behaviour
60+
description: What actually happened? Include error messages, stack traces, or unexpected output. Use code fences (```) for readability.
61+
validations:
62+
required: true
63+
64+
- type: dropdown
65+
id: pipeline
66+
attributes:
67+
label: Which part of the pipeline is affected?
68+
options:
69+
- Local evaluation (scripts/run_eval.py)
70+
- Foundry cloud evaluation (scripts/run_foundry_eval.py)
71+
- Comparison or export scripts (compare_results.py / export_results.py)
72+
- WALKTHROUGH.ipynb (Jupyter notebook)
73+
- Configuration (configs/*.yaml)
74+
- Dataset loading (JSONL / CSV / SQL)
75+
- Reporting / dashboard / charts
76+
- Tests (pytest)
77+
- Documentation
78+
- Other / not sure
79+
validations:
80+
required: true
81+
82+
- type: input
83+
id: python-version
84+
attributes:
85+
label: Python version
86+
description: Output of `python --version`
87+
placeholder: "3.11.7"
88+
validations:
89+
required: true
90+
91+
- type: input
92+
id: os
93+
attributes:
94+
label: Operating system
95+
placeholder: "Windows 11 / macOS 14.4 / Ubuntu 22.04"
96+
validations:
97+
required: true
98+
99+
- type: input
100+
id: package-version
101+
attributes:
102+
label: Repo commit or release
103+
description: Output of `git rev-parse --short HEAD` or the release tag you are using.
104+
placeholder: "abc1234 or v1.0.0"
105+
106+
- type: textarea
107+
id: config
108+
attributes:
109+
label: Relevant configuration
110+
description: A redacted snippet of the YAML config or environment variables involved. **Do not include API keys.**
111+
render: yaml
112+
113+
- type: textarea
114+
id: logs
115+
attributes:
116+
label: Logs and screenshots
117+
description: Paste any relevant terminal output, stack traces, or screenshots. Redact secrets first.
118+
render: shell
119+
120+
- type: textarea
121+
id: additional
122+
attributes:
123+
label: Additional context
124+
description: Anything else we should know — workarounds tried, related issues, hypotheses, etc.

.github/ISSUE_TEMPLATE/config.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
blank_issues_enabled: false
2+
contact_links:
3+
- name: Question or general feedback
4+
url: https://aka.ms/foundry/discord
5+
about: For questions about usage, sharing experience, or general feedback please use GitHub Discussions.
6+
- name: Microsoft Foundry documentation
7+
url: https://learn.microsoft.com/azure/ai-foundry/
8+
about: For questions about the Microsoft Foundry product itself (not this evaluation tool), see the official docs.
9+
- name: Security vulnerabilities
10+
url: https://www.microsoft.com/msrc
11+
about: Please report security vulnerabilities privately via the Microsoft Security Response Center, not as public issues.
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
name: 📚 Documentation issue
2+
description: Something in the README, QUICKSTART, docs/, or notebook is wrong, missing, or unclear.
3+
title: "[Docs]: "
4+
labels: ["documentation", "needs-triage"]
5+
body:
6+
- type: markdown
7+
attributes:
8+
value: |
9+
Thanks for helping improve the documentation. Clear docs are a feature — please tell us what tripped you up.
10+
11+
- type: input
12+
id: page
13+
attributes:
14+
label: Which page or file?
15+
description: Path or URL of the doc that needs attention.
16+
placeholder: "docs/how-to-run-live-eval.md, README.md, WALKTHROUGH.ipynb cell 5, ..."
17+
validations:
18+
required: true
19+
20+
- type: textarea
21+
id: issue
22+
attributes:
23+
label: What is wrong, missing, or unclear?
24+
placeholder: |
25+
- The instructions assume `az login` already works, but I had no Azure account yet.
26+
- The example config refers to a model deployment name that doesn't exist in the default config.
27+
validations:
28+
required: true
29+
30+
- type: textarea
31+
id: suggestion
32+
attributes:
33+
label: Suggested improvement
34+
description: Optional — if you have wording in mind, drop it here.
35+
36+
- type: dropdown
37+
id: audience
38+
attributes:
39+
label: Reader audience this affects
40+
options:
41+
- First-time / beginner user
42+
- Developer extending the toolkit
43+
- Operator running large-scale evaluations
44+
- Foundry / cloud-eval user
45+
- Other
46+
validations:
47+
required: true
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
name: ✨ Feature request
2+
description: Suggest a new capability, dataset format, grader, or improvement.
3+
title: "[Feature]: "
4+
labels: ["enhancement", "needs-triage"]
5+
body:
6+
- type: markdown
7+
attributes:
8+
value: |
9+
Thanks for suggesting an improvement! Please describe both the **problem** you're trying to solve and the **outcome** you'd like, so we can consider alternative solutions too.
10+
11+
- type: checkboxes
12+
id: prereqs
13+
attributes:
14+
label: Pre-flight checklist
15+
options:
16+
- label: I have searched existing issues and discussions for similar requests.
17+
required: true
18+
- label: This request is about the evaluation toolkit, not about Microsoft Foundry as a product.
19+
required: true
20+
21+
- type: textarea
22+
id: problem
23+
attributes:
24+
label: What problem are you trying to solve?
25+
description: Describe the use case or pain point. Don't lead with the proposed solution.
26+
placeholder: I want to evaluate Model Router on a 50,000-prompt dataset stored in BigQuery, but the current loader only supports JSONL/CSV/SQLAlchemy URLs.
27+
validations:
28+
required: true
29+
30+
- type: textarea
31+
id: proposal
32+
attributes:
33+
label: Proposed solution
34+
description: What would you like the toolkit to do? Be as specific as you can — CLI flags, config keys, output format, etc.
35+
validations:
36+
required: true
37+
38+
- type: textarea
39+
id: alternatives
40+
attributes:
41+
label: Alternatives considered
42+
description: Other approaches you thought about, and why they don't quite fit.
43+
44+
- type: dropdown
45+
id: area
46+
attributes:
47+
label: Area of the toolkit
48+
multiple: true
49+
options:
50+
- Local evaluation pipeline
51+
- Foundry cloud evaluation
52+
- Dataset loading / format support
53+
- Judge / grader prompts
54+
- Cost or latency methodology
55+
- Reporting / dashboard / charts
56+
- CLI / configuration
57+
- Documentation / tutorials
58+
- Tests / CI
59+
- Other
60+
validations:
61+
required: true
62+
63+
- type: textarea
64+
id: additional
65+
attributes:
66+
label: Additional context
67+
description: Examples, mock outputs, links to related issues or external docs.
68+
69+
- type: checkboxes
70+
id: contribute
71+
attributes:
72+
label: Contribution
73+
options:
74+
- label: I would be willing to contribute a pull request for this feature.
75+
- label: I'd like to discuss the design first before any implementation work.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
name: 💬 Feedback
2+
description: Share your experience using the toolkit — what worked, what didn't, what surprised you.
3+
title: "[Feedback]: "
4+
labels: ["feedback"]
5+
body:
6+
- type: markdown
7+
attributes:
8+
value: |
9+
Thanks for sharing! Feedback helps us understand how the toolkit is used in the real world. There are no required fields — fill in whatever is useful.
10+
11+
> For free-form questions, please use [Microsoft Foundry Discord](https://aka.ms/foundry/discord) instead.
12+
13+
- type: textarea
14+
id: use-case
15+
attributes:
16+
label: How are you using the toolkit?
17+
placeholder: "Comparing Model Router against gpt-5 on ~500 internal customer-support prompts."
18+
19+
- type: textarea
20+
id: worked
21+
attributes:
22+
label: What worked well?
23+
24+
- type: textarea
25+
id: friction
26+
attributes:
27+
label: Where did you hit friction?
28+
description: Confusing docs, broken behaviour, missing features, surprising results — anything.
29+
30+
- type: textarea
31+
id: outcome
32+
attributes:
33+
label: Did the evaluation help you make a decision?
34+
description: e.g. adopted Model Router, kept the baseline, found a quality regression, identified a cost saving.
35+
36+
- type: textarea
37+
id: additional
38+
attributes:
39+
label: Anything else?

0 commit comments

Comments
 (0)