Learning Objectives • Scenario • Lab Flow • Prerequisites • Success Criteria
- Who is this for: Software engineers, data engineers, data scientists, and technical professionals who want to move beyond basic GitHub Copilot usage and apply it effectively to difficult SQL tasks.
- What you'll learn: How to collaborate with GitHub Copilot on complex SQL by clarifying requirements, improving prompts, validating output, debugging issues, and refining queries based on evidence.
- What you'll build: SQL queries against a realistic transactional dataset that apply business rules, handle edge cases, and pass validation checks.
In this workshop, you will:
- Use advanced GitHub Copilot interaction techniques to deal with complex SQL problems.
- Iterate, validate, and refine Copilot-generated answers to improve accuracy.
- Apply general problem-solving strategies that improve the quality of AI-assisted work.
- Recognize poor prompting techniques and explain how they affect GitHub Copilot output.
Important
This lab is not about getting the first query that runs. It is about learning how to collaborate with GitHub Copilot while still verifying, debugging, and reasoning about SQL yourself.
You are supporting a payment operations team that needs a reliable daily settlement report from transactional data.
The report must:
- include only valid settlement records,
- exclude transaction types that should not be counted,
- aggregate results correctly,
- hold up under validation and edge cases.
Your goal is to use GitHub Copilot to help you build the SQL, while reviewing every suggestion critically. A query that looks correct is not enough — it must also be explainable, testable, and validated.
Tip
Treat GitHub Copilot as a collaborator, not a source of truth.
This lab is organized into six phases:
- Understand the requirements
- Explore the dataset
- Generate a baseline query
- Refine the query with better prompts
- Validate and debug the output
- Reflect on how GitHub Copilot helped — and where your judgment mattered
Each phase includes:
- a clear goal,
- suggested Copilot prompts,
- validation guidance,
- a checkpoint.
- Enable your GitHub Copilot service
- Open this repository with Codespaces
Before joining the workshop, make sure you have:
- a public GitHub account,
- access to GitHub Copilot,
- basic familiarity with SQL filtering, joins, and aggregation,
- comfort running scripts or tests from the command line.
All resources, dependencies, and data required for the workshop are included in the repository.
Note
You do not need to know the final SQL solution in advance. You do need to be willing to inspect results, revise prompts, and validate output carefully.
By the end of the lab, you should have:
- a working SQL solution that satisfies the business rules,
- evidence that the result was validated,
- at least one example of a weak prompt and an improved version,
- notes on at least one defect, edge case, or incorrect Copilot assumption you found during the lab,
- a short reflection on what improved your results.
Use GitHub Copilot to help with:
- translating requirements into SQL structure,
- exploring the schema,
- drafting initial queries,
- identifying edge cases,
- debugging failed validations,
- improving readability,
- suggesting alternative approaches.
Do not use GitHub Copilot as:
- a substitute for understanding the data,
- proof that a query is correct,
- a replacement for testing,
- a replacement for business-rule review.
A strong workflow for this lab looks like this:
- Restate the requirement in plain English
- Ask Copilot for a first pass
- Inspect the SQL carefully
- Run it and inspect the results
- Compare against expected behavior
- Revise the prompt or query
- Validate again
You will use several prompting patterns throughout the lab.
Use this when the task is ambiguous.
Example prompt
Before writing SQL, summarize the requirement as filters, exclusions, joins, aggregations, and expected output columns.
Use this when the problem is too large for one step.
Example prompt
Break this SQL problem into smaller steps using CTEs for filtering, normalization, and aggregation.
Use this when a query seems plausible but needs review.
Example prompt
List the edge cases or data anomalies that could cause this query to return incorrect totals.
Use this when tests or outputs fail.
Example prompt
This query returns more rows than expected. Identify likely causes such as duplicate joins, incorrect filters, or grouping issues, and suggest how to test each one.
Use this when most of the query is correct.
Example prompt
Do not rewrite the whole query. Only revise the logic responsible for excluding invalid settlement records.
Translate the reporting need into a precise SQL problem before generating code.
Review the scenario and identify:
- what the query must return,
- what records must be excluded,
- how results should be grouped,
- what assumptions need clarification.
Summarize this reporting requirement as a SQL design checklist. Include filters, exclusions, joins, grouping, output columns, and possible ambiguities.
Document:
- required filters,
- required exclusions,
- aggregation rules,
- any assumptions about date handling, duplicate transactions, or null values.
Tip
If you cannot explain the business rule in plain English, you are not ready to trust the SQL.
You are ready to move on if you can explain:
- which records count,
- which records do not count,
- how the final output should be shaped.
Understand the schema and identify data characteristics before writing the final query.
Inspect the available tables and answer:
- What tables are relevant?
- What columns appear necessary?
- What values exist in key filtering columns?
- Are there nulls, duplicates, or suspicious values?
Help me explore the transaction dataset. I want SQL to inspect row counts, date range, distinct status values, distinct batch types, and null counts for important columns.
Based on this schema, which columns are most likely needed for a daily settlement report and why?
Write and run SQL to:
- count total rows,
- inspect the date range,
- list distinct statuses,
- list distinct batch types,
- inspect nulls in amount or key identifiers,
- identify possible duplicates.
You are ready to move on if you can answer:
- which columns drive filtering,
- which columns drive aggregation,
- which values are likely edge cases.
Create a simple first-pass query that is correct in structure, even if incomplete.
Write a baseline query that:
- filters to the target date,
- includes only valid record status values,
- returns raw rows before aggregation.
Do not optimize yet.
Write a simple SQL query for the daily settlement dataset that returns only valid records for the reporting date. Do not aggregate yet. Add comments explaining each WHERE clause.
Before executing the query, check:
- Did Copilot assume a date column without evidence?
- Did it invent a status value?
- Did it exclude records not mentioned in the requirements?
- Is the logic readable enough to debug?
Warning
A working query can still be wrong. Always verify assumptions against the actual schema and data.
You are ready to move on if:
- the baseline query runs,
- you understand each filter,
- the result set is smaller and more focused than the raw input.
Improve the SQL by strengthening the prompt and explicitly handling business constraints.
Refine the baseline query to:
- exclude invalid settlement categories,
- group correctly,
- produce the expected report shape.
Weak prompt
Write a SQL query for today's transaction totals.
Better prompt
Write a SQL query for a daily settlement report. Return one row per merchant and transaction type for the reporting date. Exclude rows with invalid batch types such as off-us and offset, include only settled records, and sum amount as total_amount. Add comments explaining each rule.
Use both styles once and compare the results.
Document:
- what the weak prompt missed,
- what the improved prompt clarified,
- what Copilot still got wrong or assumed.
Refine this query to exclude invalid batch types and aggregate by merchant and transaction type. Keep the SQL readable and explain each business rule in comments.
Rewrite this using CTEs so the filtering, exclusion logic, and final aggregation are easier to validate separately.
You are ready to move on if:
- your query reflects the stated business rules,
- you can explain why each exclusion exists,
- the query is organized clearly enough for targeted debugging.
Prove that the SQL is correct using evidence, not confidence.
Validate the query by checking:
- row counts,
- totals,
- expected shape,
- automated tests if provided,
- edge-case behavior.
- Run the query and inspect the output
- Compare columns and row counts to expectations
- Validate totals against sample or known cases
- Run the provided tests
- If a test fails, isolate the problem to one stage of the query
- Revise only the failing logic
List the most likely reasons this settlement query could overcount totals.
This query failed a validation case involving duplicate transactions. Suggest how to detect and fix the problem without rewriting the whole query.
Explain how to test each CTE in this query separately so I can isolate the defect.
If your query is wrong:
- inspect intermediate CTE outputs,
- test one filter at a time,
- compare pre-aggregation and post-aggregation row counts,
- look for duplicate joins,
- check whether nulls or reversals are handled correctly.
You are ready to move on if:
- your query passes validation,
- you can explain why it is correct,
- you identified at least one issue that required revision.
Capture what improved your results and where your own reasoning was essential.
Answer the following:
- What did GitHub Copilot help with most?
- What did it get wrong, assume, or oversimplify?
- Which prompt improvement led to the biggest quality increase?
- What validation step caught an issue you might otherwise have missed?
- What would you do differently next time when using Copilot on a complex SQL task?
Capture:
- one weak prompt,
- one improved prompt,
- one incorrect Copilot suggestion,
- one fix you made after validation,
- one lesson learned.
Possible causes:
- duplicate joins,
- missing exclusions,
- duplicate transaction rows,
- grouping at the wrong level.
Try:
- checking row counts before and after joins,
- grouping by transaction identifier temporarily,
- validating one business rule at a time.
Possible causes:
- overly restrictive date logic,
- incorrect status filter,
- wrong join type,
- mismatch between assumed and actual values.
Try:
- removing one filter at a time,
- checking distinct values in filter columns,
- validating the reporting date logic.
Try:
Do not rewrite the full query. Only revise the WHERE clause to fix the exclusion logic.
Try:
Refactor this query into clearly named CTEs with one responsibility per CTE. Do not change the logic.
If you complete the core lab, try one or more of these:
Revise the query to handle:
- null amounts,
- duplicate transaction IDs,
- reversal records,
- date-boundary ambiguity.
Refactor the final query for readability without changing behavior.
Ask Copilot to suggest possible performance improvements, then evaluate whether they preserve correctness.
Ask Copilot to propose additional validation cases for the final query. Review whether those tests actually cover the business rules.
Example prompt
Suggest five targeted test cases for this settlement query, including one for duplicates, one for null amounts, and one for excluded batch types.
- Define the objective before generating SQL.
- Break hard problems into smaller, testable pieces.
- Use domain vocabulary to improve Copilot output.
- Iterate instead of expecting a perfect first answer.
- Use examples and constraints to reduce ambiguity.
- Think about edge cases early.
- Validate every meaningful output.
- Treat Copilot as a collaborator that must be reviewed.
Although not required, some of the features this workshop covers are in these Microsoft Learn modules:
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (for example, status check or comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information, see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos is subject to those third-parties' policies.