Skip to content

Latest commit

 

History

History
310 lines (270 loc) · 7.3 KB

File metadata and controls

310 lines (270 loc) · 7.3 KB

Subqueries and Common Table Expressions (CTEs)

This lesson covers advanced SQL concepts: subqueries and Common Table Expressions (CTEs), which are essential for writing complex, efficient, and maintainable queries.

Subqueries

Subqueries (nested queries) are queries embedded within another query. They can be used in different parts of a SQL statement.

Subqueries in WHERE Clause

Subqueries in WHERE clauses filter results based on a nested query's results.

-- Find books borrowed more than average
SELECT Title, BorrowCount
FROM Books
WHERE BorrowCount > (
    SELECT AVG(BorrowCount)
    FROM Books
);

-- Find members who have borrowed books in the last month
SELECT FirstName, LastName
FROM Members
WHERE MemberID IN (
    SELECT DISTINCT MemberID
    FROM Loans
    WHERE LoanDate >= DATEADD(month, -1, GETDATE())
);

Subqueries in SELECT Clause

Subqueries in SELECT statements can compute values for each row.

-- Show each book's borrow count and its difference from average
SELECT 
    Title,
    BorrowCount,
    BorrowCount - (
        SELECT AVG(BorrowCount) 
        FROM Books
    ) as DifferenceFromAverage
FROM Books;

-- Calculate percentage of total loans for each member
SELECT 
    m.FirstName,
    m.LastName,
    COUNT(l.LoanID) as LoanCount,
    (COUNT(l.LoanID) * 100.0) / (
        SELECT COUNT(*) 
        FROM Loans
    ) as PercentageOfTotal
FROM Members m
LEFT JOIN Loans l ON m.MemberID = l.MemberID
GROUP BY m.MemberID, m.FirstName, m.LastName;

Subqueries in FROM Clause

Subqueries in FROM clauses create derived tables that can be queried like regular tables.

-- Calculate average loans per active member
SELECT AVG(LoanCount) as AvgLoansPerMember
FROM (
    SELECT 
        MemberID,
        COUNT(LoanID) as LoanCount
    FROM Loans
    GROUP BY MemberID
) as MemberLoans;

-- Find books with above-average ratings in each category
SELECT b.Title, b.Category, b.Rating
FROM Books b
INNER JOIN (
    SELECT 
        Category,
        AVG(Rating) as AvgRating
    FROM Books
    GROUP BY Category
) as CategoryAvg
ON b.Category = CategoryAvg.Category
WHERE b.Rating > CategoryAvg.AvgRating;

Common Table Expressions (CTEs)

CTEs provide a way to write auxiliary statements for use in a larger query. They make complex queries more readable and maintainable.

Basic CTE Syntax

WITH CTE_Name AS (
    SELECT column1, column2
    FROM Table
    WHERE condition
)
SELECT *
FROM CTE_Name;

Practical CTE Examples

-- Calculate member borrowing statistics
WITH MemberStats AS (
    SELECT 
        m.MemberID,
        m.FirstName,
        m.LastName,
        COUNT(l.LoanID) as LoanCount,
        AVG(DATEDIFF(day, l.LoanDate, l.ReturnDate)) as AvgLoanDuration
    FROM Members m
    LEFT JOIN Loans l ON m.MemberID = l.MemberID
    GROUP BY m.MemberID, m.FirstName, m.LastName
)
SELECT *
FROM MemberStats
WHERE LoanCount > 0
ORDER BY LoanCount DESC;

-- Find popular book categories by age group
WITH MemberAgeGroups AS (
    SELECT 
        MemberID,
        CASE 
            WHEN Age < 20 THEN 'Teen'
            WHEN Age BETWEEN 20 AND 30 THEN 'Young Adult'
            WHEN Age BETWEEN 31 AND 50 THEN 'Adult'
            ELSE 'Senior'
        END as AgeGroup
    FROM Members
),
CategoryLoans AS (
    SELECT 
        mag.AgeGroup,
        b.Category,
        COUNT(*) as LoanCount
    FROM MemberAgeGroups mag
    JOIN Loans l ON mag.MemberID = l.MemberID
    JOIN Books b ON l.BookID = b.BookID
    GROUP BY mag.AgeGroup, b.Category
)
SELECT 
    AgeGroup,
    Category,
    LoanCount,
    RANK() OVER(PARTITION BY AgeGroup ORDER BY LoanCount DESC) as Rank
FROM CategoryLoans
WHERE LoanCount > 10
ORDER BY AgeGroup, LoanCount DESC;

Multiple CTEs

-- Analyze book popularity and availability
WITH BookLoans AS (
    SELECT 
        BookID,
        COUNT(*) as LoanCount
    FROM Loans
    GROUP BY BookID
),
BookAvailability AS (
    SELECT 
        BookID,
        CASE 
            WHEN CurrentStock > 5 THEN 'High'
            WHEN CurrentStock > 0 THEN 'Low'
            ELSE 'Out of Stock'
        END as Availability
    FROM Books
)
SELECT 
    b.Title,
    bl.LoanCount,
    ba.Availability
FROM Books b
JOIN BookLoans bl ON b.BookID = bl.BookID
JOIN BookAvailability ba ON b.BookID = ba.BookID
ORDER BY bl.LoanCount DESC;

Best Practices

  1. Subquery Guidelines:

    • Use meaningful aliases
    • Keep subqueries focused and simple
    • Consider performance implications
    • Use CTEs for complex subqueries
  2. CTE Best Practices:

    • Use descriptive names
    • Break complex queries into logical parts
    • Consider using multiple CTEs for clarity
    • Don't overuse - simple queries don't need CTEs

Exercise 🔴

-- Create sample tables
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    Name VARCHAR(100),
    DepartmentID INT,
    Salary DECIMAL(10,2)
);

CREATE TABLE Departments (
    DepartmentID INT PRIMARY KEY,
    DepartmentName VARCHAR(50),
    Location VARCHAR(50)
);

-- Insert sample data
INSERT INTO Departments VALUES
(1, 'IT', 'New York'),
(2, 'HR', 'London'),
(3, 'Finance', 'Tokyo');

INSERT INTO Employees VALUES
(1, 'John Doe', 1, 75000),
(2, 'Jane Smith', 1, 85000),
(3, 'Bob Johnson', 2, 65000),
(4, 'Alice Brown', 3, 90000);

-- Tasks:
-- 1. Write a query using a subquery to find employees who earn more than
--    their department's average salary

-- 2. Create a CTE to rank departments by average salary and show
--    employees in the top-paying department

-- 3. Use multiple CTEs to analyze salary distribution across locations

-- Solutions:

-- 1. Employees above department average
SELECT 
    e.Name,
    e.Salary,
    d.DepartmentName
FROM Employees e
JOIN Departments d ON e.DepartmentID = d.DepartmentID
WHERE e.Salary > (
    SELECT AVG(Salary)
    FROM Employees e2
    WHERE e2.DepartmentID = e.DepartmentID
);

-- 2. Top-paying department employees
WITH DeptRanking AS (
    SELECT 
        d.DepartmentID,
        d.DepartmentName,
        AVG(e.Salary) as AvgSalary,
        RANK() OVER(ORDER BY AVG(e.Salary) DESC) as SalaryRank
    FROM Departments d
    JOIN Employees e ON d.DepartmentID = e.DepartmentID
    GROUP BY d.DepartmentID, d.DepartmentName
)
SELECT 
    e.Name,
    e.Salary,
    dr.DepartmentName,
    dr.AvgSalary
FROM Employees e
JOIN DeptRanking dr ON e.DepartmentID = dr.DepartmentID
WHERE dr.SalaryRank = 1;

-- 3. Salary distribution by location
WITH LocationStats AS (
    SELECT 
        d.Location,
        MIN(e.Salary) as MinSalary,
        MAX(e.Salary) as MaxSalary,
        AVG(e.Salary) as AvgSalary
    FROM Departments d
    JOIN Employees e ON d.DepartmentID = e.DepartmentID
    GROUP BY d.Location
),
SalaryCategories AS (
    SELECT
        e.Name,
        e.Salary,
        d.Location,
        CASE
            WHEN e.Salary > ls.AvgSalary THEN 'Above Average'
            WHEN e.Salary = ls.AvgSalary THEN 'Average'
            ELSE 'Below Average'
        END as SalaryCategory
    FROM Employees e
    JOIN Departments d ON e.DepartmentID = d.DepartmentID
    JOIN LocationStats ls ON d.Location = ls.Location
)
SELECT *
FROM SalaryCategories
ORDER BY Location, Salary DESC;

This exercise demonstrates:

  • Using subqueries for comparative analysis
  • Creating CTEs for complex data transformations
  • Combining multiple CTEs for detailed analysis
  • Practical application of window functions with CTEs