-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathInterviews.sql
More file actions
311 lines (251 loc) · 12 KB
/
Interviews.sql
File metadata and controls
311 lines (251 loc) · 12 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
Purpose of the query
To retrieve contests used in colleges along with their total submissions, accepted submissions, views, and unique views, excluding contests where all these totals are zero.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
select c.contest_id,hacker_id,name,
sum(t_s),sum(t_a_s),sum(t_v),sum(t_u_v)
from contests c
join colleges cg on c.contest_id=cg.contest_id
join challenges cl on cg.college_id=cl.college_id
left join (select challenge_id,sum(coalesce(total_views,0)) as t_v,
sum(coalesce(total_unique_views,0)) as t_u_v
from view_stats
group by challenge_id
) as v_s
on v_s.challenge_id=cl.challenge_id
left join(select challenge_id, sum(coalesce(total_submissions,0)) as t_s,
sum(coalesce(total_accepted_submissions,0)) as t_a_s
from submission_stats
group by challenge_id)
as s_s
on s_s.challenge_id=cl.challenge_id
group by 1,2,3
having sum(t_s)+sum(t_a_s)+sum(t_v)+sum(t_u_v)>0
order by c.contest_id asc
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Output
845 579 Rose 1987 580 1635 566
858 1053 Angela 703 160 1002 384
883 1055 Frank 1121 319 1217 338
1793 2655 Patrick 1337 360 1216 412
2374 2765 Lisa 2733 815 3368 904
2963 2845 Kimberly 4306 1221 3603 1184
3584 2873 Bonnie 2492 652 3019 954
4044 3067 Michael 1323 449 1722 528
4249 3116 Todd 1452 376 1767 463
4269 3256 Joe 1018 372 1766 530
4483 3386 Earl 1911 572 1644 477
4541 3608 Robert 1886 516 1694 504
4601 3868 Amy 1900 639 1738 548
4710 4255 Pamela 2752 639 2378 705
4982 5639 Maria 2705 759 2558 711
5913 5669 Joe 2646 790 3181 835
5994 5713 Linda 3369 967 3048 954
6939 6550 Melissa 2842 859 3574 1004
7266 6947 Carol 2758 665 3044 835
7280 7030 Paula 1963 554 886 259
7484 7033 Marilyn 3217 934 3795 1061
7734 7386 Jennifer 3780 1015 3637 1099
7831 7787 Harry 3190 883 2933 1012
7862 8029 David 1738 476 1475 472
8812 8147 Julia 1044 302 819 266
8825 8438 Kevin 2624 772 2187 689
9136 8727 Paul 4205 1359 3125 954
9613 8762 James 3438 943 3620 1046
10568 8802 Kelly 1907 620 2577 798
11100 8809 Robin 1929 613 1883 619
12742 9203 Ralph 1523 413 1344 383
12861 9644 Gloria 1596 536 2089 623
12865 10108 Victor 2076 597 1259 418
13503 10803 David 924 251 584 167
13537 11390 Joyce 1381 497 1784 538
13612 12592 Donna 1981 550 1487 465
14502 12923 Michelle 1510 463 1830 545
14867 13017 Stephanie 2471 676 2291 574
15164 13256 Gerald 2570 820 2085 607
15804 13421 Walter 1454 459 1396 476
15891 13569 Christina 2188 710 2266 786
16063 14287 Brandon 1804 580 1621 521
16415 14311 Elizabeth 4535 1366 3631 1071
18477 14440 Joseph 1320 391 1419 428
18855 16973 Lawrence 2967 1020 3371 1011
19097 17123 Marilyn 2956 807 2554 750
19575 17562 Lori 2590 863 2627 760
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
why s_s.challenge_id=cl.challenge_id but not s_s.challenge_id=v_s.challenge_id because only if you view the challenges you can submit the challenges why not join matching rows of it
💡 Why join both on cl.challenge_id and not s_s.challenge_id = v_s.challenge_id?
1. Relational Integrity:
The challenges table (cl) is the main anchor table.
view_stats and submission_stats are side data about the same challenge.
2. Join target is cl (not v_s):
You're joining both view and submission stats back to the main list of challenges.
If you did:
ON s_s.challenge_id = v_s.challenge_id
That would:
Only match submission stats if there is also a view stat (which violates LEFT JOIN's purpose).
Drop challenges that have submissions but no views — possibly incorrect depending on data quality.
3. LEFT JOIN preserves all challenges:
The goal is probably to get a full list of all challenges, even those:
That were never viewed (i.e., v_s.challenge_id IS NULL)
That were never submitted to (s_s.challenge_id IS NULL)
🔎 About your thought: "you can only submit if you view"
Logically true, but SQL doesn’t enforce this unless your data guarantees it.
Maybe some challenge got submissions via API or bulk tools and didn’t count as a "view".
Maybe views were tracked after submissions started.
The view_stats or submission_stats might be incomplete.
So, SQL joins don’t assume causality — they stick to what’s explicitly present in the data.
✅ Final takeaway:
sql
Copy
Edit
s_s.challenge_id = cl.challenge_id
is used to map submission data to each challenge, not necessarily depending on view data.
If you want to filter to only challenges that were viewed, you could add this after the joins:
sql
Copy
Edit
WHERE v_s.challenge_id IS NOT NULL
then some times to join we use 1=1 doesnt this violate the data
ChatGPT said:
Excellent follow-up, Shuna! 👏
Let’s demystify the 1=1 join condition — when it’s used, why it doesn’t violate the data (in the right context), and when it can go wrong.
✅ What does 1 = 1 in a join mean?
sql
Copy
Edit
FROM table_a
JOIN table_b ON 1 = 1
This is a Cartesian Join (also called a Cross Join).
It means:
“Join every row of table_a with every row of table_b.”
✅ When is this valid and useful?
Controlled use, like:
Generating combinations (e.g., all dates × products)
Pivoting or simulating rows
Joining with a one-row table for defaults
Temporary table joining for calculations
Example:
sql
Copy
Edit
SELECT e.name, d.holiday_name
FROM employees e
JOIN holidays d ON 1=1
→ This gives every employee × every holiday, which might be what you want if you’re calculating leave eligibility for all dates.
❌ When does it go wrong?
When you accidentally use 1=1 instead of a real join condition, causing huge result sets (n × m rows).
If you forget a WHERE filter afterward, you might mess up totals or logic.
📌 Does it violate data?
No, not technically. SQL lets you do Cartesian joins — it assumes you know what you're doing.
But if you're expecting row alignment like a normal join, yes, logically, it could "violate" your intention.
🔥 Real-life danger example:
sql
Copy
Edit
SELECT c.name, o.order_id
FROM customers c
JOIN orders o ON 1=1
This will join every customer to every order → 💥 messy, slow, incorrect data.
✅ TL;DR
JOIN ON 1=1 Should you use it?
For generating all combinations? ✅ Yes
For actual data relationships? ❌ No
Accidental use in place of proper key join? 🔥 Big probl
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
why use left join for
we are uing left join because we are not leaving behind any challenges that are not assigned to colleges
we are getting all challenges irrepective of whether it is adssigned to college or not
in case if it is not asssiged to any colleges, challenge would have no views and stats in this case left join is used
if the challenge_id is just created it would have no view or stats
even it is assigned to the company but no one has viewed yet - but for this it would get matched rows
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
"Why not just join everything and group by challenge_id at the end — won't it give the same result?"
Short answer:
❌ No, it won’t give the same result in most real-world cases.
🔥 Here’s why — the "Row Explosion Problem"
When you join multiple one-to-many relationships directly (without subqueries or aggregation), the number of rows multiplies.
📊 Example Scenario
Let’s say:
View_Stats has:
3 rows for challenge_id = 101
| challenge\_id | total\_views |
| ------------- | ------------ |
| 101 | 10 |
| 101 | 5 |
| 101 | 3 |
Submission_Stats has:
4 rows for challenge_id = 101
| challenge\_id | total\_submissions |
| ------------- | ------------------ |
| 101 | 2 |
| 101 | 4 |
| 101 | 6 |
| 101 | 1 |
If you join them directly (without subqueries), the database creates a cross join effect for matching keys:
Since every row in View_Stats matches every row in Submission_Stats (same challenge_id), SQL creates all possible combinations
| vs.total\_views | ss.total\_submissions |
| --------------- | --------------------- |
| 10 | 2 |
| 10 | 4 |
| 10 | 6 |
| 10 | 1 |
| 5 | 2 |
| 5 | 4 |
| 5 | 6 |
| 5 | 1 |
| 3 | 2 |
| 3 | 4 |
| 3 | 6 |
| 3 | 1 |
3 View_Stats rows × 4 Submission_Stats rows = 12 rows
This multiplies the totals — your SUM(total_views) and SUM(total_submissions) will be inflated because each value appears multiple times.
and if you do sum it adds up all SUM(total_views) = (10+10+10+10 + 5+5+5+5 + 3+3+3+3) = 132 which is wrong
instead of 10+5+3 =18 (correct)
but if you do join in subquery then
o/p be for sum
| challenge\_id | total\_views | total\_submissions |
| ------------- | ------------ | ------------------ |
| 101 | 18 | 13 |
only if we want to do aggrgation like sum,min,max,count using subquery in join is Good praticse but if you want to join without any aggregation it gives same outp
as above of 12 rows so using subquery is redundant
🛡 Why subqueries fix this:
With subqueries:
You aggregate once per challenge (one row per challenge)
Then join those single rows — no row multiplication
Your SUM() at the end is accurate
🧩 When subqueries (or CTEs) become necessary:
When you need to summarize or filter data first before joining
To break complex logic into chunks
To avoid multiplication of rows (especially in SUM, AVG, COUNT situations)
When doing multi-level analytics or rankings
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Your query:
FROM contests c
JOIN colleges cl ON c.contest_id = cl.contest_id
JOIN challenges cg ON c.college_id = cg.college_id
Lets assume this table structure:
contests(contest_id, contest_name)
colleges(college_id, contest_id, college_name)
challenges(challenge_id, college_id, challenge_name)
Correct way to write the JOIN:
FROM contests c
JOIN colleges cl ON c.contest_id = cl.contest_id
JOIN challenges cg ON cl.college_id = cg.college_id
Explanation:
First JOIN: Get colleges that are linked to contests.
Second JOIN: Get challenges that are linked to those colleges.
-----------------------------------------------------------------------------------------------------------------------------------------------------------
🔍 Why not all challenges may have stats?
Because some challenges might not have received any views or submissions yet — especially if:
The contest was just created.
No one attempted the challenge.
Data entry in view_stats or submission_stats is missing for that challenge.
So the challenge exists in the challenges table, but there’s no matching row in the view_stats or submission_stats table — which means the JOIN would return NULL
for that challenge’s stats unless we use a LEFT JOIN.
🎯 Real-World Analogy
Think of:
challenges = list of all practice questions.
view_stats = how many students viewed them.
submission_stats = how many tried submitting answers.
Now:
Just because a question exists doesn’t mean a student viewed or submitted it.
So unless you left join, you will lose questions that were never touched.