-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refine SQL questions and add one more
- Loading branch information
Showing
14 changed files
with
273 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
52 changes: 52 additions & 0 deletions
52
docs/challenging-sql-problems/problems/silver/predicting-values.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Predicting values 🎱 | ||
|
||
> [!SUCCESS] Scenario | ||
> | ||
> Some students are studying [Anscombe's quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet) and have been asked to predict the `y` values for a given set of `x` values for each of the four datasets using [linear regression](https://en.wikipedia.org/wiki/Linear_regression). | ||
> [!QUESTION] | ||
> | ||
> For each of the four datasets in Anscombe's quartet, use linear regression to predict the `y` values for `x` values `16`, `17`, and `18`. | ||
> | ||
> The output should have a row for each `x` value (`16`, `17`, `18`), with the columns: | ||
> | ||
> - `x` | ||
> - `dataset_1` as the predicted value for dataset 1, rounded to 1 decimal place | ||
> - `dataset_2` as the predicted value for dataset 2, rounded to 1 decimal place | ||
> - `dataset_3` as the predicted value for dataset 3, rounded to 1 decimal place | ||
> - `dataset_4` as the predicted value for dataset 4, rounded to 1 decimal place | ||
> | ||
> Order the output by `x`. | ||
<details> | ||
<summary>Expand for the DDL</summary> | ||
--8<-- "docs/challenging-sql-problems/problems/silver/predicting-values.sql" | ||
</details> | ||
|
||
There are plenty of resources online that walk through the maths behind linear regression, such as: | ||
|
||
- [https://www.youtube.com/watch?v=GAmzwIkGFgE](https://www.youtube.com/watch?v=GAmzwIkGFgE) | ||
|
||
The solution can be found at: | ||
|
||
- [predicting-values.md](../../solutions/silver/predicting-values.md) | ||
|
||
--- | ||
|
||
<!-- prettier-ignore --> | ||
>? INFO: **Sample output** | ||
> | ||
> | x | dataset_1 | dataset_2 | dataset_3 | dataset_4 | | ||
> |----:|----------:|----------:|----------:|----------:| | ||
> | 16 | 11.0 | 11.0 | 11.0 | 11.0 | | ||
> | ... | ... | ... | ... | ... | | ||
<!-- prettier-ignore --> | ||
>? TIP: **Hint 1** | ||
> | ||
> Unpivot the datasets so that you have a table with headers `dataset`, `x`, and `y`, then apply the linear regression, and finally pivot the results back. | ||
<!-- prettier-ignore --> | ||
>? TIP: **Hint 2** | ||
> | ||
> For databases that support them, use the `regr_slope` and `regr_intercept` functions (or equivalent) to calculate the slope and intercept of the regression line. Otherwise, you'll need to calculate these manually 😄 |
26 changes: 26 additions & 0 deletions
26
docs/challenging-sql-problems/problems/silver/predicting-values.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
```sql | ||
create table anscombes_quartet ( | ||
dataset_1__x int, | ||
dataset_1__y decimal(5, 2), | ||
dataset_2__x int, | ||
dataset_2__y decimal(5, 2), | ||
dataset_3__x int, | ||
dataset_3__y decimal(5, 2), | ||
dataset_4__x int, | ||
dataset_4__y decimal(5, 2), | ||
); | ||
insert into anscombes_quartet | ||
values | ||
(10, 8.04, 10, 9.14, 10, 7.46, 8, 6.58), | ||
( 8, 6.95, 8, 8.14, 8, 6.77, 8, 5.76), | ||
(13, 7.58, 13, 8.74, 13, 12.74, 8, 7.71), | ||
( 9, 8.81, 9, 8.77, 9, 7.11, 8, 8.84), | ||
(11, 8.33, 11, 9.26, 11, 7.81, 8, 8.47), | ||
(14, 9.96, 14, 8.10, 14, 8.84, 8, 7.04), | ||
( 6, 7.24, 6, 6.13, 6, 6.08, 8, 5.25), | ||
( 4, 4.26, 4, 3.10, 4, 5.39, 19, 12.50), | ||
(12, 10.84, 12, 9.13, 12, 8.15, 8, 5.56), | ||
( 7, 4.82, 7, 7.26, 7, 6.42, 8, 7.91), | ||
( 5, 5.68, 5, 4.74, 5, 5.73, 8, 6.89) | ||
; | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
32 changes: 32 additions & 0 deletions
32
docs/challenging-sql-problems/solutions/bronze/temperature-anomaly-detection--sqlite.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
```sql | ||
with temperatures as ( | ||
select | ||
site_id, | ||
reading_datetime, | ||
temperature, | ||
avg(temperature) over rows_around_site_reading as average_temperature, | ||
count(*) over rows_around_site_reading as count_of_rows | ||
from readings | ||
window rows_around_site_reading as ( | ||
partition by site_id | ||
order by reading_datetime | ||
rows between 2 preceding | ||
and 2 following | ||
exclude current row | ||
) | ||
) | ||
select | ||
site_id, | ||
reading_datetime, | ||
temperature, | ||
round(average_temperature, 4) as average_temperature, | ||
round(100.0 * (temperature - average_temperature) / average_temperature, 4) as percentage_increase | ||
from temperatures | ||
where 1=1 | ||
and count_of_rows = 4 | ||
and (temperature - average_temperature) / average_temperature > 0.1 | ||
order by | ||
site_id, | ||
reading_datetime | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
42 changes: 42 additions & 0 deletions
42
docs/challenging-sql-problems/solutions/silver/predicting-values--duckdb--manual.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
```sql | ||
with | ||
unpivoted as ( | ||
unpivot anscombes_quartet | ||
on | ||
(dataset_1__x, dataset_1__y) as dataset_1, | ||
(dataset_2__x, dataset_2__y) as dataset_2, | ||
(dataset_3__x, dataset_3__y) as dataset_3, | ||
(dataset_4__x, dataset_4__y) as dataset_4, | ||
into | ||
name dataset | ||
value x, y | ||
), | ||
coefficients as ( | ||
select | ||
dataset, | ||
avg(x) as avg_x, | ||
avg(y) as avg_y, | ||
avg(x * x) as avg_xx, | ||
avg(x * y) as avg_xy, | ||
(avg_x * avg_y - avg_xy) / (avg_x * avg_x - avg_xx) as m, | ||
avg_y - m * avg_x as c, | ||
from unpivoted | ||
group by dataset | ||
), | ||
predictions as ( | ||
select | ||
dataset, | ||
x, | ||
round(m * x + c, 1) as y | ||
from coefficients | ||
cross join (values (16), (17), (18)) as v(x) | ||
) | ||
pivot predictions | ||
on dataset | ||
using any_value(y) | ||
order by x | ||
``` |
38 changes: 38 additions & 0 deletions
38
docs/challenging-sql-problems/solutions/silver/predicting-values--duckdb--regr.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
```sql | ||
with | ||
unpivoted as ( | ||
unpivot anscombes_quartet | ||
on | ||
(dataset_1__x, dataset_1__y) as dataset_1, | ||
(dataset_2__x, dataset_2__y) as dataset_2, | ||
(dataset_3__x, dataset_3__y) as dataset_3, | ||
(dataset_4__x, dataset_4__y) as dataset_4, | ||
into | ||
name dataset | ||
value x, y | ||
), | ||
coefficients as ( | ||
select | ||
dataset, | ||
regr_slope(y, x) as m, | ||
regr_intercept(y, x) as c, | ||
from unpivoted | ||
group by dataset | ||
), | ||
predictions as ( | ||
select | ||
dataset, | ||
x, | ||
round(m * x + c, 1) as y | ||
from coefficients | ||
cross join (values (16), (17), (18)) as v(x) | ||
) | ||
pivot predictions | ||
on dataset | ||
using any_value(y) | ||
order by x | ||
``` |
39 changes: 39 additions & 0 deletions
39
docs/challenging-sql-problems/solutions/silver/predicting-values.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Predicting values 🎱 | ||
|
||
> [!TIP] | ||
> | ||
> Solution to the following problem: | ||
> | ||
> - [predicting-values.md](../../problems/silver/predicting-values.md) | ||
## Result Set | ||
|
||
Regardless of the database, the result set should look like: | ||
|
||
| x | dataset_1 | dataset_2 | dataset_3 | dataset_4 | | ||
| --: | --------: | --------: | --------: | --------: | | ||
| 16 | 11.0 | 11.0 | 11.0 | 11.0 | | ||
| 17 | 11.5 | 11.5 | 11.5 | 11.5 | | ||
| 18 | 12.0 | 12.0 | 12.0 | 12.0 | | ||
|
||
This is one of the interesting things about Anscombe's quartet (and is the reason Anscombe created it): the four datasets have the same summary statistics, but look very different when plotted! | ||
|
||
<details> | ||
<summary>Expand for the DDL</summary> | ||
--8<-- "docs/challenging-sql-problems/solutions/silver/predicting-values.sql" | ||
</details> | ||
|
||
## Solution | ||
|
||
Some SQL solutions per database are provided below. | ||
|
||
<!-- prettier-ignore --> | ||
> SUCCESS: **DuckDB** | ||
> | ||
> Here's a solution using the `regr_slope` and `regr_intercept` functions: | ||
> | ||
--8<-- "docs/challenging-sql-problems/solutions/silver/predicting-values--duckdb--regr.sql" | ||
> | ||
> ...and one doing this manually: | ||
> | ||
--8<-- "docs/challenging-sql-problems/solutions/silver/predicting-values--duckdb--manual.sql" |
8 changes: 8 additions & 0 deletions
8
docs/challenging-sql-problems/solutions/silver/predicting-values.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
```sql | ||
select * | ||
from values | ||
(16, 11.0, 11.0, 11.0, 11.0), | ||
(17, 11.5, 11.5, 11.5, 11.5), | ||
(18, 12.0, 12.0, 12.0, 12.0) | ||
as solution(x, dataset_1, dataset_2, dataset_3, dataset_4) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters