Skip to content

Commit

Permalink
Merge pull request #17
Browse files Browse the repository at this point in the history
Add challenging SQL questions
  • Loading branch information
Bilbottom authored Jun 7, 2024
2 parents 4a0978f + 3fb3c34 commit 5d1a152
Show file tree
Hide file tree
Showing 42 changed files with 1,966 additions and 1 deletion.
30 changes: 30 additions & 0 deletions docs/challenging-sql-problems/challenging-sql-problems.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Challenging SQL questions

> [!WARNING]
>
> These questions are not for people new to SQL! These expect you to use advanced SQL techniques that most people don't know.
## Problems

### 🟤 Bronze Tier

As long as you know your database features, these are straightforward.

1. [Fibonacci sequence](problems/bronze/fibonacci-sequence.md)
2. [Temperature anomaly detection](problems/bronze/temperature-anomaly-detection.md)
3. [UK bank holidays](problems/bronze/uk-bank-holidays.md)

### ⚪ Silver Tier

These require a bit more thinking.

1. [Suspicious login activity](problems/silver/suspicious-login-activity.md)
2. [Funnel analytics](problems/silver/funnel-analytics.md)
3. [Customer sales running totals](problems/silver/customer-sales-running-totals.md)

### 🟡 Gold Tier

Expect to spend a bit of time on these.

1. [Loan repayment schedules](problems/gold/loan-repayment-schedule.md)
2. [Travel plans](problems/gold/travel-plans.md)
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Fibonacci sequence 🔢

> [!QUESTION]
>
> Generate the first 45 terms of the [Fibonacci sequence](https://en.wikipedia.org/wiki/Fibonacci_sequence).
>
> The output should have one row per term in the sequence and the columns:
>
> - `n` as the term number
> - `f_n` as the corresponding Fibonacci number
>
> Order the output by `n`.
The [Fibonacci sequence](https://en.wikipedia.org/wiki/Fibonacci_sequence) is defined as _f<sub>n</sub> = f<sub>n-1</sub> + f<sub>n-2</sub>_, where _f<sub>1</sub> = f<sub>2</sub> = 1_.

For example:

- The third term is _f<sub>3</sub> = f<sub>2</sub> + f<sub>1</sub> = 1 + 1 = 2_
- The fourth term is _f<sub>4</sub> = f<sub>3</sub> + f<sub>2</sub> = 2 + 1 = 3_
- ...
- The tenth term is _f<sub>10</sub> = f<sub>9</sub> + f<sub>8</sub> = 34 + 21 = 55_

The solution can be found at:

- [fibonacci-sequence.md](../../solutions/bronze/fibonacci-sequence.md)

---

<!-- prettier-ignore -->
>? INFO: **Sample output**
>
> | n | f_n |
> | --: | --: |
> | 1 | 1 |
> | 2 | 1 |
> | 3 | 2 |
> | 4 | 3 |
> | ... | ... |
<!-- prettier-ignore -->
>? TIP: **Hint 1**
>
> Use a [recursive CTE](../../../from-excel-to-sql/advanced-concepts/recursive-ctes.md) to generate the sequence.
<!-- prettier-ignore -->
>? TIP: **Hint 2**
>
> Use the columns `n`, `f_n`, and `f_m` to keep track of the current term, the current Fibonacci number, and the previous Fibonacci number.
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Temperature anomaly detection 🔍

> [!QUESTION]
>
> Given the temperature data below, identify the temperature readings that are at least 10% higher than the average of the previous 2 and following 2 readings.
>
> Do _not_ include the current reading in the average calculation, and use the calculated average temperature as the denominator for the 10% calculation.
>
> If there are fewer than 2 readings before or 2 after the current reading, do not include the reading in the output.
>
> The output should have the columns:
>
> - `site_id`
> - `reading_datetime`
> - `temperature`
> - `average_temperature` as the average of the 4 readings around the current reading (2 each side), rounded to 4 decimal places
> - `percentage_increase` as the percentage increase of the current reading over the `average_temperature`, rounded to 4 decimal places
>
> Order the output by `site_id` then `reading_datetime`.
<details>
<summary>Expand for the DDL</summary>
--8<-- "docs/challenging-sql-problems/problems/bronze/temperature-anomaly-detection.sql"
</details>

The solution can be found at:

- [temperature-anomaly-detection.md](../../solutions/bronze/temperature-anomaly-detection.md)

---

<!-- prettier-ignore -->
>? INFO: **Sample output**
>
> | site_id | reading_datetime | temperature | average_temperature | percentage_increase |
> |--------:|:--------------------|------------:|--------------------:|--------------------:|
> | 1 | 2021-01-02 02:01:17 | 22.43 | 20.0525 | 11.8564 |
> | ... | ... | ... | ... | ... |
<!-- prettier-ignore -->
>? TIP: **Hint 1**
>
> Use a [window function](../../../from-excel-to-sql/main-concepts/window-functions.md) (or two!) to calculate the average temperature of the surrounding readings.
<!-- prettier-ignore -->
>? TIP: **Hint 2**
>
> Use another [window function](../../../from-excel-to-sql/main-concepts/window-functions.md) to identify rows with at least 4 surrounding readings (2 before and 2 after).
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
```sql
create table readings (
site_id integer,
reading_datetime timestamp,
temperature decimal(5, 2) not null,
primary key (site_id, reading_datetime)
);
insert into readings
values
(1, '2021-01-01 03:26:23', 20.02),
(1, '2021-01-01 19:52:46', 20.17),
(1, '2021-01-02 02:01:17', 22.43),
(1, '2021-01-02 21:02:34', 19.91),
(1, '2021-01-03 04:12:56', 20.11),
(1, '2021-01-03 20:23:12', 20.22),
(1, '2021-01-04 05:34:23', 20.04),
(1, '2021-01-04 21:45:34', 22.69),
(1, '2021-01-05 06:56:45', 20.50),
(1, '2021-01-05 22:07:56', 20.32),
(1, '2021-01-06 07:18:07', 20.17),
(1, '2021-01-06 23:29:18', 23.58),
(2, '2021-01-01 04:30:10', 21.52),
(2, '2021-01-01 17:12:25', 21.48),
(2, '2021-01-02 01:59:43', 23.10),
(2, '2021-01-02 20:05:53', 18.19),
(2, '2021-01-03 03:17:12', 21.23),
(2, '2021-01-03 19:25:20', 21.27),
(2, '2021-01-04 04:33:34', 21.51),
(2, '2021-01-04 20:41:45', 21.49),
(2, '2021-01-05 05:49:56', 21.52),
(2, '2021-01-05 21:58:07', 21.48),
(2, '2021-01-06 07:06:18', 21.50),
(2, '2021-01-06 23:14:29', 21.52)
;
```
47 changes: 47 additions & 0 deletions docs/challenging-sql-problems/problems/bronze/uk-bank-holidays.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# UK bank holidays 📅

> [!QUESTION]
>
> Using [DuckDB](https://duckdb.org/), parse the [UK bank holiday endpoint](https://www.gov.uk/bank-holidays.json) into a table.
>
> - [https://www.gov.uk/bank-holidays.json](https://www.gov.uk/bank-holidays.json)
>
> Each row in the output should correspond to a single event, and the column headers (below) should map directly to the JSON properties with the same names:
>
> - `division`
> - `title`
> - `date`
> - `notes`
> - `bunting`
Here's a starting point:

```sql
from 'https://www.gov.uk/bank-holidays.json'
```

The solution can be found at:

- [uk-bank-holidays.md](../../solutions/bronze/uk-bank-holidays.md)

---

<!-- prettier-ignore -->
>? INFO: **Sample output**
>
> | division | title | date | notes | bunting |
> |:------------------|:---------------|:-----------|:------|:--------|
> | england-and-wales | New Year’s Day | 2018-01-01 | | true |
> | england-and-wales | Good Friday | 2018-03-30 | | false |
> | england-and-wales | Easter Monday | 2018-04-02 | | true |
> | ... | ... | ... | ... | ... |
<!-- prettier-ignore -->
>? TIP: **Hint 1**
>
> Use [`UNPIVOT`](https://duckdb.org/docs/sql/statements/unpivot.html) to move the separate columns for each division into a single column.
<!-- prettier-ignore -->
>? TIP: **Hint 2**
>
> Use [`UNNEST`](https://duckdb.org/docs/sql/query_syntax/unnest.html) to explode the event JSON into separate rows and columns.
141 changes: 141 additions & 0 deletions docs/challenging-sql-problems/problems/gold/loan-repayment-schedule.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Loan repayment schedule 💰

> [!QUESTION]
>
> For the loan details below, generate the loan repayment schedules for the loans.
>
> The output should have a row per loan per repayment and the columns:
>
> - `loan_id`
> - `repayment_number` as the repayment number
> - `repayment_date` as the date of the repayment
> - `interest` as the interest component of the repayment
> - `principal` as the principal component of the repayment
> - `total` as the total repayment
> - `balance` as the outstanding balance _after_ the repayment
>
> Order the output by `loan_id` and `repayment_number`.
<details>
<summary>Expand for the DDL</summary>
--8<-- "docs/challenging-sql-problems/problems/gold/loan-repayment-schedule.sql"
</details>

The loans have the following details:

- `loan_id`: The unique identifier for the loan
- `loan_value`: The total value of the loan
- `interest_rate`: The monthly interest rate
- `repayments`: The number of monthly repayments to make on the loan
- `start_date`: The date the loan was taken out

The repayments are due exactly one month after each other (no need to account for weekends or holidays), and the first repayment is due one month after the `start_date`. The `start_date` will never be on the 29th, 30th, or 31st of the month.

For each loan, the monthly repayment will be for the same amount (except the final one) which you need to calculate, or check **Hint 0** below. The monthly repayment must be rounded to two decimal places, but any rounding error should be accounted for in the final repayment so that the outstanding balance is exactly zero after the final repayment.

Each repayment, the interest is calculated and added first, and then the repayment is subtracted from the balance. The interest is calculated on the current outstanding balance and rounded to two decimal places.

A monthly repayment will be made up of two parts: the interest and the principal. The interest is calculated as described above, and the principal is the difference between the monthly repayment and the interest so is the amount that goes towards actually paying off the loan.

The solution can be found at:

- [loan-repayment-schedule.md](../../solutions/gold/loan-repayment-schedule.md)

A worked example is provided below to help illustrate the loan calculations.

---

<!-- prettier-ignore -->
>? INFO: **Sample output**
>
> | loan_id | repayment_number | repayment_date | interest | principal | total | balance |
> |--------:|-----------------:|:---------------|---------:|----------:|---------:|---------:|
> | 1 | 1 | 2024-02-03 | 1600.00 | 12682.06 | 14282.06 | 67317.94 |
> | 1 | 2 | 2024-03-03 | 1346.36 | 12935.70 | 14282.06 | 54382.24 |
> | 1 | 3 | 2024-04-03 | 1087.64 | 13194.42 | 14282.06 | 41187.82 |
> | 1 | 4 | ... | ... | ... | ... | ... |
> | 2 | 1 | 2024-02-02 | 1125.00 | 5751.00 | 6876.00 | 69249.00 |
> | 2 | 2 | 2024-03-02 | 1038.74 | 5837.26 | 6876.00 | 63411.74 |
> | 2 | 3 | 2024-04-02 | 951.18 | 5924.82 | 6876.00 | 57486.92 |
> | 2 | 4 | ... | ... | ... | ... | ... |
> | 3 | 1 | 2024-02-01 | 1000.00 | 3707.35 | 4707.35 | 96292.65 |
> | 3 | 2 | 2024-03-01 | 962.93 | 3744.42 | 4707.35 | 92548.23 |
> | 3 | 3 | 2024-04-01 | 925.48 | 3781.87 | 4707.35 | 88766.36 |
> | 3 | 4 | ... | ... | ... | ... | ... |
<!-- prettier-ignore -->
>? TIP: **Hint 0**
>
> The formula for calculating the monthly repayment is:
>
> - `(1 + interest_rate)` to the power of `repayments` as `amortised_rate`, then
> - `loan_value * interest_rate * amortised_rate / (amortised_rate - 1)` as `monthly_repayment`
<!-- prettier-ignore -->
>? TIP: **Hint 1**
>
> Use a [recursive CTE](../../../from-excel-to-sql/advanced-concepts/recursive-ctes.md) to generate and calculate the rows for the repayment schedule.
<!-- prettier-ignore -->
>? TIP: **Hint 2**
>
> For the recursive CTE's anchor statement, start with a dummy row for each loan with only the loan value and the start date. Then, recursively calculate the interest, principal, and balance for each repayment in the recursive statement.
<!-- prettier-ignore -->
>? TIP: **Hint 3**
>
> Calculate the final repayment's details separately to account for any rounding errors.
---

### Worked example

To help illustrate the loan calculations, consider the following loan:

- `loan_value`: 10,000.00
- `interest_rate`: 0.01
- `repayments`: 6
- `start_date`: 2024-01-01

A loan with these details will have a monthly repayment value of 1,725.48 (rounded to 2 decimal places).

Let's walk through a few repayments.

#### The first repayment

- The first repayment is due on 2024-02-01
- The interest is calculated on the outstanding balance of 10,000.00
- The interest is 1%, so the interest for the month is 100.00 (10,000.00 \* 0.01)
- The repayment is 1,725.48, so the outstanding balance after the repayment is 8,374.52 (10,000.00 + 100.00 - 1,725.48)
- We note that the principal component of the repayment is 1,625.48 (1,725.48 - 100.00)

#### The second repayment

- The second repayment is due on 2024-03-01
- The interest is calculated on the outstanding balance of 8,374.52
- The interest is 1%, so the interest for the month is 83.75 (8,374.52 \* 0.01)
- The repayment is 1,725.48, so the outstanding balance after the repayment is 6,732.79 (8,374.52 + 83.75 - 1,725.48)
- We note that the principal component of the repayment is 1,641.73 (1,725.48 - 83.75)

#### The third, fourth, and fifth repayments

- The interest and principal components are calculated in the same way as above
- The outstanding balance after the fifth repayment is 1,708.43

#### The final repayment

- The final repayment is due on 2024-07-01
- The interest is calculated on the outstanding balance of 1,708.43
- The interest is 1%, so the interest for the month is 17.08 (1,708.43 \* 0.01)
- Since this is the final repayment and we want to account for any rounding errors, the repayment is the outstanding balance plus the interest: 1,725.51 (1,708.43 + 17.08)

Therefore, the repayment schedule for this loan would look like:

| loan_id | repayment_number | repayment_date | interest | principal | total | balance |
| ------: | ---------------: | :------------- | -------: | --------: | ------: | ------: |
| 1 | 1 | 2024-02-01 | 100.00 | 1625.48 | 1725.48 | 8374.52 |
| 1 | 2 | 2024-03-01 | 83.75 | 1641.73 | 1725.48 | 6732.79 |
| 1 | 3 | 2024-04-01 | 67.33 | 1658.15 | 1725.48 | 5074.64 |
| 1 | 4 | 2024-05-01 | 50.75 | 1674.73 | 1725.48 | 3399.91 |
| 1 | 5 | 2024-06-01 | 34.00 | 1691.48 | 1725.48 | 1708.43 |
| 1 | 6 | 2024-07-01 | 17.08 | 1708.43 | 1725.51 | 0.00 |
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
```sql
create table loans (
loan_id integer primary key,
loan_value decimal(10, 2) not null,
interest_rate decimal(5, 4) not null,
repayments integer not null,
start_date date not null
);
insert into loans
values
(1, 80000.00, 0.020, 6, '2024-01-01'),
(2, 75000.00, 0.015, 12, '2024-01-02'),
(3, 100000.00, 0.010, 24, '2024-01-03')
;
```
Loading

0 comments on commit 5d1a152

Please sign in to comment.