Skip to content

Commit

Permalink
Add min(d), max(d) & hash functions: md5(), sha1(), sha256() (#98)
Browse files Browse the repository at this point in the history
  • Loading branch information
goodroot authored Dec 4, 2024
1 parent 12c497a commit 335a9ac
Show file tree
Hide file tree
Showing 3 changed files with 230 additions and 0 deletions.
147 changes: 147 additions & 0 deletions documentation/reference/function/hash.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
---
title: Hash Functions
sidebar_label: Hash
description: Hash (md5, sha1 and sha256) SQL functions reference documentation and explanation.
---

Hash functions generate fixed-size string outputs from variable-length inputs.

These functions are useful for data integrity verification, checksums, and data anonymization.

## Supported functions

- [`md5()`](#md5) – Generates a 128-bit (32 character) hash value
- [`sha1()`](#sha1) – Generates a 160-bit (40 character) hash value
- [`sha256()`](#sha256) – Generates a 256-bit (64 character) hash value

## Function reference

### md5()

Calculates an MD5 hash of the input value and returns it as a hexadecimal string.

**Arguments:**
- String, varchar, or binary value

**Return value:**
- A 32-character hexadecimal string representing the MD5 hash
- NULL if the input is NULL

**Examples:**
```questdb-sql title="md5() with string input" demo
SELECT md5('abc');
-- Returns: '900150983cd24fb0d6963f7d28e17f72'
SELECT md5('');
-- Returns: 'd41d8cd98f00b204e9800998ecf8427e'
```

```questdb-sql title="md5() with UTF-8 input" demo
SELECT md5('Hello, world!');
-- Returns: '6cd3556deb0da54bca060b4c39479839'
```

### sha1()

Calculates a SHA1 hash of the input value and returns it as a hexadecimal string.

**Arguments:**
- String, varchar, or binary value

**Return value:**
- A 40-character hexadecimal string representing the SHA1 hash
- NULL if the input is NULL

**Examples:**
```questdb-sql title="sha1() with string input" demo
SELECT sha1('abc');
-- Returns: 'a9993e364706816aba3e25717850c26c9cd0d89d'
SELECT sha1('');
-- Returns: 'da39a3ee5e6b4b0d3255bfef95601890afd80709'
```

```questdb-sql title="sha1() with UTF-8 input" demo
SELECT sha1('Hello, world!');
-- Returns: '943a702d06f34599aee1f8da8ef9f7296031d699'
```

### sha256()

Calculates a SHA256 hash of the input value and returns it as a hexadecimal string.

**Arguments:**
- String, varchar, or binary value

**Return value:**
- A 64-character hexadecimal string representing the SHA256 hash
- NULL if the input is NULL

**Examples:**
```questdb-sql title="sha256() with string input" demo
SELECT sha256('abc');
-- Returns: 'ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad'
SELECT sha256('');
-- Returns: 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
```

```questdb-sql title="sha256() with UTF-8 input" demo
SELECT sha256('Hello, world!');
-- Returns: '315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3'
```

## Notes and restrictions

### Input handling
- All hash functions support string, varchar, and binary inputs
- Empty strings produce a valid hash value
- NULL inputs always return NULL outputs
- UTF-8 strings are fully supported

### Thread safety
- Hash functions are not thread-safe
- Each function instance maintains its own internal state

### Output characteristics
- Output is always lowercase hexadecimal
- Output length is fixed regardless of input size:
- MD5: 32 characters
- SHA1: 40 characters
- SHA256: 64 characters

### Implementation details
- Uses Java's built-in MessageDigest implementations
- Supported algorithms are guaranteed to be available on all Java platforms
- Processes input in a single pass

### Common use cases

#### Data integrity verification

```questdb-sql
SELECT
filename,
sha256(content) = expected_hash as is_valid
FROM files;
```

#### Anonymizing sensitive data

```questdb-sql
SELECT
md5(email) as hashed_email,
count(*) as user_count
FROM users
GROUP BY hashed_email;
```

#### Binary data hashing

```questdb-sql
SELECT
file_id,
sha1(binary_content) as content_hash
FROM binary_files;
```

82 changes: 82 additions & 0 deletions documentation/reference/function/window.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,10 @@ Where:

- [`first_value()`](#first_value) – Retrieves the first value in a window

- [`max()`](#max) – Returns the maximum value within a window

- [`min()`](#min) – Returns the minimum value within a window

- [`rank()`](#rank) – Assigns a rank to rows

- [`row_number()`](#row_number) – Assigns sequential numbers to rows
Expand Down Expand Up @@ -520,6 +524,84 @@ SELECT
FROM trades;
```

### max()

In the context of window functions, `max(value)` calculates the maximum value within the set of rows defined by the window frame.

**Arguments:**

- `value`: Any numeric value.

**Return value:**

- The maximum value (excluding null) for the rows in the window frame.

**Description**

When used as a window function, `max()` operates on a "window" of rows defined by the `OVER` clause. The rows in this window are determined by the `PARTITION BY`, `ORDER BY`, and frame specification components of the `OVER` clause.

The `max()` function respects the frame clause, meaning it only includes rows within the specified frame in the calculation. The result is a separate value for each row, based on the corresponding window of rows.

Note that the order of rows in the result set is not guaranteed to be the same with each execution of the query. To ensure a consistent order, use an `ORDER BY` clause outside of the `OVER` clause.

**Syntax:**
```questdb-sql title="max() syntax"
max(value) OVER (window_definition)
```

**Example:**
```questdb-sql title="max() example" demo
SELECT
symbol,
price,
timestamp,
max(price) OVER (
PARTITION BY symbol
ORDER BY timestamp
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
) AS highest_price
FROM trades;
```

### min()

In the context of window functions, `min(value)` calculates the minimum value within the set of rows defined by the window frame.

**Arguments:**

- `value`: Any numeric value.

**Return value:**

- The minimum value (excluding null) for the rows in the window frame.

**Description**

When used as a window function, `min()` operates on a "window" of rows defined by the `OVER` clause. The rows in this window are determined by the `PARTITION BY`, `ORDER BY`, and frame specification components of the `OVER` clause.

The `min()` function respects the frame clause, meaning it only includes rows within the specified frame in the calculation. The result is a separate value for each row, based on the corresponding window of rows.

Note that the order of rows in the result set is not guaranteed to be the same with each execution of the query. To ensure a consistent order, use an `ORDER BY` clause outside of the `OVER` clause.

**Syntax:**
```questdb-sql title="min() syntax"
min(value) OVER (window_definition)
```

**Example:**
```questdb-sql title="min() example" demo
SELECT
symbol,
price,
timestamp,
min(price) OVER (
PARTITION BY symbol
ORDER BY timestamp
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
) AS lowest_price
FROM trades;
```

### rank()

In the context of window functions, `rank()` assigns a unique rank to each row
Expand Down
1 change: 1 addition & 0 deletions documentation/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,7 @@ module.exports = {
"reference/function/conditional",
"reference/function/date-time",
"reference/function/finance",
"reference/function/hash",
"reference/function/meta",
"reference/function/numeric",
"reference/function/parquet",
Expand Down

0 comments on commit 335a9ac

Please sign in to comment.