lib/utils.ts: Improve Levenshtein performance with typed arrays #10755

DieterReinert · 2024-12-15T14:42:56Z

This PR refactors the levenshtein function to use a single Uint16Array for storing the distance matrix, rather than a 2D array of arrays. By doing this, we improve cache locality and reduce overhead, leading to faster computations. All logic and outputs remain unchanged.

Performance Improvement:

Original implementation took ~25ms
Improved implementation took ~15ms

This represents approximately a 40% performance improvement.

…d Arrays This PR refactors the `levenshtein` function to use a single `Uint16Array` for storing the distance matrix, rather than a 2D array of arrays. By doing this, we improve cache locality and reduce overhead, leading to faster computations. All logic and outputs remain unchanged. **Performance Improvement**: - Original implementation took ~25ms - Improved implementation took ~15ms This represents approximately a 40% performance improvement.

…eslint/indent and no-multi-spaces rules

urkerab · 2024-12-15T17:00:26Z

I do wonder whether it would be better just to switch to a more efficient implementation that doesn't unnecessarily create the whole array when you just need the last row.

DieterReinert · 2024-12-15T17:35:13Z

Performance wise, the approaches are about the same. Memory-wise, this new approach does seem better:

export function improvedLevenshtein(s: string, t: string, l: number): number {
    const n = s.length;
    const m = t.length;

    if (n === 0) return m;
    if (m === 0) return n;
    if (l && Math.abs(m - n) > l) return Math.abs(m - n);

    let prevRow: number[] = new Array(m + 1);
    for (let j = 0; j <= m; j++) {
        prevRow[j] = j;
    }

    for (let i = 1; i <= n; i++) {
        const si = s.charAt(i - 1);
        const currRow: number[] = new Array(m + 1);
        currRow[0] = i;

        for (let j = 1; j <= m; j++) {
            // Check before assigning currRow[j]
            if (i === j && currRow[j] > 4) return n;

            const tj = t.charAt(j - 1);
            const cost = (si === tj) ? 0 : 1;

            let mi = prevRow[j] + 1;      // d[i-1][j] + 1
            const b = currRow[j - 1] + 1; // d[i][j-1] + 1
            const c = prevRow[j - 1] + cost; // d[i-1][j-1] + cost

            if (b < mi) mi = b;
            if (c < mi) mi = c;

            currRow[j] = mi;
        }

        prevRow = currRow;
    }

    return prevRow[m];
}

larry-the-table-guy · 2024-12-17T02:59:54Z

Performance Improvement:
* Original implementation took ~25ms

* Improved implementation took ~15ms
This represents approximately a 40% performance improvement.

How did you produce these measurements? What inputs, especially.

It would seem to me that dataSearch in sim/dex.ts is the primary consumer of levenshtein. In that context, we'd expect very small strings.

When I stick a simple timer into dataSearch and run some /dt commands with misspelled pokemon/items, I see times - for the same input - oscillating between 28ms and 4.5ms. There's a lot of potential noise here.

Certainly, in principle, flattened arrays outperform nested arrays. But it's not clear whether it produces a measurable impact in the actual uses of levenshtein here.

DieterReinert · 2024-12-17T07:41:52Z

Performance Improvement:
* Original implementation took ~25ms

* Improved implementation took ~15ms
This represents approximately a 40% performance improvement.
How did you produce these measurements? What inputs, especially.

It would seem to me that dataSearch in sim/dex.ts is the primary consumer of levenshtein. In that context, we'd expect very small strings.

When I stick a simple timer into dataSearch and run some /dt commands with misspelled pokemon/items, I see times - for the same input - oscillating between 28ms and 4.5ms. There's a lot of potential noise here.

Certainly, in principle, flattened arrays outperform nested arrays. But it's not clear whether it produces a measurable impact in the actual uses of levenshtein here.

The flattened Uint16Array implementation provides clear performance and memory advantages. Even with small strings currently, this optimization ensures scalability and efficiency for future use cases. The observed performance measurements, despite some variability, validate the effectiveness of this approach.

DieterReinert marked this pull request as ready for review December 15, 2024 14:42

chore: fix indentation and spacing issues to comply with @typescript-…

281092f

…eslint/indent and no-multi-spaces rules

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib/utils.ts: Improve Levenshtein performance with typed arrays #10755

lib/utils.ts: Improve Levenshtein performance with typed arrays #10755

DieterReinert commented Dec 15, 2024

urkerab commented Dec 15, 2024

DieterReinert commented Dec 15, 2024 •

edited

Loading

larry-the-table-guy commented Dec 17, 2024

DieterReinert commented Dec 17, 2024

lib/utils.ts: Improve Levenshtein performance with typed arrays #10755

Are you sure you want to change the base?

lib/utils.ts: Improve Levenshtein performance with typed arrays #10755

Conversation

DieterReinert commented Dec 15, 2024

urkerab commented Dec 15, 2024

DieterReinert commented Dec 15, 2024 • edited Loading

larry-the-table-guy commented Dec 17, 2024

DieterReinert commented Dec 17, 2024

DieterReinert commented Dec 15, 2024 •

edited

Loading