A CS 231n-style port of this project, implementing LLMs solely with NumPy #784

davidtag · 2024-11-18T04:42:13Z

I was inspired by this repo and the related videos. But instead of speed and efficiency being the goal, my goal was educational. It's implemented in the style of the Stanford CS 231n homework assignments (well, what I remember from 2018) where the only dependency is NumPy, so it's very readable and easy to follow along. To that end, it has a lot of good Python hygiene: modular, typed & type-checked, formatted, linted, >90% test coverage. I'm sharing in the event others find it helpful to follow along.

It also includes a reasonably fast implementation of BPE using Cython optimizations, in the style of minBPE (I was following the exercises :-)), with the only dependency being regex.

Together, this is a minimalist implementation of LLMs with support for training the tokenizer, training the model, doing generation, and even serving behind an API.

Update readme

d9ef2ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A CS 231n-style port of this project, implementing LLMs solely with NumPy #784

A CS 231n-style port of this project, implementing LLMs solely with NumPy #784

davidtag commented Nov 18, 2024

A CS 231n-style port of this project, implementing LLMs solely with NumPy #784

Are you sure you want to change the base?

A CS 231n-style port of this project, implementing LLMs solely with NumPy #784

Conversation

davidtag commented Nov 18, 2024