A CS 231n-style port of this project, implementing LLMs solely with NumPy #784
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I was inspired by this repo and the related videos. But instead of speed and efficiency being the goal, my goal was educational. It's implemented in the style of the Stanford CS 231n homework assignments (well, what I remember from 2018) where the only dependency is
NumPy
, so it's very readable and easy to follow along. To that end, it has a lot of good Python hygiene: modular, typed & type-checked, formatted, linted, >90% test coverage. I'm sharing in the event others find it helpful to follow along.It also includes a reasonably fast implementation of BPE using Cython optimizations, in the style of minBPE (I was following the exercises :-)), with the only dependency being
regex
.Together, this is a minimalist implementation of LLMs with support for training the tokenizer, training the model, doing generation, and even serving behind an API.