Skip to content

Commit

Permalink
doc update
Browse files Browse the repository at this point in the history
  • Loading branch information
matthewdouglas committed Dec 4, 2024
1 parent 3d595f1 commit 03fcabd
Show file tree
Hide file tree
Showing 4 changed files with 54 additions and 3 deletions.
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@
title: Papers, resources & how to cite
- title: API reference
sections:
- title: Functional
local: reference/functional
- title: Optimizers
sections:
- local: reference/optim/optim_overview
Expand Down
2 changes: 1 addition & 1 deletion docs/source/explanations/resources.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Authors: Tim Dettmers, Luke Zettlemoyer
}
```

## [LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale (Nov 2022)](https://arxiv.org/abs/2208.07339)
## [LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale (Nov 2022)](https://arxiv.org/abs/2208.07339) [[llm-int8]]
Authors: Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer

- [LLM.int8() Blog Post](https://huggingface.co/blog/hf-bitsandbytes-integration)
Expand Down
48 changes: 48 additions & 0 deletions docs/source/reference/functional.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Overview
The `bitsandbytes.functional` API provides the low-level building blocks for the library's features.

## When to Use `bitsandbytes.functional`

* When you need direct control over quantized operations and their parameters.
* To build custom layers or operations leveraging low-bit arithmetic.
* To integrate with other ecosystem tooling.
* For experimental or research purposes requiring non-standard quantization or performance optimizations.

## LLM.int8()
[[autodoc]] functional.int8_double_quant

[[autodoc]] functional.int8_linear_matmul

[[autodoc]] functional.int8_mm_dequant

[[autodoc]] functional.int8_vectorwise_deqant

[[autodoc]] functional.int8_vectorwise_quant


## 4-bit
[[autodoc]] functional.dequantize_4bit

[[autodoc]] functional.dequantize_fp4

[[autodoc]] functional.dequantize_nf4

[[autodoc]] functional.gemv_4bit

[[autodoc]] functional.quantize_4bit

[[autodoc]] functional.quantize_fp4

[[autodoc]] functional.quantize_nf4

[[autodoc]] functional.QuantState

## General Quantization
[[autodoc]] functional.dequantize_blockwise

[[autodoc]] functional.quantize_blockwise

## Utility
[[autodoc]] functional.get_ptr

[[autodoc]] functional.is_on_gpu
5 changes: 3 additions & 2 deletions docs/source/reference/nn/linear8bit.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# 8-bit quantization
# LLM.int8()
[LLM.int8()](https://hf.co/papers/2208.07339) is a quantization method that aims to make large language model inference more accessible without significant degradation. Unlike naive 8-bit quantization, which can result in loss of critical information and accuracy, LLM.int8() dynamically adapts to ensure sensitive components of the computation retain higher precision when needed. The key is to extract the outliers from the inputs and weights and multiply them in 16-bit. All other values are multiplied in 8-bit before being dequantized back to 16-bits. The outputs from the 16-bit and 8-bit multiplication are combined to produce the final output.

[LLM.int8()](https://hf.co/papers/2208.07339) is a quantization method that doesn't degrade performance which makes large model inference more accessible. The key is to extract the outliers from the inputs and weights and multiply them in 16-bit. All other values are multiplied in 8-bit and quantized to Int8 before being dequantized back to 16-bits. The outputs from the 16-bit and 8-bit multiplication are combined to produce the final output.
[Further Resources](../../explanations/resources#llm-int8)

## Linear8bitLt

Expand Down

0 comments on commit 03fcabd

Please sign in to comment.