Skip to content

Commit

Permalink
docs : remove obsolete make references, scripts, examples
Browse files Browse the repository at this point in the history
ggml-ci
  • Loading branch information
ggerganov committed Dec 2, 2024
1 parent c536c07 commit 328ded3
Show file tree
Hide file tree
Showing 8 changed files with 1 addition and 726 deletions.
7 changes: 0 additions & 7 deletions docs/backend/BLIS.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,6 @@ We recommend using openmp since it's easier to modify the cores being used.

### llama.cpp compilation

Makefile:

```bash
make GGML_BLIS=1 -j
# make GGML_BLIS=1 llama-benchmark-matmult
```

CMake:

```bash
Expand Down
7 changes: 0 additions & 7 deletions docs/build.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ In order to build llama.cpp you have four different options.

**Notes**:

- For `Q4_0_4_4` quantization type build, add the `-DGGML_LLAMAFILE=OFF` cmake option. For example, use `cmake -B build -DGGML_LLAMAFILE=OFF`.
- For faster compilation, add the `-j` argument to run multiple jobs in parallel. For example, `cmake --build build --config Release -j 8` will run 8 jobs in parallel.
- For faster repeated compilation, install [ccache](https://ccache.dev/).
- For debug builds, there are two cases:
Expand Down Expand Up @@ -337,9 +336,3 @@ For detailed info, such as model/device supports, CANN install, please refer to
### Android

To read documentation for how to build on Android, [click here](./android.md)

### Arm CPU optimized mulmat kernels

Llama.cpp includes a set of optimized mulmat kernels for the Arm architecture, leveraging Arm® Neon™, int8mm and SVE instructions. These kernels are enabled at build time through the appropriate compiler cpu-type flags, such as `-DCMAKE_C_FLAGS=-march=armv8.2a+i8mm+sve`. Note that these optimized kernels require the model to be quantized into one of the formats: `Q4_0_4_4` (Arm Neon), `Q4_0_4_8` (int8mm) or `Q4_0_8_8` (SVE). The SVE mulmat kernel specifically requires a vector width of 256 bits. When running on devices with a different vector width, it is recommended to use the `Q4_0_4_8` (int8mm) or `Q4_0_4_4` (Arm Neon) formats for better performance. Refer to [examples/quantize/README.md](../examples/quantize/README.md) for more information on the quantization formats.

To support `Q4_0_4_4`, you must build with `GGML_NO_LLAMAFILE=1` (`make`) or `-DGGML_LLAMAFILE=OFF` (`cmake`).
61 changes: 0 additions & 61 deletions examples/base-translate.sh

This file was deleted.

5 changes: 1 addition & 4 deletions examples/convert-llama2c-to-ggml/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,8 @@

This example reads weights from project [llama2.c](https://github.com/karpathy/llama2.c) and saves them in ggml compatible format. The vocab that is available in `models/ggml-vocab.bin` is used by default.

To convert the model first download the models from the [llama2.c](https://github.com/karpathy/llama2.c) repository:
To convert the model first download the models from the [llama2.c](https://github.com/karpathy/llama2.c) repository.

`$ make -j`

After successful compilation, following usage options are available:
```
usage: ./llama-convert-llama2c-to-ggml [options]
Expand Down
2 changes: 0 additions & 2 deletions examples/imatrix/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,6 @@ For faster computation, make sure to use GPU offloading via the `-ngl` argument
## Example

```bash
GGML_CUDA=1 make -j

# generate importance matrix (imatrix.dat)
./llama-imatrix -m ggml-model-f16.gguf -f train-data.txt -ngl 99

Expand Down
15 changes: 0 additions & 15 deletions examples/server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,12 +188,6 @@ services:
`llama-server` is built alongside everything else from the root of the project

- Using `make`:

```bash
make llama-server
```

- Using `CMake`:

```bash
Expand All @@ -207,15 +201,6 @@ services:

`llama-server` can also be built with SSL support using OpenSSL 3

- Using `make`:

```bash
# NOTE: For non-system openssl, use the following:
# CXXFLAGS="-I /path/to/openssl/include"
# LDFLAGS="-L /path/to/openssl/lib"
make LLAMA_SERVER_SSL=true llama-server
```

- Using `CMake`:

```bash
Expand Down
212 changes: 0 additions & 212 deletions scripts/pod-llama.sh

This file was deleted.

Loading

0 comments on commit 328ded3

Please sign in to comment.