docs : remove obsolete make references, scripts, examples

ggml-ci
ggerganov · Dec 2, 2024 · 328ded3 · 328ded3
1 parent c536c07
commit 328ded3
Show file tree

Hide file tree

Showing 8 changed files with 1 addition and 726 deletions.
diff --git a/docs/backend/BLIS.md b/docs/backend/BLIS.md
@@ -27,13 +27,6 @@ We recommend using openmp since it's easier to modify the cores being used.
 
 ### llama.cpp compilation
 
-Makefile:
-
-```bash
-make GGML_BLIS=1 -j
-# make GGML_BLIS=1 llama-benchmark-matmult
-```
-
 CMake:
 
 ```bash

diff --git a/docs/build.md b/docs/build.md
@@ -18,7 +18,6 @@ In order to build llama.cpp you have four different options.
 
   **Notes**:
 
-    - For `Q4_0_4_4` quantization type build, add the `-DGGML_LLAMAFILE=OFF` cmake option. For example, use `cmake -B build -DGGML_LLAMAFILE=OFF`.
     - For faster compilation, add the `-j` argument to run multiple jobs in parallel. For example, `cmake --build build --config Release -j 8` will run 8 jobs in parallel.
     - For faster repeated compilation, install [ccache](https://ccache.dev/).
     - For debug builds, there are two cases:
@@ -337,9 +336,3 @@ For detailed info, such as model/device supports, CANN install, please refer to
 ### Android
 
 To read documentation for how to build on Android, [click here](./android.md)
-
-### Arm CPU optimized mulmat kernels
-
-Llama.cpp includes a set of optimized mulmat kernels for the Arm architecture, leveraging Arm® Neon™, int8mm and SVE instructions. These kernels are enabled at build time through the appropriate compiler cpu-type flags, such as `-DCMAKE_C_FLAGS=-march=armv8.2a+i8mm+sve`. Note that these optimized kernels require the model to be quantized into one of the formats: `Q4_0_4_4` (Arm Neon), `Q4_0_4_8` (int8mm) or `Q4_0_8_8` (SVE). The SVE mulmat kernel specifically requires a vector width of 256 bits. When running on devices with a different vector width, it is recommended to use the `Q4_0_4_8` (int8mm) or `Q4_0_4_4` (Arm Neon) formats for better performance. Refer to [examples/quantize/README.md](../examples/quantize/README.md) for more information on the quantization formats.
-
-To support `Q4_0_4_4`, you must build with `GGML_NO_LLAMAFILE=1` (`make`) or `-DGGML_LLAMAFILE=OFF` (`cmake`).
diff --git a/examples/base-translate.sh b/examples/base-translate.sh
diff --git a/examples/convert-llama2c-to-ggml/README.md b/examples/convert-llama2c-to-ggml/README.md
@@ -2,11 +2,8 @@
 
 This example reads weights from project [llama2.c](https://github.com/karpathy/llama2.c) and saves them in ggml compatible format. The vocab that is available in `models/ggml-vocab.bin` is used by default.
 
-To convert the model first download the models from the [llama2.c](https://github.com/karpathy/llama2.c) repository:
+To convert the model first download the models from the [llama2.c](https://github.com/karpathy/llama2.c) repository.
 
-`$ make -j`
-
-After successful compilation, following usage options are available:
 ```
 usage: ./llama-convert-llama2c-to-ggml [options]
 

diff --git a/examples/imatrix/README.md b/examples/imatrix/README.md
@@ -25,8 +25,6 @@ For faster computation, make sure to use GPU offloading via the `-ngl` argument
 ## Example
 
 ```bash
-GGML_CUDA=1 make -j
-
 # generate importance matrix (imatrix.dat)
 ./llama-imatrix -m ggml-model-f16.gguf -f train-data.txt -ngl 99
 

diff --git a/examples/server/README.md b/examples/server/README.md
@@ -188,12 +188,6 @@ services:
 
 `llama-server` is built alongside everything else from the root of the project
 
-- Using `make`:
-
-  ```bash
-  make llama-server
-  ```
-
 - Using `CMake`:
 
   ```bash
@@ -207,15 +201,6 @@ services:
 
 `llama-server` can also be built with SSL support using OpenSSL 3
 
-- Using `make`:
-
-  ```bash
-  # NOTE: For non-system openssl, use the following:
-  #   CXXFLAGS="-I /path/to/openssl/include"
-  #   LDFLAGS="-L /path/to/openssl/lib"
-  make LLAMA_SERVER_SSL=true llama-server
-  ```
-
 - Using `CMake`:
 
   ```bash

diff --git a/scripts/pod-llama.sh b/scripts/pod-llama.sh