[Draft] Tensor Parallel support to llama.cpp #9648

ClarkChin08 · 2024-09-26T02:36:27Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- [ * ] Medium
- High
  Add tensor parallel support to llama.cpp, still draft code now.

Signed-off-by: Chen Xi <[email protected]>

ClarkChin08 · 2024-09-26T02:37:58Z

#9086 Refer to this issue for detailed design.

NeoZhangJianyu · 2024-09-26T10:06:16Z

@ClarkChin08
It's great to see this feature is implemented.

Is it possible to update the guide/doc to explain how to use this feature:

how to enable it.
what's the benefit.
which case should use this feature.
update the installation for dependent package (oneCCL, MPI) in oneAPI.

Thank you!

NeoZhangJianyu · 2024-09-26T10:10:26Z

ggml/src/CMakeLists.txt

@@ -566,6 +566,17 @@ if (GGML_SYCL)
        list(APPEND GGML_EXTRA_LIBS_PRIVATE DNNL::dnnl)
    endif()

+    set(oneCCL_DIR "/opt/intel/oneapi/ccl/latest/lib/cmake/oneCCL")


The real oneapi path is not always in /opt/intel/oneapi/.
Please use ENV{ONEAPI_ROOT} which is mandatory env variable in cmakefile.

Same for following script

NeoZhangJianyu · 2024-09-26T10:13:10Z

ggml/src/CMakeLists.txt

+    find_library(MPI_LIBRARY mpi HINTS ${MPI_LIBRARY_PATH})
+    find_library(ONECCL_LIBRARY ccl HINTS ${ONECCL_LIBRARY_PATH})
+    # find_package(oneCCL REQUIRED)
+    message("-- oneCCL found")


Add script for not found oneCCL.

oneCCL is not included in oneAPI base toolkit, please print the message to guide user how to install it.

NeoZhangJianyu · 2024-09-26T10:16:45Z

ggml/src/ggml-sycl/dpct/helper.hpp

@@ -870,7 +873,12 @@ namespace dpct
            }
            return -1;
        }
-
+	inline int get_rank() { return _rank; }


These new functions have no relationship with DPCT.
It's better to move the ggml-sycl/src.
Recommend to reduce the dependence on DPCT code.

NeoZhangJianyu · 2024-09-26T10:20:02Z

ggml/src/ggml-sycl/dpct/helper.hpp

@@ -1050,6 +1083,7 @@ namespace dpct
                    _cpu_device = _devs.size() - 1;
                }
            }
+	    init_ccl();


mv this init() function to ggml-sycl/src.

slaren · 2024-09-26T22:22:04Z

ggml/include/ggml.h

+    enum tensor_parallel_mode {
+        TENSOR_NO_CHANGE,
+	TENSOR_SPLIT_BY_ROW,
+	TENSOR_SPLIT_BY_COLUMN,
+	TENSOR_KEEPED_ON_MASTER
+    };


Changes to the common ggml code should not be made unless absolutely necessary, which is not likely to be the case here. We already have a way to handle this with custom buffer types like the existing CUDA and SYCL split buffer types. You can extend this model instead by creating a different buffer type for tensors split by column. The "tensors kept on master" is just the default buffer type.

fairydreaming · 2024-10-27T16:27:48Z

ggml/src/CMakeLists.txt

+    find_library(ONECCL_LIBRARY ccl HINTS ${ONECCL_LIBRARY_PATH})
+    # find_package(oneCCL REQUIRED)
+    message("-- oneCCL found")
+    set(GGML_EXTRA_LIBS ${GGML_EXTRA_LIBS} ${MPI_LIBRARY_PATH} ${ONECCL_LIBRARY_PATH})


GGML_EXTRA_LIBS was recently split into GGML_EXTRA_LIBS_PUBLIC and GGML_EXTRA_LIBS_PRIVATE, so I think the line above won't work anymore
Also why there are paths to the lib directories inside this variable instead of found mpi/ccl libraries?

fairydreaming · 2024-10-27T16:29:32Z

src/llama.cpp

@@ -8880,6 +8948,10 @@ static int llama_model_load(const std::string & fname, llama_model & model, llam
        llama_model_loader ml(fname, params.use_mmap, params.check_tensors, params.kv_overrides);

        model.hparams.vocab_only = params.vocab_only;
+        if (params.tensor_split == LLAMA_SPLIT_MODE_TENSOR) {


Shouldn't it be params.split_mode instead of params.tensor_split?

Chen Xi added 2 commits September 26, 2024 02:34

add tensor parallelism support to SYCL

cb8507b

Signed-off-by: Chen Xi <[email protected]>

add tensor parallel support

c9ae191

Signed-off-by: Chen Xi <[email protected]>

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Sep 26, 2024

NeoZhangJianyu reviewed Sep 26, 2024

View reviewed changes

slaren reviewed Sep 26, 2024

View reviewed changes

fairydreaming reviewed Oct 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Tensor Parallel support to llama.cpp #9648

[Draft] Tensor Parallel support to llama.cpp #9648

ClarkChin08 commented Sep 26, 2024

ClarkChin08 commented Sep 26, 2024

NeoZhangJianyu commented Sep 26, 2024 •

edited

Loading

NeoZhangJianyu Sep 26, 2024 •

edited

Loading

NeoZhangJianyu Sep 26, 2024

NeoZhangJianyu Sep 26, 2024

NeoZhangJianyu Sep 26, 2024 •

edited

Loading

slaren Sep 26, 2024

fairydreaming Oct 27, 2024

fairydreaming Oct 27, 2024

[Draft] Tensor Parallel support to llama.cpp #9648

Are you sure you want to change the base?

[Draft] Tensor Parallel support to llama.cpp #9648

Conversation

ClarkChin08 commented Sep 26, 2024

ClarkChin08 commented Sep 26, 2024

NeoZhangJianyu commented Sep 26, 2024 • edited Loading

NeoZhangJianyu Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

NeoZhangJianyu Sep 26, 2024

Choose a reason for hiding this comment

NeoZhangJianyu Sep 26, 2024

Choose a reason for hiding this comment

NeoZhangJianyu Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

slaren Sep 26, 2024

Choose a reason for hiding this comment

fairydreaming Oct 27, 2024

Choose a reason for hiding this comment

fairydreaming Oct 27, 2024

Choose a reason for hiding this comment

NeoZhangJianyu commented Sep 26, 2024 •

edited

Loading

NeoZhangJianyu Sep 26, 2024 •

edited

Loading

NeoZhangJianyu Sep 26, 2024 •

edited

Loading