-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Tensor Parallel support to llama.cpp #9648
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Chen Xi <[email protected]>
Signed-off-by: Chen Xi <[email protected]>
#9086 Refer to this issue for detailed design. |
@ClarkChin08 Is it possible to update the guide/doc to explain how to use this feature:
Thank you! |
@@ -566,6 +566,17 @@ if (GGML_SYCL) | |||
list(APPEND GGML_EXTRA_LIBS_PRIVATE DNNL::dnnl) | |||
endif() | |||
|
|||
set(oneCCL_DIR "/opt/intel/oneapi/ccl/latest/lib/cmake/oneCCL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real oneapi path is not always in /opt/intel/oneapi/
.
Please use ENV{ONEAPI_ROOT}
which is mandatory env variable in cmakefile.
Same for following script
find_library(MPI_LIBRARY mpi HINTS ${MPI_LIBRARY_PATH}) | ||
find_library(ONECCL_LIBRARY ccl HINTS ${ONECCL_LIBRARY_PATH}) | ||
# find_package(oneCCL REQUIRED) | ||
message("-- oneCCL found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add script for not found oneCCL.
oneCCL is not included in oneAPI base toolkit, please print the message to guide user how to install it.
@@ -870,7 +873,12 @@ namespace dpct | |||
} | |||
return -1; | |||
} | |||
|
|||
inline int get_rank() { return _rank; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new functions have no relationship with DPCT.
It's better to move the ggml-sycl/src.
Recommend to reduce the dependence on DPCT code.
@@ -1050,6 +1083,7 @@ namespace dpct | |||
_cpu_device = _devs.size() - 1; | |||
} | |||
} | |||
init_ccl(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mv this init() function to ggml-sycl/src.
enum tensor_parallel_mode { | ||
TENSOR_NO_CHANGE, | ||
TENSOR_SPLIT_BY_ROW, | ||
TENSOR_SPLIT_BY_COLUMN, | ||
TENSOR_KEEPED_ON_MASTER | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to the common ggml code should not be made unless absolutely necessary, which is not likely to be the case here. We already have a way to handle this with custom buffer types like the existing CUDA and SYCL split buffer types. You can extend this model instead by creating a different buffer type for tensors split by column. The "tensors kept on master" is just the default buffer type.
find_library(ONECCL_LIBRARY ccl HINTS ${ONECCL_LIBRARY_PATH}) | ||
# find_package(oneCCL REQUIRED) | ||
message("-- oneCCL found") | ||
set(GGML_EXTRA_LIBS ${GGML_EXTRA_LIBS} ${MPI_LIBRARY_PATH} ${ONECCL_LIBRARY_PATH}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GGML_EXTRA_LIBS was recently split into GGML_EXTRA_LIBS_PUBLIC and GGML_EXTRA_LIBS_PRIVATE, so I think the line above won't work anymore
Also why there are paths to the lib directories inside this variable instead of found mpi/ccl libraries?
@@ -8880,6 +8948,10 @@ static int llama_model_load(const std::string & fname, llama_model & model, llam | |||
llama_model_loader ml(fname, params.use_mmap, params.check_tensors, params.kv_overrides); | |||
|
|||
model.hparams.vocab_only = params.vocab_only; | |||
if (params.tensor_split == LLAMA_SPLIT_MODE_TENSOR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be params.split_mode
instead of params.tensor_split
?
Add tensor parallel support to llama.cpp, still draft code now.