Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Tensor Parallel support to llama.cpp #9648

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ClarkChin08
Copy link
Contributor

  • I have read the contributing guidelines
  • Self-reported review complexity:
    • Low
    • [ * ] Medium
    • High
      Add tensor parallel support to llama.cpp, still draft code now.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Sep 26, 2024
@ClarkChin08
Copy link
Contributor Author

#9086 Refer to this issue for detailed design.

@NeoZhangJianyu
Copy link
Collaborator

NeoZhangJianyu commented Sep 26, 2024

@ClarkChin08
It's great to see this feature is implemented.

Is it possible to update the guide/doc to explain how to use this feature:

  1. how to enable it.
  2. what's the benefit.
  3. which case should use this feature.
  4. update the installation for dependent package (oneCCL, MPI) in oneAPI.

Thank you!

@@ -566,6 +566,17 @@ if (GGML_SYCL)
list(APPEND GGML_EXTRA_LIBS_PRIVATE DNNL::dnnl)
endif()

set(oneCCL_DIR "/opt/intel/oneapi/ccl/latest/lib/cmake/oneCCL")
Copy link
Collaborator

@NeoZhangJianyu NeoZhangJianyu Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real oneapi path is not always in /opt/intel/oneapi/.
Please use ENV{ONEAPI_ROOT} which is mandatory env variable in cmakefile.

Same for following script

find_library(MPI_LIBRARY mpi HINTS ${MPI_LIBRARY_PATH})
find_library(ONECCL_LIBRARY ccl HINTS ${ONECCL_LIBRARY_PATH})
# find_package(oneCCL REQUIRED)
message("-- oneCCL found")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add script for not found oneCCL.

oneCCL is not included in oneAPI base toolkit, please print the message to guide user how to install it.

@@ -870,7 +873,12 @@ namespace dpct
}
return -1;
}

inline int get_rank() { return _rank; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new functions have no relationship with DPCT.
It's better to move the ggml-sycl/src.
Recommend to reduce the dependence on DPCT code.

@@ -1050,6 +1083,7 @@ namespace dpct
_cpu_device = _devs.size() - 1;
}
}
init_ccl();
Copy link
Collaborator

@NeoZhangJianyu NeoZhangJianyu Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mv this init() function to ggml-sycl/src.

Comment on lines +598 to +603
enum tensor_parallel_mode {
TENSOR_NO_CHANGE,
TENSOR_SPLIT_BY_ROW,
TENSOR_SPLIT_BY_COLUMN,
TENSOR_KEEPED_ON_MASTER
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to the common ggml code should not be made unless absolutely necessary, which is not likely to be the case here. We already have a way to handle this with custom buffer types like the existing CUDA and SYCL split buffer types. You can extend this model instead by creating a different buffer type for tensors split by column. The "tensors kept on master" is just the default buffer type.

find_library(ONECCL_LIBRARY ccl HINTS ${ONECCL_LIBRARY_PATH})
# find_package(oneCCL REQUIRED)
message("-- oneCCL found")
set(GGML_EXTRA_LIBS ${GGML_EXTRA_LIBS} ${MPI_LIBRARY_PATH} ${ONECCL_LIBRARY_PATH})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GGML_EXTRA_LIBS was recently split into GGML_EXTRA_LIBS_PUBLIC and GGML_EXTRA_LIBS_PRIVATE, so I think the line above won't work anymore
Also why there are paths to the lib directories inside this variable instead of found mpi/ccl libraries?

@@ -8880,6 +8948,10 @@ static int llama_model_load(const std::string & fname, llama_model & model, llam
llama_model_loader ml(fname, params.use_mmap, params.check_tensors, params.kv_overrides);

model.hparams.vocab_only = params.vocab_only;
if (params.tensor_split == LLAMA_SPLIT_MODE_TENSOR) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be params.split_mode instead of params.tensor_split?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants