-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add k8s docs for getting started, K8s Manifest and Helm #179
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: devpramod <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some suggested edits
Also, when you add new documents, they need to be linked into the table of contents structure. There's an index.rst file in this folder you can edit to add these two documents.
I'd suggest you add an edit to the index.rst doc in this deploy folder, and replace the existing Kubernetes section with this:
Kubernetes
**********
.. toctree::
:maxdepth: 1
k8s_getting_started
TGI on Xeon with Helm Charts <k8s_helm>
* Xeon & Gaudi with GMC
* Xeon & Gaudi without GMC
Signed-off-by: devpramod <[email protected]> Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the stuff I see in the docs, is just a tutorial on things that already have docs. Like TGI/TEI, Helm, and Kubernetes. It feels a lot like we're overexplaining a concept that can be answered by a link to the source docs of another tool and a command for how it's relevant to use with ChatQnA.
For reference, this is the most handholding I would do in the case of deploying TGI:
Configure Model Server
Before we deploy a model, we need to configure the model server with information like, what model to use and how many max tokens to use. We will be using the tgi-on-intel
helm chart. This chart uses XPU to the serve model normally, but we are going to configure it to use gaudi2 instead.
First, look at the configuration files in the tgi
directory and add/remove any configuration options relevant to your workflow:
cd tgi
# Create a new configmap for your model server to use
kubectl apply -f cm.yaml
Tip
Here is the reference to the Huggingface Launcher Environment Variables and the TGI-Gaudi Environment Variables.
Deploy Model Server
Now that we have configured the model server, we can deploy it to Kubernetes. Using the provided config.yaml
file in the tgi
directory, we can deploy the model server.
Modify any values like resources or replicas in the config.yaml
file to suit your needs. Then, deploy the model server:
# Encode HF Token for secret.encodedToken
echo -n '<token>' | base64
# Install Chart
git clone https://github.com/intel/ai-containers
helm install model-server -f config.yaml ai-containers/workflows/charts/tgi
# Check the pod status
kubectl get pod
kubectl logs -f <pod-name>
Please use a tool like markdownlint to ensure consistent styling.
I've got a script in docs/scripts/checkmd.sh that uses pymarkdown (lint) to scan markdown files, with a bunch of checks disabled. Alas, if I wasn't retiring today, including a markdown linter was on my list to add to the CI checks. :) |
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial setup is confusing as the docs point to set up k8s in multiple ways where as the tested configuration is minikube. We need to clearly document that. Other than that, there are some more clarifications required.
|
||
**Update Dependencies:** | ||
|
||
- A script called **./update_dependency.sh** is provided which is used to update chart dependencies, ensuring all nested charts are at their latest versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not clear whether this script is standard for all Helm charts or unique to OPEA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unique to OPEA. Normally one just calls Helm dep command directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should use the standard tools instead of creating our own mechanisms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm rather new to Helm, so all of this is very much AFAIK:
- Script is there because OPEA has many of the dependencies locally instead of in a Helm repo
- Normally users would update the repo first, and that would basically do what the script is needed for (refresh the deps to latest, not just minimum version)
- OPEA Helm charts are moving to repo, so I think in future that script is needed only in case when one wants to use local checkout
@poussa ?
|
||
### Kubernetes Cluster and Development Environment | ||
|
||
**Setting Up the Kubernetes Cluster:** Before beginning deployment for the ChatQnA application, ensure that a Kubernetes cluster is ready. For guidance on setting up your Kubernetes cluster, please refer to the comprehensive setup instructions available on the [Opea Project deployment guide](https://opea-project.github.io/latest/deploy/index.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to provide guidance on the memory/disk requirements for k8s cluster, otherwise ChatQnA may not run.
examples/ChatQnA/deploy/k8s_helm.md
Outdated
|
||
Set a new [namespace](#create-and-set-namespace) and switch to it if needed | ||
|
||
To enable UI, uncomment the lines `56-62` in `GenAIInfra/helm-charts/chatqna/values.yaml`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be lines: 58-62
56 # If you would like to switch to traditional UI image
57 # Uncomment the following lines
58 # chatqna-ui:
59 # image:
60 # repository: "opea/chatqna-ui"
61 # tag: "1.1"
62 # containerPort: "5173"
Also, just uncommenting may not be sufficient as that messes up with formatting. There is an additional space that needs to be deleted too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arun-gupta The lines have now changed to 59-63 that need to be uncommented. I'll remove the pointer to the liner numbers and say uncomment the following.
Could you tell me which space needs to be deleted?
chatqna-retriever-usvc-6695979d67-z5jgx 1/1 Running 0 5m7s | ||
chatqna-tei-769dc796c-gh5vx 1/1 Running 0 5m7s | ||
chatqna-teirerank-54f58c596c-76xqz 1/1 Running 0 5m7s | ||
chatqna-tgi-7b5556d46d-pnzph 1/1 Running 0 5m7s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could not get all pods to run, filed opea-project/GenAIExamples#1202
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arun-gupta I was thinking of adding separate resource requirements for - on-prem (baremeral) servers and cloud instances
Does that make sense?
examples/ChatQnA/deploy/k8s_helm.md
Outdated
|
||
Set the necessary environment variables to setup the use case | ||
```bash | ||
export MODELDIR="/mnt/opea-models" #export MODELDIR="null" if you don't want to cache the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This caused an error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned default/chatqna-tei-645548d7f-7crjj to minikube
Warning FailedMount 17s (x13 over 10m) kubelet MountVolume.SetUp failed for volume "model-volume" : hostPath type check failed: /mnt/opea-models is not a directory
Using this command export MODELDIR="null"
solved the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not recommend to set global.modelUseHostPath=${MODELDIR} in helm install
in multi-node environment. This options is meant to share the pre downloaded model data files in a single node environment for a quick setup to test. To use this in a multiple nodes environment , you need to go to every K8S worker node to make sure there is a ${MODELDIR}
directory exists and writable. Just don't set global.modelUseHostPath and leave it as the default value(empty string) which will download the models without sharing.
Another option to share the model files is to set global.modelUsePVC to use K8S persistent volume to share the model data files. In order to use that, users need to do some preparation work out-of-band:
- Setup the K8S storage class, persistent volumes.
- Create a K8S persistent volume claim which can be shared by multi pods.
- pass the created k8s persistent volume claim to global.modelUsePVC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not recommend setting it empty either, because not sharing the data will much more easily cause node to run out disk space, meaning that k8s evicts pods from it, and things in general break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linked instruction is fairly OK, but it assumes user knows something about setting up NFS:
- security implication of non-authenticated access
- need to replace IP address with the correct one for the used NFS server
Some knowledge of how PVCs work may also be good in case one needs to debug why pods are in Pending
state (is the problem in NFS settings, PVC not in same namespace with pod using it, there being too many PVCs for given PV etc).
|------------------------------- |-----------------------------| | ||
|`kubectl describe pod <pod-name>` | Provides detailed information about a specific pod, including its current state, recent events, and configuration details. | | ||
|`kubectl delete deployments --all` | Deletes all deployments in the current namespace, which effectively removes all the managed pods and associated resources. | | ||
|`kubectl get pods -o wide` | Retrieves a detailed list of all pods in the current namespace, including additional information like IP addresses and the nodes they are running on. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use of "current" in all of these is wrong.
When namespace is not specified, removal is done from "default" namespace, not "current" one, whatever that is.
=> I think all examples should include namespace option,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the Create and Set Namespace
above this section to indicate what current
namespace means.
`your_embedding` dimension equal to it. | ||
|
||
``` | ||
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both export
and embedding
var are redundant:
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") | |
your_embedding=$(python3 -c "import random; print([random.uniform(-1, 1) for _ in range(768)])") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eero-t I believe it's more readable this way for the user, and the source for this is the dev team's readme.
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
This PR contains the following docs:
Getting Started for k8s - Installation, basic introduction to k8s and has a section for helm and k8s manifest. As more k8s deployment modes are added, corresponding sections will be created in this doc
Deploy using helm charts, a doc that follows the xeon.md template as much as possible to deploy ChatQnA on k8s using Helm
Deploy using K8s Manifest, a doc that follows the xeon.md template as much as possible to deploy ChatQnA on k8s using a K8s manifest yaml