Skip to content

Commit

Permalink
[ENH] Add CRD backed SegmentDirectory. (#1207)
Browse files Browse the repository at this point in the history
## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
- Move from docker-compose to k8s manifests. These are lightweight for
now, will replace with helm charts.
	 - Modify workflow action for distributed mode to use minikube
	 - Add tenacity for retry/backoff logic
- Move segment_directory to /distributed since its a distributed segment
type specific interface.
	 - Remove the xfail from the workflow. Only run passing tests
 - New functionality
- Adds a custom resource backed memberlist implementation that fetch +
watches. We watch in another thread.
- Plumbs this memberlist into the Rendezvous hashing segment directory.
For now this is still hardcoded to return the segment-server service.

## Test plan
*How are these changes tested?*
Tests were added that:
1. Test fetch
2. Test streaming update
3. Test killing the watcher
You can run `bin/cluster-test.sh
chromadb/test/segment/distributed/test_memberlist_provider.py` to run
them yourself (or run minikube, apply the manifests and then run `pytest
chromadb/test/segment/distributed/test_memberlist_provider.py`)

## Documentation Changes
None required. I added a WARNING.md in k8s to make it clear these
manifests are not for use.
  • Loading branch information
HammadB authored Oct 10, 2023
1 parent e81cc9f commit 764ffe2
Show file tree
Hide file tree
Showing 24 changed files with 1,113 additions and 435 deletions.
18 changes: 11 additions & 7 deletions .github/workflows/chroma-cluster-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,8 @@ jobs:
matrix:
python: ['3.7']
platform: [ubuntu-latest]
testfile: ["--ignore-glob 'chromadb/test/property/*' --ignore='chromadb/test/test_cli.py'",
"chromadb/test/property/test_add.py",
"chromadb/test/property/test_collections.py",
"chromadb/test/property/test_embeddings.py",
"chromadb/test/property/test_filtering.py",
"chromadb/test/property/test_persist.py"]
testfile: ["chromadb/test/ingest/test_producer_consumer.py",
"chromadb/test/segment/distributed/test_memberlist_provider.py",]
runs-on: ${{ matrix.platform }}
steps:
- name: Checkout
Expand All @@ -32,6 +28,14 @@ jobs:
python-version: ${{ matrix.python }}
- name: Install test dependencies
run: python -m pip install -r requirements.txt && python -m pip install -r requirements_dev.txt
- name: Start minikube
id: minikube
uses: medyagh/setup-minikube@latest
with:
minikube-version: latest
kubernetes-version: latest
driver: docker
addons: ingress, ingress-dns
start-args: '--profile chroma-test'
- name: Integration Test
run: bin/cluster-test.sh ${{ matrix.testfile }}
continue-on-error: true # Mark the job as successful even if the tests fail for now (Xfail)
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,4 @@ repos:
hooks:
- id: mypy
args: [--strict, --ignore-missing-imports, --follow-imports=silent, --disable-error-code=type-abstract]
additional_dependencies: ["types-requests", "pydantic", "overrides", "hypothesis", "pytest", "pypika", "numpy", "types-protobuf"]
additional_dependencies: ["types-requests", "pydantic", "overrides", "hypothesis", "pytest", "pypika", "numpy", "types-protobuf", "kubernetes"]
40 changes: 38 additions & 2 deletions bin/cluster-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,50 @@
set -e

function cleanup {
docker compose -f docker-compose.cluster.test.yml down --rmi local --volumes
# Restore the previous kube context
kubectl config use-context $PREV_CHROMA_KUBE_CONTEXT
# Kill the tunnel process
kill $TUNNEL_PID
minikube delete -p chroma-test
}

trap cleanup EXIT

docker compose -f docker-compose.cluster.test.yml up -d --wait
# Save the current kube context into a variable
export PREV_CHROMA_KUBE_CONTEXT=$(kubectl config current-context)

# Create a new minikube cluster for the test
minikube start -p chroma-test

# Add the ingress addon to the cluster
minikube addons enable ingress -p chroma-test
minikube addons enable ingress-dns -p chroma-test

# Setup docker to build inside the minikube cluster and build the image
eval $(minikube -p chroma-test docker-env)
docker build -t server:latest -f Dockerfile .

# Apply the kubernetes manifests
kubectl apply -f k8s/deployment
kubectl apply -f k8s/crd
kubectl apply -f k8s/cr
kubectl apply -f k8s/test

# Wait for the pods in the chroma namespace to be ready
kubectl wait --namespace chroma --for=condition=Ready pods --all --timeout=300s

# Run mini kube tunnel in the background to expose the service
minikube tunnel -p chroma-test &
TUNNEL_PID=$!

# Wait for the tunnel to be ready. There isn't an easy way to check if the tunnel is ready. So we just wait for 10 seconds
sleep 10

export CHROMA_CLUSTER_TEST_ONLY=1
export CHROMA_SERVER_HOST=$(kubectl get svc server -n chroma -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
export PULSAR_BROKER_URL=$(kubectl get svc pulsar -n chroma -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo "Chroma Server is running at port $CHROMA_SERVER_HOST"
echo "Pulsar Broker is running at port $PULSAR_BROKER_URL"

echo testing: python -m pytest "$@"
python -m pytest "$@"
15 changes: 9 additions & 6 deletions chromadb/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@
"chromadb.ingest.Consumer": "chroma_consumer_impl",
"chromadb.db.system.SysDB": "chroma_sysdb_impl",
"chromadb.segment.SegmentManager": "chroma_segment_manager_impl",
"chromadb.segment.SegmentDirectory": "chroma_segment_directory_impl",
"chromadb.segment.distributed.SegmentDirectory": "chroma_segment_directory_impl",
"chromadb.segment.distributed.MemberlistProvider": "chroma_memberlist_provider_impl",
}


Expand All @@ -89,9 +90,11 @@ class Settings(BaseSettings): # type: ignore
chroma_segment_manager_impl: str = (
"chromadb.segment.impl.manager.local.LocalSegmentManager"
)
chroma_segment_directory_impl: str = (
"chromadb.segment.impl.manager.segment_directory.DockerComposeSegmentDirectory"
)

# Distributed architecture specific components
chroma_segment_directory_impl: str = "chromadb.segment.impl.distributed.segment_directory.RendezvousHashSegmentDirectory"
chroma_memberlist_provider_impl: str = "chromadb.segment.impl.distributed.segment_directory.CustomResourceMemberlistProvider"
worker_memberlist_name: str = "worker-memberlist"

tenant_id: str = "default"
topic_namespace: str = "default"
Expand All @@ -108,8 +111,8 @@ class Settings(BaseSettings): # type: ignore
chroma_server_cors_allow_origins: List[str] = [] # eg ["http://localhost:3000"]

pulsar_broker_url: Optional[str] = None
pulsar_admin_port: Optional[str] = None
pulsar_broker_port: Optional[str] = None
pulsar_admin_port: Optional[str] = "8080"
pulsar_broker_port: Optional[str] = "6650"

chroma_server_auth_provider: Optional[str] = None

Expand Down
95 changes: 49 additions & 46 deletions chromadb/proto/chroma_pb2.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 764ffe2

Please sign in to comment.