forked from opea-project/GenAIComps
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add toxicity detection microservice (opea-project#338)
* Add toxicity detection microservice Signed-off-by: Qun Gao <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Modification to toxicity plugin PR (opea-project#432) * changed microservice to use Service.GUARDRAILS and input/output to TextDoc Signed-off-by: Tyler Wilbers <[email protected]> * simplify dockerfile to use langchain Signed-off-by: Tyler Wilbers <[email protected]> * sort requirements Signed-off-by: Tyler Wilbers <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Tyler Wilbers <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Minor SPDX header update (opea-project#434) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove 'langsmith' per code review (opea-project#534) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add toxicity detection microservices with E2E testing Signed-off-by: Qun Gao <[email protected]> --------- Signed-off-by: Qun Gao <[email protected]> Signed-off-by: Tyler Wilbers <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Tyler W <[email protected]>
- Loading branch information
1 parent
33db504
commit 97fdf54
Showing
5 changed files
with
237 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# Toxicity Detection Microservice | ||
|
||
# ☣️💥🛡️<span style="color:royalblue"> Intel Toxicity Detection Model </span> | ||
|
||
## Introduction | ||
|
||
Intel also provides toxicity detection model, which is lightweight, runs efficiently on a CPU, and performs well on toxic_chat and jigsaws datasets. More datasets are being fine-tuned. If you're interested, please contact [email protected]. | ||
|
||
## Training Customerizable Toxicity Model on Gaudi2 | ||
|
||
Additionally, we offer a fine-tuning workflow on Intel Gaudi2, allowing you to customerize your toxicity detecction model to suit your unique needs. | ||
|
||
# 🚀1. Start Microservice with Python(Option 1) | ||
|
||
## 1.1 Install Requirements | ||
|
||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## 1.2 Start Toxicity Detection Microservice with Python Script | ||
|
||
```bash | ||
python toxicity_detection.py | ||
``` | ||
|
||
# 🚀2. Start Microservie with Docker (Option 2) | ||
|
||
## 2.1 Prepare toxicity detection model | ||
|
||
export HUGGINGFACEHUB_API_TOKEN=${HP_TOKEN} | ||
|
||
## 2.2 Build Docker Image | ||
|
||
```bash | ||
cd ../../../ # back to GenAIComps/ folder | ||
docker build -t opea/guardrails-toxicity-detection:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/guardrails/toxicity_detection/docker/Dockerfile . | ||
``` | ||
|
||
## 2.3 Run Docker Container with Microservice | ||
|
||
```bash | ||
docker run -d --rm --runtime=runc --name="guardrails-toxicity-detection-endpoint" -p 9091:9091 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} opea/guardrails-toxicity-detection:latest | ||
``` | ||
|
||
# 🚀3. Get Status of Microservice | ||
|
||
```bash | ||
docker container logs -f guardrails-toxicity-detection-endpoint | ||
``` | ||
|
||
# 🚀4. Consume Microservice Pre-LLM/Post-LLM | ||
|
||
Once microservice starts, users can use examples (bash or python) below to apply toxicity detection for both user's query (Pre-LLM) or LLM's response (Post-LLM) | ||
|
||
**Bash:** | ||
|
||
```bash | ||
curl localhost:9091/v1/toxicity | ||
-X POST | ||
-d '{"text":"How to poison your neighbor'\''s dog secretly"}' | ||
-H 'Content-Type: application/json' | ||
``` | ||
|
||
Example Output: | ||
|
||
```bash | ||
"\nI'm sorry, but your query or LLM's response is TOXIC with an score of 0.97 (0-1)!!!\n" | ||
``` | ||
|
||
**Python Script:** | ||
|
||
```python | ||
import requests | ||
import json | ||
|
||
proxies = {"http": ""} | ||
url = "http://localhost:9091/v1/toxicity" | ||
data = {"text": "How to poison your neighbor'''s dog without being caught?"} | ||
|
||
try: | ||
resp = requests.post(url=url, data=data, proxies=proxies) | ||
print(resp.text) | ||
resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes | ||
print("Request successful!") | ||
except requests.exceptions.RequestException as e: | ||
print("An error occurred:", e) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
FROM langchain/langchain:latest | ||
|
||
ENV LANG=C.UTF-8 | ||
|
||
ARG ARCH="cpu" | ||
|
||
RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \ | ||
libgl1-mesa-glx \ | ||
libjemalloc-dev \ | ||
vim | ||
|
||
RUN useradd -m -s /bin/bash user && \ | ||
mkdir -p /home/user && \ | ||
chown -R user /home/user/ | ||
|
||
USER user | ||
|
||
COPY comps /home/user/comps | ||
|
||
RUN pip install --no-cache-dir --upgrade pip && \ | ||
if [ ${ARCH} = "cpu" ]; then pip install torch --index-url https://download.pytorch.org/whl/cpu; fi && \ | ||
pip install --no-cache-dir -r /home/user/comps/guardrails/toxicity_detection/requirements.txt | ||
|
||
ENV PYTHONPATH=$PYTHONPATH:/home/user | ||
|
||
WORKDIR /home/user/comps/guardrails/toxicity_detection/ | ||
|
||
ENTRYPOINT ["python", "toxicity_detection.py"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
aiohttp | ||
docarray[full] | ||
fastapi | ||
httpx | ||
huggingface_hub | ||
langchain-community | ||
langchain-huggingface | ||
opentelemetry-api | ||
opentelemetry-exporter-otlp | ||
opentelemetry-sdk | ||
prometheus-fastapi-instrumentator | ||
pyyaml | ||
requests | ||
shortuuid | ||
uvicorn |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from transformers import pipeline | ||
|
||
from comps import ServiceType, TextDoc, opea_microservices, register_microservice | ||
|
||
|
||
@register_microservice( | ||
name="opea_service@toxicity_detection", | ||
service_type=ServiceType.GUARDRAIL, | ||
endpoint="/v1/toxicity", | ||
host="0.0.0.0", | ||
port=9091, | ||
input_datatype=TextDoc, | ||
output_datatype=TextDoc, | ||
) | ||
def llm_generate(input: TextDoc): | ||
input_text = input.text | ||
toxic = toxicity_pipeline(input_text) | ||
print("done") | ||
if toxic[0]["label"] == "toxic": | ||
return TextDoc(text="Violated policies: toxicity, please check your input.", downstream_black_list=[".*"]) | ||
else: | ||
return TextDoc(text=input_text) | ||
|
||
|
||
if __name__ == "__main__": | ||
model = "citizenlab/distilbert-base-multilingual-cased-toxicity" | ||
toxicity_pipeline = pipeline("text-classification", model=model, tokenizer=model) | ||
opea_microservices["opea_service@toxicity_detection"].start() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
#!/bin/bash | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
set -x | ||
|
||
WORKPATH=$(dirname "$PWD") | ||
ip_address=$(hostname -I | awk '{print $1}') | ||
|
||
function build_docker_images() { | ||
echo "Start building docker images for microservice" | ||
cd $WORKPATH | ||
docker build --no-cache -t opea/guardrails-toxicity-detection:comps --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/guardrails/toxicity_detection/docker/Dockerfile . | ||
if [ $? -ne 0 ]; then | ||
echo "opea/guardrails-toxicity-detection built fail" | ||
exit 1 | ||
else | ||
echo "opea/guardrails-toxicity-detection built successful" | ||
fi | ||
} | ||
|
||
function start_service() { | ||
echo "Starting microservice" | ||
docker run -d --runtime=runc --name="test-comps-guardrails-toxicity-detection-endpoint" -p 9091:9091 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/guardrails-toxicity-detection:comps | ||
sleep 5 | ||
echo "Microservice started" | ||
} | ||
|
||
function validate_microservice() { | ||
echo "Validate microservice started" | ||
echo "test 1 - toxic" | ||
result=$(curl localhost:9091/v1/toxicity -X POST -d '{"text":"How to poison your neighbor'\''s dog secretly"}' -H 'Content-Type: application/json') | ||
if [[ $result == *"Violated"* ]]; then | ||
echo "Result correct." | ||
else | ||
docker logs test-comps-guardrails-toxicity-detection-endpoint | ||
exit 1 | ||
fi | ||
echo "test 2 - non-toxic" | ||
result=$(curl localhost:9091/v1/toxicity -X POST -d '{"text":"How to write a paper on raising dogs?"}' -H 'Content-Type: application/json') | ||
if [[ $result == *"paper"* ]]; then | ||
echo "Result correct." | ||
else | ||
echo "Result wrong." | ||
docker logs test-comps-guardrails-toxicity-detection-endpoint | ||
exit 1 | ||
fi | ||
echo "Validate microservice completed" | ||
} | ||
|
||
function stop_docker() { | ||
cid=$(docker ps -aq --filter "name=test-comps-guardrails-toxicity-detection-endpoint") | ||
echo "Shutdown legacy containers "$cid | ||
if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi | ||
} | ||
|
||
function main() { | ||
|
||
stop_docker | ||
|
||
build_docker_images | ||
start_service | ||
|
||
validate_microservice | ||
|
||
stop_docker | ||
echo "cleanup container images and volumes" | ||
echo y | docker system prune 2>&1 > /dev/null | ||
|
||
} | ||
|
||
main |