microsoft · Mac0q · Apr 26, 2024 · Mac0q · Apr 26, 2024
diff --git a/model_worker/README.md b/model_worker/README.md
@@ -3,7 +3,7 @@ The lite version of the prompt is not fully optimized. To achieve better results
 ### If you use QWEN as the Agent
 
 1. QWen (Tongyi Qianwen) is a LLM developed by Alibaba. Go to [QWen](https://dashscope.aliyun.com/) and register an account and get the API key. More details can be found [here](https://help.aliyun.com/zh/dashscope/developer-reference/activate-dashscope-and-create-an-api-key?spm=a2c4g.11186623.0.0.7b5749d72j3SYU) (in Chinese).
-2. Install the required packages dashscope or run the `setup.py` with `-qwen` options.
+2. Uncomment the required packages in requirements.txt or install them separately.
 ```bash
 pip install dashscope
 ```
@@ -23,7 +23,7 @@ You can find the model name in the [QWen LLM model list](https://help.aliyun.com
 We provide a short example to show how to configure the ollama in the following, which might change if ollama makes updates.
 
 ```bash title="install ollama and serve LLMs in local" showLineNumbers
-## Install ollama on Linux & WSL2 or run the `setup.py` with `-ollama` options
+## Install ollama on Linux & WSL2.
 curl https://ollama.ai/install.sh | sh
 ## Run the serving
 ollama serve
@@ -45,19 +45,64 @@ When serving LLMs via Ollama, it will by default start a server at `http://local
     "API_MODEL": "YOUR_MODEL"
 }
 ```
-NOTE: `API_BASE` is the URL started in the Ollama LLM server and `API_MODEL` is the model name of Ollama LLM, it should be same as the one you served before. In addition, due to model limitations, you can use lite version of prompt to have a taste on UFO which can be configured in `config_dev.yaml`. Attention to the top ***note***.
+NOTE: `API_BASE` is the URL started in the Ollama LLM server and `API_MODEL` is the model name of Ollama LLM, it should be same as the one you served before. In addition, due to model limitations, you can use lite version of prompt to have a taste on UFO which can be configured in `config_dev.yaml`. Attention to the top ***NOTE***.
 
 #### If you use your custom model as the Agent
 1. Start a server with your model, which will later be used as the API base in `config.yaml`.
 
 2. Add following configuration to `config.yaml`:
 ```json showLineNumbers
 {
-    "API_TYPE": "custom_model" ,
+    "API_TYPE": "Custom" ,
     "API_BASE": "YOUR_ENDPOINT", 
     "API_KEY": "YOUR_KEY",  
     "API_MODEL": "YOUR_MODEL"
 }
 ```
 
-NOTE: You should create a new Python script <custom_model>.py in the ufo/llm folder like the format of the <placeholder>.py, which needs to inherit `BaseService` as the parent class, as well as the `__init__` and `chat_completion` methods. At the same time, you need to add the dynamic import of your file in the `get_service` method of `BaseService`.
+NOTE: You should create a new Python script `custom_model.py` in the ufo/llm folder like the format of the `placeholder.py`, which needs to inherit `BaseService` as the parent class, as well as the `__init__` and `chat_completion` methods. At the same time, you need to add the dynamic import of your file in the `get_service` method of `BaseService`.
+
+####EXAMPLE
+Also, ufo provides the usage of ***LLaVA-1.5*** and ***CogAgent*** as the example.
+
+1.1 Download the essential libs of your custom model.
+
+#### If you use LLaVA-1.5 as the Agent
+
+Please refer to the [LLaVA](https://github.com/haotian-liu/LLaVA) project to download and prepare the LLaVA-1.5 model, for example:
+
+```bash
+git clone https://github.com/haotian-liu/LLaVA.git
+cd LLaVA
+conda create -n llava python=3.10 -y
+conda activate llava
+pip install --upgrade pip  # enable PEP 660 support
+pip install -e .
+```
+
+#### If you use CogAgent as the Agent
+
+Please refer to the [CogVLM](https://github.com/THUDM/CogVLM) project to download and prepare the CogAgent model. Download the sat version of the CogAgent weights `cogagent-chat.zip` from [here](https://huggingface.co/THUDM/CogAgent/tree/main), unzip it.
+
+1.2 Start your custom model. You must customize your model to support the interface of the UFO.
+For simplicity, you have to configure `YOUR_ENDPOINT/chat/completions`.
+
+#### If you use LLaVA as the Agent
+Add the `direct_generate_llava` method and a new post interface `/chat/completions` from the `custom_model_worker.py` to the into the `llava/serve/model_worker.py` And start it with the following command:
+```bash
+python -m llava.serve.llava_model_worker --host YOUR_HOST --port YOUR_POINT --worker YOUR_ENDPOINT --model-path liuhaotian/llava-v1.5-13b --no-register
+```
+
+#### If you use CogAgent as the Agent
+You can modify the model generate from the `basic_demo/cli_demo.py` with a new post interface `/chat/completions` to enjoy it with UFO.
+
+3. Add following configuration to `config.yaml`:
+```json showLineNumbers
+{
+    "API_TYPE": "Custom" ,
+    "API_BASE": "YOUR_ENDPOINT",   
+    "API_MODEL": "YOUR_MODEL"
+}
+```
+
+***Note***: Only LLaVA and CogAgent are supported as open source models for now. If you want to use your own model, remember to modify the dynamic import of your model file in the `get_service` method of `BaseService` in `ufo/llm/base.py`.
diff --git a/model_worker/custom_worker.py b/model_worker/custom_worker.py
@@ -0,0 +1,67 @@
+#Method to generate response from prompt and image using the Llava model
+@torch.inference_mode()
+def direct_generate_llava(self, params):
+    tokenizer, model, image_processor = self.tokenizer, self.model, self.image_processor
+
+    prompt = params["prompt"]
+    image = params.get("image", None)
+    if image is not None:
+        if DEFAULT_IMAGE_TOKEN not in prompt:
+            raise ValueError("Number of image does not match number of <image> tokens in prompt")
+
+        image = load_image_from_base64(image)
+        image = image_processor.preprocess(image, return_tensors='pt')['pixel_values'][0]
+        image = image.to(self.model.device, dtype=self.model.dtype)
+        images = image.unsqueeze(0)
+
+        replace_token = DEFAULT_IMAGE_TOKEN
+        if getattr(self.model.config, 'mm_use_im_start_end', False):
+            replace_token = DEFAULT_IM_START_TOKEN + replace_token + DEFAULT_IM_END_TOKEN
+        prompt = prompt.replace(DEFAULT_IMAGE_TOKEN, replace_token)
+
+        num_image_tokens = prompt.count(replace_token) * model.get_vision_tower().num_patches
+    else:
+        return {"text": "No image provided", "error_code": 0}
+
+    temperature = float(params.get("temperature", 1.0))
+    top_p = float(params.get("top_p", 1.0))
+    max_context_length = getattr(model.config, 'max_position_embeddings', 2048)
+    max_new_tokens = min(int(params.get("max_new_tokens", 256)), 1024)
+    stop_str = params.get("stop", None)
+    do_sample = True if temperature > 0.001 else False
+    input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors='pt').unsqueeze(0).to(self.device)
+    keywords = [stop_str]
+    max_new_tokens = min(max_new_tokens, max_context_length - input_ids.shape[-1] - num_image_tokens)
+
+    input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors='pt').unsqueeze(0).to(self.device)
+
+    input_seq_len = input_ids.shape[1]
+
+    generation_output = self.model.generate(
+        inputs=input_ids, 
+        do_sample=do_sample,
+        temperature=temperature,
+        top_p=top_p,
+        max_new_tokens=max_new_tokens,
+        images=images, 
+        use_cache=True,
+    )
+
+    generation_output = generation_output[0, input_seq_len:]
+    decoded = tokenizer.decode(generation_output, skip_special_tokens=True)
+
+    response = {"text": decoded}
+    print("response", response)
+    return response
+
+
+# The API is included in llava and cogagent installations. If you customize your model, you can install fastapi via pip or uncomment the library in the requirements.
+# import FastAPI
+# app = FastAPI()
+
+#For llava
+@app.post("/chat/completions")
+async def generate_llava(request: Request):
+    params = await request.json()
+    response_data = worker.direct_generate_llava(params)
+    return response_data
diff --git a/requirements.txt b/requirements.txt
@@ -4,7 +4,7 @@ langchain==0.1.11
 langchain_community==0.0.27
 msal==1.25.0
 openai==1.13.3
-Pillow==10.2.0
+Pillow==10.3.0
 pywin32==306
 pywinauto==0.6.8
 PyYAML==6.0.1

diff --git a/ufo/config/config.yaml.template b/ufo/config/config.yaml.template
@@ -15,6 +15,12 @@ HOST_AGENT: {
   # API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
   # API_MODEL: "YOUR_MODEL",  # The only OpenAI model by now that accepts visual input
   # API_DEPLOYMENT_ID: "gpt-4-visual-preview", # The deployment id for the AOAI API
+
+  ### Comment above and uncomment these according to your need if using "Qwen", "Ollama" or "Custom".
+  # API_TYPE: "Custom", 
+  # API_BASE: "YOUR_ENDPOINT", 
+  # API_KEY: "YOUR_KEY", 
+  # API_MODEL: "YOUR_MODEL",
 
   ### For Azure_AD
   # AAD_TENANT_ID: "YOUR_TENANT_ID", # Set the value to your tenant id for the llm model
@@ -39,6 +45,12 @@ APP_AGENT: {
   # API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
   # API_MODEL: "YOUR_MODEL",  # The only OpenAI model by now that accepts visual input
   # API_DEPLOYMENT_ID: "gpt-4-visual-preview", # The deployment id for the AOAI API
+
+  ### Comment above and uncomment these according to your need if using "Qwen", "Ollama" or "Custom".
+  # API_TYPE: "Custom", 
+  # API_BASE: "YOUR_ENDPOINT", 
+  # API_KEY: "YOUR_KEY", 
+  # API_MODEL: "YOUR_MODEL",
 
   ### For Azure_AD
   # AAD_TENANT_ID: "YOUR_TENANT_ID", # Set the value to your tenant id for the llm model
@@ -63,6 +75,12 @@ BACKUP_AGENT: {
   # API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
   # API_MODEL: "YOUR_MODEL",  # The only OpenAI model by now that accepts visual input
   # API_DEPLOYMENT_ID: "gpt-4-visual-preview", # The deployment id for the AOAI API
+
+  ### Comment above and uncomment these according to your need if using "Qwen", "Ollama" or "Custom".
+  # API_TYPE: "Custom", 
+  # API_BASE: "YOUR_ENDPOINT", 
+  # API_KEY: "YOUR_KEY", 
+  # API_MODEL: "YOUR_MODEL",
 
   ### For Azure_AD
   # AAD_TENANT_ID: "YOUR_TENANT_ID", # Set the value to your tenant id for the llm model

diff --git a/ufo/llm/base.py b/ufo/llm/base.py
@@ -14,22 +14,47 @@ def chat_completion(self, *args, **kwargs):
         pass
 
     @staticmethod
-    def get_service(name):
+    def get_service(name, model_name=None):
+        """
+        Get the service based on the given name and custom model.
+        Args:
+            name (str): The name of the service.
+            model_name (str, optional): The model name.
+        Returns:
+            object: The service object.
+        Raises:
+            ValueError: If the given service name or model name is not supported.
+        """
         service_map = {
                 'openai': 'OpenAIService',
                 'aoai': 'OpenAIService',
                 'azure_ad': 'OpenAIService',
                 'qwen': 'QwenService',
                 'ollama': 'OllamaService',
                 'placeholder': 'PlaceHolderService',
+                'custom': 'CustomService',
                 }
+        custom_service_map = {
+            'llava': 'LlavaService',
+            'cogagent': 'CogAgentService',
+        }
         service_name = service_map.get(name, None)
         if service_name:
             if name in ['aoai', 'azure_ad']:
                 module = import_module('.openai', package='ufo.llm')
+            elif service_name == 'CustomService':
+                custom_model = 'llava' if 'llava' in model_name else model_name
+                custom_service_name = custom_service_map.get('llava' if 'llava' in custom_model else custom_model, None)
+                if custom_service_name:
+                    module = import_module('.'+custom_model, package='ufo.llm')
+                    service_name = custom_service_name
+                else:
+                    raise ValueError(f'Custom model {custom_model} not supported')
             else:
                 module = import_module('.'+name.lower(), package='ufo.llm')
-        return getattr(module, service_name)
+            return getattr(module, service_name)
+        else:
+            raise ValueError(f'Model {name} not supported')
 
     def get_cost_estimator(self, api_type, model, prices, prompt_tokens, completion_tokens) -> float:
         """

diff --git a/ufo/llm/cogagent.py b/ufo/llm/cogagent.py
@@ -0,0 +1,81 @@
+import time
+from typing import Any, Optional
+
+import requests
+
+from ufo.utils import print_with_color
+from .base import BaseService
+
+
+class CogAgentService(BaseService):
+    def __init__(self, config, agent_type: str):
+        self.config_llm = config[agent_type]
+        self.config = config
+        self.max_retry = self.config["MAX_RETRY"]
+        self.timeout = self.config["TIMEOUT"]
+        self.max_tokens = 2048 #default max tokens for cogagent for now
+
+    def chat_completion(
+        self,
+        messages,
+        n,
+        temperature: Optional[float] = None,
+        max_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        **kwargs: Any,
+    ):
+        """
+        Generate chat completions based on given messages.
+        Args:
+            messages (list): A list of messages.
+            n (int): The number of completions to generate.
+            temperature (float, optional): The temperature for sampling. Defaults to None.
+            max_tokens (int, optional): The maximum number of tokens in the completion. Defaults to None.
+            top_p (float, optional): The cumulative probability for top-p sampling. Defaults to None.
+            **kwargs: Additional keyword arguments.
+        Returns:
+            tuple: A tuple containing the generated texts and None.
+        """
+
+        temperature = temperature if temperature is not None else self.config["TEMPERATURE"]
+        max_tokens = max_tokens if max_tokens is not None else self.config["MAX_TOKENS"]
+        top_p = top_p if top_p is not None else self.config["TOP_P"]
+
+        texts = []
+        for i in range(n):
+            image_base64 = None
+            if self.config_llm["VISUAL_MODE"]:
+                image_base64 = messages[1]['content'][-2]['image_url']\
+                    ['url'].split('base64,')[1]
+            prompt = messages[0]['content'] + messages[1]['content'][-1]['text']
+
+            payload = {
+                'model': self.config_llm['API_MODEL'],
+                'prompt': prompt,
+                'temperature': temperature,
+                'top_p': top_p,
+                'max_new_tokens': self.max_tokens,
+                "image":image_base64
+            }
+
+            for _ in range(self.max_retry):
+                try:
+                    response = requests.post(self.config_llm['API_BASE']+"/chat/completions", json=payload)
+                    if response.status_code == 200:
+                        response = response.json()
+                        text = response["text"]
+                        texts.append(text)
+                        break
+                    else:
+                        raise Exception(
+                    f"Failed to get completion with error code {response.status_code}: {response.text}",
+                )
+                except Exception as e:
+                    print_with_color(f"Error making API request: {e}", "red")
+                    try:
+                        print_with_color(response, "red")
+                    except:
+                        _
+                    time.sleep(3)
+                    continue
+        return texts, None