Update up-to-date public information for the avatar chatbot example (#…

…228) * Fix invalid link fixed link for LivePortrait Signed-off-by: Chun Tao <[email protected]> * remove sharepoint locations Signed-off-by: Chun Tao <[email protected]> * update avatar project doc Signed-off-by: Chun Tao <[email protected]> * update Signed-off-by: Chun Tao <[email protected]> * add YT video link Signed-off-by: Chun Tao <[email protected]> * update Signed-off-by: Chun Tao <[email protected]> * add idz article link Signed-off-by: Chun Tao <[email protected]> --------- Signed-off-by: Chun Tao <[email protected]>
opea-project · Oct 29, 2024 · 909657b · 909657b
1 parent e732569
commit 909657b
Show file tree

Hide file tree

Showing 3 changed files with 22 additions and 14 deletions.
diff --git a/community/rfcs/24-08-02-OPEA-AIAvatarChatbot.md b/community/rfcs/24-08-02-OPEA-AIAvatarChatbot.md
@@ -1,17 +1,24 @@
 # 24-08-02-OPEA-AIAvatarChatbot
 
-A RAG-Powered Human-Like AI Avatar Audio Chatbot integrated with OPEA AudioQnA
-<!-- The short description of the feature you want to contribute -->
+A Human-Like AI Avatar Audio Chatbot integrated with OPEA AudioQnA  
+
+Code contributions:  
+"animation" component: https://github.com/opea-project/GenAIComps/tree/main/comps/animation/wav2lip  
+"AvatarChatbot" examples: https://github.com/opea-project/GenAIExamples/tree/main/AvatarChatbot
+
+Intel Developer Zone Article "Create an AI Avatar Talking Bot with PyTorch* and Open Platform for Enterprise AI (OPEA)": https://www.intel.com/content/www/us/en/developer/articles/technical/ai-avatar-talking-bot-with-pytorch-and-opea.html 
+
+YouTube tech-talk video: https://youtu.be/OjaElyUB8Z0?si=6-IdxwTg0YFMraFl
 
 ## Author
 <!-- List all contributors of this RFC. -->
 [ctao456](https://github.com/ctao456), [alexsin368](https://github.com/alexsin368), [YuningQiu](https://github.com/YuningQiu), [louie-tsai](https://github.com/louie-tsai)
 
 ## Status
 <!-- Change the PR status to Under Review | Rejected | Accepted. -->
-v0.1 - ASMO Team sharing on Fri 6/28/2024  
-[GenAIComps pr #400](https://github.com/opea-project/GenAIComps/pull/400) (Under Review)  
-[GenAIExamples pr #523](https://github.com/opea-project/GenAIExamples/pull/523) (Under Review)
+v0.1 - ASMO Team sharing on Thursday 10/24/2024  
+* [GenAIComps pr #775](https://github.com/opea-project/GenAIComps/pull/775) | <span style="color: green;">Merged</span>  
+* [GenAIExamples pr #923](https://github.com/opea-project/GenAIExamples/pull/923) | <span style="color: green;">Merged</span>
 
 ## Objective
 <!-- List what problem will this solve? What are the goals and non-goals of this RFC? -->
@@ -39,10 +46,10 @@ The chatbot will:
 * Use multimodal retrieval-augmented generation (RAG) to generate more accurate, in-domain responses, in v0.2
 
 New microservices include:
-* animation 
+* [animation](https://github.com/opea-project/GenAIComps/tree/main/comps/animation/wav2lip) 
 
 New megaservices include:
-* AvatarChatbot
+* [AvatarChatbot](https://github.com/opea-project/GenAIExamples/tree/main/AvatarChatbot)
 
 ## Motivation
 <!-- List why this problem is valuable to solve? Whether some related work exists? -->
@@ -60,9 +67,9 @@ Related works include [Nvidia Audio2Face](https://docs.nvidia.com/ace/latest/mod
 ### Avatar Chatbot design
 <!-- Removed PPT slides -->
 
-![avatar chatbot design](assets/design.png)
+![avatar chatbot design](assets/avatar_design.png)
 
-Currently, the RAG feature using the `embedding` and `dataprep` microservices is missing in the above design, including uploading relevant documents/weblinks, storing them in the database, and retrieving them for the LLM model. These features will be added in v0.2.  
+Currently, the RAG feature using the `embedding`, `retrieval`, `reranking` and `dataprep` microservices and VectorDB is missing in the above design, including uploading relevant documents/weblinks, storing them in the database, and retrieving them for the LLM model. These features will be added in v0.2.  
 
 Flowchart: AvatarChatbot Megaservice  
 <!-- Insert Mermaid flowchart here -->
@@ -217,13 +224,14 @@ End-to-End Inference Time for AvatarChatbot Megaservice (asr -> llm -> tts -> an
 
 On SPR:  
 ~30 seconds for AudioQnA on SPR,  
-~40-200 seconds for AvatarAnimation on SPR
+~30-200 seconds for AvatarAnimation on SPR
 
 On Gaudi 2:  
 ~5 seconds for AudioQnA on Gaudi, 
-~10-50 seconds for AvatarAnimation on Gaudi, depending on:  
+~10-40 seconds for AvatarAnimation on Gaudi, depending on:  
 1) Whether the input is an image or a multi-frame, fixed-fps video
-1) LipSync Animation DL model used: Wav2Lip_only or Wav2Lip+GFPGAN or SadTalker  
-2) Resolution and FPS rate of the resulting mp4 video
+2) The `max_tokens` parameter used in LLM text generation
+3) LipSync Animation DL model used: Wav2Lip_only or Wav2Lip+GFPGAN or SadTalker  
+4) Resolution and FPS rate of the resulting mp4 video
 
-All latency reportings are as of 8/2/2024.
+All latency reportings are as of 10/24/2024.
diff --git a/community/rfcs/assets/avatar_design.png b/community/rfcs/assets/avatar_design.png
diff --git a/community/rfcs/assets/design.png b/community/rfcs/assets/design.png