From 4584b4bf000095e0d029a9826d6869b62c8edddd Mon Sep 17 00:00:00 2001
From: Aymeric <aymeric.roucher@gmail.com>
Date: Fri, 17 May 2024 19:00:42 +0200
Subject: [PATCH] Implement streaming run in react agent

---
 docs/source/en/agents.md              |  16 ++--
 src/transformers/agents/agents.py     | 111 ++++++++++++++++++--------
 src/transformers/agents/llm_engine.py |   4 -
 src/transformers/agents/prompts.py    |  21 +++--
 4 files changed, 99 insertions(+), 53 deletions(-)

diff --git a/docs/source/en/agents.md b/docs/source/en/agents.md
index ae9e5db2b7897b..c42aecb9d86cd4 100644
--- a/docs/source/en/agents.md
+++ b/docs/source/en/agents.md
@@ -28,8 +28,8 @@ An agent is a system that uses an LLM as its engine, and it has access to functi
 These *tools* are functions for performing a task, and they contain all necessary description for the agent to properly use them.
 
 The agent can be programmed to:
-- devise a series of actions/tools and run them all at once like the `CodeAgent` for example
-- plan and execute actions/tools one by one and wait for the outcome of each action before launching the next one like the `ReactJsonAgent` for example
+- devise a series of actions/tools and run them all at once like the [`CodeAgent`] for example
+- plan and execute actions/tools one by one and wait for the outcome of each action before launching the next one like the [`ReactJsonAgent`] for example
 
 ### Types of agents
 
@@ -42,8 +42,8 @@ This agent has a planning step, then generates python code to execute all its ac
 This is the go-to agent to solve reasoning tasks, since the ReAct framework ([Yao et al., 2022](https://huggingface.co/papers/2210.03629)) makes it really efficient to think on the basis of its previous observations.
 
 We implement two versions of ReactJsonAgent: 
-- [`~ReactJsonAgent`] generates tool calls as a JSON in its output.
-- [`~ReactCodeAgent`] is a new type of ReactJsonAgent that generates its tool calls as blobs of code, which works really well for LLMs that have strong coding performance.
+- [`ReactJsonAgent`] generates tool calls as a JSON in its output.
+- [`ReactCodeAgent`] is a new type of ReactJsonAgent that generates its tool calls as blobs of code, which works really well for LLMs that have strong coding performance.
 
 > [!TIP]
 > Read [Open-source LLMs as LangChain Agents](https://huggingface.co/blog/open-source-llms-as-agents) blog post to learn more the ReAct agent.
@@ -124,7 +124,7 @@ You could use any `llm_engine` method as long as:
 
 You also need a `tools` argument which accepts a list of `Tools`. You can provide an empty list for `tools`, but use the default toolbox with the optional argument `add_base_tools=True`.
 
-Now you can create an agent, like `CodeAgent`, and run it. For convenience, we also provide the `HfEngine` class that uses `huggingface_hub.InferenceClient` under the hood.
+Now you can create an agent, like [`CodeAgent`], and run it. For convenience, we also provide the [`HfEngine`] class that uses `huggingface_hub.InferenceClient` under the hood.
 
 ```python
 from transformers import CodeAgent, HfEngine
@@ -187,7 +187,7 @@ The execution will stop at any code trying to perform an illegal operation or if
 
 ### The system prompt
 
-An agent, or rather the LLM that drives the agent, generates an output based on the system prompt. The system prompt can be customized and tailored to the intended task. For example, check the system prompt for the `ReactCodeAgent` (below version is slightly simplified).
+An agent, or rather the LLM that drives the agent, generates an output based on the system prompt. The system prompt can be customized and tailored to the intended task. For example, check the system prompt for the [`ReactCodeAgent`] (below version is slightly simplified).
 
 ```text
 You will be given a task to solve as best you can.
@@ -246,7 +246,7 @@ of the available tools.
 
 A tool is an atomic function to be used by an agent.
 
-You can for instance check the [~PythonInterpreterTool]: it has a name, a description, input descriptions, an output type, and a `__call__` method to perform the action.
+You can for instance check the [`PythonInterpreterTool`]: it has a name, a description, input descriptions, an output type, and a `__call__` method to perform the action.
 
 When the agent is initialized, the tool attributes are used to generate a tool description which is baked into the agent's system prompt. This lets the agent know which tools it can use and why.
 
@@ -259,7 +259,7 @@ Transformers comes with a default toolbox for empowering agents, that you can ad
 - **Speech to text**: given an audio recording of a person talking, transcribe the speech into text ([Whisper](./model_doc/whisper))
 - **Text to speech**: convert text to speech ([SpeechT5](./model_doc/speecht5))
 - **Translation**: translates a given sentence from source language to target language.
-- **Python code interpreter**: runs your the LLM generated Python code in a secure environment. This tool will only be added to [~ReactJsonAgent] if you use `add_base_tools=True`, since code-based tools can already execute Python code
+- **Python code interpreter**: runs your the LLM generated Python code in a secure environment. This tool will only be added to [`ReactJsonAgent`] if you use `add_base_tools=True`, since code-based tools can already execute Python code
 
 
 You can manually use a tool by calling the [`load_tool`] function and a task to perform.
diff --git a/src/transformers/agents/agents.py b/src/transformers/agents/agents.py
index 18b162c2d4378b..c0e28f8a669679 100644
--- a/src/transformers/agents/agents.py
+++ b/src/transformers/agents/agents.py
@@ -597,7 +597,31 @@ def __init__(
         if "final_answer" not in self._toolbox.tools:
             self._toolbox.add_tool(FinalAnswerTool())
 
-    def run(self, task: str, **kwargs):
+
+    def provide_final_answer(self, task) -> str:
+        """
+        This method provides a final answer to the task, based on the logs of the agent's interactions.
+        """
+        self.prompt = [
+            {
+                "role": MessageRole.SYSTEM,
+                "content": "An agent tried to answer a user query but it failed to do so. You are tasked with providing an answer instead. Here is the agent's memory:",
+            }
+        ]
+        self.prompt += self.write_inner_memory_from_logs()[1:]
+        self.prompt += [
+            {
+                "role": MessageRole.USER,
+                "content": f"Based on the above, please provide an answer to the following user request:\n{task}",
+            }
+        ]
+        try:
+            return self.llm_engine(self.prompt, stop_sequences=["<end_action>", "Observation:"])
+        except Exception as e:
+            return f"Error in generating final llm output: {e}."
+
+
+    def run(self, task: str, stream: bool = False, **kwargs):
         """
         Runs the agent for the given task.
 
@@ -614,41 +638,59 @@ def run(self, task: str, **kwargs):
         agent.run("What is the result of 2 power 3.7384?")
         ```
         """
+        if stream:
+            return self.stream_run(task, **kwargs)
+        else:
+            return self.direct_run(task, **kwargs)
+
+
+    def stream_run(self, task: str, **kwargs):
         self.initialize_for_run(task, **kwargs)
 
         final_answer = None
         iteration = 0
         while final_answer is None and iteration < self.max_iterations:
             try:
-                final_answer = self.step()
+                step_logs = self.step()
+                if 'final_answer' in step_logs:
+                    final_answer = step_logs['final_answer']
             except AgentError as e:
                 self.logger.error(e, exc_info=1)
                 self.logs[-1]["error"] = e
             finally:
                 iteration += 1
+                yield self.logs[-1]
 
         if final_answer is None and iteration == self.max_iterations:
             error_message = "Reached max iterations."
             self.logs.append({"error": AgentMaxIterationsError(error_message)})
             self.logger.error(error_message, exc_info=1)
+            final_answer = self.provide_final_answer(task)
+
+        return final_answer
+
+
+    def direct_run(self, task: str, **kwargs):
+        self.initialize_for_run(task, **kwargs)
 
-            self.prompt = [
-                {
-                    "role": MessageRole.SYSTEM,
-                    "content": "An agent tried to answer a user query but it failed to do so. You are tasked with providing an answer instead. Here is the agent's memory:",
-                }
-            ]
-            self.prompt += self.write_inner_memory_from_logs()[1:]
-            self.prompt += [
-                {
-                    "role": MessageRole.USER,
-                    "content": f"Based on the above, please provide an answer to the following user request:\n{task}",
-                }
-            ]
+        final_answer = None
+        iteration = 0
+        while final_answer is None and iteration < self.max_iterations:
             try:
-                final_answer = self.llm_engine(self.prompt, stop_sequences=["<end_action>", "Observation:"])
-            except Exception as e:
-                final_answer = f"Error in generating final llm output: {e}."
+                step_logs = self.step()
+                if 'final_answer' in step_logs:
+                    final_answer = step_logs['final_answer']
+            except AgentError as e:
+                self.logger.error(e, exc_info=1)
+                self.logs[-1]["error"] = e
+            finally:
+                iteration += 1
+
+        if final_answer is None and iteration == self.max_iterations:
+            error_message = "Reached max iterations."
+            self.logs.append({"error": AgentMaxIterationsError(error_message)})
+            self.logger.error(error_message, exc_info=1)
+            final_answer = self.provide_final_answer(task)
 
         return final_answer
 
@@ -683,12 +725,14 @@ def step(self):
         """
         agent_memory = self.write_inner_memory_from_logs()
 
-        self.logs[-1]["agent_memory"] = agent_memory.copy()
         self.prompt = agent_memory
         self.logger.debug("===== New step =====")
 
         # Add new step in logs
-        self.logs.append({})
+        current_step_logs = {}
+        self.logs.append(current_step_logs)
+        current_step_logs["agent_memory"] = agent_memory.copy()
+
         self.logger.info("===== Calling LLM with this last message: =====")
         self.logger.info(self.prompt[-1])
 
@@ -698,7 +742,7 @@ def step(self):
             raise AgentGenerationError(f"Error in generating llm output: {e}.")
         self.logger.debug("===== Output message of the LLM: =====")
         self.logger.debug(llm_output)
-        self.logs[-1]["llm_output"] = llm_output
+        current_step_logs["llm_output"] = llm_output
 
         # Parse
         self.logger.debug("===== Extracting action =====")
@@ -709,8 +753,8 @@ def step(self):
         except Exception as e:
             raise AgentParsingError(f"Could not parse the given action: {e}.")
 
-        self.logs[-1]["rationale"] = rationale
-        self.logs[-1]["tool_call"] = {"tool_name": tool_name, "tool_arguments": arguments}
+        current_step_logs["rationale"] = rationale
+        current_step_logs["tool_call"] = {"tool_name": tool_name, "tool_arguments": arguments}
 
         # Execute
         self.logger.warning(f"Calling tool: '{tool_name}' with arguments: {arguments}")
@@ -740,8 +784,8 @@ def step(self):
                 updated_information = f"Stored '{observation_name}' in memory."
 
             self.logger.info(updated_information)
-            self.logs[-1]["observation"] = updated_information
-            return None
+            current_step_logs["observation"] = updated_information
+            return current_step_logs
 
 
 class ReactCodeAgent(ReactAgent):
@@ -782,14 +826,15 @@ def step(self):
         The errors are raised here, they are caught and logged in the run() method.
         """
         agent_memory = self.write_inner_memory_from_logs()
-        self.logs[-1]["agent_memory"] = agent_memory.copy()
 
         self.prompt = agent_memory.copy()
 
         self.logger.debug("===== New step =====")
 
         # Add new step in logs
-        self.logs.append({})
+        current_step_logs = {}
+        self.logs.append(current_step_logs)
+        current_step_logs["agent_memory"] = agent_memory.copy()
 
         self.logger.info("===== Calling LLM with these last messages: =====")
         self.logger.info(self.prompt[-2:])
@@ -801,7 +846,7 @@ def step(self):
 
         self.logger.debug("===== Output message of the LLM: =====")
         self.logger.debug(llm_output)
-        self.logs[-1]["llm_output"] = llm_output
+        current_step_logs["llm_output"] = llm_output
 
         # Parse
         self.logger.debug("===== Extracting action =====")
@@ -813,8 +858,8 @@ def step(self):
             error_msg = f"Error in code parsing: {e}. Make sure to provide correct code"
             raise AgentParsingError(error_msg)
 
-        self.logs[-1]["rationale"] = rationale
-        self.logs[-1]["tool_call"] = {"tool_name": "code interpreter", "tool_arguments": code_action}
+        current_step_logs["rationale"] = rationale
+        current_step_logs["tool_call"] = {"tool_name": "code interpreter", "tool_arguments": code_action}
 
         # Execute
         self.log_code_action(code_action)
@@ -824,7 +869,7 @@ def step(self):
             information = self.state["print_outputs"]
             self.logger.warning("Print outputs:")
             self.logger.log(32, information)
-            self.logs[-1]["observation"] = information
+            current_step_logs["observation"] = information
         except Exception as e:
             error_msg = f"Failed while trying to execute the code below:\n{CustomFormatter.reset + code_action + CustomFormatter.reset}\nThis failed due to the following error:\n{str(e)}"
             if "'dict' object has no attribute 'read'" in str(e):
@@ -834,5 +879,5 @@ def step(self):
             if line[: len("final_answer")] == "final_answer":
                 self.logger.warning(">>> Final answer:")
                 self.logger.log(32, result)
-                return result
-        return None
+                current_step_logs["final_answer"] = result
+        return current_step_logs
diff --git a/src/transformers/agents/llm_engine.py b/src/transformers/agents/llm_engine.py
index 50929940959161..a6dd99891ab4b2 100644
--- a/src/transformers/agents/llm_engine.py
+++ b/src/transformers/agents/llm_engine.py
@@ -72,10 +72,6 @@ def __init__(self, model: str = "meta-llama/Meta-Llama-3-8B-Instruct"):
         self.client = InferenceClient(model=self.model, timeout=120)
 
     def __call__(self, messages: List[Dict[str, str]], stop_sequences=[]) -> str:
-        if "Meta-Llama-3" in self.model:
-            if "<|eot_id|>" not in stop_sequences:
-                stop_sequences.append("<|eot_id|>")
-
         # Get clean message list
         messages = get_clean_message_list(messages, role_conversions=llama_role_conversions)
 
diff --git a/src/transformers/agents/prompts.py b/src/transformers/agents/prompts.py
index 3ebbfe8160bee7..829b6dcf0723e4 100644
--- a/src/transformers/agents/prompts.py
+++ b/src/transformers/agents/prompts.py
@@ -129,7 +129,7 @@ def download_prompt(prompt_or_repo_id, agent_name, mode="run"):
 Be sure to provide a 'Code:\n```' sequence before the code and '```<end_action>' after, else you will get an error.
 DO NOT pass the arguments as a dict as in 'answer = ask_search_agent({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = ask_search_agent(query="What is the place where James Bond lives?")'.
 
-Now Begin!
+Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.
 """
 
 
@@ -255,7 +255,11 @@ def download_prompt(prompt_or_repo_id, agent_name, mode="run"):
 Above example were using notional tools that might not exist for you. You only have acces to those tools:
 <<tool_descriptions>>
 
-ALWAYS provide a 'Thought:' and an 'Action:' sequence. You MUST provide at least the 'Action:' sequence to move forward.
+Here are the rules you should always follow to solve your task:
+1. ALWAYS provide a 'Thought:' sequence, and an 'Action:' sequence that ends with <end_action>, else you will fail.
+2. Always use the right arguments for the tools. Never use variable names in the 'action_input' field, use the value instead.
+3. Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself.
+4. Never re-do a tool call that you previously did with the exact same parameters.
 
 Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.
 """
@@ -348,12 +352,13 @@ def download_prompt(prompt_or_repo_id, agent_name, mode="run"):
 
 You also can perform computations in the python code you generate.
 
-These are the rules you should always follow to solve your task:
-1. Always provide a 'Thought:' and an 'Code:\n```py' sequence ending with '```<end_action>' sequence, else you will get an error.
-2. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in 'answer = ask_search_agent({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = ask_search_agent(query="What is the place where James Bond lives?")'.
-3. Make sure the variable you use are all defined.
-3. Do not perform too many operations in a single code block. Split the task into intermediate code blocks. Then use print() to save the intermediate result. Finally, use final_answer() to return the final result.
-4. Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself. Never re-do a tool call that you previously did with the exact same parameters.
+Here are the rules you should always follow to solve your task:
+1. Always provide a 'Thought:' sequence, and a 'Code:\n```py' sequence ending with '```<end_action>' sequence, else you will fail.
+2. Make sure the variable you use are all defined.
+3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in 'answer = ask_search_agent({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = ask_search_agent(query="What is the place where James Bond lives?")'.
+4. Do not perform too many operations in a single code block. Split the task into intermediate code blocks. Then use print() to save the intermediate result. Finally, use final_answer() to return the final result.
+5. Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself.
+6. Never re-do a tool call that you previously did with the exact same parameters.
 
 Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.
 """