huggingface · aymeric-roucher · May 31, 2024 · May 15, 2024 · May 17, 2024 · May 17, 2024
diff --git a/docs/source/en/agents.md b/docs/source/en/agents.md
@@ -28,8 +28,8 @@ An agent is a system that uses an LLM as its engine, and it has access to functi
 These *tools* are functions for performing a task, and they contain all necessary description for the agent to properly use them.
 
 The agent can be programmed to:
-- devise a series of actions/tools and run them all at once like the `CodeAgent` for example
-- plan and execute actions/tools one by one and wait for the outcome of each action before launching the next one like the `ReactJsonAgent` for example
+- devise a series of actions/tools and run them all at once like the [`CodeAgent`] for example
+- plan and execute actions/tools one by one and wait for the outcome of each action before launching the next one like the [`ReactJsonAgent`] for example
 
 ### Types of agents
 
@@ -42,8 +42,8 @@ This agent has a planning step, then generates python code to execute all its ac
 This is the go-to agent to solve reasoning tasks, since the ReAct framework ([Yao et al., 2022](https://huggingface.co/papers/2210.03629)) makes it really efficient to think on the basis of its previous observations.
 
 We implement two versions of ReactJsonAgent: 
-- [`~ReactJsonAgent`] generates tool calls as a JSON in its output.
-- [`~ReactCodeAgent`] is a new type of ReactJsonAgent that generates its tool calls as blobs of code, which works really well for LLMs that have strong coding performance.
+- [`ReactJsonAgent`] generates tool calls as a JSON in its output.
+- [`ReactCodeAgent`] is a new type of ReactJsonAgent that generates its tool calls as blobs of code, which works really well for LLMs that have strong coding performance.
 
 > [!TIP]
 > Read [Open-source LLMs as LangChain Agents](https://huggingface.co/blog/open-source-llms-as-agents) blog post to learn more the ReAct agent.
@@ -124,7 +124,7 @@ You could use any `llm_engine` method as long as:
 
 You also need a `tools` argument which accepts a list of `Tools`. You can provide an empty list for `tools`, but use the default toolbox with the optional argument `add_base_tools=True`.
 
-Now you can create an agent, like `CodeAgent`, and run it. For convenience, we also provide the `HfEngine` class that uses `huggingface_hub.InferenceClient` under the hood.
+Now you can create an agent, like [`CodeAgent`], and run it. For convenience, we also provide the [`HfEngine`] class that uses `huggingface_hub.InferenceClient` under the hood.
 
 ```python
 from transformers import CodeAgent, HfEngine
@@ -187,7 +187,7 @@ The execution will stop at any code trying to perform an illegal operation or if
 
 ### The system prompt
 
-An agent, or rather the LLM that drives the agent, generates an output based on the system prompt. The system prompt can be customized and tailored to the intended task. For example, check the system prompt for the `ReactCodeAgent` (below version is slightly simplified).
+An agent, or rather the LLM that drives the agent, generates an output based on the system prompt. The system prompt can be customized and tailored to the intended task. For example, check the system prompt for the [`ReactCodeAgent`] (below version is slightly simplified).
 
 ```text
 You will be given a task to solve as best you can.
@@ -246,7 +246,7 @@ of the available tools.
 
 A tool is an atomic function to be used by an agent.
 
-You can for instance check the [~PythonInterpreterTool]: it has a name, a description, input descriptions, an output type, and a `__call__` method to perform the action.
+You can for instance check the [`PythonInterpreterTool`]: it has a name, a description, input descriptions, an output type, and a `__call__` method to perform the action.
 
 When the agent is initialized, the tool attributes are used to generate a tool description which is baked into the agent's system prompt. This lets the agent know which tools it can use and why.
 
@@ -259,7 +259,7 @@ Transformers comes with a default toolbox for empowering agents, that you can ad
 - **Speech to text**: given an audio recording of a person talking, transcribe the speech into text ([Whisper](./model_doc/whisper))
 - **Text to speech**: convert text to speech ([SpeechT5](./model_doc/speecht5))
 - **Translation**: translates a given sentence from source language to target language.
-- **Python code interpreter**: runs your the LLM generated Python code in a secure environment. This tool will only be added to [~ReactJsonAgent] if you use `add_base_tools=True`, since code-based tools can already execute Python code
+- **Python code interpreter**: runs your the LLM generated Python code in a secure environment. This tool will only be added to [`ReactJsonAgent`] if you use `add_base_tools=True`, since code-based tools can already execute Python code
 
 
 You can manually use a tool by calling the [`load_tool`] function and a task to perform.

diff --git a/src/transformers/agents/agents.py b/src/transformers/agents/agents.py
@@ -347,6 +347,7 @@ def toolbox(self) -> Toolbox:
         return self._toolbox
 
     def initialize_for_run(self, task: str, **kwargs):
+        self.token_count = 0
         self.task = task
         if len(kwargs) > 0:
             self.task += f"\nYou have been provided with these initial arguments: {str(kwargs)}."
@@ -544,7 +545,7 @@ def run(self, task: str, return_generated_code: bool = False, **kwargs):
         self.prompt = [prompt_message, task_message]
         self.logger.info("====Executing with this prompt====")
         self.logger.info(self.prompt)
-        llm_output = self.llm_engine(self.prompt, stop_sequences=["<end_code>"])
+        llm_output = self.llm_engine(self.prompt, stop_sequences=["<end_action>"])
 
         if return_generated_code:
             return llm_output
@@ -597,7 +598,29 @@ def __init__(
         if "final_answer" not in self._toolbox.tools:
             self._toolbox.add_tool(FinalAnswerTool())
 
-    def run(self, task: str, **kwargs):
+    def provide_final_answer(self, task) -> str:
+        """
+        This method provides a final answer to the task, based on the logs of the agent's interactions.
+        """
+        self.prompt = [
+            {
+                "role": MessageRole.SYSTEM,
+                "content": "An agent tried to answer an user query but it got stuck and failed to do so. You are tasked with providing an answer instead. Here is the agent's memory:",
+            }
+        ]
+        self.prompt += self.write_inner_memory_from_logs()[1:]
+        self.prompt += [
+            {
+                "role": MessageRole.USER,
+                "content": f"Based on the above, please provide an answer to the following user request:\n{task}",
+            }
+        ]
+        try:
+            return self.llm_engine(self.prompt)
+        except Exception as e:
+            return f"Error in generating final llm output: {e}."
+
+    def run(self, task: str, stream: bool = False, **kwargs):
         """
         Runs the agent for the given task.
 
@@ -614,41 +637,62 @@ def run(self, task: str, **kwargs):
         agent.run("What is the result of 2 power 3.7384?")
         ```
         """
+        if stream:
+            return self.stream_run(task, **kwargs)
+        else:
+            return self.direct_run(task, **kwargs)
+
+    def stream_run(self, task: str, **kwargs):
         self.initialize_for_run(task, **kwargs)
 
         final_answer = None
         iteration = 0
         while final_answer is None and iteration < self.max_iterations:
             try:
-                final_answer = self.step()
+                step_logs = self.step()
+                if "final_answer" in step_logs:
+                    final_answer = step_logs["final_answer"]
             except AgentError as e:
                 self.logger.error(e, exc_info=1)
                 self.logs[-1]["error"] = e
             finally:
                 iteration += 1
+                yield self.logs[-1]
 
         if final_answer is None and iteration == self.max_iterations:
             error_message = "Reached max iterations."
-            self.logs.append({"error": AgentMaxIterationsError(error_message)})
+            final_step_log = {"error": AgentMaxIterationsError(error_message)}
+            self.logs.append(final_step_log)
             self.logger.error(error_message, exc_info=1)
+            final_answer = self.provide_final_answer(task)
+            final_step_log["final_answer"] = final_answer
+            yield final_step_log
+
+        yield final_answer
+
+    def direct_run(self, task: str, **kwargs):
+        self.initialize_for_run(task, **kwargs)
 
-            self.prompt = [
-                {
-                    "role": MessageRole.SYSTEM,
-                    "content": "An agent tried to answer a user query but it failed to do so. You are tasked with providing an answer instead. Here is the agent's memory:",
-                }
-            ]
-            self.prompt += self.write_inner_memory_from_logs()[1:]
-            self.prompt += [
-                {
-                    "role": MessageRole.USER,
-                    "content": f"Based on the above, please provide an answer to the following user request:\n{task}",
-                }
-            ]
+        final_answer = None
+        iteration = 0
+        while final_answer is None and iteration < self.max_iterations:
             try:
-                final_answer = self.llm_engine(self.prompt, stop_sequences=["Observation:"])
-            except Exception as e:
-                final_answer = f"Error in generating final llm output: {e}."
+                step_logs = self.step()
+                if "final_answer" in step_logs:
+                    final_answer = step_logs["final_answer"]
+            except AgentError as e:
+                self.logger.error(e, exc_info=1)
+                self.logs[-1]["error"] = e
+            finally:
+                iteration += 1
+
+        if final_answer is None and iteration == self.max_iterations:
+            error_message = "Reached max iterations."
+            final_step_log = {"error": AgentMaxIterationsError(error_message)}
+            self.logs.append(final_step_log)
+            self.logger.error(error_message, exc_info=1)
+            final_answer = self.provide_final_answer(task)
+            final_step_log["final_answer"] = final_answer
 
         return final_answer
 
@@ -683,22 +727,24 @@ def step(self):
         """
         agent_memory = self.write_inner_memory_from_logs()
 
-        self.logs[-1]["agent_memory"] = agent_memory.copy()
         self.prompt = agent_memory
         self.logger.debug("===== New step =====")
 
         # Add new step in logs
-        self.logs.append({})
+        current_step_logs = {}
+        self.logs.append(current_step_logs)
+        current_step_logs["agent_memory"] = agent_memory.copy()
+
         self.logger.info("===== Calling LLM with this last message: =====")
         self.logger.info(self.prompt[-1])
 
         try:
-            llm_output = self.llm_engine(self.prompt, stop_sequences=["Observation:"])
+            llm_output = self.llm_engine(self.prompt, stop_sequences=["<end_action>", "Observation:"])
         except Exception as e:
             raise AgentGenerationError(f"Error in generating llm output: {e}.")
         self.logger.debug("===== Output message of the LLM: =====")
         self.logger.debug(llm_output)
-        self.logs[-1]["llm_output"] = llm_output
+        current_step_logs["llm_output"] = llm_output
 
         # Parse
         self.logger.debug("===== Extracting action =====")
@@ -709,8 +755,8 @@ def step(self):
         except Exception as e:
             raise AgentParsingError(f"Could not parse the given action: {e}.")
 
-        self.logs[-1]["rationale"] = rationale
-        self.logs[-1]["tool_call"] = {"tool_name": tool_name, "tool_arguments": arguments}
+        current_step_logs["rationale"] = rationale
+        current_step_logs["tool_call"] = {"tool_name": tool_name, "tool_arguments": arguments}
 
         # Execute
         self.logger.warning(f"Calling tool: '{tool_name}' with arguments: {arguments}")
@@ -721,7 +767,8 @@ def step(self):
                 answer = arguments
             if answer in self.state:  # if the answer is a state variable, return the value
                 answer = self.state[answer]
-            return answer
+            current_step_logs["final_answer"] = answer
+            return current_step_logs
         else:
             observation = self.execute_tool_call(tool_name, arguments)
             observation_type = type(observation)
@@ -740,8 +787,8 @@ def step(self):
                 updated_information = f"Stored '{observation_name}' in memory."
 
             self.logger.info(updated_information)
-            self.logs[-1]["observation"] = updated_information
-            return None
+            current_step_logs["observation"] = updated_information
+            return current_step_logs
 
 
 class ReactCodeAgent(ReactAgent):
@@ -782,26 +829,27 @@ def step(self):
         The errors are raised here, they are caught and logged in the run() method.
         """
         agent_memory = self.write_inner_memory_from_logs()
-        self.logs[-1]["agent_memory"] = agent_memory.copy()
 
         self.prompt = agent_memory.copy()
 
         self.logger.debug("===== New step =====")
 
         # Add new step in logs
-        self.logs.append({})
+        current_step_logs = {}
+        self.logs.append(current_step_logs)
+        current_step_logs["agent_memory"] = agent_memory.copy()
 
         self.logger.info("===== Calling LLM with these last messages: =====")
         self.logger.info(self.prompt[-2:])
 
         try:
-            llm_output = self.llm_engine(self.prompt, stop_sequences=["<end_code>", "Observation:"])
+            llm_output = self.llm_engine(self.prompt, stop_sequences=["<end_action>", "Observation:"])
         except Exception as e:
             raise AgentGenerationError(f"Error in generating llm output: {e}.")
 
         self.logger.debug("===== Output message of the LLM: =====")
         self.logger.debug(llm_output)
-        self.logs[-1]["llm_output"] = llm_output
+        current_step_logs["llm_output"] = llm_output
 
         # Parse
         self.logger.debug("===== Extracting action =====")
@@ -813,8 +861,8 @@ def step(self):
             error_msg = f"Error in code parsing: {e}. Make sure to provide correct code"
             raise AgentParsingError(error_msg)
 
-        self.logs[-1]["rationale"] = rationale
-        self.logs[-1]["tool_call"] = {"tool_name": "code interpreter", "tool_arguments": code_action}
+        current_step_logs["rationale"] = rationale
+        current_step_logs["tool_call"] = {"tool_name": "code interpreter", "tool_arguments": code_action}
 
         # Execute
         self.log_code_action(code_action)
@@ -824,7 +872,7 @@ def step(self):
             information = self.state["print_outputs"]
             self.logger.warning("Print outputs:")
             self.logger.log(32, information)
-            self.logs[-1]["observation"] = information
+            current_step_logs["observation"] = information
         except Exception as e:
             error_msg = f"Failed while trying to execute the code below:\n{CustomFormatter.reset + code_action + CustomFormatter.reset}\nThis failed due to the following error:\n{str(e)}"
             if "'dict' object has no attribute 'read'" in str(e):
@@ -834,5 +882,5 @@ def step(self):
             if line[: len("final_answer")] == "final_answer":
                 self.logger.warning(">>> Final answer:")
                 self.logger.log(32, result)
-                return result
-        return None
+                current_step_logs["final_answer"] = result
+        return current_step_logs
diff --git a/src/transformers/agents/llm_engine.py b/src/transformers/agents/llm_engine.py
@@ -72,12 +72,6 @@ def __init__(self, model: str = "meta-llama/Meta-Llama-3-8B-Instruct"):
         self.client = InferenceClient(model=self.model, timeout=120)
 
     def __call__(self, messages: List[Dict[str, str]], stop_sequences=[]) -> str:
-        if "Meta-Llama-3" in self.model:
-            if "<|eot_id|>" not in stop_sequences:
-                stop_sequences.append("<|eot_id|>")
-            if "!!!!!" not in stop_sequences:
-                stop_sequences.append("!!!!!")
-
         # Get clean message list
         messages = get_clean_message_list(messages, role_conversions=llama_role_conversions)