diff --git a/automata-embedding-data b/automata-embedding-data index bf3046ac8..878c9fbc4 160000 --- a/automata-embedding-data +++ b/automata-embedding-data @@ -1 +1 @@ -Subproject commit bf3046ac8890cd3f81719fa5b00d8f4c041bfa9f +Subproject commit 878c9fbc4395fb7cad22cdf032fef211d6f83afa diff --git a/docs/agent/agent.rst b/docs/agent/agent.rst index f8478eadf..f9bba3cb9 100644 --- a/docs/agent/agent.rst +++ b/docs/agent/agent.rst @@ -1,25 +1,89 @@ -- The ``Agent`` class is typically used as a base for creating - autonomous agents that interact with different APIs or data sources. - These agents can be used for a variety of tasks including but not - limited to text generation, translation, summarization or any task - that involves natural language processing. - -- The ``OpenAIAutomataAgent`` is one specific implementation of the - ``Agent`` class. Depending on the library, there might be other - concrete implementations to interact with different APIs. - -- A custom database provider for the ``set_database_provider`` method - could be any class that implements a common interface for database - operations. For instance, this could be a provider that interacts - with a SQL database, a NoSQL database like MongoDB, or a simple - in-memory database for testing purposes. - -- The ``LLMConversation`` typically represents a series of exchanges or - “turns” between the agent and user, where each “turn” includes a user - message and an assistant message. The ``LLMIterationResult`` - typically contains the result of a single iteration of processing, - which includes the assistant’s message for the current turn and when - implemented, could include other metadata such as response time, - temperature for generation, use of p, etc. Kindly note that the - actual implementation might differ based on specific implementation - of ``Agent`` and the context it is being used in. +Agent Class +=========== + +Overview +-------- + +The ``Agent`` is an abstract class for creating autonomous agents. These +agents can perform actions and communicate with other providers. During +instantiation, an agent is initialized with a set of instructions and +can optionally be linked with a database provider. + +An ``Agent`` works by advancing through a sequence of tasks. It +implements iterator methods (``__iter__`` and ``__next__``) for this +purpose. Each iteration corresponds to a step of the task that the +``Agent`` has been designed to accomplish. This step could be a +conversation turn, which involves generating a new message from the +‘assistant’ and then parsing the reply from the ‘user’. The ``run`` +method can be used to execute these tasks until completion, with the +task being deemed complete when the ``__next__`` method returns +``None``. + +It has abstract properties for fetching its responses, associated +conversation, and tools, whose concrete implementation is instantiated +by subclasses. It also has an abstract method for setting a database +provider, essential for managing conversations with the user. + +Usage Example: +-------------- + +The following example shows a basic creation of a subclass of ``Agent``: + +.. code:: python + + class SimpleAgent(Agent): + """Implements the abstract Agent class for a simple specific agent.""" + + def __init__(self, instructions: str) -> None: + super().__init__(instructions) + + def __iter__(self): + ... + + def __next__(self) -> str: + ... + + @property + def conversation(self) -> LLMConversation: + ... + + @property + def agent_responses(self) -> List[LLMChatMessage]: + ... + + @property + def tools(self) -> Sequence[Tool]: + ... + + def run(self) -> str: + ... + +This example shows a simple implementation of the ``Agent`` abstract +class. The ``...`` represents sections of code that must be implemented +to define the specific behaviour of the ``SimpleAgent``. + +Related Symbols +--------------- + +- ``LLMChatMessage``, ``LLMConversation``: Models for handling and + representing chat messages and conversations. +- ``Tool``: An abstraction for different types of tools associated with + the agent. +- ``LLMConversationDatabaseProvider``: Abstract base class for database + providers. + +Limitations +----------- + +The ``Agent`` abstract class doesn’t provide an easy method to modify or +control the flow of execution. It assumes that all tasks are to be +performed in a cyclical manner and that they complete after a specific +number of steps. + +Follow-up Questions: +-------------------- + +- How to handle more complex workflows that require non-linear + execution paths? +- Is it possible to dynamically adjust the maximum number of iterations + based on the task complexity? diff --git a/docs/agent/agent/index.rst b/docs/agent/agent/index.rst index 764986ff9..0b92dc729 100644 --- a/docs/agent/agent/index.rst +++ b/docs/agent/agent/index.rst @@ -17,6 +17,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/agent/agent_database_error.rst b/docs/agent/agent_database_error.rst index 0e4fa6f87..0972e343e 100644 --- a/docs/agent/agent_database_error.rst +++ b/docs/agent/agent_database_error.rst @@ -1,17 +1,2 @@ -- Given that ``AgentDatabaseError`` will be raised in cases of various - database-related issues, it would be beneficial to include specific - error messages or codes. This would make it easier for users to - determine the source of the problem and know how to fix it, rather - than needing to manually inspect the code and database setup. -- Whether ``AgentDatabaseError`` is being used in test coverage would - depend on the specific practices of the developers or team using it. - As a best practice, it is usually a good idea to include error - handling and exception throwing in testing to ensure that your code - can gracefully handle any runtime issues. If ``AgentDatabaseError`` - is used in testing, it should be documented in the test cases to make - it clear to other developers and future users exactly when and why - it’s being raised. -- The documentation for ``AgentDatabaseError`` should also ideally - include examples of situations where it could be raised, to provide a - better understanding for users who aren’t intimately familiar with - the database provider setup. +class AgentDatabaseError(AutomataError): ‘An exception raised when the +agent fails to set the database provider.’ pass diff --git a/docs/agent/agent_general_error.rst b/docs/agent/agent_general_error.rst index b2fae105f..03cf8b322 100644 --- a/docs/agent/agent_general_error.rst +++ b/docs/agent/agent_general_error.rst @@ -1,16 +1,2 @@ -- ``AgentGeneralError`` could potentially have subclasses for further - granulation of errors. However, whether such subclasses exist or not - depends on the specifics of the error handling design in the package. - Subclasses could be helpful in distinguishing between different types - of errors, providing more detailed information to the users and - developers. - -- More specific errors can indeed be beneficial in certain cases. If - the system has known points of potential failure, then having more - specific exceptions allows for more precise error handling and easier - debugging. However, the ``AgentGeneralError`` should still be present - to catch all the unexpected errors that do not fit into any of the - specific cases. - -Thus, a balance between general and specific errors should be maintained -for a robust error handling system. +class AgentGeneralError(AutomataError): ‘An exception raised when there +is a general error arises with the agent.’ pass diff --git a/docs/agent/agent_max_iter_error.rst b/docs/agent/agent_max_iter_error.rst index bb58b0eab..33e8e5876 100644 --- a/docs/agent/agent_max_iter_error.rst +++ b/docs/agent/agent_max_iter_error.rst @@ -1,27 +1,2 @@ -As an AI developed by OpenAI, I do not have direct access to the -``automata`` module, a third-party package. I cannot retrieve its -concrete implementation details, making it challenging to answer the -follow-up questions concretely. Furthermore, my training doesn’t include -specific knowledge of this package. Nonetheless, I can offer a general -perspective based on common programming principles and practices: - -1. **Default iteration count**: With most libraries that include a - maximum iteration count, a default value is often defined. The actual - default value can vary according to the specific parameters of the - agent. - -2. **Modifying iteration count**: Depending on how the library is - implemented, the maximum iteration count could potentially be - modified during an agent’s lifecycle. However, typically, such - settings are defined during initialization and might remain constant. - -3. **Recovery from AgentMaxIterError**: Typically, when a maximum - iteration count is exceeded, the task that the agent was trying to - fulfill is considered unsuccessful. However, the way this case is - handled can vary. In some cases, it might be designed to attempt the - task again with adjusted parameters; in others, it could fail - outright and require manual intervention. - -To obtain the most accurate answers, consulting the specific -implementation details of the ``automata`` library or reaching out to -its developers would be recommended. +class AgentMaxIterError(AutomataError): ‘An exception raised when the +agent exceeds the maximum number of iterations.’ pass diff --git a/docs/agent/agent_result_error.rst b/docs/agent/agent_result_error.rst index 0a199e27e..828958c3d 100644 --- a/docs/agent/agent_result_error.rst +++ b/docs/agent/agent_result_error.rst @@ -1,17 +1,2 @@ -``AgentResultError`` is typically thrown when an agent in the Automata -project fails to deliver an expected outcome after performing its task. -This could be due to an internal error, a communication failure with -other interfaces, or reaching the maximum number of iterations without a -result. - -How the system recovers from this error greatly depends on the context -and the system design. In general, raising an ``AgentResultError`` would -be followed by diagnostic logging that provides useful information for -troubleshooting. The system might also catch this exception and attempt -a retry cycle, switch to a different agent, or notify the user about the -incident if the error persists. - -For a specific recovery process, developers should consider the -trade-off between system complexity and resilience. It’s crucial to -strike a balance to avoid overcomplicating the system or making it -excessively prone to errors. +class AgentResultError(AutomataError): ‘An exception raised when the +agent fails to produce a result.’ pass diff --git a/docs/agent/agent_stop_iteration_error.rst b/docs/agent/agent_stop_iteration_error.rst new file mode 100644 index 000000000..1c5cf02c0 --- /dev/null +++ b/docs/agent/agent_stop_iteration_error.rst @@ -0,0 +1,2 @@ +class AgentStopIterationError(AutomataError): ‘An exception raised when +the agent iteration process terminates.’ pass diff --git a/docs/agent/agent_toolkit_builder.rst b/docs/agent/agent_toolkit_builder.rst index 3fe5e30c7..d7960d2a2 100644 --- a/docs/agent/agent_toolkit_builder.rst +++ b/docs/agent/agent_toolkit_builder.rst @@ -1,94 +1,13 @@ -AgentToolkitBuilder -=================== +class AgentToolkitBuilder(ABC): ‘:raw-latex:`\n `AgentToolkitBuilder +is an abstract class for building tools for +providers.:raw-latex:`\n `Each builder builds the tools associated +with a specific AgentToolkitNames.:raw-latex:`\n `’ TOOL_NAME: +Optional[AgentToolkitNames] = None LLM_PROVIDER: Optional[LLMProvider] = +None -``AgentToolkitBuilder`` is an abstract class used for building tools for -various providers. These tools, once built, are associated with the -respective ``AgentToolkitNames``. - -Overview --------- - -The fundamental purpose of ``AgentToolkitBuilder`` is to offer a -standardized way to create a collection of tools that can be used with -different types of agents, as defined by the ``AgentToolkitNames``. - -Given the abstract nature of this class, it doesn’t instantiate any -object on its own, but outlines the requirements for -sub-classes/offspring of the ``AgentToolkitBuilder``. - -Related Symbols ---------------- - -Here are some related classes that build upon or interact with -``AgentToolkitBuilder``: - -- ``ContextOracleOpenAIToolkitBuilder`` -- ``SymbolSearchOpenAIToolkitBuilder`` -- ``PythonAgentToolkit`` -- ``OpenAIAgentToolkitBuilder`` -- ``PyWriterOpenAIToolkitBuilder`` - -Mandatory Methods ------------------ - -The ``AgentToolkitBuilder`` possesses an abstract method named -``build``: - -.. code:: python +:: @abstractmethod - def build(self) -> List[Tool]: + def build(self) -> List['Tool']: + 'Builds the tools associated with the `AgentToolkitBuilder`.' pass - -This method, once implemented in the subclasses, is expected to return a -list of ``Tool`` objects. - -Example -------- - -Let’s provide an example of a class ``PythonAgentToolkit`` which -inherits from ``AgentToolkitBuilder``. - -.. code:: python - - from automata.tools.base import Tool - - class PythonAgentToolkit: - def __init__(self, python_agent: PythonAgent): - self.python_agent = python_agent - - def build(self) -> List[Tool]: - def python_agent_python_task(): - pass - - tools = [ - Tool( - "automata-task", - python_agent_python_task, - "Execute a Python task using the PythonAgent. Provide the task description in plain English.", - ) - ] - return tools - -In this example, the subclass ``PythonAgentToolkit`` implements the -``build`` method to generate a list of ``Tool`` items. - -Limitations and Considerations ------------------------------- - -Since ``AgentToolkitBuilder`` is an abstract class, it should not be -instantiated directly. Instead, create a subclass that implements the -``build`` method. The usage and appropriateness of this class and its -subclasses will depend on the corresponding agent context where this -toolkit would be used. - -Follow-up Questions: --------------------- - -- Are there existing subclasses of ``AgentToolkitBuilder`` apart from - the ones identified? -- Are there any additional methods that could be part of the - ``AgentToolkitBuilder``, to be implemented by subclasses? -- Any specific structures to be maintained in the ``Tool`` objects - built by the subclasses? How are these ``Tool`` objects expected to - interact with agents? diff --git a/docs/agent/agent_toolkit_names.rst b/docs/agent/agent_toolkit_names.rst index 2304ed82f..40aee898d 100644 --- a/docs/agent/agent_toolkit_names.rst +++ b/docs/agent/agent_toolkit_names.rst @@ -1,28 +1,75 @@ -Being an enumeration class, ``AgentToolkitNames`` provides ease in using -toolkit identifiers for managing agent tools. While it simplifies the -process of tool selection in the current setup, it is inherently static -and **update to this enum class is required when new toolkits are -introduced or existing ones are removed**. - -In such a scenario, a new enum value corresponding to the new toolkit -must be added to ``AgentToolkitNames``, and a new builder class for the -toolkit must be defined in ``automata/core/agent/builder/``. - -If an enum in ``AgentToolkitNames`` doesn’t find a matching builder, -this would result in a **KeyError** at runtime when trying to access the -builder from the ``AgentToolkitBuilder.TOOL_TYPE`` dictionary. - -To avoid this, developers should ensure that all enum values within -``AgentToolkitNames`` have a corresponding builder class in the -``automata/core/agent/builder/`` directory. Implementation ideally needs -to include a check whenever new toolkits are added, to ensure that -associated enum and builder exist. - -On the same note, they should handle removal of toolkits cautiously to -avoid runtime errors. Deletion of any toolkit should include removal of -associated enum in ``AgentToolkitNames`` and deletion of the -corresponding builder class. - -These changes should ideally go hand-in-hand and should be a part of the -same commit in version control systems to avoid conflicts due to partial -updates. +AgentToolkitNames +================= + +``AgentToolkitNames`` is an enumerated class that defines the different +types of agent tools available. These names correspond to various types +of agent tools. This enum provides an easy way to identify an agent tool +through its name. + +The associated builders, which construct corresponding agent tools, can +be found in the ``automata/core/agent/builder/*`` directory. + +Overview +-------- + +``AgentToolkitNames`` is a ``Python Enum`` that provides symbolic names +to the agent tools used within the OpenAI Automata system. It helps in +maintaining a clean, clear enumeration and handling of agent toolkit +names. + +This enum consists of several members: + +- SYMBOL_SEARCH +- ADVANCED_CONTEXT_ORACLE +- DOCUMENT_ORACLE +- PY_READER +- PY_WRITER +- PY_INTERPRETER +- AGENTIFIED_SEARCH + +These names, when used, are replaced by their respective string values +``'symbol-search'``, ``'advanced-context-oracle'``, +``'document-oracle'``, ``'py-reader'``, ``'py-writer'``, +``'py-interpreter'``, and ``'agent-search'``. + +Related Symbols +--------------- + +- ``automata.agent.openai_agent.OpenAIAutomataAgent`` +- ``automata.tools.agent_tool_factory.AgentToolFactory`` +- ``automata.agent.openai_agent.OpenAIAgentToolkitBuilder`` +- ``automata.tools.tool_base.Tool`` +- ``automata.llm.providers.openai_llm.OpenAITool`` +- ``automata.experimental.tools.builders.symbol_search_builder.SymbolSearchToolkitBuilder`` +- ``automata.experimental.tools.builders.document_oracle_builder.DocumentOracleOpenAIToolkitBuilder`` +- ``automata.experimental.tools.builders.agentified_search_builder.AgentifiedSearchOpenAIToolkitBuilder`` + +Example +------- + +The following example demonstrates how to access one of the enumerations +in the AgentToolkitNames Enum. + +.. code:: python + + from automata.agent.agent import AgentToolkitNames + + agent_tool = AgentToolkitNames.SYMBOL_SEARCH + print(agent_tool) # outputs: AgentToolkitNames.SYMBOL_SEARCH + print(agent_tool.value) # outputs: 'symbol-search' + +Limitations +----------- + +While the ``AgentToolkitNames`` enum offers a convenient way to maintain +a list of agent toolkit names, one limitation is that the associated +agent tools need to be implemented and their builders need to be located +in the ``automata/core/agent/builder/*`` directory. + +Follow-up Questions: +-------------------- + +- Is it possible to dynamically add new agent tool names to this enum + at runtime? +- How are builders associated with the agent tools and how are they + retrieved given an ``AgentToolkitNames`` value? diff --git a/docs/agent/index.rst b/docs/agent/index.rst index 9f9332dcb..de955effa 100644 --- a/docs/agent/index.rst +++ b/docs/agent/index.rst @@ -35,6 +35,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. @@ -47,6 +49,7 @@ Summary of content agent_max_iter_error agent_result_error agent_stop_iteration + agent_stop_iteration_error agent_task_general_error agent_task_git_error agent_task_instructions diff --git a/docs/agent/instances/index.rst b/docs/agent/instances/index.rst index d06cf9f37..f7468ad1b 100644 --- a/docs/agent/instances/index.rst +++ b/docs/agent/instances/index.rst @@ -17,6 +17,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/agent/open_ai_agent_toolkit_builder.rst b/docs/agent/open_ai_agent_toolkit_builder.rst index 836037eb6..099a1e85a 100644 --- a/docs/agent/open_ai_agent_toolkit_builder.rst +++ b/docs/agent/open_ai_agent_toolkit_builder.rst @@ -1,64 +1,86 @@ OpenAIAgentToolkitBuilder ========================= -``OpenAIAgentToolkitBuilder`` is an abstract class for building OpenAI -agent tools. It is used to define ``build_for_open_ai`` and -``can_handle`` as abstract methods. Developers intending to use OpenAI -for agents should subclass from ``OpenAIAgentToolkitBuilder`` and -provide implementations for these methods. - Overview -------- -Some classes that have implemented the ``OpenAIAgentToolkitBuilder`` -abstract class include -``automata.tools.builders.context_oracle.ContextOracleOpenAIToolkitBuilder``, -``automata.tools.builders.symbol_search.SymbolSearchOpenAIToolkitBuilder``, -and ``automata.tools.builders.py_writer.PyWriterOpenAIToolkitBuilder`` -among others. Every implementing class must define the -``build_for_open_ai`` method which returns a list of ``OpenAITool`` -objects, and the ``can_handle`` method which checks if the class can -handle a given tool manager. +``OpenAIAgentToolkitBuilder`` is an abstract class which means it cannot +be initialized itself. It is used to create robust classes that manage +and build tools for OpenAI agents. + +It’s an essential piece in building tools for different types of +toolkits. For instance, builders for SymbolSearch, DocumentOracle, +AgentifiedSearch and Python code reading are subclasses of +``OpenAIAgentToolkitBuilder``. Each of those specific builders are able +to construct a list of appropriate ``OpenAITool``\ ’s, which can be used +by the OpenAI agents for various purposes such as code symbol search, +document search using oracles, or executing tasks using Python code. + +The ``OpenAIAgentToolkitBuilder`` class contains two necessary methods: +- ``build_for_open_ai()``: An abstract method that, when implemented, +should build and return a list of ``OpenAITool`` objects. - +``can_handle()``: A class method that checks if the builder matches the +expected tool manager type. Related Symbols --------------- -- ``automata.llm.providers.openai.OpenAITool`` -- ``automata.agent.agent.AgentToolkitBuilder`` -- ``automata.tests.unit.test_automata_agent_builder.test_builder_creates_proper_instance`` -- ``automata.tools.builders.context_oracle.ContextOracleOpenAIToolkitBuilder`` -- ``automata.tools.builders.symbol_search.SymbolSearchOpenAIToolkitBuilder`` +- ``automata.llm.providers.openai_llm.OpenAITool``: A class that + represents a tool which can be used by the OpenAI agent. +- ``automata.experimental.tools.builders.symbol_search_builder.SymbolSearchToolkitBuilder``: + A class for interacting with the SymbolSearch API, to search indexed + python codebase. +- ``automata.tools.agent_tool_factory.AgentToolFactory``: The class is + responsible for creating tools from a given agent tool name. +- ``automata.tools.agent_tool_factory.AgentToolFactory.create_tools_from_builder``: + A static method that uses the Builder Registry to create tools from a + given agent tool name. +- ``automata.agent.agent.AgentToolkitNames``: An enum for the different + types of agent tools. Example ------- +Due to the fact that ``OpenAIAgentToolkitBuilder`` is an abstract class +and cannot be instantiated on its own, we will show the instantiation of +its subclass ``SymbolSearchToolkitBuilder``: + .. code:: python - from automata.tools.builders.context_oracle import ContextOracleOpenAIToolkitBuilder + from automata.experimental.tools.builders.symbol_search_builder import SymbolSearchToolkitBuilder + from automata.llm.providers.symbol_search import SymbolSearch - class MyOpenAIToolkitBuilder(ContextOracleOpenAIToolkitBuilder): - TOOL_TYPE = "my-type" + # initialize symbols search + symbol_search = SymbolSearch() - def build_for_open_ai(self): - # Create a list of OpenAITools here - openai_tools = [OpenAITool(...), OpenAITool(...)] - return openai_tools + # initialize builder + symbol_search_builder = SymbolSearchToolkitBuilder(symbol_search=symbol_search) -The above example creates a subclass of -``ContextOracleOpenAIToolkitBuilder`` that builds OpenAI tools for the -``"my-type"`` toolkit. +The ``build`` method can then be called on the symbol_search_builder +object to construct a set of tools for that purpose: + +.. code:: python + + tools = symbol_search_builder.build() Limitations ----------- -As ``OpenAIAgentToolkitBuilder`` is an abstract class, it cannot be -instantiated directly and must be subclassed with implementation -provided for the ``build_for_open_ai`` and ``can_handle`` methods. +The ``OpenAIAgentToolkitBuilder`` itself is an abstract class, which +requires it to be subclassed and its abstract methods to be implemented +in order to be fully utilized. This design may limit the flexibility for +direct usage. + +Another limitation revolves around ``OpenAIAgentToolkitBuilder.mro()``. +Since it is an abstract method, its implementation is dependent on what +is defined in subclasses. This implies that the tasks that the created +tools can perform are limited to the responsibilities defined by the +subclasses. -Follow-up Questions: --------------------- +Follow-up Questions +------------------- -- What happens if the subclass does not provide implementation for the - ``build_for_open_ai`` and ``can_handle`` methods? -- Can there be multiple ways to implement the ``build_for_open_ai`` - method or is there a particular way it should be done? +- What other concrete builders can be created by inheriting + ``OpenAIAgentToolkitBuilder``? +- How may additional functionality be added to the builder without + changing the current subclasses? diff --git a/docs/agent/open_ai_automata_agent.rst b/docs/agent/open_ai_automata_agent.rst index f7e57a50d..4be4ce83c 100644 --- a/docs/agent/open_ai_automata_agent.rst +++ b/docs/agent/open_ai_automata_agent.rst @@ -1,23 +1,77 @@ -- The ``OpenAIAutomataAgent`` uses internal mechanisms to handle errors - or timeouts. If the OpenAI API reports an error or timeout, the agent - captures this as an exception and handles it accordingly in the host - system. The exact handling might depend on the nature of the error. - For instance, the program may choose to retry the request, skip to - the next iteration, or terminate the process based on the error. - -- Extending the ``OpenAIAutomataAgent`` could potentially involve - augmenting it to handle multi-stage tasks, tasks with more complex - branching logic, or incorporating additional functionality such as - sending emails, interacting with databases, etc. This would entail - extending the class and adding new methods or modifying the existing - ``_run_iteration()`` method. However, such changes would need to - consider the implications on token usage, runtime, and other resource - constraints. - -- The tools in the context of ``OpenAIAutomataAgent`` refer to a set of - helper structures or external resources that assist the agent in - executing its instruction. For instance, these tools could include - APIs, databases, file systems, or other assets that the agent uses to - perform its tasks. They can extend the capabilities of the agent - beyond simple message generation, allowing it to interact with - external systems and execute more complex tasks. +OpenAIAutomataAgent +=================== + +``OpenAIAutomataAgent`` is a specific type of ``Agent`` tailored for +executing tasks using the OpenAI engine. This autonomous agent takes +instructions, performs actions based on them, and reports the results. +The interactions with the various tools are managed via the OpenAI API +for generating responses. + +Overview +-------- + +An instance of ``OpenAIAutomataAgent`` is initialized with a set of +``instructions`` along with a ``config`` object of type +``OpenAIAutomataAgentConfig`` detailng the operational parameters of the +agent. During its life cycle, the agent executes a series of iterations, +each of which consists of generating a new assistant message, processing +it, and incrementing an iteration counter. The agent also manages a +conversation with the OpenAI API, storing the history of user and +assistant messages. + +Usage Example +------------- + +In this sample, an ``OpenAIAutomataAgent`` is created with a specific +set of ``instructions`` and a configuration object: + +:: + + from automata.agent.openai_agent import OpenAIAutomataAgent + from automata.config.openai_config import OpenAIAutomataAgentConfigBuilder + + instructions = 'Translate the following English text to French: {TEXT_TO_TRANSLATE}.' + config = OpenAIAutomataAgentConfigBuilder.from_name('automata').with_stream(False).with_system_template_formatter({}).build() + + agent = OpenAIAutomataAgent(instructions, config) + agent.run() + +After instantiating the OpenAIAutomataAgent, the ``run`` method can be +called to start executing the task. + +Related Symbols +--------------- + +- ``automata.config.openai_config.OpenAIAutomataAgentConfig`` +- ``automata.llm.providers.openai_llm.OpenAITool`` +- ``automata.llm.providers.openai_llm.OpenAIChatMessage`` +- ``automata.memory_store.conversation_database_providers.OpenAIAutomataConversationDatabase`` +- ``automata.agent.agent.Agent`` +- ``automata.tasks.task_registry.AutomataTaskRegistry`` +- ``automata.tasks.automata_task.AutomataTask`` + +Limitations +----------- + +- While ``OpenAIAutomataAgent`` supports executing larger tasks + interactively over multiple iterations, exceeding the maximum number + of iterations or tokens raises an ``AgentStopIterationError``. +- Interactions with tools are currently only executed sequentially, + without support for hierarchical or parallel invocation of tools or + asynchronous tool execution. +- The agent only supports a conversation model where the user and + assistant take turns in communicating, with the assistant always + responding to the user’s prompts or responses to previous assistant + messages. + +Follow-up Questions: +-------------------- + +- Can ``OpenAIAutomataAgent`` be customized or extended for different + interaction models with the assistant? +- Very long instructions are currently truncated while generating + status notes. Would deeper support for long instructions be + beneficial? +- How would ``OpenAIAutomataAgent`` be used in a live, interactive + setting with real users providing inputs to the agent, as opposed to + pre-set instructions? diff --git a/docs/cli/custom_logger.rst b/docs/cli/custom_logger.rst new file mode 100644 index 000000000..511587296 --- /dev/null +++ b/docs/cli/custom_logger.rst @@ -0,0 +1,12 @@ +class CustomLogger(logging.Logger): ‘A custom logger which adheres to +input specifications.’ + +:: + + def __init__(self, name, level=logging.NOTSET): + super().__init__(name, level) + + def cli_output(self, message: str, *args, **kwargs) -> None: + 'Logs a message at the CLI_OUTPUT level.' + if self.isEnabledFor(CLI_OUTPUT_LEVEL): + self._log(CLI_OUTPUT_LEVEL, message, args, **kwargs) diff --git a/docs/cli/index.rst b/docs/cli/index.rst new file mode 100644 index 000000000..69fea070a --- /dev/null +++ b/docs/cli/index.rst @@ -0,0 +1,23 @@ +cli +=== + +**Automata** is a Python library for autonomous providers. + +Check out the :doc:`usage` section for further information, including +how to :ref:`installation` the project. + + + +.. AUTO-GENERATED CONTENT START +.. + + .. toctree:: + :maxdepth: 1 + + custom_logger + +.. AUTO-GENERATED CONTENT END +.. + + + diff --git a/docs/code_handling/index.rst b/docs/code_handling/index.rst index 812d0069b..f8c7d6643 100644 --- a/docs/code_handling/index.rst +++ b/docs/code_handling/index.rst @@ -17,6 +17,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/code_handling/py/index.rst b/docs/code_handling/py/index.rst index f3f49ea6a..efa1bfbf7 100644 --- a/docs/code_handling/py/index.rst +++ b/docs/code_handling/py/index.rst @@ -21,6 +21,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/code_handling/py/writer/index.rst b/docs/code_handling/py/writer/index.rst index 831c99472..309fa4987 100644 --- a/docs/code_handling/py/writer/index.rst +++ b/docs/code_handling/py/writer/index.rst @@ -19,6 +19,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/code_parsers/directory.rst b/docs/code_parsers/directory.rst index 96800f6ed..795110d1d 100644 --- a/docs/code_parsers/directory.rst +++ b/docs/code_parsers/directory.rst @@ -1,22 +1,75 @@ -- Implementing a recursive retrieval of files and subdirectories could - be done with a simple recursive function. If a directory contains - other directories, the function would call itself on those - subdirectories. For each subdirectory, the function will list its - files and subdirectories and then call itself again and so forth. -- The addition of a remove function depends on the expected use cases - of this class. If the directory class is simply meant to read the - structure of a pre-defined and unchanging directory, there is no need - for a removal function. However, if the class is meant to manage and - manipulate a directory’s structure, a removal function would be - important. -- The handling of symbolic links depends on the specific requirements - of the project. It would likely be user defined – some users may need - to treat them as files while others may need to treat them as - directories. -- For file/directory permissions or inaccessible files/directories, the - class should handle these cases gracefully. This could include - catching relevant exceptions and then either ignoring inaccessible - files/directories, logging an error message, or potentially even - asking the user for credentials to access these files/directories. - The specific implementation will largely depend on the needs of the - users. +Directory +========= + +Overview +-------- + +The ``Directory`` class represents a directory in a file tree system +within the context of an automata software. A directory is a container +for various types of nodes that can be other directories or files, which +can be nested to form a hierarchical structure. The ``Directory`` class +inherits from the ``Node`` class and includes additional methods for +managing child nodes, checking the type of the directory, and getting +all file names or subdirectory names. + +Features +-------- + +- Support for adding child nodes to a directory via ``add_child()`` + method. +- Ability to check if the directory is root directory + (``is_root_dir()``) or a leaf directory (``is_leaf_dir()``). +- Provides methods to fetch names of all files (``get_file_names()``) + or subdirectories (``get_subdirectories()``) in the directory. + +Related Symbols +--------------- + +- ``automata.code_parsers.directory.DirectoryManager`` +- ``automata.code_parsers.directory.File`` +- ``automata.code_parsers.directory.Node`` + +Example +------- + +Here is an example of how you can create and manage ``Directory`` +object. + +.. code:: python + + from automata.code_parsers.directory import Directory, File + + base_dir = Directory('base') + child_dir = Directory('child', base_dir) + base_dir.add_child(child_dir) + + test_file = File('test.txt', child_dir) + child_dir.add_child(test_file) + + print(base_dir.get_subdirectories()) # ['child'] + print(child_dir.get_file_names()) # ['test.txt'] + print(base_dir.is_root_dir()) # True + print(child_dir.is_leaf_dir()) # True + +This example creates a base directory as a root and then creates a child +directory within it. It adds a test file to the child directory, and +then verifies the child directory and test file exist under ‘base’ and +‘child’ respectively. It also checks if base is root directory and if +the child directory is a leaf directory. + +Limitations +----------- + +The ``Directory`` class does not have direct methods for file operation +such as file write or read, file deletion and has no provisions for +error handling in case of invalid operations like duplicating child node +names. + +Follow-up Questions: +-------------------- + +- What happens if we try to add two children with the same name to a + directory? +- How does the ``Directory`` class handle symbolic links? +- Does the ``Directory`` class support operations on hidden files and + directories? diff --git a/docs/code_parsers/directory_manager.rst b/docs/code_parsers/directory_manager.rst index 6f298faa2..22de9ae89 100644 --- a/docs/code_parsers/directory_manager.rst +++ b/docs/code_parsers/directory_manager.rst @@ -1,14 +1,88 @@ -- DirectoryManager doesn’t appear to handle threadsafety by itself. For - multi-threaded scenarios where the same directory could potentially - be accessed by different threads, synchronization mechanisms might - need to be employed outside of DirectoryManager to prevent issues - like race conditions. - -- As for the limitations on the size of the directory or depth of the - subdirectories, the main constraints would likely come from the file - system and the resources of the machine it’s running on. The Python - code itself doesn’t seem to place any explicit limits. Large - directories or deep nesting could potentially slow down operations, - and extremely large amounts might cause problems like stack - overflows. These situations could be mitigated by using methods like - iterative deepening if they become an issue. +DirectoryManager +================ + +Overview +-------- + +``DirectoryManager`` is a class designed to handle operations related to +directory structures. It provides a convenient interface for loading a +directory structure into memory as a structured object, getting file +names in a directory, getting subdirectories within a directory, +ensuring a directory exists by creating it if necessary, all by +manipulating and traversing the structured object. + +Initialising the DirectoryManager object involves loading a directory +structure from a specified base path. The loaded directory structure is +then available for various operations such as fetching the list of files +or subdirectories. + +Related Symbols +--------------- + +- ``_load_directory_structure``: a private method to load directory + structure into objects. +- ``get_files_in_dir``: a method to get a list of files in the given + directory. +- ``get_subdirectories``: a method to list subdirectories in the given + directory. +- ``ensure_directory_exists``: a method to create a directory if it + does not exist already. +- ``_get_node_for_path``: a utility method to find a node corresponding + to a given path. + +Usage Example +------------- + +Assuming a directory structure where ‘root_dir’ is the root directory +and ‘sub_dir1’ and ‘sub_dir2’ are subdirectories: + +:: + + root_dir + | + |------sub_dir1 + | | + | |--file1.txt + | + |------sub_dir2 + | + |--file2.txt + +The following example shows how to use the DirectoryManager: + +.. code:: python + + from automata.code_parsers.directory import DirectoryManager + + # Initialise DirectoryManager with a base directory + mgr = DirectoryManager('root_dir') + + # Get list of files in a directory + files_in_subdir1 = mgr.get_files_in_dir('sub_dir1') # returns ['file1.txt'] + + # Get list of subdirectories in a directory + subdirs_in_root = mgr.get_subdirectories('root_dir') # returns ['sub_dir1', 'sub_dir2'] + + # Ensure a directory exists + mgr.ensure_directory_exists('/root_dir/sub_dir3') # Creates 'sub_dir3' if it doesn't exist + +Limitations +----------- + +``DirectoryManager`` reads directories synchronously and may not be +ideal for large directory structures due to performance issues. Also, +changes to the file system aren’t automatically reflected by the +``DirectoryManager`` instance, unless ``_load_directory_structure`` is +called after the changes. No measures are built into to handle issues +with file permissions or broken symbolic links. Additionally, the class +is designed to operate only locally and does not support operations over +networked file systems. + +Follow-up Questions: +-------------------- + +- How can we extend ``DirectoryManager`` to support networked file + systems? +- Would it be possible to update the loaded directory structure in + real-time without having to manually call + ``_load_directory_structure`` after every change? diff --git a/docs/code_parsers/file.rst b/docs/code_parsers/file.rst index e57455907..224a50f6a 100644 --- a/docs/code_parsers/file.rst +++ b/docs/code_parsers/file.rst @@ -1,40 +1,6 @@ -1. FileNotFondError exceptions during File object initialization could - be handled in the constructor of the File class. Exceptions handling - can be used to check if a file exists before trying to open it. If - the file doesn’t exist, the function can either create it or alert - the user that the file wasn’t found. The appropriate action will - depend on the specific use-case requirements. +class File(Node): ‘Represents a file in the tree’ -.. code:: python +:: - import os - - class File: - def __init__(self, name, parent=None): - if not os.path.isfile(name): - raise FileNotFoundError(f"The file {name} does not exist.") - self.name = name - self.parent = parent - -2. To handle more advanced file and directory interactions, additional - methods could be added to the File class. Some potential enhancements - could include: - -- A method to read the file’s contents. -- Methods to change or retrieve the file’s permissions. -- A method to rename or move the file. -- A method to delete the file. -- Attribute to store the file’s size. -- Attribute to store the file’s creation, modification, and access - times. - -Each of these functionalities would require using appropriate system -calls, like ``os`` or ``shutil`` modules in Python. - -Note that if more advanced functionalities are needed, it might be more -appropriate to use or build upon existing libraries designed to interact -with the file system in a more comprehensive way, such as ``os``, -``shutil``, or ``pathlib`` in Python. - -Remember also to always consider security implications when dealing with -file operations, and properly handle any potential exceptions. + def __init__(self, name: str, parent: Optional['Node']=None) -> None: + super().__init__(name, parent) diff --git a/docs/code_parsers/index.rst b/docs/code_parsers/index.rst index 8737721f1..8f6074777 100644 --- a/docs/code_parsers/index.rst +++ b/docs/code_parsers/index.rst @@ -21,6 +21,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/code_parsers/node.rst b/docs/code_parsers/node.rst index d205ba87e..9e661345a 100644 --- a/docs/code_parsers/node.rst +++ b/docs/code_parsers/node.rst @@ -1,21 +1,7 @@ -1. ``Node`` provides a template that all derived classes must adhere to. - By defining required attributes and methods in the ``Node`` abstract - base class, it ensures any derived classes will have the necessary - functionality to be used as a node in the file tree. However, - concrete methods in the derived classes can be varied as per the - unique requirement of these classes, provided they do not violate the - structure outlined by ``Node``. +class Node(): ‘Abstract base class for a node in the file tree’ -2. The parent-child relationship between nodes is typically maintained - through references. Each ``Node`` has a ``parent`` attribute which - holds a reference to its parent node. When a ``Node`` is created, it - is assigned a ``parent``, and it may also add itself to the parent’s - list of children (if such functionality is implemented in the child - class). This creates a two-way link between the parent and child - nodes. When a ``Node`` is deleted, these links are typically also - removed to ensure the integrity of the tree. +:: - However, as the ``Node`` class is an abstract base class, it does not - directly handle these relationships - this would be the - responsibility of the concrete classes that inherit from ``Node``, - such as ``Directory`` or ``File``. + def __init__(self, name: str, parent: Optional['Node']=None) -> None: + self.name = name + self.parent = parent diff --git a/docs/code_parsers/py/context_processing/index.rst b/docs/code_parsers/py/context_processing/index.rst index 2aa010ce8..8cd2e62cf 100644 --- a/docs/code_parsers/py/context_processing/index.rst +++ b/docs/code_parsers/py/context_processing/index.rst @@ -8,6 +8,8 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/code_parsers/py/dot_path_map.rst b/docs/code_parsers/py/dot_path_map.rst index 595479c2b..79f9d70dd 100644 --- a/docs/code_parsers/py/dot_path_map.rst +++ b/docs/code_parsers/py/dot_path_map.rst @@ -1,24 +1,89 @@ -- For the first question, as ``DotPathMap`` is based on the filesystem, - it may not handle file or directory renaming or movements - automatically. Once a module is renamed or moved to a different - location, the mapper will lose track of it. If you are modifying the - codebase, you would likely need to re-instantiate the ``DotPathMap`` - to rebuild the map with the updated module structure. - -- As for the second question, based on the provided information, it - isn’t clear if ``DotPathMap`` directly supports live updates to - namespace changes in the filesystem. Given that it’s reliant on the - filesystem’s structure, it may not automatically track dynamic - changes including module additions, deletions or modifications. Any - such changes would likely require a re-instantiation to update the - map. - -- For the third question, a possible solution could be having a - separate ``DotPathMap`` instance for each application that references - the global module. However, it’s not stated explicitly how global or - installation-wide modules are handled. This would likely depend on - where these modules reside in the filesystem and how they’re - referenced within individual applications. Changes to these global - modules would need careful coordination to ensure all referencing - applications update their instances of ``DotPathMap`` to reflect - accurate paths. +DotPathMap +========== + +Overview +-------- + +``DotPathMap`` is a class that creates a mapping from module dot paths +to module file paths. The aim of this class is to facilitate the +efficient retrieval and manipulation of files within a project or module +hierarchy structure. This is achieved by creating a dictionary where the +keys are module dot paths and the values are corresponding file paths. + +The class initializes with a provided file path and project name, +converting the project name into a prefix replacing path separators with +dots. It then builds two dictionaries to store dot path to file path +mapping and vice versa. + +There are also methods to add (``put_module``) or remove +(``delete_module``) module information into/from these dictionaries. + +Related Symbols +--------------- + +- ‘automata.code_parsers.py.dotpath_map.DotPathMap.get_module_fpath_by_dotpath’ +- ‘automata.code_parsers.py.dotpath_map.DotPathMap.items’ +- ‘automata.code_parsers.py.dotpath_map.DotPathMap.contains_dotpath’ +- ‘automata.code_parsers.py.dotpath_map.DotPathMap.contains_fpath’ +- ‘automata.code_parsers.py.dotpath_map.DotPathMap.get_module_dotpath_by_fpath’ +- ‘automata.code_parsers.py.dotpath_map.convert_fpath_to_module_dotpath’ + +Example +------- + +The following is an example demonstrating the usage of ``DotPathMap``. +Assume a project structure as follows: + +:: + + my_project + │ main.py + │ + └───core + │ │ calculator.py + │ │ calculator2.py + │ │ + │ └───extended + │ │ calculator3.py + │ + └───utils + │ util1.py + +Now, to create a ``DotPathMap`` instance for this project structure, and +perform some operations: + +.. code:: python + + from automata.code_parsers.py.dotpath_map import DotPathMap + + # Create an instance of DotPathMap + dotpath_map = DotPathMap(path='/path/to/my_project', project_name='my_project') + + # Fetch file path of a module using dot path + file_path = dotpath_map.get_module_fpath_by_dotpath('my_project.core.calculator') + print(file_path) # Output: /path/to/my_project/core/calculator.py + + # Check if a dot path exists in the map + exists = dotpath_map.contains_dotpath('my_project.core.calculator') + print(exists) # Output: True + + # Remove a module using dot path + dotpath_map.delete_module('my_project.core.calculator') + + # Add a module using dot path + dotpath_map.put_module('my_project.new_module.new_file') + +Limitations +----------- + +The ``DotPathMap`` class assumes all files in the project are Python +files (‘.py’). It can’t map other file types. + +Follow-up Questions: +-------------------- + +- Will the design allow for other file types besides ‘.py’? +- What happens if identical module names exist in different + directories? +- Is there a way to reload the DotPathMap in the event of changes to + the filesystem? diff --git a/docs/code_parsers/py/index.rst b/docs/code_parsers/py/index.rst index 4514da013..7da50f17c 100644 --- a/docs/code_parsers/py/index.rst +++ b/docs/code_parsers/py/index.rst @@ -23,6 +23,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. @@ -36,6 +38,7 @@ Summary of content dot_path_map line_item py_context_retriever + py_reader context_processing/index .. AUTO-GENERATED CONTENT END diff --git a/docs/code_parsers/py/py_reader.rst b/docs/code_parsers/py/py_reader.rst new file mode 100644 index 000000000..dfb3ee2f3 --- /dev/null +++ b/docs/code_parsers/py/py_reader.rst @@ -0,0 +1,75 @@ +PyReader +======== + +Overview +-------- + +The ``PyReader`` class is designed to fetch Python code from specified +modules, classes, or methods. It provides comprehensive tools for +scanning and extracting requested information, and supports a range of +operations including fetching source code, extracting docstrings, and +retrieving source code without docstrings for specified sections of +code. + +The class also includes comparison operations and contains static +utility methods for extracting docstrings from specific AST nodes. + +Related Symbols +--------------- + +- ``automata.code_writers.py.py_code_writer.PyCodeWriter`` : A utility + class for writing Python code along AST nodes +- ``automata.tools.builders.py_reader_builder.PyReaderToolkitBuilder``: + A builder for a toolkit associated with directly retrieving python + code. +- ``automata.symbol.symbol_parser.parse_symbol``: Parses a ``Symbol`` + given a ``Symbol`` URI. +- ``automata.code_writers.py.py_code_writer.PyCodeWriter.__init__``: + Initializer for PyCodeWriter. +- ``automata.tools.builders.py_reader_builder.PyReaderToolkitBuilder.__init__``: + Initializer for PyReaderToolkitBuilder. +- ``automata.symbol.symbol_base.Symbol``: Base class for symbols. +- ``automata.code_writers.py.py_doc_writer.PyDocWriter``: Class for + writing documentation for Python modules. +- ``automata.experimental.code_parsers.py.context_processing.context_retriever.ContextComponent``: + Enum class representing context components. +- ``automata.tools.tool_base.Tool``: Exposes a function or coroutine + directly. +- ``automata.eval.agent.code_writing_eval.CodeWritingAction``: + Represents written code. + +Example +------- + +Below is an example of how to use the ``PyReader`` class to retrieve the +source code and docstring from a module. + +.. code:: python + + from automata.code_parsers.py.py_reader import PyReader + + # Initialize the PyReader class + pyreader = PyReader() + + # Get the source code from a specific module + source_code = pyreader.get_source_code('sample_module') + + # Get the docstring of a specific class in a module + docstring = pyreader.get_docstring('sample_module', 'SampleClass') + +Limitations +----------- + +While ``PyReader`` provides extensive functionality for extracting +Python code and metadata, it is reliant on the specific syntax tree +structure of Python and as such may not correctly interpret non-standard +or complex code structures. + +Follow-up Questions: +-------------------- + +- Is it possible to expand ``PyReader`` to account for non-standard + Python structures? +- How does ``PyReader`` handle errors or exceptions when the specified + code cannot be found? Are there specific tools for debugging such + situations with ``PyReader``? diff --git a/docs/code_writers/index.rst b/docs/code_writers/index.rst index 1430a3fc6..b70a2330b 100644 --- a/docs/code_writers/index.rst +++ b/docs/code_writers/index.rst @@ -8,6 +8,8 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/code_writers/py/code_writer/index.rst b/docs/code_writers/py/code_writer/index.rst index a8c46ecb9..56dfbc86f 100644 --- a/docs/code_writers/py/code_writer/index.rst +++ b/docs/code_writers/py/code_writer/index.rst @@ -8,6 +8,8 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/code_writers/py/index.rst b/docs/code_writers/py/index.rst index 0b0eb9b78..5077ddcca 100644 --- a/docs/code_writers/py/index.rst +++ b/docs/code_writers/py/index.rst @@ -8,12 +8,16 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. .. toctree:: :maxdepth: 1 + py_code_writer + py_doc_writer code_writer/index py_code_writer/index diff --git a/docs/code_writers/py/py_code_writer.rst b/docs/code_writers/py/py_code_writer.rst new file mode 100644 index 000000000..da519a041 --- /dev/null +++ b/docs/code_writers/py/py_code_writer.rst @@ -0,0 +1,88 @@ +PyCodeWriter +============ + +``PyCodeWriter`` is a Python utility class designed to interpret and +manipulate Python code in the form of Abstract Syntax Tree (AST) nodes. +It enables creating, updating, deleting, and writing Python modules +using AST, providing an interface for programmatically generating and +modifying Python source code. + +Overview +-------- + +``PyCodeWriter`` uses the Python built-in ``ast`` module and +interactions with ``py_module_loader``, an instance of the +``PyModuleLoader`` class, for operations. This class consists of methods +that map to typical file operations. Unique exceptions labeled +``ModuleNotFoundError``, ``StatementNotFoundError`` and +``InvalidArgumentsError``, each provide specific error handling for +common pitfalls. + +Related Symbols +--------------- + +- ``PyCodeWriter.ModuleNotFoundError`` +- ``PyCodeWriter.StatementNotFoundError`` +- ``PyCodeWriter.InvalidArgumentsError`` +- ``automata.tools.builders.py_writer_builder.PyCodeWriterToolkitBuilder`` +- ``automata.singletons.dependency_factory.DependencyFactory`` + +Example Usage +------------- + +.. code:: python + + from automata.singletons.dependency_factory import DependencyFactory + from automata.code_writers.py.py_code_writer import PyCodeWriter + from ast import parse + + # Create an instance of PyCodeWriter + dep_factory = DependencyFactory() + py_writer = dep_factory.create_py_writer() + + # Create a new module with a function "foo" + source_code = """ + def foo(): + return 'Hello, world!' + """ + + py_writer.create_new_module('sample_module', parse(source_code), do_write=True) + + # Update the module "foo" function logic + source_code_update = """ + def foo(): + return 'Hello from updated world!' + """ + py_writer.upsert_to_module( + parse(source_code), + parse(source_code_update) + ) + +In this example, the method ``create_new_module`` was used to create a +new Python module ``sample_module`` with a function ``foo``. Following +this, the function ``foo``\ ’s logic was updated with +``upsert_to_module`` to change its return string. + +Limitations +----------- + +``PyCodeWriter`` has strong dependencies on the project and file +structure. It requires the modules to be setup in a specific way. As +such, it may not work accurately if the project structure is not aligned +with its expectations. + +Also, ``PyCodeWriter`` heavily relies on the ``ast`` module. It would +not be effective for changes not supported by the ``ast`` module. + +In certain operations, such as ``delete_from_module``, there’s a need +for deletion items to already exist in the module, else it will throw an +error. Care must be taken to ensure the preconditions for each operation +are met before execution. + +Follow-up Questions: +-------------------- + +- Are there safeguards for handling common user errors such as + attempting to modify a non-existent file or node? +- How can this class be extended/modified to support different file and + module structures? diff --git a/docs/code_writers/py/py_code_writer/index.rst b/docs/code_writers/py/py_code_writer/index.rst index 5f0b51745..5cb4be29d 100644 --- a/docs/code_writers/py/py_code_writer/index.rst +++ b/docs/code_writers/py/py_code_writer/index.rst @@ -8,6 +8,8 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. @@ -15,8 +17,10 @@ how to :ref:`installation` the project. :maxdepth: 1 invalid_arguments + invalid_arguments_error module_not_found statement_not_found + statement_not_found_error .. AUTO-GENERATED CONTENT END .. diff --git a/docs/code_writers/py/py_code_writer/invalid_arguments_error.rst b/docs/code_writers/py/py_code_writer/invalid_arguments_error.rst new file mode 100644 index 000000000..a09d2763e --- /dev/null +++ b/docs/code_writers/py/py_code_writer/invalid_arguments_error.rst @@ -0,0 +1,2 @@ +class InvalidArgumentsError(Exception): ‘Raised when invalid arguments +are passed to a method’ pass diff --git a/docs/code_writers/py/py_code_writer/statement_not_found_error.rst b/docs/code_writers/py/py_code_writer/statement_not_found_error.rst new file mode 100644 index 000000000..1e1b440f6 --- /dev/null +++ b/docs/code_writers/py/py_code_writer/statement_not_found_error.rst @@ -0,0 +1,2 @@ +class StatementNotFoundError(Exception): ‘Raised when a provided +ast.Statement is not found in the module’ pass diff --git a/docs/code_writers/py/py_doc_writer.rst b/docs/code_writers/py/py_doc_writer.rst new file mode 100644 index 000000000..62ea5cb6a --- /dev/null +++ b/docs/code_writers/py/py_doc_writer.rst @@ -0,0 +1,81 @@ +PyDocWriter +=========== + +``PyDocWriter`` is a Python class built to facilitate automated +documentation writing for Python modules. It is capable of generating +RestructuredText (``.rst``) files for each module, creating an +``index.rst`` file for each directory that contains subdirectories or +``.rst`` files, and producing a summary of the whole module. + +Overview +-------- + +The ``PyDocWriter`` class simplifies the process of generating +module-level summaries and documentation. Given a base path of a Python +project, the class scans through each directory and generates ``.rst`` +files using the content read from each of these files. Summaries are +created not just for individual modules, but also for entire directories +based on the existing ``.rst`` files. The class also ensures that the +directory structure is updated appropriately during this process. + +Some key features of this class include: + +- **Flexibility**: ``PyDocWriter`` can work with any Python project + simply by supplying the project’s base path during the class + instantiation. +- **Efficiency**: The class scans and processes multiple directories + and files in a project concurrently, resulting in faster + documentation. +- **Detailed documentation**: ``PyDocWriter`` generates not only + individual ``.rst`` files for each module but also an ``index.rst`` + file that serves as a summary documentation for all items within a + directory. + +Related Symbols +--------------- + +- ``Symbol``: Represents a detected Python code symbol in a Python + module or script for which documentation is to be generated. +- ``SymbolDocEmbedding``: A data structure representing an embedded + documentation of a ``Symbol``. +- ``DirectoryManager``: A utility class used by ``PyDocWriter`` for + directory management. + +Usage Example +------------- + +The following example demonstrates how to use the ``PyDocWriter`` class: + +.. code:: python + + from automata.code_writers.py.py_doc_writer import PyDocWriter + + docs_dir = '/path/to/docs' + base_path = '/path/to/project' + symbols = [] # This would ordinarily be a list of Symbol instances + docs = {} # This would ordinarily be a mapping of Symbol instances to SymbolDocEmbedding instances + + writer = PyDocWriter(base_path) + writer.write_documentation(docs, symbols, docs_dir) + +Limitations +----------- + +``PyDocWriter`` currently has a few limitations: + +- The efficiency of the class heavily relies on the filesystem I/O. + Therefore, the speed of the documentation generation process may vary + across different environments. +- The ``generate_summary`` method is not implemented yet. This method + is expected to generate a summary from the content provided. +- The class assumes that all ``.rst`` files in a directory correspond + to the same module. This assumption might not always hold. + +Follow-up Questions: +-------------------- + +- What is the expected behavior of the ``generate_summary`` method? +- How could this tool be improved to handle different project + structures where .rst files might not correspond to the same module? +- Could there be performance improvements in the file reading process + that would make the documentation generation more efficient? diff --git a/docs/config/agent_config_builder.rst b/docs/config/agent_config_builder.rst index 086e189fa..1d93560b6 100644 --- a/docs/config/agent_config_builder.rst +++ b/docs/config/agent_config_builder.rst @@ -1,73 +1,105 @@ AgentConfigBuilder ================== -``AgentConfigBuilder`` is a builder class that helps in the creation of -Agent configurations. It extends the generic type ``T`` and requires the -implementor to implement methods for creating the specific configuration -object and associating the correct model with the agent. +The ``AgentConfigBuilder`` class in the ``automata.config.config_base`` +module is an abstract base class used to build configuration objects +used by agents. In this context, agent refers to the Agent instances in +the automata system - this could include symbol search agents, python +code retrieval agents, and more. Overview -------- -``AgentConfigBuilder`` primarily functions by taking an optional -``config`` object, upon instantiation which defaults to the result of -the ``create_config`` function if not provided. The configuration object -can be constructed from scratch or from existing configurations by using -the ``from_config`` or ``from_name`` methods, respectively. +The ``AgentConfigBuilder`` helps in setting up agent configurations +through its various methods that allow adding or modifying properties +such as model, tools, stream, verbosity, max iterations, tokens and +temperature. The built configuration is used to tailor the functionality +of an agent. Setting different configurations can affect how the agent +performs and functions. -This configuration builder also has the capability to set specific -parameters related to the Agent including the tools it will use, the -model it should run, whether it will stream output, verbosity of -logging, maximum iterations the agent should run, and others. The -validity of all parameter types is thoroughly checked before being -updated in the builder configuration. +This class is intended to be subclassed, with the subclasses providing +specific implementation for particular types of agents. As such, some +methods (such as ``create_config``\ and ``with_model``) are abstract and +require a concrete implementation in the subclass. Related Symbols --------------- -- ``automata.tests.unit.test_automata_agent_builder.test_builder_default_config`` -- ``automata.config.openai_agent.OpenAIAutomataAgentConfigBuilder`` -- ``automata.tests.unit.test_automata_agent_builder.test_builder_creates_proper_instance`` -- ``automata.tests.unit.test_automata_agent_builder.test_builder_provided_parameters_override_defaults`` -- ``automata.agent.agent.AgentInstance.Config`` -- ``automata.tests.unit.test_automata_agent_builder.test_builder_accepts_all_fields`` -- ``automata.agent.instances.OpenAIAutomataAgentInstance.Config`` -- ``automata.config.base.AgentConfigName`` -- ``automata.tools.base.Tool`` +- ``automata.symbol.graph.symbol_graph.SymbolGraph``: The graph + containing the symbols and relationships between them. +- ``automata.embedding.embedding_base.EmbeddingBuilder``: An abstract + class to build embeddings. +- ``automata.code_writers.py.py_code_writer.PyCodeWriter``: A utility + class for writing Python code along AST nodes. +- ``automata.tools.builders.py_reader_builder.PyReaderToolkitBuilder``: + A class for interacting with the PythonIndexer API, which provides + functionality to retrieve python code. -Usage Example -------------- +Example +------- -.. code:: python - - from automata.config.openai_agent import OpenAIAutomataAgentConfigBuilder - from automata.config.base import AgentConfigName +As AgentConfigBuilder is an abstract base class, we cannot create an +instance of it directly. Instead, we will show an example of a +hypothetical subclass named ``AutomataAgentConfigBuilder``. - # Using builder to construct with default config - builder_default_config = OpenAIAutomataAgentConfigBuilder() - config_default = builder_default_config.build() - - # Using builder to construct with existing config - builder_from_config = OpenAIAutomataAgentConfigBuilder.from_config(config_default) - config_from_config = builder_from_config.build() +.. code:: python - # Using builder to construct from named config - builder_from_name = OpenAIAutomataAgentConfigBuilder.from_name(AgentConfigName.TEST) - config_from_name = builder_from_name.build() + from automata.config.config_base import AgentConfigBuilder + from typing import TypeVar, Optional + from automata.singletons.tokenizer.single_tokenizer import SingleTokenizer + + T = TypeVar('T') + + class AutomataAgentConfigBuilder(AgentConfigBuilder[T]): + + def create_config(self, config_name: Optional[str]=None) -> T: + + # In this hypothetical example, the create_config method + # returns an instance of a hypothetical AutomataAgentConfig. + return AutomataAgentConfig(config_name) + + def with_model(self, model: str) -> 'AutomataAgentConfigBuilder': + + # In this example, the 'model' attribute may determine + # the internal workings of the AutomataAgentConfig. + self._config.model = model + return self + + # Usage: + builder = AutomataAgentConfigBuilder() + config = (builder.with_model("model_v1") + .with_stream(True) + .with_max_iterations(100) + .build()) + +This example demonstrates how a subclass of ``AgentConfigBuilder`` could +be implemented and used. The ``AutomataAgentConfigBuilder`` implements +the ``create_config`` and ``with_model`` methods specific to its needs. +When building the ``AutomataAgentConfig``, the ``with_model`` method is +used to specify the model and the ``with_stream``, +``with_max_iterations`` methods are used to specify other attributes. Limitations ----------- -The builder pattern, while providing a clean API, can lead to over -complicated code since each attribute is set individually. Be careful of -overusing builders and consider passing a single object with many -parameters. This can also make it more difficult to understand as the -logical groups of parameters can be broken up. +Since ``AgentConfigBuilder`` is an abstract base class, it cannot be +used on its own and requires subclasses to provide implementations for +the ``create_config`` and ``with_model`` methods. It is also tightly +coupled to the structure and functionality of Agent objects and other +related entities in the ``automata`` system. Follow-up Questions: -------------------- -- Is there a way to populate the builder with a group of related - parameters at once? -- How can we ensure each attribute is being updated in a consistent - manner? +- What specific Agent configurations are required in the different + subclasses of AgentConfigBuilder? +- Are there any restrictions in setting up the configurations - should + properties be set in a certain sequence or are there any dependencies + among properties? + +This documentation was written based on the provided context in code +comments, method signatures, related tests and related symbols. Without +actual source code or sample responses, the specifics of method +implementations and returned results are hypothetical. Further +clarification may be necessary for a complete understanding of the class +and its use. diff --git a/docs/config/agent_config_name.rst b/docs/config/agent_config_name.rst index e2b89abe2..c0139ddd1 100644 --- a/docs/config/agent_config_name.rst +++ b/docs/config/agent_config_name.rst @@ -1,20 +1,4 @@ -- To add new configuration files and ensure they can be loaded with - ``AgentConfigName``, follow these steps: - - - The new configuration file should be added to the - ``automata/config/agent/`` directory. - - Update the ``AgentConfigName`` enumeration class with a new - enumeration value that is the name of the new file (excluding the - ``.yaml`` extension). - - In the ``AgentConfig`` class, update the ``load`` classmethod to - properly load the new configuration file based on the given - enumeration value. - -- Configurations are typically validated by using a validation schema - or a set of rules defined in the code. An example can be checking for - required fields, ensuring that values are within a certain range, or - that they adhere to a specific format. If a configuration does not - match these set rules, an error is thrown when attempting to load it. - This process is not detailed for ``AgentConfigName`` specifically but - generally happens in conjunction with the ``AgentConfig`` - implementation. +class AgentConfigName(PathEnum): ’:raw-latex:`\n `AgentConfigName: +Enum of agent config names.:raw-latex:`\n `Corresponds files in +automata/config/agent/\*.yaml:raw-latex:`\n `’ DEFAULT = ‘default’ +TEST = ‘test’ AUTOMATA_MAIN = ‘automata-main’ diff --git a/docs/config/base/index.rst b/docs/config/base/index.rst index cfe60cad4..eb2350c56 100644 --- a/docs/config/base/index.rst +++ b/docs/config/base/index.rst @@ -17,6 +17,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/config/config_base/config.rst b/docs/config/config_base/config.rst index d850f3968..5b90f5292 100644 --- a/docs/config/config_base/config.rst +++ b/docs/config/config_base/config.rst @@ -1,24 +1,2 @@ -1. The ``arbitrary_types_allowed = True`` within the class definition - allows BaseModel (from pydantic) to serialize and validate any type - of properties within the data model. This means you can use types in - your model fields that Pydantic’s BaseModel would not be able to - handle out of the box. This grants developers a good deal of - flexibility when defining their data models. - -2. The design of the ``AgentConfig`` class might be improved to provide - more straightforward and flexible configuration by employing the - Strategy Pattern. The ‘load’, ‘setup’, and ‘get_llm_provider’ methods - could be encapsulated inside a Strategy interface, with subclasses - implementing these methods according to their specific configuration - needs. This way, each AgentConfig can dynamically select its - strategy, simplifying the customisation process and eliminating the - need to continuously create subclasses for each new AgentConfig - scenario. - -3. ``Automata`` is a Python library that employs a command system to - interact with ‘intelligent agents’, which execute tasks on behalf of - automated systems. The name ``Automata`` is inspired from the - theoretical concept in computer science and formal language theory - called ‘automata theory’, which studies abstract machines and - automata, as well as the computational problems that can be solved - using them. +class Config(): arbitrary_types_allowed = True provider = +LLMProvider.OPENAI diff --git a/docs/config/config_base/index.rst b/docs/config/config_base/index.rst index 8e419bfa6..b4545b315 100644 --- a/docs/config/config_base/index.rst +++ b/docs/config/config_base/index.rst @@ -8,6 +8,8 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/config/config_category.rst b/docs/config/config_category.rst index 46fc22990..41a83feb6 100644 --- a/docs/config/config_category.rst +++ b/docs/config/config_category.rst @@ -1,40 +1,5 @@ -- To add a new category in ``ConfigCategory``, you simply define a new - attribute in the ``ConfigCategory`` Enum class and assign it a unique - value. Following is an example of how to do this: - -:: - - from enum import Enum - - class ConfigCategory(Enum): - AGENT = 1 - PROMPT = 2 - SYMBOL = 3 - INSTRUCTION = 4 - NEWCATEGORY = 5 # new category - -- As for the situation where there is a need for a sub-category, - ``ConfigCategory`` does not directly support sub-categories. If there - is a need for this, it may be an indicator that your code needs to be - restructured instead. However, you could potentially implement this - by storing another Enum as the value of a category. This might look - like this: - -:: - - from enum import Enum - - class Subcategory(Enum): - SUBCATEGORY1 = 1 - SUBCATEGORY2 = 2 - - class ConfigCategory(Enum): - AGENT = 1 - PROMPT = 2 - SYMBOL = 3 - INSTRUCTION = 4 - NEWCATEGORY = Subcategory # new category with sub-categories - -Keep in mind though that this would be somewhat unconventional and might -make the code harder to understand. It would be better to consider other -ways to structure your code if sub-categories are necessary. +class ConfigCategory(PathEnum): ’:raw-latex:`\n `A class to represent +the different categories of configuration +options:raw-latex:`\n `Corresponds folders in +automata/configs/\*:raw-latex:`\n `’ AGENT = ‘agent’ PROMPT = +‘prompt’ SYMBOL = ‘symbol’ INSTRUCTION = ‘instruction-configs’ diff --git a/docs/config/embedding_data_category.rst b/docs/config/embedding_data_category.rst index a58c8d801..c6a676614 100644 --- a/docs/config/embedding_data_category.rst +++ b/docs/config/embedding_data_category.rst @@ -1,23 +1,5 @@ -1. The ``EmbeddingDataCategory`` class can be improved to be more - flexible with both the location and type of the embedding data by - utilizing dynamic configuration loading. For instance, there can be a - configuration file that lists the embedding categories and their - corresponding directories. The class would then read this - configuration file, and set its constants accordingly. - -2. Incorporating new types of embedding data categories can be achieved - programmatically if a dynamic configuration approach is used as - described above. Each time a new category is added to the - configuration file, the class would automatically detect it and make - it available for use in the rest of the system. - -3. Yes, it is feasible to add a functionality for automatic detection of - categories based on the folder structure within - ``automata/configs/*``. This approach would involve iterating over - the folders in the directory, and creating a category for each one. - However, this method has its own challenges, as it requires that the - naming and structure of the folders follow a strict convention. - Additionally, it might be less efficient than the current approach if - there are a large number of folders or nested directories to - traverse. It could also potentially introduce unintended categories - if not handled correctly. +class EmbeddingDataCategory(PathEnum): ’:raw-latex:`\n `A class to +represent the different categories of configuration +options:raw-latex:`\n `Corresponds folders in +automata/configs/\*:raw-latex:`\n `’ CODE_EMBEDDING = +‘code-embedding’ DOC_EMBEDDING = ‘doc-embedding-l2’ INDICES = ‘indices’ diff --git a/docs/config/index.rst b/docs/config/index.rst index 3d36712b0..f1f0815da 100644 --- a/docs/config/index.rst +++ b/docs/config/index.rst @@ -29,6 +29,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. @@ -42,9 +44,11 @@ Summary of content embedding_data_category instruction_config_version llm_provider + model_information open_ai_automata_agent_config open_ai_automata_agent_config_builder path_enum + serialized_data_category template_formatter base/index config_base/index diff --git a/docs/config/instruction_config_version.rst b/docs/config/instruction_config_version.rst index c0698abd1..7f3f1b121 100644 --- a/docs/config/instruction_config_version.rst +++ b/docs/config/instruction_config_version.rst @@ -1,14 +1,5 @@ -- New versions can be added to the ``InstructionConfigVersion`` enum by - adding a new enum member in the definition of - ``InstructionConfigVersion``. This will correspond to a filename in - ``automata/configs/instruction_configs/``. -- Changes in instruction versions potentially can have effects on the - Automated Agent’s performance. If the instructions in a new version - differ significantly from the old version, the agent might behave - differently. Hence, whenever a new version is incorporated, a - thorough testing and tuning of the agent might be necessary. -- If an instruction version does not have a corresponding YAML - configuration file, an error will likely be raised when the agent - tries to use that instruction set. The application won’t be able to - locate the file and will fail to start properly. Thus, it’s crucial - to ensure that the necessary files exist when creating new versions. +class InstructionConfigVersion(PathEnum): +’:raw-latex:`\n `InstructionConfigVersion: Enum of instruction +versions.:raw-latex:`\n `Corresponds files in +automata/configs/instruction_configs/\*.yaml:raw-latex:`\n `’ +AGENT_INTRODUCTION = ‘agent-introduction’ diff --git a/docs/config/llm_provider.rst b/docs/config/llm_provider.rst index 228093500..c3ec0bcab 100644 --- a/docs/config/llm_provider.rst +++ b/docs/config/llm_provider.rst @@ -1,38 +1 @@ -To address the Follow-up Questions: - -1. To decouple an LLM Provider when a new provider needs to be added, - one potential method is to implement a factory pattern. This pattern - allows the creation of objects without exposing the creation logic to - the client and use a common interface. This gives the flexibility to - add new providers while keeping the rest of the system unaware of the - specific types of providers. - -2. Using a factory pattern or a registration mechanism can allow for - more flexibility in accepting different providers without modifying - the enumeration class. Instead of defining the providers in the enum, - a method can be created to register new providers. Each provider - could be implemented with a unique identifier string, which can be - used in place of the enum. - -3. ``LLMProvider`` parameter is used in the configuration of an - AutomataAgent to specify the provider for performing language model - tasks. Below is a high-level example of how it could be used: - -.. code:: python - - from automata.config.config_base import AgentConfig, LLMProvider - - config = AgentConfig( - client_id='xyz', - client_access_token='123', - LLM_provider=LLMProvider.OPENAI - ) - - - agent = AutomataAgent(config) - -In the above example, the ``LLMProvider`` parameter in the -``AgentConfig`` tells the ``AutomataAgent`` that “when I have a task -that requires the use of the Language Model (LLM), use the provider -specified (OPENAI in this case)”. This could translate to different API -calls depending on the provider selected. +class LLMProvider(PathEnum): OPENAI = ‘openai’ diff --git a/docs/config/model_information.rst b/docs/config/model_information.rst new file mode 100644 index 000000000..e9b387762 --- /dev/null +++ b/docs/config/model_information.rst @@ -0,0 +1,3 @@ +@dataclass class ModelInformation(): ‘A class to represent the model +information’ prompt_token_cost: float completion_token_cost: float +abs_max_tokens: int diff --git a/docs/config/open_ai_automata_agent_config.rst b/docs/config/open_ai_automata_agent_config.rst index 6a814d062..7894bed56 100644 --- a/docs/config/open_ai_automata_agent_config.rst +++ b/docs/config/open_ai_automata_agent_config.rst @@ -1,82 +1,82 @@ OpenAIAutomataAgentConfig ========================= -``OpenAIAutomataAgentConfig`` is an agent configuration class for the -Automata OpenAI Agent. It extends the ``AgentConfig`` abstract base -class and is designed to hold the configuration settings for the OpenAI -agent. These settings include the agent’s system template, template -variables and formatter, instruction version, system instruction, and -other configuration parameters. +``OpenAIAutomataAgentConfig`` is a configuration class for Automata +agents interacting with the OpenAI API. This class extends the +``AgentConfig`` base class and provides specific configurations related +to the OpenAI API. Overview -------- -The ``OpenAIAutomataAgentConfig`` class defines the necessary -configuration settings for an OpenAI-powered Automata agent. The -parameters for this agent configuration include the system template, -system template variables, instruction version, and a system -instruction, among others. This class also includes a ``setup`` method -that ensures the necessary class attributes are properly initialized. - -The class also consists of a ``TemplateFormatter`` and a ``load`` -method. The ``TemplateFormatter`` is a static class that provides a -method to create a default formatter for the given configuration while -the ``load`` method is used to load the configuration for an agent based -on the given ``config_name``. - -However, usage of the class revolves generally around initializing it, -calling the ``setup`` and ``load`` methods when necessary, and utilizing -the configuration in an OpenAI Automata Agent. - -Related Symbols ---------------- - -- ``automata.tests.unit.test_automata_agent_builder.test_builder_creates_proper_instance``: - This method tests whether the config builder can correctly create an - instance of the OpenAIAutomataAgentConfig class. -- ``automata.tests.conftest.automata_agent_config_builder``: This is a - fixture that provides a builder for the OpenAIAutomataAgentConfig - class. -- ``automata.agent.providers.OpenAIAutomataAgent``: This class utilizes - the OpenAIAutomataAgentConfig for operational settings during its - instantiation and behavior. -- ``automata.tests.unit.test_automata_agent_builder.test_automata_agent_init``: - This method tests the initialization of the AutomataAgent with the - corresponding configuration. - -Example -------- - -Here is an example of creating an instance of -``OpenAIAutomataAgentConfig`` using a predefined configuration name. +The purpose of the ``OpenAIAutomataAgentConfig`` class is to maintain +all configuration settings required for automata agents talking to +OpenAI. It stores key information like system templates, template +variables, and provides ways to perform necessary setup and +configuration loading. + +Attributes +---------- + +- ``system_template (str)``: A string template that guides the initial + message of the system. +- ``system_template_variables (List[str])``: A list of string variable + names indicating the placeholders in the system template. +- ``system_template_formatter (Dict[str, str])``: A dictionary that + formats the system template. +- ``instruction_version (InstructionConfigVersion)``: The instruction + configuration version. +- ``system_instruction (Optional[str])``: System instruction. + +These configurations control the interaction of an automata agent with +OpenAI API in terms of conversation initiation, system instructions and +more. + +Methods +~~~~~~~ + +- ``setup() -> None``: Performs setup for the agent such as computing + session_id, setting setup the ``system_template_formatter`` and + creating system instructions. +- ``load(config_name: AgentConfigName) -> OpenAIAutomataAgentConfig``: + Class method to load the configuration for the agent. +- ``get_llm_provider() -> LLMProvider``: Provides the type of + LLMProvider that the agent uses. +- ``_formatted_instruction() -> str``: Transforms the system template + into a system instruction. + +Example Usage +------------- + +The following code shows how to load an ``OpenAIAutomataAgentConfig`` +configuration for the ``DEFAULT`` agent: .. code:: python - from automata.config.openai_agent import OpenAIAutomataAgentConfig - from automata.config.base import AgentConfigName + from automata.config.openai_config import OpenAIAutomataAgentConfig + # Load default configuration for OpenAI Automata Agent + config = OpenAIAutomataAgentConfig.load(AgentConfigName.DEFAULT) - config_name = AgentConfigName.AUTOMATA_MAIN - config = OpenAIAutomataAgentConfig.load(config_name) +Further manipulation of the config object allows customization of the +agent’s behaviors: + +.. code:: python + + # Change system template + config.system_template = 'This is a template with {variable}.' Limitations ----------- -The main limitation of the ``OpenAIAutomataAgentConfig`` is its -dependence on specific configurations and enum values. It’s currently -limited to a set of supported models, and the ``load`` method relies on -the ``AgentConfigName`` enum for loading the configuration. Hence, -custom configuration beyond these narrow bounds may not be possible. -Furthermore, it’s particularly critical that the keys in the -``system_template_formatter`` match exactly with -``system_template_variables``, resulting in a potential source of error -if not met. - -Follow-up Questions: --------------------- - -1. Is there a way to extend the list of supported models? -2. How can we ensure safe and error-free usage with the stipulation of - the exact match between ``system_template_formatter`` and - ``system_template_variables``? -3. Can functionality be expanded to accept custom configurations beyond - the listed ``AgentConfigName`` values? +``OpenAIAutomataAgentConfig`` cannot be initialized with arbitrary +configuration values, and must be loaded from a predefined selection +(``AgentConfigName``). This can limit customization potential as any +additional configurations will have to be manually added after loading. + +Follow-up Questions +------------------- + +- How can users add bespoke configuration settings not covered under + the ``AgentConfigName`` enumeration? +- Can there be a mechanism for users to specify and load configuration + from their own files? diff --git a/docs/config/openai_agent/index.rst b/docs/config/openai_agent/index.rst index cc56056e4..2edbd039a 100644 --- a/docs/config/openai_agent/index.rst +++ b/docs/config/openai_agent/index.rst @@ -18,6 +18,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/config/path_enum.rst b/docs/config/path_enum.rst index e18030cd3..6a65bdba2 100644 --- a/docs/config/path_enum.rst +++ b/docs/config/path_enum.rst @@ -1,27 +1,6 @@ -- The conversion from ``kebab-case`` to ``snake_case`` is a convention - used in Python to name variables and functions. ``kebab-case`` is - commonly used for filenames or URLs, whereas ``snake_case`` is used - for Python’s variable and function names. This practice fosters - readability and helps to avoid syntax errors. +class PathEnum(Enum): ‘A base class for enums that represent paths.’ -- The use of enums for organizing configuration options makes the code - more readable, maintainable and reliable as these enums group related - values together. It also provides type safety since enums are - essentially a fixed set of constants. While there may be other - methods to organize these options (using dictionaries or lists), - using enums could be preferable for the reasons mentioned above. The - desired method may depend on the specific needs of the software and - the team’s coding convention. +:: -- Currently, it does not seem like ``PathEnum`` is being used outside - of the ``automata.config.base`` context, but there is nothing - stopping it from being used elsewhere if the need arises. If other - parts of the codebase have similar needs to handle path-related - enums, it could be beneficial to utilize ``PathEnum``. - -- As for additional utility methods, it could be useful to have methods - for handling path concatenation, checking path existence, creating - directories, etc. However, these are general file handling tasks not - specific to enums and might be better suited for other classes or - utilities. The current ``to_path`` function seems to serve its - purpose for ``PathEnum``\ ’s intended use. + def to_path(self) -> str: + return convert_kebab_to_snake_case(self.value) diff --git a/docs/config/serialized_data_category.rst b/docs/config/serialized_data_category.rst new file mode 100644 index 000000000..aba0f7e27 --- /dev/null +++ b/docs/config/serialized_data_category.rst @@ -0,0 +1,6 @@ +class SerializedDataCategory(PathEnum): ’:raw-latex:`\n `A class to +represent the different categories of serialized +data:raw-latex:`\n `Corresponds folders in +automata/automata-embedding-data/\*:raw-latex:`\n `’ +PICKLED_DATA_PATH = ‘graphs’ PICKLED_SYMBOL_GRAPH = ‘symbol_graph.pkl’ +PICKLED_SYMBOL_SUBGRAPH = ‘symbol_subgraph.pkl’ diff --git a/docs/config/template_formatter.rst b/docs/config/template_formatter.rst index 4a19246d3..c4ac6718e 100644 --- a/docs/config/template_formatter.rst +++ b/docs/config/template_formatter.rst @@ -1,16 +1,77 @@ -1. As of now, there are no official announcements or documentation that - suggests an update towards supporting additional ``AgentConfigName`` - types in the ``create_default_formatter`` method of the - ``TemplateFormatter``. However, as per the demand or customization - requirements of users or for broader applicability, there might be - potential updates to this. - -2. Any newly introduced instruction configurations would typically need - to have their template definitions added to this class, so as to - ensure that the new instructions are properly formatted. This could - potentially affect existing functionality as it would need to be - updated or adapted to accommodate the new instructions. However, - without specific details about what these new instruction - configurations might be, it’s hard to ascertain the exact impact. The - developers would have to ensure that any new additions maintain - compatibility with the existing architecture and functionality. +TemplateFormatter +================= + +``TemplateFormatter`` is a utility class that helps in formatting agent +configurations in a dictionary format which enhances code readability +and maintainability. It supports operations for creating a default +formatter given a configuration, symbol ranking and maximum default +overview symbols. + +Overview +-------- + +``TemplateFormatter`` is used for the preparation of formatted +configurations that are used by the system. This class offers a +``create_default_formatter`` static method which takes an +``AgentConfig``, a ``SymbolRank`` object, and an integer specifying the +maximum default overview symbols. This method builds a dictionary that +provides an overview of the top symbols in the system, as well as +important configuration settings including maximum iterations and +tokens. + +The class is set up as a static utility, and all its methods are +available as static methods. This makes it an efficient tool to format +and present configurations and symbol rankings in a meaningful way which +is essential for debugging and system comprehension. + +Related Symbols +--------------- + +- ``config.automata_agent_config.AgentConfig`` +- ``experimental.search.symbol_rank.SymbolRank`` +- ``config.config_enums.AgentConfigName`` + +Example +------- + +The following serves as an example for creating a dictionary for a +formatter with a default setup. + +.. code:: python + + from automata.config.automata_agent_config import AgentConfig + from automata.experimental.search.symbol_rank import SymbolRank + from automata.config.formatter import TemplateFormatter + from automata.config.config_enums import AgentConfigName + + #Initialize the AgentConfig + agent_config = AgentConfig(AgentConfigName.AUTOMATA_MAIN) + #Initialize SymbolRank object + symbol_rank = SymbolRank() + + #Create a TemplateFormatter + formatter = TemplateFormatter.create_default_formatter(agent_config, symbol_rank) + +In the above example, the ``create_default_formatter`` method of +``TemplateFormatter`` is utilized to generate a format that presents key +configurations and symbol rankings in a readable manner. + +Limitations +----------- + +The TemplateFormatter assumes a specific configuration setup within the +system and has a rigidity in terms of input parameters. Also, the method +``create_default_formatter`` only works if the ``config_name`` of the +passed ``AgentConfig`` is ``AUTOMATA_MAIN``, otherwise it simply +generates an empty dictionary. + +Follow-up Questions: +-------------------- + +- How could additional class methods for alternate formatting styles be + implemented? +- How should the TemplateFormatter handle incorrect or unexpected input + parameters for the ``create_default_formatter`` method? Should + specific exceptions be defined? +- What is the use case for ``create_default_formatter`` method when the + ``AgentConfig``\ ’s ``config_name`` is not ``AUTOMATA_MAIN``? diff --git a/docs/context_providers/index.rst b/docs/context_providers/index.rst index 6808be0ae..9e38f9170 100644 --- a/docs/context_providers/index.rst +++ b/docs/context_providers/index.rst @@ -18,6 +18,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/context_providers/symbol_provider_registry.rst b/docs/context_providers/symbol_provider_registry.rst index c4761c833..f4de838f8 100644 --- a/docs/context_providers/symbol_provider_registry.rst +++ b/docs/context_providers/symbol_provider_registry.rst @@ -4,70 +4,81 @@ SymbolProviderRegistry Overview -------- -``SymbolProviderRegistry`` is a central registry management class which -keeps track of the multiple symbol providers in the system. The primary -role of the ``SymbolProviderRegistry`` is to synchronize the symbols -supported by several symbol providers and maintain them in a sorted -order. It ensures that only the common symbols across the system are -taken into account by the application. - -This class operates primarily on singleton methods to maintain and -provide a single central registry. +``SymbolProviderRegistry`` is a class for managing instances of +``ISymbolProvider`` objects. It provides methods for registering, +tracking, and synchronizing symbol providers across numerous parts of +the system. This class uses singleton design pattern, thus there is only +one unique ``SymbolProviderRegistry`` instance during the runtime. The +registry keeps track of two primary attributes: a set ``_providers`` of +all registered symbol providers, and a list ``sorted_supported_symbols`` +of all symbols supported by every registered provider. Related Symbols --------------- -- ``automata.symbol.base.Symbol`` -- ``automata.symbol.base.ISymbolProvider`` -- ``automata.tests.unit.test_symbol_graph.test_get_all_symbols`` -- ``automata.tests.unit.test_symbol_graph.test_build_real_graph`` -- ``automata.context_providers.symbol_synchronization.SymbolProviderSynchronizationContext.register_provider`` -- ``automata.context_providers.symbol_synchronization.SymbolProviderSynchronizationContext.synchronize`` +- ``automata.context_providers.symbol_synchronization_context.SymbolProviderSynchronizationContext.register_provider`` +- ``automata.context_providers.symbol_synchronization_context.SymbolProviderSynchronizationContext.synchronize`` +- ``automata.singletons.dependency_factory.DependencyFactory._synchronize_provider`` +- ``automata.symbol.symbol_base.ISymbolProvider`` +- ``automata.cli.scripts.run_doc_embedding.initialize_providers`` Usage Example ------------- -.. code:: python +Synchronization of symbol providers and their supported symbols is an +important operation in application involving symbolic representations of +data. Here is an example of using ``SymbolProviderRegistry`` for +registering and synchronizing symbol providers: - from automata.symbol.base import ISymbolProvider, Symbol - from automata.context_providers.symbol_synchronization import SymbolProviderRegistry +.. code:: python - # Define a custom symbol provider - class CustomSymbolProvider(ISymbolProvider): - ... + from automata.context_providers.symbol_synchronization_context import SymbolProviderRegistry + from your_module import SymbolProviderExample # Assume this class implements ISymbolProvider interface - custom_provider = CustomSymbolProvider() + provider_one = SymbolProviderExample() + provider_two = SymbolProviderExample() - # Register your custom provider - SymbolProviderRegistry.register_provider(custom_provider) + SymbolProviderRegistry.register_provider(provider_one) + SymbolProviderRegistry.register_provider(provider_two) - # Synchronize the symbols across all providers SymbolProviderRegistry.synchronize() - # Get the sorted list of supported symbols - symbols = SymbolProviderRegistry.get_sorted_supported_symbols() + filtered_supported_symbols = SymbolProviderRegistry.get_sorted_supported_symbols() - # Your code with the symbol - ... +In the above example, we first register ``provider_one`` and +``provider_two``. Subsequently, we synchronize all registered providers +using ``synchronize()``. After synchronizing, we use +``get_sorted_supported_symbols()`` to retrieve the supported symbols, +which now includes only those symbols supported by all providers. Limitations ----------- -``SymbolProviderRegistry`` relies on the symbol providers in the system -implementing the ``ISymbolProvider`` interface correctly. If a symbol -provider provides incorrect or incomplete information about supported -symbols, it may introduce errors in the sorted symbols list. -Additionally, the registry assumes that all symbol providers will be -registered before any get or synchronize operation is performed. If a -new symbol provider is added after synchronization, it will not be -considered until the next synchronization. +``SymbolProviderRegistry`` imposes a common subset of symbols ideology +among all registered providers, leading to a situation where if there is +no common subset between them, an exception will be raised. This +approach might not be desirable for cases where it’s completely +acceptable to have providers with no overlapping symbols. + +Furthermore, the functionality provided by this class is limited by the +correct implementation of ``ISymbolProvider``\ ’s methods by symbol +providers. If methods like ``set_synchronized``, ``filter_symbols``, +``_get_sorted_supported_symbols`` are not implemented correctly, the +``SymbolProviderRegistry`` may not operate as expected. + +Lastly, it’s important to note that this class uses a Singleton pattern +and can only support one distinct instance of ``SymbolProviderRegistry`` +during the system’s runtime. Follow-up Questions ------------------- -- How does adding a new symbol provider after synchronization affect - the result? Is there a watch mechanism or notification setup in - symbol providers when new symbols get added? -- What precautions or considerations should be taken while implementing - a custom symbol provider to ensure compatibility with - ``SymbolProviderRegistry``? +- How well does this class handle cases where there is a need to + maintain different registries for different symbol providers? +- How well can the synchronization mechanism handle future extensions + to the ISymbolProvider interface, such as the addition of new + initialization states? +- What happens when a registered symbol provider does not implement the + ISymbolProvider methods correctly? Are there sufficient error handle + mechanisms in place to inform the user about potential issues caused + by this? diff --git a/docs/context_providers/symbol_provider_synchronization_context.rst b/docs/context_providers/symbol_provider_synchronization_context.rst index 08486a2f8..fb0cc9dc9 100644 --- a/docs/context_providers/symbol_provider_synchronization_context.rst +++ b/docs/context_providers/symbol_provider_synchronization_context.rst @@ -1,19 +1,72 @@ -- If symbol providers are not synchronized before exiting the context, - it can cause inconsistent data representation among different symbol - providers. This can potentially lead to errors and unexpected - behavior of the software system that relies on these symbol - providers. For example, a procedure might not see the latest - modifications of a symbol or might incorrectly assume two symbols are - identical when they aren’t. - -- If a symbol provider does not implement ``ISymbolProvider`` - correctly, it may lead to a range of issues, especially when used in - a ``SymbolProviderSynchronizationContext``. For instance, if the - synchronization method is not implemented correctly, trying to - synchronize the given provider could fail or produce an incorrect - state. This can lead to inconsistencies in symbol representation, - corruption of data, or unexpected runtime errors. It might also - violate the dependencies and contracts between different parts of the - code that rely on the symbol provider. Hence why proper - implementation of the ``ISymbolProvider`` interface by each symbol - provider is fundamental for the functioning of the system. +SymbolProviderSynchronizationContext +==================================== + +``SymbolProviderSynchronizationContext`` is a Python class designed to +manage synchronization tasks for symbol providers in the Automata +codebase. This context manager class ensures that symbol providers are +able to register and sync effectively to maintain expected code +performance and correctness in symbol processing procedures. + +Overview +-------- + +``SymbolProviderSynchronizationContext`` manages the registration and +synchronization of symbol providers using context management protocol +methods ``__enter__`` and ``__exit__``. This class makes sure to raise +an exception when a symbol provider has not been synchronized within the +synchronization context. + +This class provides two primary methods: - ``register_provider``: +Registers a symbol provider into ``SymbolProviderRegistry``. - +``synchronize``: Synchronizes all registered symbol providers in +``SymbolProviderRegistry``. + +Related Symbols +--------------- + +- automata.symbol.symbol_base.ISymbolProvider.\__init\_\_ +- automata.symbol.symbol_base.SymbolReference +- automata.symbol_embedding.symbol_embedding_base.SymbolEmbedding.symbol + +Usage Example +------------- + +.. code:: python + + from automata.context_providers.symbol_synchronization_context import SymbolProviderSynchronizationContext + + # Assume `MySymbolProvider` is a class that implements the `ISymbolProvider` interface. + my_provider = MySymbolProvider() + + with SymbolProviderSynchronizationContext() as sync_context: + sync_context.register_provider(my_provider) + # Attempt to register more providers (if any). + sync_context.synchronize() # Synchronize all registered providers. + +Implementation Details and Limitations +-------------------------------------- + +- The class uses an internal attribute ``_was_synchronized`` to keep + track of whether symbol providers have been synchronized within the + context. This design decision could limit the usability of the class + in distributed scenarios. In such cases where multiple threads or + processes are using the same context, race conditions might occur. +- When ``__exit__`` is called, the class raises a ``RuntimeError`` if + no synchronization of symbol providers has occurred. This means + achieving graceful context exit relies on the client code to call + ``synchronize`` method at least once before exiting the context. + +Follow-up Questions: +-------------------- + +- How can we better adapt ``SymbolProviderSynchronizationContext`` to + multi-threaded or distributed applications? +- With the current design, a ``RuntimeError`` is raised if providers + are registered but not synchronized within the context. Could there + be situations where this strict rule might be overbearing? How can we + achieve more flexibility while maintaining effectiveness of the + class? +- Could there be a better alternative design for the + ``register_provider`` and ``synchronize`` methods to ensure all + symbol providers are always synchronized correctly after being + registered? diff --git a/docs/core/base/automata_error.rst b/docs/core/base/automata_error.rst new file mode 100644 index 000000000..741554a86 --- /dev/null +++ b/docs/core/base/automata_error.rst @@ -0,0 +1,70 @@ +AutomataError +============= + +Overview +-------- + +``AutomataError`` is an essential base class for all exceptions defined +in the Automata framework. It inherits directly from Python’s built-in +``Exception`` class, and adds a few additional properties that provide +greater context when an error is thrown. + +A unique element of ``AutomataError`` is that in addition to the +standard exception message, it allows for the inclusion of extra details +in the form of another field, ``details``. This added context can +greatly simplify error handling and debugging in complex project +environments. + +The ``user_message`` property is designed to return the ``Exception`` +message, providing a useful, human-readable error message. If no message +is provided, it defaults to ``""``. + +Related Symbols +--------------- + +``AutomataError`` is used as a base class for multiple exception classes +across the Automata project. Some of these classes include: - +``automata.tasks.task_error.TaskStateError`` - +``automata.tasks.task_error.TaskGeneralError`` - +``automata.eval.agent.code_writing_eval.CodeExecutionError`` - +``automata.eval.agent.code_writing_eval.VariableNotFoundError`` + +Usage Example +------------- + +During the development of tasks, if a task is not in the correct state +for the operation, ``TaskStateError`` would be raised: + +.. code:: python + + from automata.tasks.task_error import TaskStateError + + try: + # Code that fails because of the task being in the wrong state + task = AutomataTask() + task.execute() + except TaskStateError as e: + print(f"Encountered an error: {e.user_message}. Details: {e.details}") + +Note: ``AutomataTask`` and its ``execute`` method is used as a +placeholder for this example. + +Limitations +----------- + +``AutomataError`` does not necessarily have limitations; however, one +might consider that both the ``message`` and ``details`` are not +enforced to adhere to any particular format, which could lead to +inconsistent error messages in a larger codebase. It might also be +perceived as a limitation that this error does not include built-in +support for richer error logging or serialization. + +Follow-up Questions: +-------------------- + +- Is there a need for a consistent format or schema for additional + ``details`` in the ``AutomataError`` exceptions? +- Would it be beneficial to integrate ``AutomataError`` with a logging + or a monitoring system? +- Could ``AutomataError`` benefit from being equipped with a feature + allowing it to be serialized to JSON or another format? diff --git a/docs/core/base/database/chroma_vector_database.rst b/docs/core/base/database/chroma_vector_database.rst index 405fce2b6..8e2e9e969 100644 --- a/docs/core/base/database/chroma_vector_database.rst +++ b/docs/core/base/database/chroma_vector_database.rst @@ -1,35 +1,99 @@ -In a real-world context, a concrete subclass of ``ChromaVectorDatabase`` -might be created to manage a database of vector representations for -specific types of data, such as AI or ML models. - -The subclass would implement the abstract methods depending on the -specific use case. For example, ``add()`` and ``batch_add()`` might be -implemented to insert new vectors to the database, ``update_entry()`` to -modify existing vectors, ``entry_to_key()`` to create a unique -identifier for each vector, and ``get_ordered_keys()`` and -``get_all_ordered_embeddings()`` to retrieve the vectors in a specific -order. - -The choice of ``duckdb+parquet`` as the Chroma’s DB implementation -suggests that the database is designed for efficient handling of large -amounts of read-oriented analytical workloads. DuckDB is an in-memory -analytical database, and Parquet is a columnar storage file format -optimized for big data processing. - -This choice would make ChromaVectorDatabase efficient for operations -like filtering and aggregation but less suitable for write-heavy -workloads due to the overhead of converting the data into the Parquet -format. It is also probable that the database would work seamlessly with -tools that support the Parquet format, such as Pandas and Apache Arrow. - -The ordering of keys in ``get_ordered_keys()`` method would depend on -the specific needs of the application. For example, keys could be -ordered based on the timestamp of their insertion to the database, their -semantic meaning, or their closeness to a specific reference vector. - -To get the ordered entries efficiently in -``get_all_ordered_embeddings()`` method, the database could use an index -on the columns that are used for ordering. The exact strategy would -depend on the chosen DB implementation and the specific requirements of -the application, such as the need for real-time responses or the -acceptable level of accuracy in the order of the returned entries. +ChromaVectorDatabase +==================== + +``ChromaVectorDatabase`` is a concrete class defined in the +``automata.core.base.database.vector_database`` module. The main purpose +of this class is to use Chroma, a live vector database, for persistent +storage of vectors. Its functionalities include common database +operations like adding either a single entry or a batch of entries, +updating entries, deleting entries by key, and performing organized +retrieval of keys and embeddings. The class also offers provisions to +create and set up a client for Chroma using the persistence directory. + +Overview +-------- + +The ``ChromaVectorDatabase`` is an implementation of the +``VectorDatabaseProvider`` and ``Generic[K, V]`` interfaces, aiming to +establish and manage a connection with a Chroma database per collection. +The constructor (``__init__``) sets up the Chroma client and the +collection to be used according to the input parameters. Provided +utility methods like ``load``, ``save``, ``clear``, and ``contains`` +allow for efficient management of the database and its entries. +Furthermore, ``ChromaVectorDatabase`` specifies abstract methods (to be +implemented in subclasses) for specific database operations that depend +on the type of keys and the order of entries. + +Related Symbols +--------------- + +- ``VectorDatabaseProvider``: Interface that ``ChromaVectorDatabase`` + class implements. +- ``Generic[K, V]``: Python’s Generic class used for flexible type + hints. +- ``chromadb``: Chroma client library that gets imported as part of + Chroma setup. +- ``Settings``: Chroma DB settings object. + +Example +------- + +Below is an example of how you can use the ``ChromaVectorDatabase``: + +.. code:: python + + from automata.core.base.database.vector_database import ChromaVectorDatabase + + # Instantiate ChromaVectorDatabase with a collection name and persistent directory. + collection_name = "my_collection" + persist_dir = "/path/to/persistent/directory" + chroma_db = ChromaVectorDatabase(collection_name=collection_name, persist_directory=persist_dir) + + # Add data to Chroma DB (data format depends on K and V types defined in the subclass) + # chroma_db.add(data) + + # Check if specific key exists in the collection + # exists = chroma_db.contains(key) + + # Clear data in the Chroma DB collection + chroma_db.clear() + +Please note that the ``ChromaVectorDatabase`` class is abstract, and +requires several methods such as ``add(data)`` and ``contains(key)`` to +be implemented in a subclass to specify the behavior according to the +required key and value types. + +Limitations +----------- + +The ``ChromaVectorDatabase`` class depends on the Chroma client library +(``chromadb``), which may need to be installed +(“``pip install chromadb``”) before use. Additionally, the class relies +heavily on the specific way Chroma client manages and interacts with +collections. Any changes to Chroma client’s functionality may require +corresponding changes in this class. Furthermore, this class is an +abstract class which means you cannot directly instantiate this class. + +Can consider adding detailed descriptions and usage examples for the +abstract methods to provide further guidance on how to implement these +methods in a subclass. Other operations, like achieving concurrent or +multi-threaded write operations, might require advanced handling and +additional care at the implementation level. + +Follow-up Questions: +-------------------- + +- Is there an example of a subclass implementation from this abstract + base class (``ChromaVectorDatabase``) in the project? +- What is the behavior when a simultaneously read and write operation + occurs in this database? To what extent does it handle concurrency? +- Are there mechanisms in place for handling cases when the Chroma + client or the persistence directory is not accessible or fails? +- What are the specific formatting or restrictions on the key (K) and + value (V) types, especially when considering the need for ordered + keys and embeddings? +- Given that this class deals with storing and retrieving vector data, + how do we handle high-dimensional vectors or large amounts of vector + data? +- How does this class interact with the rest of the system (agents, + handlers, etc.)? diff --git a/docs/core/base/database/index.rst b/docs/core/base/database/index.rst index 16eec95de..3f0e81640 100644 --- a/docs/core/base/database/index.rst +++ b/docs/core/base/database/index.rst @@ -21,6 +21,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. @@ -32,6 +34,7 @@ Summary of content relational_database sql_database vector_database_provider + relational_database/index .. AUTO-GENERATED CONTENT END .. diff --git a/docs/core/base/database/json_vector_database.rst b/docs/core/base/database/json_vector_database.rst index a9639bd2c..fe2cc7252 100644 --- a/docs/core/base/database/json_vector_database.rst +++ b/docs/core/base/database/json_vector_database.rst @@ -1,84 +1,80 @@ JSONVectorDatabase ================== -``JSONVectorDatabase`` is a concrete class providing a vector database -that implements storage and retrieval operations into a JSON file. - Overview -------- -The ``JSONVectorDatabase`` performs the following operations: +``JSONVectorDatabase`` is an abstraction that provides a vector database +which saves its elements in a JSON file. It’s a simple yet effective way +to utilize the file system as storage for vectors. It’s designed to be +general, implementing the ``VectorDatabaseProvider`` interface, and is +also modular with types ``K`` for keys and ``V`` for values provided in +a generic fashion. + +It can handle basic database operations like adding, getting, and +discarding entries individually or in batches, as well as ability to +update entries. Additional functionalities include checking if a key +exists, getting all ordered embeddings, and clearing all entries in the +database. -- Initialize an empty vector database or load an existing one from a - JSON file. -- Add and discard entries in the vector database. -- Check if a certain entry exists in the vector database. -- Get a specific entry from the vector database based on its key. -- Load the vector database from a JSON file and save it back to the - JSON file. -- Update an existing entry in the vector database. +Note that ``JSONVectorDatabase`` was not designed with efficiency in +mind and might become slow when handling large number of vectors. Related Symbols --------------- -- ``automata.core.base.database.vector.VectorDatabaseProvider``: An - abstract base class that ``JSONVectorDatabase`` inherits from, which - lays out the fundamental methods a vector database should implement. -- ``automata.tests.unit.test_database_vector.test_init_vector``, - ``automata.tests.unit.test_database_vector.test_load``, - ``automata.tests.unit.test_database_vector.test_save``, - ``automata.tests.unit.test_database_vector.test_delete_symbol``, - ``automata.tests.unit.test_database_vector.test_add_symbol``, - ``automata.tests.unit.test_database_vector.test_add_symbols``, - ``automata.tests.unit.test_database_vector.test_lookup_symbol``: Unit - test files that provide examples on how to utilize - ``JSONVectorDatabase``\ ’s methods. +- ``VectorDatabaseProvider``: The interface implemented by + ``JSONVectorDatabase``. +- ``jsonpickle``: Used in encoding and decoding objects for JSON + representation. Example ------- -The following is an example demonstrating the usage of +The following is an example demonstrating how to use ``JSONVectorDatabase``. .. code:: python - from automata.core.base.database.vector import JSONVectorDatabase + from automata.core.base.database.vector_database import JSONVectorDatabase - file_path = "db.json" - vector_db = JSONVectorDatabase(file_path) + # Define custom database with string keys and int value vectors + class CustomDatabase(JSONVectorDatabase[str, int]): + def get_ordered_keys(self): + return sorted(self.index.keys()) + + def entry_to_key(self, entry): + return str(entry) - # Add an entry - vector_db.add("apple") - assert vector_db.contains("apple") + db = CustomDatabase("/path/to/your/database.json") - # Save the database to the json file - vector_db.save() + # Add entries to the database + db.add(5) + db.add(7) + db.add(2) - # Discard an entry - vector_db.discard("apple") - assert not vector_db.contains("apple") + # Save the database to the JSON file + db.save() - # Load the database from the json file - vector_db.load() + # Load the database from the JSON file + db.load() - # Update an entry - vector_db.update_database("banana") + # Prints [2, 5, 7] + print(db.get_all_ordered_embeddings()) Limitations ----------- -``JSONVectorDatabase`` currently only supports JSON files and does not -maintain order when loading back from the file due to the inherent -property of JSON objects. Additionally, the entry keys in the vector -database are strictly hashable, limiting the types of objects you can -add in the database. +``JSONVectorDatabase`` has some limitations. The JSON file format is not +designed to support large datasets, so performance may degrade when +handling large number of vectors. It is also not designed with +concurrency in mind, so concurrent writes and reads might lead to +inconsistent data. Follow-up Questions: -------------------- -- Is it possible to extend the functionality of ``JSONVectorDatabase`` - to support other file types, i.e., csv or yaml? -- What happens when we try to add an object to the database that is not - hashable as an entry? -- Can we consider using ordered dictionaries (collections.OrderedDict - in Python) to maintain the order when loading and saving databases? +- What is a good alternative to JSON for handling larger databases more + efficiently? +- How can we modify ``JSONVectorDatabase`` to support concurrent writes + and reads? diff --git a/docs/core/base/database/relational_database.rst b/docs/core/base/database/relational_database.rst index 2edcb2734..c6c65f2da 100644 --- a/docs/core/base/database/relational_database.rst +++ b/docs/core/base/database/relational_database.rst @@ -1,65 +1,93 @@ RelationalDatabase ================== -``RelationalDatabase`` is an abstract base class for different types of -relational databases. The class definition includes several abstract -methods intended to be overridden by the subclasses. These methods -primarily facilitate core database operations. +``RelationalDatabase`` serves as an abstract base class to represent +various types of relational databases. It organizes data into one or +more tables with designated fields. Overview -------- -``RelationalDatabase`` provides a pattern for designing various types of -relational databases. It contains abstract methods that outline the -fundamental operations of a relational database. These operations -contain connecting, closing, creating tables on the database, and the -CRUD (Create, Read, Update, Delete) operation methods which are -``insert``, ``select``, ``update_database`` and ``delete``. +``RelationalDatabase`` primarily provides methods for basic database +operations such as connecting, closing the connection, creating tables, +inserting data, selecting data, updating entries, and deleting data. +Given its status as an abstract base class, it only defines the +interface for these operations. The implementation details must be +provided by concrete subclasses, such as an SQL database class that +implements these operations specific to SQL databases. Related Symbols --------------- -- ``automata.tests.unit.test_task_database.db`` -- ``automata.tests.unit.test_conversation_database.db`` -- ``automata.core.base.database.relational.SQLDatabase`` -- ``automata.tests.unit.test_task_database.test_database_lifecycle`` -- ``automata.llm.foundation.LLMConversationDatabaseProvider`` -- ``automata.tests.unit.test_conversation_database.test_get_last_interaction_id_when_no_interactions`` -- ``automata.tasks.agent_database.AutomataAgentTaskDatabase`` -- ``automata.tests.unit.test_task_database.test_get_tasks_by_query`` -- ``automata.memory_store.agent_conversation_database.AgentConversationDatabase`` -- ``automata.tests.unit.test_task_database.test_contains`` +- ``automata.core.base.database.relational_database.SQLDatabase``: This + is a concrete class that provides an SQL database. It inherits from + ``RelationalDatabase`` and thus has the same methods, but with + specific implementations for an SQL database. +- ``automata.core.base.database.vector_database.VectorDatabaseProvider``: + This is an abstract base class for different types of vector database + providers. +- ``automata.eval.agent.agent_eval_database.AgentEvalResultDatabase``: + This class writes evaluation results to an SQLite database. Example ------- -Due to its abstract nature, ``RelationalDatabase`` cannot be -instantiated. However, subsequent code shows an example of a concrete -subclass ``SQLDatabase`` which inherits from this class: +As ``RelationalDatabase`` is an abstract base class, below is an example +of a hypothetical subclass ``MySQLDatabase`` implementing the methods in +``RelationalDatabase``: .. code:: python - from automata.core.base.database.relational import SQLDatabase + from automata.core.base.database.relational_database import RelationalDatabase - db_instance = SQLDatabase() - db_instance.connect('path/to/database.db') - db_instance.create_table('test_table', {'id': int, 'name': str, 'email': str}) - db_instance.insert('test_table', {'id': 1, 'name': 'Test', 'email': 'test@email.com'}) - data = db_instance.select('test_table', ['id', 'name', 'email'], {'id': 1}) - print(data) - db_instance.close() + class MySQLDatabase(RelationalDatabase): + def connect(self, db_path): + # implementation for MySQL connect + + def close(self): + # implementation for MySQL close + + def create_table(self, table_name, fields): + # implementation for MySQL create table + + def insert(self, table_name, data): + # implementation for MySQL insert + + def select(self, table_name, fields, conditions): + # implementation for MySQL select + + def update_entry(self, table_name, data, conditions): + # implementation for MySQL update_entry + + def delete(self, table_name, conditions): + # implementation for MySQL delete + +You would use the subclass similarly to how you would use any class: + +.. code:: python + + db = MySQLDatabase() + db.connect("/path/to/db") + db.create_table("MyTable", {"name": "VARCHAR(100)", "age": "INT"}) + db.insert("MyTable", {"name": "John Doe", "age": 30}) + results = db.select("MyTable", ["name"], {"age": 30}) + db.close() Limitations ----------- -The ``RelationalDatabase`` class is abstract and cannot be used directly -to create a database. It is intended to serve as a base class to be -subclassed by classes that implement specific databases. +The ``RelationalDatabase`` class itself does not provide any actual +implementation details. Thus, instances of ``RelationalDatabase`` cannot +be directly used for operations. Also, any class that inherits from +``RelationalDatabase`` must provide concrete implementations for the +abstract methods defined in the ``RelationalDatabase`` class. -Follow-up Questions: +Follow-Up Questions: -------------------- -- What specific databases are implemented from this abstract bases - class? -- How does the implementation vary from different subclasses of this - abstract class? +- Are there default implementations for any of the methods defined in + ``RelationalDatabase`` in common scenarios? +- How does error handling work at this level? For example, what happens + if one tries to select data from a table that does not exist? +- What type of databases other than SQL might make use of the abstract + ``RelationalDatabase`` class in a typical application’s use case? diff --git a/docs/core/base/database/relational_database/index.rst b/docs/core/base/database/relational_database/index.rst new file mode 100644 index 000000000..2aa5ef31d --- /dev/null +++ b/docs/core/base/database/relational_database/index.rst @@ -0,0 +1,24 @@ +relational_database +=================== + +**Automata** is a Python library for autonomous providers. + +Check out the :doc:`usage` section for further information, including +how to :ref:`installation` the project. + + + +.. AUTO-GENERATED CONTENT START +.. + + .. toctree:: + :maxdepth: 1 + + null_connection + null_cursor + +.. AUTO-GENERATED CONTENT END +.. + + + diff --git a/docs/core/base/database/relational_database/null_connection.rst b/docs/core/base/database/relational_database/null_connection.rst new file mode 100644 index 000000000..96fe67353 --- /dev/null +++ b/docs/core/base/database/relational_database/null_connection.rst @@ -0,0 +1,7 @@ +class NullConnection(): ‘A null connection to a database.’ + +:: + + def commit(self) -> Any: + 'Commit a transaction.' + raise NotImplementedError('This is a null connection.') diff --git a/docs/core/base/database/relational_database/null_cursor.rst b/docs/core/base/database/relational_database/null_cursor.rst new file mode 100644 index 000000000..7dc6e4039 --- /dev/null +++ b/docs/core/base/database/relational_database/null_cursor.rst @@ -0,0 +1,11 @@ +class NullCursor(): ‘A null cursor to a database.’ + +:: + + def execute(self, *args, **kwargs) -> Any: + 'Execute a query.' + raise NotImplementedError('This is a null cursor.') + + def fetchall(self) -> Any: + 'Fetch all results from a query.' + raise NotImplementedError('This is a null cursor.') diff --git a/docs/core/base/database/sql_database.rst b/docs/core/base/database/sql_database.rst index 59b02f57c..9e026675b 100644 --- a/docs/core/base/database/sql_database.rst +++ b/docs/core/base/database/sql_database.rst @@ -1,93 +1,94 @@ SQLDatabase =========== -``SQLDatabase`` is a concrete class that enables interaction with an -SQLite database. It encapsulates various common operations that one -needs to perform on a SQL database such as creation and deletion of -tables, data insertion, selection, and deletion as well as closing the -database connection. - Overview -------- -``SQLDatabase`` opens a connection to a SQLite database file and -provides a set of methods to execute SQL queries for Data Definition -Language (DDL) like ``create_table`` and Data Manipulation Language -(DML) like ``delete``, ``insert``, ``select``, and ``update_database``. - -Import Statements ------------------ - -.. code:: python - - import sqlite3 - from abc import ABC, abstractmethod - from typing import Dict, List - from automata.config import CONVERSATION_DB_PATH +``SQLDatabase`` is a concrete implementation class derived from +``RelationalDatabase`` to manage operations with SQLite databases. It +abstracts basic operations like creating a table, inserting data into a +table, selecting data from a table, updating an entry in a table and +deleting data from a table. Two important utilities +i.e. ``NullConnection`` and ``NullCursor`` are used to represent null +database connection and cursor respectively. Related Symbols --------------- -- ``automata.tests.unit.test_task_database.db`` -- ``automata.tests.unit.test_conversation_database.db`` -- ``automata.tasks.agent_database.AutomataAgentTaskDatabase`` -- ``automata.core.base.database.relational.RelationalDatabase`` -- ``automata.memory_store.agent_conversation_database.AgentConversationDatabase`` +- ``sqlite3.Connection, sqlite3.Cursor`` +- ``automata.core.base.database.relational_database.RelationalDatabase`` + - The base class of ``SQLDatabase``. Example ------- -This example demonstrates the way to use the ``SQLDatabase`` for -performing the basic SQL operations. +This example demonstrates creating an SQLite Database and performing +simple operations like creating a table, inserting and selecting data. .. code:: python - # Create instance of SQLDatabase - database = SQLDatabase() - - # Connect to the SQLite database - database.connect('example.db') - - # Create a table - database.create_table('students', {'name': 'TEXT', 'age': 'INTEGER'}) - - # Insert data into the table - database.insert('students', {'name': 'John', 'age': 20}) + from automata.core.base.database.relational_database import SQLDatabase - # Select data from the table - data = database.select('students', ['name', 'age']) - - print(data) + # Initialize SQLDatabase Object + database = SQLDatabase() - # Delete data from the table - database.delete('students', {'name': 'John'}) + # Connect to Database + database.connect(db_path="my_database.sqlite3") + + # Define Table Name and Fields + table_name = "Employees" + fields = { + "ID": "INTEGER PRIMARY KEY", + "NAME": "TEXT", + "AGE": "INT", + "ADDRESS": "CHAR(50)", + "SALARY": "REAL" + } + + # Create new table + database.create_table(table_name, fields) + + # Insert data + data = { + "NAME": "Paul", + "AGE": 32, + "ADDRESS": "California", + "SALARY": 20000.00 + } + database.insert(table_name, data) + + # Select data + fields = ["NAME", "AGE"] + conditions = {"AGE": 32} + employees = database.select(table_name, fields, conditions) + + # Update data + data = {"SALARY": 25000.00} + conditions = {"NAME": "Paul"} + database.update_entry(table_name, data, conditions) # Close the database connection database.close() +Please ensure that the file path and data used are modified as per your +system and needs. + Limitations ----------- -The ``SQLDatabase`` class specifically designed for SQLite databases. It -is not explicitly designed to work with other types of SQL databases, -for example MySQL or PostgreSQL. +- The ``SQLDatabase`` class is designed specifically to work with + SQLite databases and hence may not be compatible with other types of + SQL databases like MySQL, PostgreSQL, etc. +- The ``commit`` method of ``NullConnection`` and ``execute``, + ``fetchall`` methods of ``NullCursor`` raise ``NotImplementedError``. + These classes serve as placeholders for null connection and cursor + and not expected to have these methods operational. Follow-up Questions: -------------------- -- Is ``SQLDatabase`` compatible with all versions of SQLite or only - with particular ones? -- How is exception handling managed in the ``SQLDatabase`` class? -- Are there any plans to extend ``SQLDatabase`` class for compatibility - with other types of SQL databases? - -Notes ------ - -- Mock objects referenced in test files have been replaced with actual - objects for this documentation. -- Information provided for - ``automata.tests.unit.sample_modules.sample_module_write.CsSWU`` - class has been excluded from this documentation, as it appears to be - unrelated to the primary symbol. If this class is important to - understanding ``SQLDatabase``, more context would be helpful. +- Does the SQLDatabase support different type of SQL databases other + than SQLite? +- Can SQLDatabase handle composite primary keys while creating tables? +- How does SQLDatabase handle SQL injection attacks in its current + form? diff --git a/docs/core/base/database/vector_database_provider.rst b/docs/core/base/database/vector_database_provider.rst index 7fbe55f48..770ab4b50 100644 --- a/docs/core/base/database/vector_database_provider.rst +++ b/docs/core/base/database/vector_database_provider.rst @@ -4,53 +4,108 @@ VectorDatabaseProvider Overview -------- -The ``VectorDatabaseProvider`` is an abstract base class for different -types of vector database providers. These providers are responsible for -managing databases holding vector data. They have various methods that -need to be implemented in the concrete classes inheriting from the -``VectorDatabaseProvider``. +``VectorDatabaseProvider`` is an abstract base class that provides an +interface for different types of vector database providers. It defines a +standard set of operations for interacting with a vector database. The +operations covered include database CRUD operations, such as saving, +loading, clearing, and getting entries. + +The ``VectorDatabaseProvider`` relies on the concrete implementation of +these operations which could be tailored according to the specific +vector database in use, such as ``SQLiteVectorDatabaseProvider``, +``MemoryVectorDatabaseProvider``, ``RedisVectorDatabaseProvider``, etc. Related Symbols --------------- -- ``automata.symbol_embedding.base.JSONSymbolEmbeddingVectorDatabase`` -- ``automata.core.base.database.vector.JSONVectorDatabase`` -- ``automata.embedding.base.EmbeddingVectorProvider`` +- ``automata.embedding.embedding_base.EmbeddingHandler`` +- ``automata.embedding.embedding_base.EmbeddingBuilder.__init__`` +- ``automata.embedding.embedding_base.EmbeddingSimilarityCalculator`` +- ``automata.symbol_embedding.symbol_embedding_builders.SymbolCodeEmbeddingBuilder`` +- ``automata.memory_store.symbol_code_embedding_handler.SymbolCodeEmbeddingHandler.process_embedding`` Example ------- -As ``VectorDatabaseProvider`` is an abstract base class, an instance of -it cannot be created. However, its methods can be implemented by -inheriting this class. One of the example concrete classes is -``JSONSymbolEmbeddingVectorDatabase`` +Although ``VectorDatabaseProvider`` is an abstract base class and cannot +be instantiated directly, the following example demonstrates how it +might be extended and used in practice: .. code:: python - from automata.symbol_embedding.base import JSONSymbolEmbeddingVectorDatabase + from automata.core.base.database.vector_database import VectorDatabaseProvider + from typing import Any, List + + class MyDatabaseProvider(VectorDatabaseProvider[Any, Any]): + """Custom vector database provider.""" - class ExampleVectorDatabase(JSONSymbolEmbeddingVectorDatabase): - def __init__(self, file_path: str): - super().__init__(file_path) + def __len__(self) -> int: + ... + + def save(self) -> None: + ... + + def load(self) -> None: + ... + + def clear(self) -> None: + ... + + def get_ordered_keys(self) -> List[Any]: + ... + + def get_all_ordered_embeddings(self) -> List[Any]: + ... + + def add(self, entry: Any) -> None: + ... + + def batch_add(self, entries: Any) -> None: + ... + + def update_entry(self, entry: Any) -> None: + ... + + def batch_update(self, entries: List[Any]) -> None: + ... + + def entry_to_key(self, entry: Any) -> Any: + ... + + def contains(self, key: Any) -> bool: + ... + + def get(self, key: Any) -> Any: + ... + + def batch_get(self, keys: List[Any]) -> List[Any]: + ... + + def discard(self, key: List[Any]) -> None: + ... + + def batch_discard(self, keys: List[Any]) -> None: + ... - # Implement all the abstract methods required by the parent class - # The methods signature should be same as that in VectorDatabaseProvider +Note: The ``Any`` type is used as placeholders for an actual key type +``K`` and vector type ``V``. Replace it with the appropriate types based +on your specific context and vector database connection requirements. Limitations ----------- -The primary limitation of the ``VectorDatabaseProvider`` base class is -that it does not provide any concrete implementations of the methods. -The specific implementations for vector database operations are deferred -to the classes that inherit it. It is essential to understand that each -concrete class may have its own limitations depending on the implemented -data source or methodology. +The ``VectorDatabaseProvider`` in itself doesn’t perform any operations, +it simply declares the methods that all vector database providers should +implement. The efficiency and effectiveness of these operations are +completely dependent on the implementation in the concrete classes that +extend ``VectorDatabaseProvider``. Follow-up Questions: -------------------- -- How can we ensure efficient implementation of the abstract methods - across different classes inheriting ``VectorDatabaseProvider``? -- Are there some implementations of ``VectorDatabaseProvider`` which - are specifically optimized for specific types of vector data or - specific data sources? +- How should the handling of exceptions in the concrete subclasses be + standardized? +- Is there a provision to define a custom hashing algorithm in the + ``entry_to_key`` method for unique keys generation? +- How can we introduce asynchronous capabilities to increase + performance where necessary? diff --git a/docs/core/base/index.rst b/docs/core/base/index.rst index d24d19bd5..df1b8cb75 100644 --- a/docs/core/base/index.rst +++ b/docs/core/base/index.rst @@ -18,12 +18,15 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. .. toctree:: :maxdepth: 1 + automata_error database/index patterns/index diff --git a/docs/core/base/patterns/index.rst b/docs/core/base/patterns/index.rst index 3835f435c..a6399138f 100644 --- a/docs/core/base/patterns/index.rst +++ b/docs/core/base/patterns/index.rst @@ -18,6 +18,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/core/base/patterns/observer.rst b/docs/core/base/patterns/observer.rst index 3d0bd4a1d..600338cff 100644 --- a/docs/core/base/patterns/observer.rst +++ b/docs/core/base/patterns/observer.rst @@ -1,54 +1,8 @@ -Observer -======== +class Observer(ABC): ‘An abstract class for implementing an observer.’ -``Observer`` is an abstract base class used for implementing the -Observer design pattern in Python. subclasses of the ``Observer`` class -should implement the ``update`` method, which is called whenever a -subject the observer is watching changes. +:: -Overview --------- - -The ``Observer`` class provides a template for creating objects that -watch or monitor other objects (subjects). If a subject changes, it -calls the ``update`` method in its observers. - -Related Symbols ---------------- - -- ``automata.tests.unit.test_task.test_callback`` -- ``automata.tests.unit.sample_modules.sample.Person`` -- ``automata.llm.foundation.LLMConversation.register_observer`` -- ``automata.tests.unit.sample_modules.sample_module_write.CsSWU`` -- ``automata.tests.conftest.MockRepositoryClient`` -- ``automata.llm.foundation.LLMConversation.unregister_observer`` -- ``automata.llm.foundation.LLMConversation.notify_observers`` -- ``automata.tests.unit.test_py_reader.getter`` -- ``automata.llm.foundation.LLMConversationDatabaseProvider`` - -Example -------- - -The following is an example demonstrating how to implement an instance -of ``Observer`` class by implementing the ``update`` method. - -.. code:: python - - class CustomObserver(Observer): - def update(self, subject: Any): - print(f"Subject {subject} has changed.") - -Limitations ------------ - -The ``Observer`` class is an abstract base class, so it cannot be -instantiated on its own. Instead, you need to create a subclass and -implement the ``update`` method. - -Follow-up Questions: -~~~~~~~~~~~~~~~~~~~~ - -- What are the specification and role of the subject parameter in the - ``update`` method? -- What exact changes in the subject cause the ``update`` method to be - called? + @abstractmethod + def update(self, subject: Any): + 'When the subject changes, this method is called to notify the observer.' + pass diff --git a/docs/core/base/patterns/singleton.rst b/docs/core/base/patterns/singleton.rst index 1e7a84a19..c47ff64bd 100644 --- a/docs/core/base/patterns/singleton.rst +++ b/docs/core/base/patterns/singleton.rst @@ -1,89 +1,11 @@ -Singleton -========= +class Singleton(abc.ABCMeta, type): ‘:raw-latex:`\n `Singleton +metaclass for ensuring only one instance of a class.:raw-latex:`\n `’ +\_instances: Dict[(str, Any)] = {} -Overview --------- +:: -The ``Singleton`` class is a metaclass designed to ensure only one -instance of a class is created. It follows a creational pattern which is -commonly used in situations where a class must control the number of -instances created, e.g. for memory management or ensuring unique -communication points in a system. - -The core functionality of this class resides under the ``__call__`` -method, which checks if an instance of the class already exists before -creating a new one. If the instance already exists, it returns the -existing instance. - -This class is a part of ``automata.core.base.patterns`` and is defined -using abstract base class (``abc``) module of Python’s standard library -for creating abstract base classes. - -Related Symbols ---------------- - -- ``automata.tests.unit.sample_modules.sample_module_write.CsSWU``, a - unit test sample module. -- ``automata.tests.unit.sample_modules.sample.EmptyClass``, a unit test - sample module with an empty class. -- ``automata.context_providers.symbol_synchronization.SymbolProviderSynchronizationContext``, - a context provider for symbol synchronization. -- ``automata.tests.unit.sample_modules.sample_module_write.CsSWU.__init__``, - initializer for the ``CsSWU`` class. -- ``automata.tests.unit.sample_modules.sample.Person``, a sample class - for unit testing. -- ``automata.symbol.base.ISymbolProvider.__init__``, initializer for - the ``ISymbolProvider`` Interface. -- ``automata.tests.unit.sample_modules.sample.OuterClass``, a sample - outer class with an inner class for unit testing. -- ``automata.symbol.base.Symbol``, core class for creating, managing, - and manipulating symbols. -- ``automata.tests.unit.sample_modules.sample.OuterClass.InnerClass``, - a sample inner class located within an outer class for unit testing. - -Example -------- - -The following is an example demonstrating how to create a class with -Singleton as metaclass. - -.. code:: python - - import abc - from automata.core.base.patterns.singleton import Singleton - - class MyClass(metaclass=Singleton): - def __init__(self, name): - self.name = name - - # Create a new instance - instance1 = MyClass("MyClass1") - print(instance1.name) # Outputs: MyClass1 - - # Try to create another instance - instance2 = MyClass("MyClass2") - print(instance2.name) # Outputs: MyClass1 - - # Confirm both instances are the same - print(instance1 is instance2) # Outputs: True - -Limitations ------------ - -The Singleton pattern restricts the instantiation of a class to a single -object. It ensures a class has only one instance and provides a global -point of access to it. - -One main drawback is that you can’t create a second instance of your -Singleton class. If your application needs to have multiple instances of -a class, then the Singleton pattern is not suitable. Also, complex tests -can become difficult with Singleton if not handled carefully. - -Follow-up Questions: -^^^^^^^^^^^^^^^^^^^^ - -- Are there instances when the Singleton pattern might not be desired, - or possibly harmful? -- Does Singleton thread safe? -- How do initializers (``__init__``) behave when used with the - Singleton pattern? + def __call__(self, *args, **kwargs): + 'Call method for the singleton metaclass.' + if (self not in self._instances): + self._instances[self] = super(Singleton, self).__call__(*args, **kwargs) + return self._instances[self] diff --git a/docs/core/bounding_box.rst b/docs/core/bounding_box.rst index 5851fd3ef..defc887e0 100644 --- a/docs/core/bounding_box.rst +++ b/docs/core/bounding_box.rst @@ -1,24 +1,2 @@ -1. Integrating location information onto the Bounding Box could prove - beneficial in certain use-cases, such as running a specific check in - a particular portion of the AST or highlighting an error. Instead of - looking up the location details separately, these could then be - directly retrieved from the BoundingBox instance. - -2. Bounding box intersection or overlap could be implemented by - extending this class or by creating a new utility class. The most - suitable approach would depend on the project’s design philosophy. If - there’s a need for making ``BoundingBox`` instances ‘aware’ of each - other and capable of making intersection or overlap checks, then - extending this class would make sense. Conversely, if this added - functionality does not conceptually belong to the idea of a - ‘BoundingBox’, a utility class that takes two bounding boxes and - checks for intersection or overlap would be appropriate. - -3. A ``BoundingBox`` is conventionally a rectangle, derived from the - ‘minimum bounding rectangle’ concept in spatial analysis where it - represents the smallest rectangle (oriented along the axes) within - which all points lie. However, depending on the higher-level - abstraction, a “bounding box” might not always be rectangular, but - that could complicate the computation of overlap, intersection, etc. - and would likely involve the creation of a significantly more complex - class or set of classes to handle these cases. +@dataclass class BoundingBox(): ‘A class to represent the bounding box +of a symbol.’ top_left: LineItem bottom_right: LineItem diff --git a/docs/core/docstring_remover.rst b/docs/core/docstring_remover.rst new file mode 100644 index 000000000..7e61d105e --- /dev/null +++ b/docs/core/docstring_remover.rst @@ -0,0 +1,10 @@ +class DocstringRemover(NodeTransformer): ‘Removes docstrings from a +class or function.’ + +:: + + def visit(self, node: AST) -> Optional[AST]: + 'Visits a node in the AST.' + if (isinstance(node, (AsyncFunctionDef, ClassDef, FunctionDef, Module)) and isinstance(node.body[0], Expr) and isinstance(node.body[0].value, Str)): + node.body.pop(0) + return super().visit(node) diff --git a/docs/core/handler_dict.rst b/docs/core/handler_dict.rst index 8e6ab295b..04e262c21 100644 --- a/docs/core/handler_dict.rst +++ b/docs/core/handler_dict.rst @@ -1,71 +1,2 @@ -HandlerDict -=========== - -``HandlerDict`` is a special dictionary type that represents a logging -handler in the logging configuration structure. Handlers are responsible -for delivering a log record (LogRecord instance) to its destination. -Hence ``HandlerDict`` plays a crucial role in controlling how and where -every logging event is handled. - -Overview --------- - -``HandlerDict`` extends Python’s built-in ``dict`` class and it is used -within the larger scope of a ``LoggingConfig`` which is formatted as a -dictionary. The ``LoggingConfig`` includes the handler definitions, -which are represented as ``HandlerDict``. - -Related Symbols ---------------- - -- ``automata.core.utils.RootDict`` -- ``automata.core.utils.LoggingConfig`` -- ``automata.tests.unit.sample_modules.sample_module_write.CsSWU.__init__`` - -Example usage of HandlerDict ----------------------------- - -Although the ``HandlerDict`` is usually a part of a LoggingConfig, it -can be used independently as well. - -In a ``LoggingConfig``, the HandlerDict usually is present in something -similar to the following structure: - -.. code:: python - - logging_config = { - 'version': 1, - 'disable_existing_loggers': False, - 'handlers': { - 'console': { - 'class': 'logging.StreamHandler', - 'level': 'INFO', - 'formatter': 'standard', - 'stream': 'ext://sys.stdout', - } - } - } - -Here, ‘console’ is a HandlerDict which further contains handler-specific -settings in its own dictionary. - -It is important to note as this an integral part of logging -configuration, using it independently might not yield much significant -results. - -Limitations: ------------- - -``HandlerDict`` is a subset of the larger ``LoggingConfig`` dictionary -and hence does not offer any specialized functionality or methods apart -from the dictionary structure it provides to store configuration. It -must be accurately formatted and used in a valid logging configuration -structure to function as intended. Furthermore, it assumes that proper, -valid handler properties are used as keys in the HandlerDict. - -Follow-Up Questions -------------------- - -- What advanced customization options are available for - ``HandlerDict``? -- Can it be used outside a Logging configuration context effectively? +class HandlerDict(TypedDict): ‘A dictionary representing a logging +handler’ class\_: str formatter: str level: int filename: Optional[str] diff --git a/docs/core/import_remover.rst b/docs/core/import_remover.rst new file mode 100644 index 000000000..18beb157f --- /dev/null +++ b/docs/core/import_remover.rst @@ -0,0 +1,9 @@ +class ImportRemover(NodeTransformer): ‘Removes import statements from a +module, class or function.’ + +:: + + def visit(self, node): + if (isinstance(node, (AsyncFunctionDef, ClassDef, FunctionDef, Module)) and isinstance(node.body[0], (Import, ImportFrom))): + node.body.pop(0) + return super().visit(node) diff --git a/docs/core/index.rst b/docs/core/index.rst index d36d5e3e4..b47f15b6d 100644 --- a/docs/core/index.rst +++ b/docs/core/index.rst @@ -20,6 +20,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. @@ -27,7 +29,10 @@ Summary of content :maxdepth: 1 bounding_box + docstring_remover handler_dict + import_remover + line_item logging_config root_dict base/index diff --git a/docs/core/line_item.rst b/docs/core/line_item.rst new file mode 100644 index 000000000..1a753b1be --- /dev/null +++ b/docs/core/line_item.rst @@ -0,0 +1,2 @@ +@dataclass class LineItem(): ‘A class to represent a line item in a +bounding box.’ line: int column: int diff --git a/docs/core/logging_config.rst b/docs/core/logging_config.rst index c2e467724..da016717b 100644 --- a/docs/core/logging_config.rst +++ b/docs/core/logging_config.rst @@ -1,86 +1,4 @@ -LoggingConfig -============= - -``LoggingConfig`` is a flexible dictionary-like configuration object -introduced in the ``automata.core.utils`` module. It provides a -structured form to define the logging configuration including the -logger’s version, control over disabling existing loggers, definitions -for formatters and handlers, and base root dictionary for logging. - -Overview --------- - -``LoggingConfig`` is a subclass of Python’s ``TypedDict`` that allows -you to have dictionary, where keys, values are restricted to specific -types. It is less error-prone and provides a higher quality of tooling -support. The ``total=False`` specification in the subclass definition -means that not all dictionary keys need to be present in initialized -instances of ``LoggingConfig``. - -The ``LoggingConfig`` values can range from fundamental datatypes like -``int`` and ``bool`` to more compound ones like dictionaries of custom -types like ``HandlerDict`` and ``RootDict``. - -Related Symbols ---------------- - -- ``automata.core.utils.HandlerDict``: A dictionary representing a - logging handler. -- ``automata.core.utils.RootDict``: A dictionary representing the root - logger. -- ``automata.cli.commands.reconfigure_logging``: Methods to reconfigure - logging. -- ``automata.core.utils.get_logging_config``: Returns the logging - configuration. -- ``automata.tests.unit.test_task.test_task_inital_state, test_register_task, test_execute_automata_task_fail, test_execute_automata_task_success``: - Unit tests validating various functionalities of ``LoggingConfig``. - -Example -------- - -Here is an illustrative example to create a ``LoggingConfig`` object: - -.. code:: python - - from automata.core.utils import LoggingConfig - - log_config = LoggingConfig( - version=1, - disable_existing_loggers=False, - formatters={}, - handlers={"handler": {"class": "logging.StreamHandler", "formatter": "default", "level": "DEBUG"}}, - root={"handlers": ["handler"], "level": "DEBUG"}, - ) - -Set ``disable_existing_loggers = False`` to permit the functionality of -existing loggers. Handlers can use a ``StreamHandler`` with a level of -“DEBUG”. In the root logger, this handler will be used at the debugging -level. - -Limitations ------------ - -The ``LoggingConfig`` object is structured to conform to the -``TypedDict`` constraints including the type specifications of keys and -their associated values. As such, it may not support dynamic -configurations if the keys and/or their attribute types vary beyond the -predefined schema. - -Another potential limitation is the absence of some keys in the -dictionary due to the ``TypedDict`` was defined with ``total=False``. -That could lead to runtime errors if the code assumes the existence of -keys that might not be present. - -Follow-up Questions: --------------------- - -- Is there a way for ``LoggingConfig`` to accept a broader range of - logging formatters and handlers? -- What happens if specified logging handlers aren’t available or - configured incorrectly? -- How are exceptions handled when logging configuration issues occur, - due to reasons like missing keys in the dict or data-type mismatch? -- Can ``LoggingConfig`` be extended to accept or support custom logging - handlers and formatters? -- What are performance implications of using ``TypedDict`` vs a - standard dictionary in Python? +class LoggingConfig(TypedDict, total=False): ‘A dictionary representing +the logging configuration’ version: int disable_existing_loggers: bool +formatters: dict handlers: dict[(str, Union[(HandlerDict, dict)])] root: +RootDict diff --git a/docs/core/root_dict.rst b/docs/core/root_dict.rst index 49beace8a..d40596a98 100644 --- a/docs/core/root_dict.rst +++ b/docs/core/root_dict.rst @@ -1,124 +1,2 @@ -RootDict -======== - -**Import Statements**: - -.. code:: python - - import json - import logging - import os - import colorlog - import networkx as nx - import openai - import yaml - from copy import deepcopy - from typing import Any, Dict, List, Optional, TypedDict, Union, cast - from automata.symbol.base import Symbol - from automata.config import OPENAI_API_KEY - -**Class Docstring**: ``RootDict`` is a dictionary representing the root -logger - -Overview: ---------- - -The ``RootDict`` class is part of the ``automata.core.utils`` module and -is used to represent a dictionary-like data structure for the root -logger. This root logger dictionary is typically used for logger -configuration. - -Related Symbols: ----------------- - -1. ``automata.tests.unit.sample_modules.sample.EmptyClass`` - -2. ``automata.tests.unit.sample_modules.sample_module_write.CsSWU.__init__`` - - .. code:: python - - def __init__(self): - pass - -3. ``automata.core.utils.LoggingConfig`` - - .. code:: python - - class LoggingConfig(TypedDict, total=False): - """A dictionary representing the logging configuration""" - - version: int - disable_existing_loggers: bool - formatters: dict - handlers: dict[str, Union[HandlerDict, dict]] - root: RootDict - -4. ``automata.tests.unit.sample_modules.sample_module_write.CsSWU`` - - .. code:: python - - class CsSWU: - """hWrByOIFxNMacOLrgszg""" - - def __init__(self): - pass - -5. ``automata.core.utils.HandlerDict`` - - - A dictionary representing a logging handler - -6. ``automata.tests.unit.test_directory_manager.test_load_directory_structure`` - -7. ``automata.llm.foundation.LLMChatMessage.to_dict`` - -8. ``automata.tests.unit.sample_modules.sample.OuterClass`` - -9. ``automata.llm.providers.openai.OpenAIChatMessage.to_dict`` - -10. ``automata.tests.unit.test_task_environment.TestURL`` - -Example -------- - -While no direct usage of ``RootDict`` has been provided in the context, -we can still infer an example usage from the given context: - -.. code:: python - - from typing import Any - from automata.core.utils import RootDict - - # Initialize a root logger dictionary - logger_dict: RootDict = {"level": "INFO", "handlers": ["console"]} - - # Usage in a logging configuration - logging_config = { - "version": 1, - "disable_existing_loggers": False, - "handlers": { - "console": { - "class": "logging.StreamHandler", - "level": "INFO", - "formatter": "default" - } - }, - "root": logger_dict - } - -Limitations ------------ - -There are no notable limitations identified for this class from the -provided context. As this class essentially behaves like a dictionary, -the operations and limitations consistent with typical Python dictionary -objects will apply here. More specific limitations may be -context-dependent. - -Follow-up Questions: --------------------- - -- What are the mandatory and optional fields for ``RootDict``? -- How does one link or bind the root logger dictionary to the actual - logger? -- Is ``RootDict`` typically used in certain types of applications or in - specific scenarios? +class RootDict(TypedDict): ‘A dictionary representing the root logger’ +handlers: List[str] level: int diff --git a/docs/embedding/embedding.rst b/docs/embedding/embedding.rst index af968938a..770875675 100644 --- a/docs/embedding/embedding.rst +++ b/docs/embedding/embedding.rst @@ -1,75 +1,13 @@ -Embedding -========= +class Embedding(abc.ABC): ‘Abstract base class for different types of +embeddings’ -Overview --------- +:: -``Embedding`` is an abstract base class that lays the groundwork for -different embedding objects in the Automata codebase. This class manages -the embedding vector and provides methods to convert the instance to a -string, and vice versa. + def __init__(self, key: Any, document: str, vector: np.ndarray): + self.key = key + self.document = document + self.vector = vector -The ``Embedding`` class is typically used as a base class that specific -types of embeddings inherit from. An embedding takes an input object and -transforms it into a vector form that can be easily manipulated by -machine learning models. The class holds a key, an input object, and its -corresponding vector representation. - -Related Symbols ---------------- - -Primary Symbol -~~~~~~~~~~~~~~ - -- automata.embedding.base.Embedding - -Others -~~~~~~ - -- automata.core.base.database.vector.VectorDatabaseProvider -- automata.symbol.base.Symbol -- automata.symbol.symbol_utils.convert_to_fst_object -- automata.symbol_embedding.base.SymbolCodeEmbedding -- automata.symbol_embedding.base.SymbolDocEmbedding -- automata.memory_store.symbol_doc_embedding.SymbolDocEmbeddingHandler -- automata.tests.unit.test_symbol_embedding.test_update_embeddings -- automata.tests.unit.test_database_vector.test_load - -Example -------- - -As an abstract base class, ``Embedding`` is not directly instantiated in -most cases. Instead, it is extended by other classes, which implement -the specific type of embedding. Here is an example of a hypothetical -class ``ExampleEmbedding`` that extends ``Embedding``: - -.. code:: python - - class ExampleEmbedding(Embedding): - def __init__(self, key: Any, input_object: str, vector: np.ndarray): - super().__init__(key, input_object, vector) - - def __str__(self): - description = f'Example Embedding for the object {self.input_object} with key {self.key}' - return description - -Limitations ------------ - -As an abstract base class, ``Embedding`` doesn’t provide any -implementations. The ``__str__`` method is expected to be overridden in -child classes since it’s decorated with ``@abc.abstractmethod``. It’s -also assumed the embedding vector will be of the type numpy.ndarray, -though this isn’t enforced in the Embedding class itself. - -Follow-up Questions -------------------- - -- What are the requirements for the key and input_object during the - initialization of the Embedding object? -- What practical implementations are used in the Automata codebase, and - what are specific use-cases? -- What error handling is used if the vector object passed during - initialization is not a numpy.ndarray? -- Are there size or dimension requirements for the array, or can it be - of any shape? + @abc.abstractmethod + def __str__(self) -> str: + pass diff --git a/docs/embedding/embedding_builder.rst b/docs/embedding/embedding_builder.rst index 92cb2c91b..d88e809d0 100644 --- a/docs/embedding/embedding_builder.rst +++ b/docs/embedding/embedding_builder.rst @@ -1,67 +1,79 @@ EmbeddingBuilder ================ -``EmbeddingBuilder`` is an abstract class that defines interfaces for -building embeddings for symbols. It is typically extended by other -classes that provide specific implementations for building the -embeddings. +The ``EmbeddingBuilder`` class is an abstract base class used to create +embeddings. It contains abstract methods (to be implemented by +subclasses) that build the embeddings from source text and a provided +symbol. Two types of embeddings can be created - a single instance-based +embedding and batch-based embeddings. Overview -------- -``EmbeddingBuilder`` is an important part of the automata.embedding -module. It provides the foundation for building symbol embeddings in an -abstract way, allowing for different methods of building embeddings to -be developed and used interchangeably. The class contains abstract -methods that are intended to be implemented by child classes. +The ``EmbeddingBuilder`` takes an ``EmbeddingVectorProvider`` as an +input during its instantiation. This provider supplies the algorithms to +generate vector representations (embeddings) from source code text. + +The main functionalities of the ``EmbeddingBuilder`` are defined by two +main methods - ``build()`` and ``batch_build()``. These are abstract +methods, implying that their precise implementation should be provided +in subclasses of ``EmbeddingBuilder``. + +The ``build()`` method builds an embedding for a single symbol from +source text. The ``batch_build()`` generates embeddings for a batch of +symbols simultaneously. + +In addition, there’s a helper method ``fetch_embedding_source_code()``, +which transforms a given symbol into its respective source code. The +transformed code is used as a context during the embedding generation. Related Symbols --------------- -- ``automata.tests.unit.test_context_oracle_tool.context_oracle_tool_builder`` -- ``automata.symbol_embedding.builders.SymbolCodeEmbeddingBuilder`` -- ``automata.symbol_embedding.builders.SymbolDocEmbeddingBuilder`` -- ``automata.memory_store.symbol_doc_embedding.SymbolDocEmbeddingHandler.__init__`` -- ``automata.memory_store.symbol_code_embedding.SymbolCodeEmbeddingHandler.__init`` - -Example -------- +- ``automata.experimental.symbol_embedding.symbol_doc_embedding_builder.SymbolDocEmbeddingBuilder.build`` +- ``automata.symbol_embedding.symbol_embedding_builders.SymbolCodeEmbeddingBuilder.build`` +- ``automata.experimental.tools.builders.advanced_context_oracle_builder.AdvancedContextOracleToolkitBuilder.build`` +- ``automata.experimental.symbol_embedding.symbol_doc_embedding_builder.SymbolDocEmbeddingBuilder.build_non_class`` +- ``automata.experimental.memory_store.symbol_doc_embedding_handler.SymbolDocEmbeddingHandler._create_new_embedding`` +- ``automata.singletons.dependency_factory.DependencyFactory.create_embedding_similarity_calculator`` +- ``automata.memory_store.symbol_code_embedding_handler.SymbolCodeEmbeddingHandler._build_and_add_embeddings`` +- ``automata.experimental.tools.builders.document_oracle_builder.DocumentOracleToolkitBuilder.build`` +- ``automata.embedding.embedding_base.EmbeddingNormType`` +- ``automata.symbol_embedding.symbol_embedding_base.SymbolEmbedding.from_args`` -Here’s an example of how a class might implement ``EmbeddingBuilder`` -providing the actual implementation for the ``build`` method. +Usage Example +------------- .. code:: python - class ConcreteEmbeddingBuilder(EmbeddingBuilder): - def build(self, source_text: str, symbol: Symbol) -> Any: - # concrete implementaion of building the embedding. + # Concrete implementation of EmbeddingBuilder class + class MyEmbeddingBuilder(EmbeddingBuilder): + def build(self, source_text, symbol): + # Implementation of embedding generation for a single symbol + pass + def batch_build(self, source_text, symbol): + # Implementation of embedding generation for a batch of symbols pass -Please note that this is a mock example. Replace -‘ConcreteEmbeddingBuilder’ with the actual class that you want to use as -an ``EmbeddingBuilder``. + # Now MyEmbeddingBuilder can be used in our models + my_embedding_builder = MyEmbeddingBuilder(embedding_provider) Limitations ----------- -As an abstract base class, ``EmbeddingBuilder`` does not provide any -functionality itself, it merely outlines the methods that need to be -implemented by any concrete subclasses. It involves designing these -subclasses to actually build the embeddings, and the design of these -subclasses can significantly affect the performance and accuracy of -symbol recognition. - -Dependencies ------------- - -- ``automata.embedding.base.EmbeddingVectorProvider`` -- ``automata.symbol.base.Symbol`` -- ``automata.symbol.symbol_utils.convert_to_fst_object`` +Being an abstract base class, ``EmbeddingBuilder`` doesn’t provide any +concrete implementation of its methods, and merely provides an interface +to be followed by its subclasses. Therefore, it’s not usable on its own, +and requires a subclass to define the ``build`` and ``batch_build`` +methods. Follow-up Questions: -------------------- -- When creating subclasses of ``EmbeddingBuilder``, what are the common - pitfalls that one should be mindful of? -- What are the typical strategies to build a good embedding and how do - we evaluate the effectiveness of the strategies? +- What embedding techniques/algorithms (e.g., Word2Vec, GloVe, + FastText, etc.) are available with the ``EmbeddingVectorProvider``? +- How is the quality of the generated embedding ensured, and is it + possible to customize the embedding generation process according to + the needs of the specific task? Besides, how can one handle source + texts that may have varying language styles, especially in the + context of different programming languages? diff --git a/docs/embedding/embedding_handler.rst b/docs/embedding/embedding_handler.rst index 273f6603a..699429074 100644 --- a/docs/embedding/embedding_handler.rst +++ b/docs/embedding/embedding_handler.rst @@ -1,93 +1,87 @@ EmbeddingHandler ================ -``EmbeddingHandler`` is an abstract base class designed to handle -embeddings in the Automata library. It acts as an interface that -dictates the basic functions an embedding handler must implement. +``EmbeddingHandler`` is an abstract base class (abc) to handle batch +embeddings. This abstract class lays out the structure and expected +methods for any derived class that is to handle batch embeddings. Overview -------- -The ``EmbeddingHandler`` symbol provides a standardised interface for -embedding handling. It is designed to be used as a base class for other -specific implementations like ``SymbolEmbeddingHandler`` and -``SymbolDocEmbeddingHandler``. - -It handles the interaction with both the ``embedding_db`` -(VectorDatabaseProvider instance) and ``embedding_builder`` -(EmbeddingBuilder instance), requiring these as parameters during the -class initialisation. +``EmbeddingHandler`` provides a structured interface with four abstract +methods to be implemented by any derived class. These methods provide a +mechanism to get embeddings for a list of symbols, get all the +embeddings in a sorted order, process the embeddings for a list of +symbols, as well as perform any remaining updates following completion +of full batch processing. Related Symbols --------------- -- ``automata.core.base.database.vector.VectorDatabaseProvider`` -- ``automata.embedding.base.EmbeddingBuilder`` -- ``automata.symbol.base.Symbol`` -- ``automata.symbol_embedding.base.SymbolEmbeddingHandler`` -- ``automata.memory_store.symbol_doc_embedding.SymbolDocEmbeddingHandler`` -- ``automata.memory_store.symbol_code_embedding.SymbolCodeEmbeddingHandler`` - -Methods +- ``automata.cli.scripts.run_code_embedding.process_embeddings`` +- ``automata.experimental.tools.builders.advanced_context_oracle_builder.AdvancedContextOracleToolkitBuilder.__init__`` +- ``automata.experimental.tools.builders.document_oracle_builder.DocumentOracleToolkitBuilder.__init__`` +- ``automata.experimental.memory_store.symbol_doc_embedding_handler.SymbolDocEmbeddingHandler.process_embedding`` +- ``automata.core.utils.HandlerDict`` +- ``automata.cli.scripts.run_code_embedding.main`` +- ``automata.core.utils.LoggingConfig`` +- ``automata.cli.commands.run_doc_embedding`` +- ``automata.symbol_embedding.symbol_embedding_handler.SymbolEmbeddingHandler.process_embedding`` +- ``automata.config.config_base.EmbeddingDataCategory`` +- ``automata.embedding.embedding_base.EmbeddingHandler.flush`` +- ``automata.embedding.embedding_base.EmbeddingHandler.get_embeddings`` +- ``automata.embedding.embedding_base.EmbeddingHandler.get_all_ordered_embeddings`` +- ``automata.embedding.embedding_base.EmbeddingHandler.process_embedding`` + +Example ------- -``__init__`` -~~~~~~~~~~~~ - -This is the constructor method for the ``EmbeddingHandler`` and it is -responsible for initialising the instance with the provided -``embedding_db`` and ``embedding_builder``. As an abstract method, it -simply sets these two properties without any further processing. +As ``EmbeddingHandler`` is an abstract base class, it cannot be +instantiated directly and doesn’t provide any functionality on its own. +The following code sample is a mock example of how a subclass may look +like when ``EmbeddingHandler`` is extended: .. code:: python - def __init__(self, embedding_db: VectorDatabaseProvider, embedding_builder: EmbeddingBuilder) -> None: - self.embedding_db = embedding_db - self.embedding_builder = embedding_builder + from typing import Any, List + from automata.common.symbol import Symbol + from automata.embedding.embedding_base import EmbeddingHandler -``get_embedding`` -~~~~~~~~~~~~~~~~~ + class MyEmbeddingHandler(EmbeddingHandler): + def get_embeddings(self, symbols: List[Symbol]) -> List[Any]: + # Implement method to return embeddings for a list of symbols + pass -This abstract method is designed to return the embedding for a specific -symbol. The specific implementation will be dependent on the child -class. - -.. code:: python + def get_all_ordered_embeddings(self) -> List[Any]: + # Implement method to return all embeddings in a sorted order + pass - @abc.abstractmethod - def get_embedding(self, symbol: Symbol) -> Any: - pass - -``process_embedding`` -~~~~~~~~~~~~~~~~~~~~~ - -This abstract method is designed to process the embedding for a specific -symbol. The specific implementation will be dependent on the child -class. - -.. code:: python + def process_embedding(self, symbols: Symbol) -> None: + # Implement method to process embeddings for a list of symbols + pass - @abc.abstractmethod - def process_embedding(self, symbol: Symbol) -> None: - pass + def flush(self) -> None: + # Implement method to perform any remaining updates + pass Limitations ----------- -As ``EmbeddingHandler`` is an abstract class, it can’t be instantiated -directly. Instead, it must be subclassed, and at least ``get_embedding`` -and ``process_embedding`` methods must be implemented in the child -class. - -Follow-up Questions: --------------------- - -- How is the ``get_embedding`` method expected to behave? Does it - always access live data, cache results, or some combination of the - two? -- How is the ``process_embedding`` method expected to behave? What sort - of preprocessing might it do? -- Are there expected side-effects to either the ``get_embedding`` or - ``process_embedding`` methods? -- What is the expected type of the returned embeddings? -- How are symbols identified for embedding processing? +The primary limitations of the ``EmbeddingHandler`` are tied to specific +implementations in derived classes. Since ``EmbeddingHandler`` is an +abstract base class, it doesn’t pose limitations on its own but it gives +a layout to be followed, meaning that the limitations of its +implementations are up to the specific subclass. + +Follow-up Questions +------------------- + +- Are there more specific templates or guidelines for each of the + abstract methods to be implemented for better consistency across + different implementations? +- Could type hints be provided for the return types of the methods of + ``EmbeddingHandler`` to enhance usage clarity of the respective + methods? +- How are errors and exceptions handled across derived classes of + ``EmbeddingHandler`` considering it doesn’t define any error handling + procedures in its interface? diff --git a/docs/embedding/embedding_norm_type.rst b/docs/embedding/embedding_norm_type.rst index 78b3a0d22..bf03b9271 100644 --- a/docs/embedding/embedding_norm_type.rst +++ b/docs/embedding/embedding_norm_type.rst @@ -1,19 +1 @@ -As an AI model, I don’t have access to the real-time codebase or -updates. Based on general programming practice: - -1. Expanding the ``EmbeddingNormType`` enumeration with more - normalization options is generally possible. It would involve - defining additional constants within the enumeration to represent the - new normalization types. However, it also requires that the - corresponding normalization algorithms are implemented in the - embedding calculations where this enumeration is used. - -2. In order to use a user-defined norm type in the embedding calculation - process, the code must be modular enough to accept a custom function - that performs the normalization. If the code is not currently - designed in this way, it would need to be refactored to enable this - functionality. This could involve changes to the - ``EmbeddingSimilarityCalculator`` class and wherever the - ``EmbeddingNormType`` enumeration is used in the calculation logic. - Please check the documentation or contact a maintainer to understand - the current possibilities. +class EmbeddingNormType(Enum): L1 = ‘l1’ L2 = ‘l2’ SOFTMAX = ‘softmax’ diff --git a/docs/embedding/embedding_similarity_calculator.rst b/docs/embedding/embedding_similarity_calculator.rst index c03ed6974..9887826db 100644 --- a/docs/embedding/embedding_similarity_calculator.rst +++ b/docs/embedding/embedding_similarity_calculator.rst @@ -1,84 +1,84 @@ EmbeddingSimilarityCalculator ============================= -``EmbeddingSimilarityCalculator`` is a class in the -``automata.embedding.base`` module. It takes an instance of -``EmbeddingVectorProvider`` and calculates the similarity score between -a query text and symbol embeddings. +``EmbeddingSimilarityCalculator`` is a class that computes similarity +scores between embedding vectors. Specifically, it calculates the dot +product similarity between a query vector and a set of vectors +corresponding to symbols. Overview -------- -``EmbeddingSimilarityCalculator`` leverages embeddings representation to -quantify the similarity between code symbols and a given query text. It -uses the dot product of the query embedding and the symbol embeddings. -If required, the resulting similarity scores can be sorted in descending -order by default. +At its core, ``EmbeddingSimilarityCalculator`` provides an interface to +calculate similarity scores between a query and multiple embeddings. The +query is first converted into an embedding vector using an +``EmbeddingVectorProvider``, and then dot product similarity scores are +calculated between this query vector and a sequence of symbol +embeddings. The results can be sorted in descending order of similarity +scores. -Every instance of ``EmbeddingSimilarityCalculator`` is initialized with -an ``EmbeddingVectorProvider`` and a type of norm for vector -normalization (``EmbeddingNormType``). Initially, it sets these -parameters with the corresponding values. - -The main method in this class is ``calculate_query_similarity_dict``. -This method retrieves the embedding for a provided query text, -calculates the similarity scores with the existing symbol embeddings, -constructs a dictionary with these scores indexed by the symbols and -optionally sorts the dictionary. +The class also offers normalization methods to normalize the embeddings +according to specified norm types: L1, L2, and Softmax. Related Symbols --------------- -- ``automata.embedding.base.EmbeddingVectorProvider`` -- ``automata.embedding.base.EmbeddingNormType`` -- ``automata.embedding.base.Embedding`` -- ``automata.core.base.database.vector.VectorDatabaseProvider`` -- ``automata.symbol.base.Symbol`` - -Example: --------- +- ``EmbeddingVectorProvider`` +- ``Symbol`` +- ``EmbeddingNormType`` +- ``Embedding`` -In this example, ``EmbeddingSimilarityCalculator`` is utilized to find -the symbol most similar to a given query text: +Usage Example +------------- .. code:: python - from automata.embedding.base import EmbeddingSimilarityCalculator, EmbeddingVectorProvider - from automata.symbol.base import Symbol - from numpy import array + from automata.embedding.embedding_base import EmbeddingSimilarityCalculator, EmbeddingNormType + from automata.embedding.embedding_vector_provider import EmbeddingVectorProvider + from automata.symbol import Symbol + from automata.embedding.embedding import Embedding + + # Assuming an instance of EmbeddingVectorProvider + embedding_provider = EmbeddingVectorProvider(model_name='bert-base-uncased', do_lower_case=True) - # Create an instance of the class - mock_provider = EmbeddingVectorProvider() - embedding_sim_calc = EmbeddingSimilarityCalculator(mock_provider) + # Initialize EmbeddingSimilarityCalculator + similarity_calculator = EmbeddingSimilarityCalculator(embedding_provider, EmbeddingNormType.L2) - # Define query_text, and embeddings - query_text = "symbol1" - ordered_embeddings = [Embedding(Symbol('symbol1'), 'symbol1', array([1,0,0,0])), - Embedding(Symbol('symbol2'), 'symbol2', array([0,1,0,0])), - Embedding(Symbol('symbol3'), 'symbol3', array([0,0,1,0]))] + # Assume some embeddings + ordered_embeddings = [ + Embedding(vector=np.array([1, 0, 0]), key=Symbol(name='Sym1')), + Embedding(vector=np.array([0, 1, 0]), key=Symbol(name='Sym2')), + Embedding(vector=np.array([0, 0, 1]), key=Symbol(name='Sym3')), + ] - # Use the calculate_query_similarity_dict method - result = embedding_sim_calc.calculate_query_similarity_dict(ordered_embeddings, query_text) + # Query text + query_text = 'house' - print(result) + # Calculate query similarity dictionary + similarity_dict = similarity_calculator.calculate_query_similarity_dict(ordered_embeddings, query_text, return_sorted=True) + print(similarity_dict) -**Note:** In real scenario ``EmbeddingVectorProvider`` would be an -instance of class that provides actual embeddings like -``OpenAIEmbedding``. +Please note that in practice, embeddings are typically high-dimensional +and are computed from trained language models. This example is greatly +simplified for demonstration purposes. Limitations ----------- -The accuracy of ``EmbeddingSimilarityCalculator`` heavily depends on the -quality of the embeddings produced by ``EmbeddingVectorProvider``. Poor -embeddings can result in inaccurate similarity scores. Additionally, it -does not inherently handle cases where symbols might have the same -embedding values. +The key limitation of ``EmbeddingSimilarityCalculator`` is that it +relies on an ``EmbeddingVectorProvider`` to convert the query into an +embedding vector. Therefore, the effectiveness of +``EmbeddingSimilarityCalculator`` is contingent upon the quality of the +underlying language model used in ``EmbeddingVectorProvider``. Another +limitation is the presence of only three types of normalization methods. +Depending on the use case, users might need to employ other +normalization techniques. Follow-up Questions: -------------------- -- If two symbols end up having the same embedding, how does the - ``EmbeddingSimilarityCalculator`` differentiate between them? -- How are the results affected if a different norm type - (``EmbeddingNormType``) is used? +- Is it possible to include custom embedding providers? +- Can we extend the class to support more types of normalization + techniques? +- What specific similarity measures (beyond dot product) could be + implemented to provide better results in certain contexts? diff --git a/docs/embedding/embedding_vector_provider.rst b/docs/embedding/embedding_vector_provider.rst index a67842d23..42a8576b8 100644 --- a/docs/embedding/embedding_vector_provider.rst +++ b/docs/embedding/embedding_vector_provider.rst @@ -1,73 +1,14 @@ -EmbeddingVectorProvider -======================= +class EmbeddingVectorProvider(abc.ABC): ‘A class to provide embeddings +for symbols’ -``EmbeddingVectorProvider`` is an abstract base class that provides a -way to create embedding vectors for specified symbols in the automata -library. This vector provider returns vector embeddings in numpy array -format, which get utilized in both the OpenAI API and the internal -automata embedding layer. +:: -Overview --------- + @abc.abstractmethod + def build_embedding_vector(self, document: str) -> np.ndarray: + 'An abstract method to build the embedding vector for a document.' + pass -As an abstract base class, ``EmbeddingVectorProvider`` doesn’t provide a -specific implementation. Instead, it defines a standardized interface -for all types of embedding vector providers. These providers process -symbols to convert them into embedding vectors. The class mainly defines -one method, ``build_embedding_vector``, which needs to be implemented by -any subclasses. - -Key symbols in relation to ``EmbeddingVectorProvider`` include -``EmbeddingBuilder``, ``OpenAIEmbeddingProvider``, -``JSONSymbolEmbeddingVectorDatabase``, ``SymbolCodeEmbedding``, and -associated unit testing files. - -Related Symbols ---------------- - -- ``automata.embedding.base.EmbeddingBuilder`` -- ``automata.llm.providers.openai.OpenAIEmbeddingProvider`` -- ``automata.symbol_embedding.base.JSONSymbolEmbeddingVectorDatabase`` -- ``automata.symbol_embedding.base.SymbolCodeEmbedding`` -- ``automata.tests.unit.test_symbol_embedding`` - -Example -------- - -``EmbeddingVectorProvider`` is an abstract base class and is thus not -directly usable. However, library classes that make use of -``EmbeddingVectorProvider`` (for example, the ``EmbeddingBuilder`` or -``OpenAIEmbeddingProvider``), provide more concrete examples of usage. -Here is an example involving the ``OpenAIEmbeddingProvider``: - -.. code:: python - - from automata.llm.providers.openai import OpenAIEmbeddingProvider - - embed_provider = OpenAIEmbeddingProvider() - - symbol_source = "Text from which to generate the embedding" - embedding_vector = embed_provider.build_embedding_vector(symbol_source) - -This example requires proper configuration of the OpenAI API and -importing the required objects. - -Limitations ------------ - -The primary limitations of ``EmbeddingVectorProvider`` stem from it -being an abstract base class. It does not provide a practical -implementation by itself. Also, the extent to which it can generate -effective embeddings heavily depends on the algorithms and libraries -used in the implementation of its subclasses. - -Follow-up Questions: --------------------- - -- In testing cases where ``EmbeddingVectorProvider`` is used, it seems - that mock examples are being used. Are there certain assumptions or - configurations that should be considered when designing tests for it, - considering that it’s a mock object? -- Are there specific providers that are known to perform better or - worse with certain types of symbols or classes? If so, are there ways - to optimize these situations? + @abc.abstractmethod + def batch_build_embedding_vector(self, documents: List[str]) -> List[np.ndarray]: + 'An abstract method to build the embedding vector for a list of documents.' + pass diff --git a/docs/embedding/index.rst b/docs/embedding/index.rst index da17a25ed..d1dcf6d41 100644 --- a/docs/embedding/index.rst +++ b/docs/embedding/index.rst @@ -22,6 +22,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/eval/action.rst b/docs/eval/action.rst new file mode 100644 index 000000000..c4ef9c083 --- /dev/null +++ b/docs/eval/action.rst @@ -0,0 +1,19 @@ +class Action(ABC): ‘An arbitrary action to be taken by an LLM, like an +OpenAI function call’ + +:: + + @abstractmethod + def __init__(self) -> None: + pass + + @abstractmethod + def to_payload(self) -> Payload: + 'Converts the Action to a dictionary.' + pass + + @staticmethod + @abstractmethod + def from_payload(dct: Payload) -> 'Action': + 'Creates an Action from a dictionary.' + pass diff --git a/docs/eval/agent/agent_eval.rst b/docs/eval/agent/agent_eval.rst new file mode 100644 index 000000000..62c6b9b97 --- /dev/null +++ b/docs/eval/agent/agent_eval.rst @@ -0,0 +1,98 @@ +AgentEval +========= + +Overview +-------- + +``AgentEval`` is an abstract class designed for evaluating the +performance of Language Learning Models (LLMs) in the Automata library. +It operates by generating evaluation results for a specified set of +instructions and expected actions. “Evaluation” here includes processing +the results of a session, and comparing these results against an +expected sequence of actions to evaluate how closely the model’s actions +followed the expected sequence. Inheritances of this class should +implement the ``generate_eval_result`` and ``process_result`` methods. + +Interface Methods +----------------- + +generate_eval_result(self, exec_input: AutomataTask, expected_output: List[Action], executor: AutomataTaskExecutor, \*args, \**kwargs) -> EvalResult +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This method is used to generate an evaluation result for a given set of +instructions (``exec_input``) and expected actions +(``expected_output``). The ``executor`` parameter is an instance of +``AutomataTaskExecutor`` that is used to execute the task. + +process_result(self, expected_actions: List[Action], process_input: Sequence[LLMChatMessage], \*args, \**kwargs) -> EvalResult +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This method processes the result of an evaluation. It takes in an +expected list of actions and a sequence of ``LLMChatMessage`` instances +to process the evaluation. + +Related Symbols +--------------- + +- ``automata.eval.agent.agent_eval.AgentEvalResult``: A specialized + ``EvalResult`` created and returned by the execution of + ``AgentEval``. It provides granular results of an evaluation + including match results, extra actions, session id, and other + attributes. + +- ``automata.eval.agent.agent_eval_composite.AgentEvalComposite``: + Combines multiple ``AgentEval`` evaluators into a composite evaluator + to allow evaluation with multiple criteria. + +- ``automata.eval.eval_base.EvalResult``: An abstract class that + represents the result of an evaluation. ``AgentEvalResult`` is a + concrete implementation of this class. + +- ``automata.eval.agent.agent_eval_harness.AgentEvaluationHarness.evaluate``: + Returns the evaluation metrics for a list of tasks given their + expected actions. + +- ``automata.core.run_handlers.run_with_eval``: A function to perform a + task run with the provided parameters and evaluates the results. + +Usage Example +------------- + +.. code:: python + + from automata.eval.agent.agent_eval import AgentEval + from automata.eval.agent.agent_eval_result import AgentEvalResult + from automata.tasks.task_executor import AutomataTaskExecutor + from typing import List + from automata.common.types import Action, AutomataTask + + class MyAgentEval(AgentEval): + + def generate_eval_result(self, exec_input: AutomataTask, expected_output: List[Action], executor: AutomataTaskExecutor) -> AgentEvalResult: + # you need to implement this method based on how you want to evaluate. + pass + + def process_result(self, expected_actions: List[Action], process_input: Sequence[LLMChatMessage]) -> EvalResult: + # you need to implement this method based on how you want to process the evaluation. + pass + + # Create an instance of MyAgentEval + my_agent_eval = MyAgentEval() + +Limitations +----------- + +The primary limitations associated with ``AgentEval`` are the need for +each inheritor to implement its own versions of ``generate_eval_result`` +and ``process_result`` methods. This requires a clear understanding of +the specific evaluation process required for each unique learning model. +This evaluation process must also be implementable in a manner +compatible with ``AgentEval``\ ’s methods. + +Follow-up Questions: +-------------------- + +- How can we generalize ``AgentEval`` evaluation methods to be + applicable to a wider range of learning models? +- Can we simplify ``AgentEval`` interfaces while maintaining their + function for evaluations? diff --git a/docs/eval/agent/agent_eval_composite.rst b/docs/eval/agent/agent_eval_composite.rst new file mode 100644 index 000000000..ad41da558 --- /dev/null +++ b/docs/eval/agent/agent_eval_composite.rst @@ -0,0 +1,89 @@ +AgentEvalComposite +================== + +``AgentEvalComposite`` is a utility class that facilitates multiple +evaluations in a composite manner. It is used when you need to perform a +set of evaluations and aggregate their results. This class is essential +when dealing with complex evaluation setups where the final evaluation +result depends on the results of several different evaluators. + +Overview +-------- + +``AgentEvalComposite`` is a subclass of ``Eval``, intended to use +several ``AgentEval`` instances for a multi-faceted evaluation approach. +At initialization, the uniqueness of the evaluators is checked and a +list of agent evaluators is compiled. The class provides functionality +to generate evaluation results, extract actions and filter actions +(although the last function is not implemented). This composite class is +primarily designed to be a flexible manager object that operates with +multiple evaluators to provide a comprehensive evaluation. + +Related Symbols +--------------- + +- ``automata.eval.agent.agent_eval_composite.check_eval_uniqueness`` +- ``automata.cli.scripts.run_tool_eval.run_eval_harness`` +- ``automata.eval.agent.openai_function_eval.OpenAIFunctionEval.__repr__`` +- ``automata.tools.tool_executor.ToolExecution`` +- ``automata.cli.scripts.run_agent_eval.run_eval_harness`` +- ``automata.cli.scripts.run_agent_eval.main`` +- ``automata.llm.llm_base.LLMChatCompletionProvider`` + +Example +------- + +Below is an example demonstrating how to use the ``AgentEvalComposite`` +class. In this example, two dummy evaluators (``CustomAgentEval1`` and +``CustomAgentEval2``, which are hypothetical subclasses of +``AgentEval``) are combined using the ``AgentEvalComposite``. Please +replace the ``CustomAgentEval`` classes with actual ``AgentEval`` +subclasses according to your application. + +Note: This is a simplified example and does not cover all the possible +uses and features of ``AgentEvalComposite``. + +.. code:: python + + from automata.eval.agent.agent_eval_composite import AgentEvalComposite + from automata.eval.agent.agent_eval import AgentEval + + class CustomAgentEval1(AgentEval): + pass + + class CustomAgentEval2(AgentEval): + pass + + evaluator1 = CustomAgentEval1() + evaluator2 = CustomAgentEval2() + composite_evaluator = AgentEvalComposite([evaluator1, evaluator2]) + + # Additional implementation of the evaluators and the composite evaluator is required to demonstrate the complete operation. + +Limitations +----------- + +Though providing the flexibility needed to combine multiple evaluators, +``AgentEvalComposite`` does not implement action filtering +(``_filter_actions``). This could limit its capacity in scenarios where +filtering actions based on certain conditions is needed after extracting +action from the given message. Implementing this in a subclass might be +necessary based on the use case. + +Another limitation comes into play when the evaluators return a type +that is not an ``AgentEvalResult``. Since the composite evaluator +strictly checks the type to be ``AgentEvalResult``, it throws a +``ValueError`` in case the type returned is incorrect. Hence, +subclassing ``AgentEval`` demands discipline ensuring that the output +type is always as expected. + +Follow-up Questions: +-------------------- + +- Is there a way to make ’\_filter_actions’ method in + ``AgentEvalComposite`` more flexible or adaptable to the specific + cases where action filtering is needed? +- How could type checking be made more robust, or handled in a more + pythonic way, rather than checking after results are computed? +- In which cases are composite evaluations particularly beneficial, and + could example cases be provided in the documentation? diff --git a/docs/eval/agent/agent_eval_result.rst b/docs/eval/agent/agent_eval_result.rst new file mode 100644 index 000000000..3879cff35 --- /dev/null +++ b/docs/eval/agent/agent_eval_result.rst @@ -0,0 +1,104 @@ +AgentEvalResult +=============== + +``AgentEvalResult`` is a Python class serving as a designated container +for storing the outcome from an evaluation of an agent. It’s a concrete +class that inherits from the ``EvalResult``, and is designed +specifically to accommodate results following agent evaluations. +Fundamentally, it holds the match results, extra actions taken by the +agent, and the associated session ID. + +Overview +-------- + +The ``AgentEvalResult`` takes a dictionary of match results, a list of +extra actions, and an optional session id in its constructor. The match +results represent outcomes of each action taken by the agent as either +True or False (True being a successful match and False being a missed +match), while the extra actions contain any additional actions performed +by the agent that were not specified in the original instruction. The +session id is a unique identifier for the agent’s session. + +The class also provides properties ``is_full_match`` and +``is_partial_match``, which are utilities to quickly determine if the +result is a full or partial match. A full match is when the agent +performs all of the expected actions according to the given instruction, +and a partial match is at least one of the expected actions was +performed. + +The class provides methods to create a ``payload``, i.e., a dictionary +of the result, and to create an ``AgentEvalResult`` from a payload, +enabling serialization and deserialization of the objects. + +Related Symbols +--------------- + +- ``agent.agent_eval.AgentEvalResult.__repr__``: Represents the class + object as a string. +- ``eval_base.Action.to_payload``: Converts the Action to a dictionary. +- ``eval_base.parse_action_from_payload``: Parses out the corresponding + action from a raw dictionary. +- ``eval.agent.agent_eval.AgentEvalResult.is_full_match``: Checks if + the result is a full match. +- ``eval.agent.agent_eval.AgentEvalResult.is_partial_match``: Checks if + the result is a partial match. +- ``eval.agent.agent_eval.AgentEvalResult.from_payload``: Creates an + evaluation result from a dictionary (or other serialized format). +- ``eval.eval_base.EvalResult.__init__``: Initializes the EvalResult + class. + +Usage Example +------------- + +.. code:: python + + from automata.eval.agent.agent_eval import AgentEvalResult + from automata.eval.eval_base import Action + + # Define action and match results + actions = [{"type":"read","payload":{"text":"Read the document."},"time_to_live":5}] + match_results = {Action.from_payload(action): True for action in actions} + extra_actions = [] + + # Define the agent evaluation result + session_id = "123456" + agent_result = AgentEvalResult(match_results, extra_actions, session_id) + + # Use the agent evaluation result + is_full_match = agent_result.is_full_match + +In this example, we first create a dictionary of match results with one +action “Read” which has successfully matched (True). We also specify no +extra actions (``extra_actions = []``). We then initialize an +``AgentEvalResult`` instance with the match results, empty +``extra_actions``, and a ``session_id``. ``is_full_match`` is a boolean +value indicating whether all actions were successfully matched. + +Limitations +----------- + +One of the limitations of the ``AgentEvalResult`` is that it assumes the +results to be in the form of a dictionary where actions are keys and the +match results are values (boolean). As such, if match results come in a +different format, conversion to the expected format is necessary before +initializing ``AgentEvalResult``. + +Furthermore, the class is tightly coupled with the ``Action`` class, as +it expects actions to be instances of the ``Action`` class or its +subclass, which may limit the flexibility of using this class with +different action representations. + +In the ``from_payload`` method, it raises a ``ValueError`` when the +payload contains invalid match results or session_id. This means +``AgentEvalResult`` assumes certain data hygiene of the inputs which +needs to be ensured by the calling class/function. + +Follow-up Questions: +-------------------- + +- How does the ``is_full_match`` property handle invalid or incomplete + match results? +- How are ``extra_actions`` utilized in the agent’s operations, and how + does including them in AgentEvalResult aid in result analysis? +- Could the handling of invalid match results or session IDs within + ``from_payload`` method be better managed? diff --git a/docs/eval/agent/agent_eval_result_database.rst b/docs/eval/agent/agent_eval_result_database.rst index 7986e2ecd..1ed173ed3 100644 --- a/docs/eval/agent/agent_eval_result_database.rst +++ b/docs/eval/agent/agent_eval_result_database.rst @@ -1,28 +1,81 @@ -1. SQLite databases are typically used in this context because they are - lightweight, requiring minimal setup and configuration, and are - embedded within the application, eliminating the need for a separate - server process. This makes SQLite databases ideal for local storage - and testing environments. However, for a production environment or - for larger scale applications, it might be necessary to support more - robust databases like PostgreSQL or MySQL. The decision would largely - depend on the specific requirements of the project. - -2. Making the table schema dynamic could indeed provide greater - flexibility. However, this also introduces greater complexity and - potential for inconsistencies, especially if different instances or - versions of the application attempt to write different schemas to the - same database. Additionally, changing the schema after data has - already been written could result in data loss or corruption. - Therefore, it’s important to carefully consider the specific needs - and tradeoffs of the project before deciding to implement a dynamic - schema. - -3. Persisting the database file path in memory could make using the - ``AgentEvalResultDatabase`` class more convenient in some situations. - However, this would also make the class stateful, which could lead to - unexpected behavior in certain scenarios. For example, if the - application were to crash and restart, or if multiple instances of - the class were being used concurrently, the stored path might not be - what the user expects. As with the above points, whether to store the - path in memory depends on the specific needs and tradeoffs of the - project. +AgentEvalResultDatabase +======================= + +``AgentEvalResultDatabase`` is a subclass of ``SQLDatabase`` that is +specifically designed to write agent evaluation results to a SQLite +database. It serves as a reliable storage system for recording and +retrieving evaluation results from different sessions and runs. + +Attributes of the class include a table name, entry name, and a table +schema which constitutes session ID and run ID. These properties are +used to structure the SQLite database that the class interacts with. + +Overview +-------- + +The ``AgentEvalResultDatabase`` class has two key methods: +``write_result`` and ``get_results``. + +The ``write_result`` method takes an ``AgentEvalResult`` object as input +and writes it to the database. During this process, it checks if a +session ID has been set for the evaluation result, raising a +``ValueError`` if none exist. + +The ``get_results`` method provides a way to retrieve evaluation results +from the database, accepting either a ``session_id`` or ``run_id`` as +parameters. Without these parameters, the method raises a +``ValueError``. If successful, the method returns a list of +``AgentEvalResult`` objects. + +Related Symbols +--------------- + +- ``automata.eval.tool.tool_eval_metrics.ToolEvaluationMetrics.total_evaluations`` +- ``automata.experimental.tools.builders.agentified_search_builder.AgentifiedSearchToolkitBuilder._agent_selected_best_match`` +- ``automata.eval.tool.search_eval.SymbolSearchEval.to_tool_result`` +- ``automata.eval.agent.agent_eval_metrics.AgentEvaluationMetrics.__str__`` + +Example +------- + +Here is a simple example explaining how to use +``AgentEvalResultDatabase``: + +.. code:: python + + from automata.eval.agent.agent_eval_database import AgentEvalResultDatabase + from automata.eval.agent.agent_eval_result import AgentEvalResult + + # Initialization + db = AgentEvalResultDatabase(db_path="/path/to/database") + + # Creating an AgentEvalResult object + eval_result = AgentEvalResult(session_id="123", run_id="456", total_evaluations=10) + + # Writing the result to the database + db.write_result(eval_result) + + # Getting the results from the database by session_id + results = db.get_results(session_id="123") + +Please replace “/path/to/database” with the actual path where you want +to store your SQLite database file. + +Limitations +----------- + +The ``AgentEvalResultDatabase`` class only accepts ``AgentEvalResult`` +objects. Therefore, it cannot directly handle other types of evaluation +results unless they are transformed into ``AgentEvalResult`` instances. +Furthermore, this class does not support concurrent database access +which may result in a locked database error. + +Follow-up Questions: +-------------------- + +- Could there be a form of support for handling other kinds of + evaluation results directly? +- How might concurrent database access be supported by + ``AgentEvalResultDatabase`` to prevent database lock issues? +- What happens if a non-existent ``session_id`` or ``run_id`` is used + in ``get_results`` function? diff --git a/docs/eval/agent/agent_eval_set_loader.rst b/docs/eval/agent/agent_eval_set_loader.rst new file mode 100644 index 000000000..135d9f42e --- /dev/null +++ b/docs/eval/agent/agent_eval_set_loader.rst @@ -0,0 +1,82 @@ +AgentEvalSetLoader +================== + +Overview +-------- + +The ``AgentEvalSetLoader`` provides a mechanism for loading a set of +tasks from a specified JSON file. These tasks are used for agent +evaluation in test scenarios, with each task representing a particular +scenario that the agent must execute. + +A task consists of an instruction string and an expected actions list, +the format of which are validated during the loading process. If there’s +any inconsistency in the format, ``ValueError`` exceptions are raised. +The class only supports JSON files and will raise ``ValueError`` if a +non-JSON file is passed to it. + +The payload of tasks and their expected actions are loaded from the JSON +file in its constructor, and they can be accessed through the ``tasks`` +and ``tasks_expected_actions`` instance properties respectively. + +Related Symbols +--------------- + +- ``automata.eval.tool.tool_eval_harness.ToolEvalSetLoader.load_json`` +- ``automata.eval.tool.tool_eval_harness.ToolEvalSetLoader.__init__`` +- ``automata.cli.cli_utils.initialize_py_module_loader`` +- ``automata.cli.commands.run_tool_eval`` +- ``automata.llm.providers.openai_llm.OpenAIEmbeddingProvider.__init__`` +- ``automata.experimental.scripts.run_update_tool_eval.process_payload`` +- ``automata.experimental.scripts.run_update_tool_eval.main`` +- ``automata.eval.tool.tool_eval.ToolEval._filter_actions`` +- ``automata.experimental.tools.builders.symbol_search_builder.SearchTool`` + +Example +------- + +The following is an example of creating an instance of +``AgentEvalSetLoader``, it assumes that there is a valid JSON file with +the necessary tasks data. + +.. code:: python + + from automata.eval.agent.agent_eval_harness import AgentEvalSetLoader + + file_path = "filepath_to_json" + agent_eval_set_loader = AgentEvalSetLoader(filepath=file_path) + +Upon instantiating the ``AgentEvalSetLoader``, the provided JSON file is +read and validated, the tasks and associated expected actions are loaded +and stored in ``tasks`` and ``tasks_expected_actions`` properties +respectively and can be used as needed. + +Limitations +----------- + +The ``AgentEvalSetLoader`` currently only supports task data in JSON +format and the tasks data should adhere to a particular structure - a +‘instructions’ field that contains a string and an ‘expected_actions’ +field that contains a list of dictionaries. + +It also doesn’t support variants of JSON like JSONL (or +newline-delimited JSON) or any other data formats like XML or CSV. These +limitations may restrict the applicability of the module for use cases +that store task data in formats other than standard JSON, or structured +differently. + +Follow-up Questions: +-------------------- + +- Is there a plan for supporting other data formats like XML or CSV, or + variants of JSON like JSONL? +- Could we consider extending the functionality to allow for some + flexibility in the structure of the tasks data? + +The class contains an embedded function (``format_values``) only used +within the ``load_json()`` function and it’s not directly accessible +outside the class instance. This design choice may or may not be best, a +follow-up question could be: + +- What was the rationale behind not making the ``format_values`` + function a separate method in the ``AgentEvalSetLoader`` class? diff --git a/docs/eval/agent/agent_evaluation_harness.rst b/docs/eval/agent/agent_evaluation_harness.rst new file mode 100644 index 000000000..7137fca26 --- /dev/null +++ b/docs/eval/agent/agent_evaluation_harness.rst @@ -0,0 +1,96 @@ +AgentEvaluationHarness +====================== + +Overview +-------- + +``AgentEvaluationHarness`` is a class that provides functionalities for +performing evaluation of a list of instructions against a set of +expected actions. It does so by comparing the commenced actions of an +agent to an expected result set. The core function ``evaluate`` takes +tasks, their expected actions, and an executor and provides an +aggregation of AgentEvaluationMetrics as output. + +The class is initialized with a list of ``AgentEval`` objects and a +``AgentEvalResultDatabase`` object, which is used for writing the +results into a data store. The evaluation is done for each task and its +corresponding set of instructions by processing the task through an +evaluator. The results are then aggregated (if specified) and written to +the database. + +Related Symbols +--------------- + +- ``automata.eval.agent.agent_eval.AgentEval`` +- ``automata.eval.agent.agent_eval_result_database.AgentEvalResultDatabase`` +- ``automata.eval.agent.agent_evaluation_metrics.AgentEvaluationMetrics`` +- ``automata.tasks.task_executor.IAutomataTaskExecution.execute`` + +Usage Example +------------- + +.. code:: python + + from automata.eval.agent.agent_eval_harness import AgentEvaluationHarness + from automata.eval.agent.agent_eval import SomeCustomAgentEval + from automata.eval.agent.agent_eval_result_database import SomeCustomAgentEvalResultDatabase + from automata.tasks.task_executor import SomeCustomAutomataTaskExecutor + from dataclasses import dataclass + from typing import List + + @dataclass + class AutomataTask: + # Custom task definition + task_detail: str # Simplified for example + + @dataclass + class Action: + # Custom action definition + action_detail: str # Simplified for example + + evals: List[SomeCustomAgentEval] = [eval1, eval2] + database = SomeCustomAgentEvalResultDatabase() + harness = AgentEvaluationHarness(evals, database) + + tasks: List[AutomataTask] = [task1, task2] + tasks_expected_actions: List[List[Action]] = [[action1, action2], [action3, action4]] + executor = SomeCustomAutomataTaskExecutor() + + metrics = harness.evaluate(tasks, tasks_expected_actions, executor, aggregate=True) + +In this simplified example, custom agent evaluation, agent result +database, automata task executor, task, and action classes are assumed. +These should be replaced with actual implementations according to the +use case. + +Limitations +----------- + +The ``AgentEvaluationHarness`` assumes that the evaluator defined in the +``AgentEval`` objects returns an ``AgentEvalResult`` type of result. In +case it doesn’t, it will raise a ValueError exception, limiting its +usability with erroneous evaluators. + +Due to its dependency on the ``AgentEval`` and +``AgentEvalResultDatabase`` classes, implementing custom evaluation or +database storage methods would require defining new classes that adhere +to these two interfaces. The code encapsulation provided by this class +makes extensive customizations slightly more tedious due to the need to +maintain consistent interfaces. + +The execution is stopped if there is an exception occurring during the +evaluation of a task. While it ensures the integrity of the test run, it +also entails that no further tests will be conducted beyond an erring +one. + +Follow-up Questions: +-------------------- + +- For larger sets of tests, would it be beneficial to implement a + recovery or skip mechanism for faulty tasks to enable the completion + of the entire test suite? +- Could there be opportunities to allow more flexible evaluators that + do not strictly have to return ``AgentEvalResult`` objects? Could + this be accommodated with wrapper or adaptor patterns? +- What amendments would be needed to handle asynchronous task + executions to potentially increase throughput? diff --git a/docs/eval/agent/agent_evaluation_metrics.rst b/docs/eval/agent/agent_evaluation_metrics.rst new file mode 100644 index 000000000..79fd01af2 --- /dev/null +++ b/docs/eval/agent/agent_evaluation_metrics.rst @@ -0,0 +1,68 @@ +AgentEvaluationMetrics +====================== + +``AgentEvaluationMetrics`` is a class designed to compute and store +various metrics derived from a list of ``AgentEvalResult`` objects. +These results are the output of evaluating the performance of an agent. +The metrics calculated include the total number of actions, successful +actions, full matches, partial matches, extra actions, as well as the +frequency of extra, successful, and failed actions. Moreover, this class +provides a method for calculating success rates for action, full match, +and partial match. + +Overview +-------- + +``AgentEvaluationMetrics`` provides a way to assess and quantify the +agent’s performance during its operation. The measures include plain +counts (e.g., total number of actions, successful actions) and more +complex metrics (e.g., success rates for different types of matches and +actions). Properties and methods of ``AgentEvaluationMetrics`` lazily +compute these values when accessed and then cache it for future access. + +Related Symbols +--------------- + +- ``automata.eval.agent.agent_eval_result.AgentEvalResult`` +- ``python.collections.Counter`` + +Example +------- + +The following shows an example of how to use ``AgentEvaluationMetrics`` +to compute metrics from a list of ``AgentEvalResult`` instances. + +.. code:: python + + from automata.eval.agent.agent_eval_metrics import AgentEvaluationMetrics + from automata.eval.agent.agent_eval_result import AgentEvalResult + # Assume we have a list of AgentEvalResult instances as results + metrics = AgentEvaluationMetrics(results) + + # We can now access various metrics + print(f"Total actions: {metrics.total_actions}") + print(f"Total successful actions: {metrics.total_successful_actions}") + print(f"Total full matches: {metrics.total_full_matches}") + print(f"Total partial matches: {metrics.total_partial_matches}") + print(f"Action success rate: {metrics.action_success_rate}") + # etc. + +Limitations +----------- + +``AgentEvaluationMetrics`` does not detect changes in the underlying +``AgentEvalResult`` list, i.e., once a metric is accessed and computed, +adding more ``AgentEvalResults`` to the list won’t change the computed +metrics. In addition, this class assumes that the results passed during +the instance creation are comprehensive and final. If the evaluation +results are updated or change dynamically, a new instance of +``AgentEvaluationMetrics`` needs to be created. + +Follow-up Questions: +-------------------- + +- Is there a way to make ``AgentEvaluationMetrics`` more dynamic, i.e., + enabling it to handle updates or changes in the ``AgentEvalResult`` + list? +- How can we make the information retrieval (property access) less + verbose, considering the many metrics it can provide? diff --git a/docs/eval/agent/code_execution_error.rst b/docs/eval/agent/code_execution_error.rst index 577b7f2a3..0d84b0366 100644 --- a/docs/eval/agent/code_execution_error.rst +++ b/docs/eval/agent/code_execution_error.rst @@ -1,20 +1,2 @@ -- To provide more description of the error when raising a - ``CodeExecutionError``, you could customize the message given when - the error is raised. For example, if a ``NameError`` occurs during - execution, you might raise a ``CodeExecutionError`` with a message - that further details this case. E.g. - ``raise CodeExecutionError("Code execution failed due to an undefined variable.") from e``. - Additionally, you could create subclasses of ``CodeExecutionError`` - that cover specific error cases (e.g., ``UndefinedVariableError``, - ``SyntaxError``), each with their own custom messages. - -- The specific list of errors that should raise a - ``CodeExecutionError`` would depend on the context and application. - Generally, any errors that occur specifically during code execution, - and can not be classified under a more specific error, could - potentially raise a ``CodeExecutionError``. This could include, but - is not limited to, ``SyntaxError``, ``TypeError``, ``ValueError``, - ``NameError``, and ``AttributeError``. It’s important to note that - the exception handling logic should be specific enough to handle - different types of errors appropriately and not unnecessarily broadly - use ``CodeExecutionError``. +class CodeExecutionError(AutomataError): “Exception raised when there’s +an error executing the code.” pass diff --git a/docs/eval/agent/code_writing_action.rst b/docs/eval/agent/code_writing_action.rst new file mode 100644 index 000000000..872fc6390 --- /dev/null +++ b/docs/eval/agent/code_writing_action.rst @@ -0,0 +1,63 @@ +CodeWritingAction +================= + +Overview +-------- + +``CodeWritingAction`` is a concrete implementation of the abstract +``Action`` class that represents a code-writing action performed by a +language model. This action essentially captures the instance of the +written Python code by the model. The class also includes mechanisms for +checking the equality of two instances of ``CodeWritingAction``, hashing +the instances, and converting the actions to and from Payload format for +easy serialization and deserialization. + +Related Symbols +--------------- + +- ``automata.eval.agent.code_writing_eval.CodeWritingEval`` +- ``automata.eval.eval_base.Action`` +- ``automata.eval.agent.code_writing_eval.CodeWritingEval.extract_action`` +- ``automata.eval.agent.code_writing_eval.CodeExecutionError`` +- ``automata.eval.agent.code_writing_eval.CodeWritingEval._parse_code_snippet`` +- ``automata.memory_store.symbol_code_embedding_handler.SymbolCodeEmbeddingHandler`` +- ``automata.eval.tool.search_eval.SymbolSearchEvalResult`` +- ``automata.eval.agent.openai_function_eval.OpenAIFunctionEval.extract_action`` +- ``automata.symbol_embedding.symbol_embedding_base.SymbolCodeEmbedding`` + +Example +------- + +Given below is an example script demonstrating how to create an instance +of a CodeWritingAction. + +.. code:: python + + from automata.eval.agent.code_writing_eval import CodeWritingAction + + py_object = "x = 1" + error = None + action = CodeWritingAction(py_object=py_object, error=error) + + # using the methods + payload = action.to_payload() # converting action into a payload + same_action = CodeWritingAction.from_payload(payload) # and then recreating the same action from the payload + assert action == same_action # checks equality of the original and recreated actions + +Limitations +----------- + +Notwithstanding its significant utility, the ``CodeWritingAction`` class +is not without its limitations. The current implementation of the +``CodeWritingAction`` assumes that the code represented is Python code +only. Therefore, the use of this class for other programming languages +may need additional constraints or checks to guarantee correctness. + +Follow-up Questions: +-------------------- + +- Is it possible to extend ``CodeWritingAction`` to handle other + languages besides Python? If so, what potential issues might be + encountered? +- How does ``CodeWritingAction`` handle multiline Python scripts? Are + there any problems related to encoding and decoding such scripts? diff --git a/docs/eval/agent/code_writing_eval.rst b/docs/eval/agent/code_writing_eval.rst new file mode 100644 index 000000000..2ed42b1e7 --- /dev/null +++ b/docs/eval/agent/code_writing_eval.rst @@ -0,0 +1,78 @@ +CodeWritingEval +=============== + +Overview +-------- + +The ``CodeWritingEval`` class is designed to evaluate an LLM’s (Language +Learning Model) code writing ability. The LLMs are language learning +models designed for various language learning tasks. + +The class provides functionalities to extract coding actions using the +``extract_action`` method, parsing code snippets and fetching relevant +details using the ``_parse_code_snippet`` method and filtering the +actions based on a specified condition using the ``_filter_actions`` +method. + +The class is inherited from ``AgentEval``, which is an abstract base +class for evaluating agent’s performance. The ``CodeWritingEval`` class +needs to implement the abstract methods of the parent class in order to +function correctly. + +Related Symbols +--------------- + +- ``automata.eval.agent.agent_eval.AgentEval``: Base class for Agent + evaluation. +- ``automata.eval.agent.openai_function_eval.OpenAIFunctionEval``: A + concrete class for evaluating OpenAI messages for function call + actions. + +Example +------- + +Below is an example of how to use the ``CodeWritingEval`` class: + +.. code:: python + + from automata.eval.agent.code_writing_eval import CodeWritingEval + from automata.llm.base_llm_message import LLMChatMessage + + # initialize CodeWritingEval with target variables + code_eval = CodeWritingEval(target_variables=['x', 'y']) + + # Create a mock LLMChatMessage + chat_message = LLMChatMessage(role='mock_role', content='x = 10; y = 20') + + # Extract coding action + actions = code_eval.extract_action(chat_message) + print(actions) # it will print list of actions + +Please note that ``LLMChatMessage`` is a mock object, which means, it’s +often impractical to create an instance in real environment. Production +usage of this class will generally involve actual data received from +LLM. + +Limitations +----------- + +Currently, the ``CodeWritingEval`` class assumes that the raw content +passed for parsing code snippets is in markdown format. So, it may fail +if the format is different. Furthermore, it expects the target variable +names to be available in advance before initializing +``CodeWritingEval``. + +In terms of error handling, more specific exceptions could be thrown for +different error cases, currently, most of the errors throw +``CodeExecutionError`` which might not provide enough context to the +error. + +Follow-up Questions: +-------------------- + +- Is there a way for the ``CodeWritingEval`` class to handle other data + formats apart from markdown? +- What should be the correct approach in the case when the target + variables are not known in advance? +- Can there be a mechanism to provide more specific exceptions for + different error cases? diff --git a/docs/eval/agent/index.rst b/docs/eval/agent/index.rst index 96e9ae4f1..a8d34caef 100644 --- a/docs/eval/agent/index.rst +++ b/docs/eval/agent/index.rst @@ -8,14 +8,25 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. .. toctree:: :maxdepth: 1 + agent_eval + agent_eval_composite + agent_eval_result agent_eval_result_database + agent_eval_set_loader + agent_evaluation_harness + agent_evaluation_metrics code_execution_error + code_writing_action + code_writing_eval + open_ai_function_call_action open_ai_function_eval variable_not_found_error diff --git a/docs/eval/agent/open_ai_function_call_action.rst b/docs/eval/agent/open_ai_function_call_action.rst new file mode 100644 index 000000000..604d0ff5e --- /dev/null +++ b/docs/eval/agent/open_ai_function_call_action.rst @@ -0,0 +1,74 @@ +OpenAIFunctionCallAction +======================== + +``OpenAIFunctionCallAction`` is a concrete Action class that represents +a function call from the OpenAI API. The class contains a name and +arguments attribute which store the function name and the arguments that +function requires. It is part of the OpenAI Software Development Kit +used for controlling and managing actions within the Autonomous System. + +Overview +-------- + +``OpenAIFunctionCallAction`` implements functionalities for comparing +instances, hashing, converting itself into readable string format, and +payload processing which includes conversion to and from payload data. +The payload-related functionalities are critical when persisting and +retrieving instances from storage. + +Related Symbols +--------------- + +- ``automata.eval.agent.openai_function_eval.OpenAIFunctionEval._filter_actions`` +- ``automata.llm.providers.openai_llm.OpenAITool.__init__`` +- ``automata.llm.providers.openai_llm.OpenAIChatCompletionResult.from_args`` +- ``automata.llm.providers.openai_llm.OpenAIChatCompletionResult.get_function_call`` +- ``automata.eval.tool.tool_eval.ToolEval.extract_action`` +- ``automata.llm.providers.openai_llm.OpenAIChatMessage.__init__`` +- ``automata.tools.tool_executor.ToolExecution.execute`` +- ``automata.llm.providers.openai_llm.OpenAIChatCompletionProvider.__init__`` +- ``automata.llm.llm_base.FunctionCall.from_response_dict`` +- ``automata.llm.providers.openai_llm.OpenAIChatCompletionProvider.get_next_assistant_completion`` + +Examples +-------- + +Here’s an example of how to instantiate and use +``OpenAIFunctionCallAction``. + + Please note that actual OpenAI function names and arguments have not + been provided in this example. + +.. code:: python + + from automata.eval.agent.openai_function_eval import OpenAIFunctionCallAction + + name = 'fake_openai_function' + arguments = {'arg1': 'value1', 'arg2': 'value2'} + + OpenAI_action = OpenAIFunctionCallAction(name, arguments) + + print(OpenAI_action) + +Limitations +----------- + +As a specialized ``Action`` subclass, ``OpenAIFunctionCallAction`` is +tailored for function calls in the OpenAI API, thus it won’t work for +other kinds of actions or function calls in different contexts. The +seamless use of this class assumes a working understanding of the OpenAI +API and its function calls. Furthermore, while it provides a +``from_payload`` method for creating instances from saved data, the +corresponding ``to_payload`` method doesn’t save the whole state of the +object, just the action name and arguments. + +Follow-up Questions: +-------------------- + +- What are all the available functions for the OpenAI API that can be + used with ``OpenAIFunctionCallAction``? +- What are some specific arguments some OpenAI functions take and in + what format should they be provided? +- How can the process of saving and retrieving + ``OpenAIFunctionCallAction`` instances be improved to capture more + aspects of the object’s state? diff --git a/docs/eval/agent/open_ai_function_eval.rst b/docs/eval/agent/open_ai_function_eval.rst index 055993b35..07d44d037 100644 --- a/docs/eval/agent/open_ai_function_eval.rst +++ b/docs/eval/agent/open_ai_function_eval.rst @@ -1,19 +1,71 @@ -- Yes, ``OpenAIFunctionEval`` is designed to be extensible and can be - subclassed to handle other types of actions. However, doing so would - require careful design to ensure that the additional functionality - does not break the existing functionality and conforms to the - expected behavior of the parent class and other existing subclasses. - -- In a standard interaction in the ``OpenAIAutomataAgent``, messages - are received and passed to the ``OpenAIFunctionEval`` object to - extract the pertinent actions. These actions are then passed to the - respective handlers within the agent to be executed. This allows for - a clean separation of concerns within the agent, ensuring modularity - and maintainability of the code. - -- If there are any errors or exceptions while extracting function call - actions, ``OpenAIFunctionEval`` logs them and returns an empty list. - This means that in these circumstances no action can be taken. - Additional error handling could potentially be implemented in the - future to attempt recovery or provide more detailed error information - depending on system requirements. +OpenAIFunctionEval +================== + +``OpenAIFunctionEval`` is an agent evaluator that interacts with OpenAI +messages for function call actions, stemming from the base class +AgentEval. + +Overview +-------- + +``OpenAIFunctionEval`` provides an implementation to evaluate OpenAI +messages that include a function call action. The evaluator extracts the +function call action from the message and filters irrelevant actions, +returning a list of necessary actions meant for the OpenAI function in +the message. + +Related Symbols +--------------- + +- ``automata.llm.providers.openai_llm.OpenAIChatMessage.__str__`` +- ``automata.experimental.code_parsers.py.context_processing.context_utils.get_all_methods`` +- ``automata.agent.openai_agent.OpenAIAutomataAgent._get_next_user_response`` +- ``automata.experimental.code_parsers.py.context_processing.context_utils.is_private_method`` +- ``automata.tools.tool_base.Tool.run`` +- ``automata.llm.providers.openai_llm.OpenAIFunction.prompt_format`` +- ``automata.llm.providers.openai_llm.OpenAIChatCompletionResult.__str__`` +- ``automata.core.ast_handlers.find_syntax_tree_node.find_syntax_tree_node_pyast`` +- ``automata.llm.providers.openai_llm.OpenAIChatMessage.from_completion_result`` + +Example +------- + +Below is an example of how to use ``OpenAIFunctionEval``: + +.. code:: python + + from automata.eval.agent.openai_function_eval import OpenAIFunctionEval + from automata.llm.providers.openai_llm import OpenAIChatMessage + from automata.llm.model.llm import FunctionCallLLM + + # Instantiate an evaluator + evaluator = OpenAIFunctionEval() + + # Create an OpenAIChatMessage with a function call + message = OpenAIChatMessage( + function_call=FunctionCallLLM(name="print_hello", arguments={}) + ) + + # Extract actions from the message + actions = evaluator.extract_action(message) + + # Now actions contains the action for the function call in the OpenAIChatMessage + +Limitations +----------- + +- The ``OpenAIFunctionEval`` class depends on the ``OpenAIChatMessage`` + format and is tailored specifically for extracting function call + actions. If a message does not conform to this format or if a + function call is not included, it will not return any actions. +- Since the evaluation is based on the assumption that the function + call is found in a message, the presence of actions other than OpenAI + function calls would not be recognized. + +Follow-up Questions: +-------------------- + +- How can this class be adapted or extended to handle different or more + complex scenarios? +- Is it possible to modify ``OpenAIFunctionEval`` to handle other types + of actions beyond function calls? diff --git a/docs/eval/agent/variable_not_found_error.rst b/docs/eval/agent/variable_not_found_error.rst index dfb53abd9..23dfa2e51 100644 --- a/docs/eval/agent/variable_not_found_error.rst +++ b/docs/eval/agent/variable_not_found_error.rst @@ -1,16 +1,2 @@ -- ``VariableNotFoundError`` does not have specific logging or reporting - methods to aid in identifying the source of the error. However, the - error message itself usually comes with the line code that caused the - problem and this could help in identifying the source of the problem. - -- To prevent ``VariableNotFoundError`` from being raised, it is key to - ensure all variables are properly initialized and declared before - they are used. Code review checkpoints and appropriate testing can - also prevent this error. - -- The ``VariableNotFoundError`` is often raised by the interpreter when - a piece of code is trying to access an undefined variable. This - usually happens due to mistakes in code writing, when a particular - variable is referenced before it is defined, or when it’s out of - scope. Understanding the control flow and correctly structuring the - code can help in minimizing such errors. +class VariableNotFoundError(AutomataError): ‘Exception raised when the +target variable is not found.’ pass diff --git a/docs/eval/eval.rst b/docs/eval/eval.rst new file mode 100644 index 000000000..c0208225b --- /dev/null +++ b/docs/eval/eval.rst @@ -0,0 +1,79 @@ +Eval +==== + +Overview +-------- + +``Eval`` is an abstract class that provides a blueprint for evaluating +the performance of Language Learning Models (LLMs). The class is +designed to be very flexible and accommodates different kinds of +evaluators through method overriding. It requires implementing three +primary methods: ``generate_eval_result``, ``extract_action``, and +``_filter_actions``. The ``generate_eval_result`` is used to produce an +evaluation result given a set of instructions, expected actions, and an +execution mechanism. The ``extract_action`` method is for pulling out a +list of actions from a given message, and ``_filter_actions`` is for +refining the action list according to the needs of the evaluation. + +Related Symbols +--------------- + +- ``automata.singletons.py_module_loader.PyModuleLoader.__init__`` +- ``automata.eval.tool.tool_eval_metrics.ToolEvaluationMetrics.__init__`` +- ``automata.llm.llm_base.LLMConversation.__init__`` +- ``automata.embedding.embedding_base.Embedding.__str__`` +- ``automata.tasks.task_base.TaskEnvironment.validate`` +- ``automata.tasks.task_base.Task.status`` +- ``automata.core.ast_handlers.DocstringRemover.visit`` +- ``automata.tasks.automata_task.AutomataTask.__init__`` +- ``automata.llm.llm_base.LLMCompletionResult`` +- ``automata.core.base.patterns.observer.Observer`` + +Example +------- + +Since ``Eval`` is an abstract class, you cannot create an instance of it +directly. Instead, you need to create a subclass that implements the +required methods: ``generate_eval_result``, ``extract_action``, and +``_filter_actions``. Below is an example of how to create a subclass of +Eval: + +.. code:: python + + from automata.eval.eval_base import Eval + + class MyCustomEval(Eval): + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def generate_eval_result(self, exec_input, expected_output, executor, *args, **kwargs): + # Implement the method to generate eval result + pass + + def extract_action(self, input): + # Implement the method to extract actions from the input + pass + + def _filter_actions(self, inputs): + # Implement the method to filter the extracted actions + pass + +Limitations +----------- + +The main limitation of the ``Eval`` class is that it is an abstract base +class (ABC) and it cannot be used on its own without providing concrete +implementations of the ``generate_eval_result``, ``extract_action``, and +``_filter_actions`` methods. This means that the usefulness of the +``Eval`` class is dependant on how these methods are implemented in the +subclass. + +Follow-up Questions: +-------------------- + +- What are some strategies for implementing the + ``generate_eval_result``, ``extract_action``, and ``_filter_actions`` + methods? +- Is it possible to provide a default implementation of these methods + in the ``Eval`` class to make it usable out of the box, while still + allowing for customization via subclassing? diff --git a/docs/eval/eval_execution_error.rst b/docs/eval/eval_execution_error.rst index caa5da245..c431c86ca 100644 --- a/docs/eval/eval_execution_error.rst +++ b/docs/eval/eval_execution_error.rst @@ -1,25 +1,2 @@ -- Yes, in some cases, there may be more specific exceptions that can be - used instead of ``EvalExecutionError``. For instance, if an exception - occurs specifically during the execution of code, - ``CodeExecutionError`` would be more appropriate. Therefore, while - ``EvalExecutionError`` can act as a broad exception for any type of - execution error, it should not replace more specific exceptions that - would provide more detailed information about the error. - -- As for handling ``EvalExecutionError`` within the automata - environment, the best practices would largely depend on the specific - context and the error-handling strategy decided for the application. - However, some general recommendations would include: - - - Catching and logging the exception information for debugging - purposes. This can provide valuable insights and help in - identifying the root cause of the error. - - For errors that can be handled gracefully, appropriate error - handling logic should be added. This could be retry attempts, - defaulting to a different task execution strategy, or even - notifying the user about the issue. - - For errors that cannot be handled, it would be a best practice to - fail fast and propagate the exception up the call stack. This can - help in avoiding any further execution of tasks with an incorrect - state. Please keep in mind, these are broad guidelines and the - appropriate strategy can vary based on the specific use cases. +class EvalExecutionError(AutomataError): “Raised when there’s an issue +during task execution.” pass diff --git a/docs/eval/eval_loading_error.rst b/docs/eval/eval_loading_error.rst index 1475c27b4..77849e8c4 100644 --- a/docs/eval/eval_loading_error.rst +++ b/docs/eval/eval_loading_error.rst @@ -1,10 +1,2 @@ -I’m sorry for the confusion, but the information about the -``EvalLoadingError`` and related symbols seems to be hypothetical or -from a private or hypothetical source code that doesn’t have publicly -available documentation. It’s a common practice in Python programming to -define custom exceptions for specific error handling. In this context, -``EvalLoadingError`` appears to be a custom exception designed to handle -errors specifically occurred during the evaluation loading process in an -undefined ‘Automata’ project. However, without specific source code or -documentation, I’m unable to provide the exact answers to your follow-up -questions. +class EvalLoadingError(AutomataError): “Exception raised when there’s an +issue with loading evaluations.” pass diff --git a/docs/eval/eval_result.rst b/docs/eval/eval_result.rst new file mode 100644 index 000000000..ffde0ff2f --- /dev/null +++ b/docs/eval/eval_result.rst @@ -0,0 +1,84 @@ +EvalResult +========== + +``EvalResult`` is an abstract class that represents the result of an +evaluation. This class gets an unique random string as ‘run_id’ on +instantiation and has abstract properties like ``is_full_match`` and +``is_partial_match``. It also contains two abstract methods - +``to_payload`` and ``from_payload`` for serialization and +deserialization of evaluation results. + +Overview +-------- + +``EvalResult`` serves as a base for creation of specific evaluation +result objects in Automata platform. All evaluation result classes +should inherit from ``EvalResult`` and implement its properties and +methods. The primary purpose of the ``EvalResult`` class is to provide a +unified interface for dealing with evaluation results. The ``run_id`` +attribute uniquely identifies each run of evaluation. + +Related Symbols +--------------- + +- ``automata.llm.llm_base.LLMCompletionResult.get_content`` +- ``automata.agent.openai_agent.OpenAIAutomataAgent.get_result`` +- ``automata.llm.llm_base.LLMCompletionResult.get_role`` +- ``automata.experimental.tools.builders.agentified_search_builder.AgentifiedSearchToolkitBuilder._get_formatted_search_results`` +- ``automata.experimental.search.symbol_search.SymbolSearch.symbol_references`` +- ``automata.experimental.search.symbol_search.SymbolSearch.process_query`` + +Example +------- + +Since ``EvalResult`` is an abstract base class, you would not typically +instantiate it directly. Instead, you would create a new class that +inherits from ``EvalResult`` and implements its abstract methods and +properties. + +Here is an example of how you might define such a class: + +.. code:: python + + from automata.eval.eval_base import EvalResult + from typing import Any + + class CustomEvalResult(EvalResult): + + # Implement the abstract properties + @property + def is_full_match(self) -> bool: + # Implement the logic here + pass + + @property + def is_partial_match(self) -> bool: + # Implement the logic here + pass + + # Implement the abstract methods + def to_payload(self) -> dict: + # Convert the object into a dict or other serializable format + pass + + @classmethod + def from_payload(cls, payload: dict) -> 'CustomEvalResult': + # Create a new object from a dict or other serialized format + pass + +Limitations +----------- + +Being an abstract base class, ``EvalResult`` is not meant to be used +directly. The main limitation is that it defines a common interface but +does not provide an implementation. The actual functionality must be +provided by subclasses, meaning errors can occur if subclasses do not +properly implement all required methods and properties. + +Follow-up Questions: +-------------------- + +- How are ``run_ids`` used in the larger context of the Automata + application? +- What are the typical return types and key-value pairs expected in the + ``to_payload`` and ``from_payload`` methods? diff --git a/docs/eval/index.rst b/docs/eval/index.rst index b7e0a828d..398784f7b 100644 --- a/docs/eval/index.rst +++ b/docs/eval/index.rst @@ -8,16 +8,21 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. .. toctree:: :maxdepth: 1 + action agent_eval code_execution_error + eval eval_execution_error eval_loading_error + eval_result variable_not_found_error agent/index tool/index diff --git a/docs/eval/tool/eval_execution_error.rst b/docs/eval/tool/eval_execution_error.rst index 4709f79cf..d80b54f0c 100644 --- a/docs/eval/tool/eval_execution_error.rst +++ b/docs/eval/tool/eval_execution_error.rst @@ -1,15 +1 @@ -1. Whether there’s a need for more specialized exception classes to - handle different types of errors during an execution evaluation - depends on the complexity of the execution process and the system’s - needs regarding error handling. If having more specific error classes - would help in identifying and resolving issues quickly, then it could - be beneficial to have them. - -2. ``EvalExecutionError`` is used in conjunction with other exception - classes in the system to denote specific types of errors. Guidelines - or strategies for using one exception type over another usually - center around the principle of specificity: use the most specific - exception type that accurately describes the error at hand. This - enables efficient error handling and debugging. If an error does not - fit any of the more specific exception classes, then a more general - type, like ``EvalExecutionError``, might be used. +class EvalExecutionError(Exception): pass diff --git a/docs/eval/tool/eval_loading_error.rst b/docs/eval/tool/eval_loading_error.rst index a82ff16d3..899560228 100644 --- a/docs/eval/tool/eval_loading_error.rst +++ b/docs/eval/tool/eval_loading_error.rst @@ -1,9 +1 @@ -- Could there be more specific subclasses of ``EvalLoadingError``? For - example, could there be a ``EvalFileNotFoundError``, - ``EvalInvalidFormatError``, or ``EvalEmptyError`` to differentiate - between different types of loading issues? -- Are there ways to handle ``EvalLoadingError`` in the code itself, - perhaps by attempting to reload the evaluation or proceeding with a - backup evaluation if the primary one fails to load? -- Where should the logs for ``EvalLoadingError`` be stored? Are they - recorded in a consistent way that allows for easy debugging later on? +class EvalLoadingError(Exception): pass diff --git a/docs/eval/tool/index.rst b/docs/eval/tool/index.rst index cfab94502..882b2ea4a 100644 --- a/docs/eval/tool/index.rst +++ b/docs/eval/tool/index.rst @@ -8,6 +8,8 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. @@ -16,6 +18,14 @@ how to :ref:`installation` the project. eval_execution_error eval_loading_error + symbol_search_action + symbol_search_eval + symbol_search_eval_result + tool_eval + tool_eval_result + tool_eval_set_loader + tool_evaluation_harness + tool_evaluation_metrics .. AUTO-GENERATED CONTENT END .. diff --git a/docs/eval/tool/symbol_search_action.rst b/docs/eval/tool/symbol_search_action.rst new file mode 100644 index 000000000..73fe1d56a --- /dev/null +++ b/docs/eval/tool/symbol_search_action.rst @@ -0,0 +1,76 @@ +SymbolSearchAction +================== + +Overview +-------- + +``SymbolSearchAction`` is a concrete class that represents a symbol +search operation in the codebase. It can be initialized with a query +string and an optional list of search results. The class enables +comparison of ``SymbolSearchAction`` instances and uniquely identifies +each instance based on its hash. + +This class provides two key methods, ``to_payload()`` and +``from_payload()``. The ``to_payload()`` method generates a serializable +payload from the instance’s query and search results, suitable for +storage or transmission. The ``from_payload()`` method, a class method, +takes such a payload and reconstructs the ``SymbolSearchAction`` +instance from it. + +Related Symbols +--------------- + +- DependencyFactory.create_symbol_search: Creates a ``SymbolSearch`` + instance. +- SymbolSearchToolkitBuilder.\__init\_\_: Creates an instance of + ``SymbolSearchToolkitBuilder``. +- SymbolDocEmbeddingBuilder._generate_search_list: Generates a search + list. +- AgentifiedSearchToolkitBuilder.\__init\_\_: Creates an instance of + ``AgentifiedSearchToolkitBuilder``. +- SymbolSearch.exact_search: Performs an exact search across the + indexed codebase. + +Usage Example +------------- + +The following example illustrates how to create and work with an +instance of SymbolSearchAction. + +.. code:: python + + from automata.eval.tool.search_eval import SymbolSearchAction + + # Create a SymbolSearchAction + sym_search_action = SymbolSearchAction(query="MyQuery") + + # Now, let's simulate a search operation that returned some results + sym_search_action.search_results = ["result1", "result2"] + + # Create a payload from the SymbolSearchAction + payload = sym_search_action.to_payload() + + # The payload should look something like {'type': 'SymbolSearchAction', 'query': 'MyQuery', 'search_results': 'result1,result2'} + print(payload) + + # Now, create a SymbolSearchAction from the payload + sym_search_action_reconstructed = SymbolSearchAction.from_payload(payload) + + # The original and reconstructed SymbolSearchAction should be equivalent + assert sym_search_action == sym_search_action_reconstructed + +Limitations and Follow-up Questions +----------------------------------- + +``SymbolSearchAction`` doesn’t actually perform any search operations - +it is simply a representation of a search action that can be serialised +or deserialised. + +This class makes a strong assumption about the payload format, +specifically that ‘query’ and ‘search_results’ are both strings. This +may limit its compatibility with other systems or future extensions. + +- How does this class interact with the rest of the search + functionality provided in Automata for codebase exploration? +- Is there a need for more flexible payload schemas? +- How should this class handle errors in payload conversion? diff --git a/docs/eval/tool/symbol_search_eval.rst b/docs/eval/tool/symbol_search_eval.rst new file mode 100644 index 000000000..1669dc48b --- /dev/null +++ b/docs/eval/tool/symbol_search_eval.rst @@ -0,0 +1,91 @@ +SymbolSearchEval +================ + +``SymbolSearchEval`` is a class for evaluating an instance of Language +Learning Model’s (LLM’s) symbol searching ability. It forms a part of +‘automata.eval.tool.search_eval’ in the codebase. Instances of this +class are responsible for evaluating the ability of a correctly +configured Automata system to accurately perform symbol-based searches. + +Overview +-------- + +The ``SymbolSearchEval`` class inherits from ``ToolEval`` and implements +the ability to evaluate the effectiveness of symbol search operations. +It performs this evaluation based on an expected action (which must be +an instance of ``SymbolSearchAction``) and an observed action, which +could either be a ``SymbolSearchAction`` instance or a ``None`` value. + +This class facilitates the extraction of search actions implicitly from +input actions and transforms them into ``ToolEvalResult`` objects by +comparing expected and observed actions. + +Important methods in this class include ``extract_action``, and +``to_tool_result``. + +Related Symbols +--------------- + +- ``automata.symbol.graph.symbol_navigator.SymbolGraphNavigator._get_symbol_references_in_scope`` +- ``automata.symbol.symbol_base.Symbol.from_string`` +- ``automata.symbol.symbol_utils.get_rankable_symbols`` +- ``automata.experimental.tools.builders.symbol_search_builder.SymbolSearchToolkitBuilder._symbol_code_similarity_search_processor`` +- ``automata.experimental.search.symbol_search.SymbolSearch.retrieve_source_code_by_symbol`` +- ``automata.symbol.symbol_parser._SymbolParser.accept_identifier`` +- ``automata.symbol.graph.symbol_navigator.process_symbol_bounds`` +- ``automata.experimental.tools.builders.symbol_search_builder.SymbolSearchToolkitBuilder.process_query`` +- ``automata.symbol.graph.symbol_graph.SymbolGraph.get_references_to_symbol`` +- ``automata.symbol.symbol_parser.new_local_symbol`` + +Example +------- + +The following is an example demonstrating how to use the +``SymbolSearchEval`` class. + +.. code:: python + + from automata.eval.tool.search_eval import SymbolSearchEval + from automata.common.action import FunctionCall + + # Example FunctionCall and query result + func_call = FunctionCall(name='symbol-search', arguments={'query': 'symbol_xyz'}) + search_result = "Searching for symbol...\n'xyz': {'rank': 1, 'symbol': 'symbol_xyz'}" + input_action_tuple = (func_call, search_result) + + # Instantiate SymbolSearchEval + sybmol_search_eval = SymbolSearchEval() + + # Extract action + symbol_search_action = sybmol_search_eval.extract_action(input_action_tuple) + + # To tool result + tool_eval_result = sybmol_search_eval.to_tool_result(expected_action=symbol_search_action, observed_action=None) + +This example demonstrates how the ``SymbolSearchEval`` class can be used +to evaluate a symbol search operation. It first sets up a tuple of a +``FunctionCall`` and the expected result of the search. It then +instantiates the ``SymbolSearchEval`` class, and uses this to extract +the expected action from the input tuple, and to evaluate the expected +versus the observed action (in this case, None was used for simplicity). + +Limitations +----------- + +The ``SymbolSearchEval`` class currently only supports +``symbol-rank-search``, ``symbol-similarity-search``, and +``llm-facilitated-search`` operations. Any other operation will raise a +``ValueError``. + +Follow-up Questions: +-------------------- + +1. How can we extend the class to support other types of search + operations? +2. Do we have mechanisms in place to handle edge cases and errors during + the search process? +3. How can we improve the evaluation accuracy or provide comparative + analysis between different evaluation measures? +4. Are there plans in place for supporting parallel evaluations in + large-scale systems, and if so, how will potential synchronisation + issues be handled? diff --git a/docs/eval/tool/symbol_search_eval_result.rst b/docs/eval/tool/symbol_search_eval_result.rst new file mode 100644 index 000000000..9620fb62f --- /dev/null +++ b/docs/eval/tool/symbol_search_eval_result.rst @@ -0,0 +1,77 @@ +SymbolSearchEvalResult +====================== + +Overview +-------- + +The SymbolSearchEvalResult class inherits from ToolEvalResult. It is a +specific implementation tailored to represent the result of a symbol +search evaluation. Instances of this class store the expected and +observed actions which pertain to symbol searches as well as the top +match and top k matches from the observed action’s search results. It +also stores the first match from expected action’s search results as +expected match. + +Throughout its methods and properties, the checks and interactions are +primarily concerned with these aforementioned actions and matches. +Furthermore, it offers functionality to check for full and partial match +occurrences and to convert the SymbolSearchEvalResult instance into a +serializable format or create an SymbolSearchEvalResult instance from a +serializable format. + +Related Symbols +--------------- + +- ``automata.cli.scripts.run_code_embedding.collect_symbols`` +- ``automata.symbol.graph.symbol_navigator.SymbolGraphNavigator.get_references_to_symbol`` +- ``automata.symbol.symbol_parser._SymbolParser.parse_descriptors`` +- ``automata.symbol.symbol_base.Symbol.py_kind`` +- ``automata.eval.tool.tool_eval_metrics.ToolEvaluationMetrics.total_partial_matches`` +- ``automata.symbol.symbol_parser._SymbolParser.error`` +- ``automata.symbol.symbol_base.Symbol.parent`` +- ``automata.experimental.tools.builders.symbol_search_builder.SymbolSearchToolkitBuilder.build`` +- ``automata.symbol.graph.symbol_references.ReferenceProcessor`` + +Example +------- + +The following is an example demonstrating instantiation of an +SymbolSearchEvalResult object with an expected and observed action of +type SymbolSearchAction. + +.. code:: python + + from automata.eval.tool.search_eval import SymbolSearchEvalResult + from automata.eval.action import SymbolSearchAction + + expected_action = SymbolSearchAction(...) + observed_action = SymbolSearchAction(...) + + result = SymbolSearchEvalResult(expected_action, observed_action) + +**Note:** This is the most basic usage example. It is assumed that you +replace the ellipsis (…) with the required parameters to construct a +SymbolSearchAction. The expected and observed actions must be of type +SymbolSearchAction, and each should ideally hold a list of symbol search +results sorted in the order of relevance. + +Limitations +----------- + +Due to dependency on third-party terminologies like ‘Symbol’, ‘Action’, +etc. from external modules, changes made to these dependencies in future +can impact the working of this class. The class logic is strongly +coupled with the order of the ‘search_results’ list in the expected and +observed actions, having the first entry as the most important/ relevant +search result. If this order is not maintained, the logic of this class +might not work as expected. + +Follow-up Questions: +-------------------- + +- How can we handle instances where the expected or observed actions + are not of type ``SymbolSearchAction``? Is there a functionality to + fall back on a default action or should it strictly require a + ``SymbolSearchAction``? +- Is there a way to customize the number of top matches + (``TOP_K_MATCHES``) that the class considers for a partial match? diff --git a/docs/eval/tool/tool_eval.rst b/docs/eval/tool/tool_eval.rst new file mode 100644 index 000000000..224209510 --- /dev/null +++ b/docs/eval/tool/tool_eval.rst @@ -0,0 +1,78 @@ +ToolEval +======== + +``ToolEval`` is an abstract class designed for evaluating the +performance of tools by generating and comparing expected and observed +results. It has several methods you can override to customize the +evaluation process. It requires the expected output, the tool executor, +and the function call to generate the eval result. + +Overview +-------- + +``ToolEval`` is a core part of the evaluation system in Automata. It +provides a structure and means to evaluate how well a tool performs in +its task. This class requires implementation of the ``extract_action`` +and ``to_tool_result`` methods, meaning you can give it specific +evaluation behaviours such as how to translate operations and determine +the equivalence between expected and observed actions. + +Related Symbols +--------------- + +- ``automata.tasks.task_environment.AutomataTaskEnvironment.validate`` +- ``automata.core.base.patterns.singleton.Singleton`` +- ``automata.cli.env_operations.update_key_value`` +- ``automata.tasks.task_environment.EnvironmentMode`` +- ``automata.tasks.task_base.ITaskExecution`` +- ``automata.tasks.task_base.ITaskExecution.execute`` +- ``automata.eval.eval_base.Action.__init__`` +- ``automata.cli.env_operations.load_env_vars`` +- ``automata.tasks.task_base.TaskEnvironment.teardown`` +- ``automata.tools.tool_executor.ToolExecutor.__init__`` + +Example +------- + +Please note that ``ToolEval`` is an abstract base class and cannot be +instantiated directly. The following is an example demonstrating how to +create an implementation of ``ToolEval``. + +.. code:: python + + from automata.eval.tool.tool_eval import ToolEval + from automata.eval.eval_base import EvalResult, Action + from typing import Tuple, Optional, List + + class CustomToolEval(ToolEval): + + def extract_action(self, input_action_tuple: Tuple) -> Action: + # Custom implementation of action extraction + pass + + def to_tool_result(self, expected_action: Action, observed_action: Optional[Action]) -> EvalResult: + # Custom method of evaluating tool results + pass + + def _filter_actions(self, actions: List[Action]) -> List[Action]: + # Custom implementation to filter actions if necessary + pass + +Limitations +----------- + +The limitations of the ``ToolEval`` class are up to the implemented +class, as ``ToolEval`` is an abstract base class. However, it’s worth +noting that it does not inherently include any failure recovery or retry +mechanisms. If these are necessary for your use case, you should include +them in your implementation. + +Follow-up Questions: +-------------------- + +- What are some common strategies for implementing ``extract_action`` + and ``to_tool_result``? +- How can we handle cases where the tool execution fails? +- How can this be used in conjunction with other parts of the Automata + project? Is there a method to easily integrate this with existing + task environments or tool executors? diff --git a/docs/eval/tool/tool_eval_result.rst b/docs/eval/tool/tool_eval_result.rst new file mode 100644 index 000000000..4eb73b51c --- /dev/null +++ b/docs/eval/tool/tool_eval_result.rst @@ -0,0 +1,9 @@ +class ToolEvalResult(EvalResult): ‘An abstract class to represent the +result of a tool eval.’ + +:: + + def __init__(self, expected_action: Action, observed_action: Optional[Action], *args, **kwargs): + super().__init__(*args, **kwargs) + self.expected_action = expected_action + self.observed_action = observed_action diff --git a/docs/eval/tool/tool_eval_set_loader.rst b/docs/eval/tool/tool_eval_set_loader.rst new file mode 100644 index 000000000..bc7e1608c --- /dev/null +++ b/docs/eval/tool/tool_eval_set_loader.rst @@ -0,0 +1,82 @@ +ToolEvalSetLoader +================= + +``ToolEvalSetLoader`` is a class that loads a list of function calls and +their expected actions from a JSON file. This class is mainly used for +loading evaluation tasks for testing the functionality and performance +of tools. It provides developers with an efficient way of loading, +formatting, and accessing the JSON payloads for function calls and the +expected actions associated with them. + +Overview +-------- + +``ToolEvalSetLoader`` constitutes two major parts of operation: loading +JSON payloads and parsing them for instantiating function calls and +expected actions. It implements validation checks to ensure that JSON +files are loaded, and the payloads formed are dictionaries. It also +provides a formatter to apply to all string values in the loaded data +recursively. The class offers validation for the payloads, ensuring the +loaded information is in the correct format for further processing. + +In summary, ``ToolEvalSetLoader`` constitutes a fundamental part of the +tool evaluation process by streaming the function calls and expected +actions from a JSON file and validating the payloads. + +Related Symbols +--------------- + +- ``automata.symbol_embedding.vector_databases.JSONSymbolEmbeddingVectorDatabase.__init__`` +- ``automata.cli.cli_output_logger.CustomLogger.__init__`` +- ``automata.tools.tool_error.UnknownToolError.__init__`` +- ``automata.config.config_base.SerializedDataCategory`` +- ``automata.experimental.code_parsers.py.context_processing.context_utils.get_all_classes`` +- ``automata.singletons.py_module_loader.PyModuleLoader._load_module_from_fpath`` +- ``automata.symbol.graph.symbol_graph.SymbolGraph.__init__`` +- ``automata.singletons.dependency_factory.DependencyFactory.create_py_context_retriever`` +- ``automata.core.ast_handlers.ImportRemover`` +- ``automata.core.utils.set_openai_api_key`` + +Example +------- + +Here is an example demonstrating how to instantiate the +``ToolEvalSetLoader`` and load a JSON file of function calls and +expected actions. + +.. code:: python + + from automata.eval.tool.tool_eval_harness import ToolEvalSetLoader + + # Path to the JSON file of function calls and expected actions. + filepath = "path/to/json/file" + + # Create an instance of ToolEvalSetLoader + tool_eval_set_loader = ToolEvalSetLoader(filepath) + + # Load the JSON file + payloads = tool_eval_set_loader.load_json() + +The above code will load the JSON file and parse it into +``FunctionCall`` and ``Action`` objects that can be further used in the +evaluation process. + +Limitations +----------- + +- ``ToolEvalSetLoader`` can only process JSON files. If any other file + type is provided, it will raise a ``ValueError``. +- It assumes that each payload consists of a ‘template’ and ‘entries’. + If the structure of the payloads in the JSON file varies from this, + then an error may occur. + +Follow-up Questions: +-------------------- + +- What is the data structure of ‘template’ and ‘entries’ that are + expected in each payload? +- Is there a specific format for the function call and expected action + dictionaries? +- What exceptions does the class handle? What happens if an exception + is raised during the loading or parsing of the JSON file? +- Could other file types (e.g., YAML, XML) be supported in the future? diff --git a/docs/eval/tool/tool_evaluation_harness.rst b/docs/eval/tool/tool_evaluation_harness.rst new file mode 100644 index 000000000..ee3b83f4f --- /dev/null +++ b/docs/eval/tool/tool_evaluation_harness.rst @@ -0,0 +1,83 @@ +ToolEvaluationHarness +===================== + +Overview +-------- + +``ToolEvaluationHarness`` is a utility class designed to facilitate the +evaluation of a list of function calls against a set of expected +actions. It leverages various related tools and interfaces to perform +the evaluation, compute metrics, and report the results. The class +generates a unique run ID for each evaluation session and provides an +interface to conduct the evaluation with custom configurations. + +Related Symbols +--------------- + +- ``automata.tasks.task_executor.AutomataTaskExecutor.__init__`` +- ``automata.tools.tool_executor.ToolExecution.__init__`` +- ``automata.tasks.task_base.TaskEnvironment.setup`` +- ``automata.tasks.task_base.Task.__init__`` +- ``automata.tasks.task_environment.AutomataTaskEnvironment.teardown`` +- ``automata.agent.agent.AgentToolkitBuilder.build`` +- ``automata.experimental.symbol_embedding.symbol_doc_embedding_builder.SymbolDocEmbeddingBuilder._build_prompt`` +- ``automata.symbol.graph.symbol_caller_callees.CallerCalleeProcessor.process`` +- ``automata.cli.env_operations.show_key_value`` + +Usage Example +------------- + +Below is an example showcasing how ``ToolEvaluationHarness`` can be +integrated. This depiction presumes existing lists of +``input_functions`` and ``expected_actions``, in addition to the +``executor`` as an instance of ``ToolExecution``. + +.. code:: python + + from automata.eval.tool.tool_eval_harness import ToolEvaluationHarness + from automata.tools.tool_executor import ToolExecution + from mock import MagicMock # using mock objects for illustration + + # Assuming these lists are predefined and filled with functions and actions respectively + input_functions = [MagicMock(name='function') for _ in range(10)] + expected_actions = [MagicMock(name='action') for _ in range(10)] + # Assuming executor is an instance of ToolExecution + executor = ToolExecution([MagicMock(name='tool')]) + + # Initialize a ToolEvaluationHarness instance + tool_eval_harness = ToolEvaluationHarness([]) # Empty evals list for this example + + # Conduct an evaluation + metrics = tool_eval_harness.evaluate(input_functions, expected_actions, executor) + + # `metrics` now contains the evaluation results + +Limitations +----------- + +``ToolEvaluationHarness`` requires thorough understanding for its +effective utilization due to its close association with other symbolic +representations and automation tasks. The exact working and usefulness +of this class becomes clearer when combined with related symbols like +``AutomataTaskExecutor``, ``ToolExecution``, and +``CallerCalleeProcessor``. + +While the class shuffles function-action pairs randomly to prevent +ordering bias, this could potentially dilute important order nuances in +certain evaluation scenarios. + +Finally, please note that exception thrown during evaluation will halt +the process and needs to be managed appropriately. + +Follow-up Questions: +-------------------- + +1. What’s the level of granularity for ``ToolEvaluationMetrics``? Any + chance to get evaluation breakdown per function call or action in the + results? +2. Any alternatives to handle exceptions during function call executions + to maintain the continuity of the evaluation process? +3. Is there a way to override the default random seed for shuffling + function-action pairs in evaluation for specific use-cases? +4. What happens if input function list and expected action list don’t + have the same lengths? diff --git a/docs/eval/tool/tool_evaluation_metrics.rst b/docs/eval/tool/tool_evaluation_metrics.rst new file mode 100644 index 000000000..e678020ee --- /dev/null +++ b/docs/eval/tool/tool_evaluation_metrics.rst @@ -0,0 +1,70 @@ +ToolEvaluationMetrics +===================== + +``ToolEvaluationMetrics`` is a class developed to evaluate, quantify, +and provide detailed metrics accumulated from a sequence of Tool +Evaluation Results. It offers a comprehensive overview of evaluation +results from tool testing, gathered in one handy reference data object. + +Overview +-------- + +``ToolEvaluationMetrics`` class is initialized using a list of tool +evaluation results. It then provides various metrics such as the total +count of evaluations, total full matches, total partial matches and +their respective rates. The interpretive metrics available aid in +understanding, to a fine degree, the performance and effectiveness of +the tools being evaluated. + +Related Symbols +--------------- + +- ``automata.cli.scripts.run_agent.process_issues`` +- ``automata.cli.scripts.run_agent_config_validation.test_yaml_validation`` +- ``automata.singletons.dependency_factory.DependencyFactory.build_dependencies_for_tools`` +- ``automata.cli.install_indexing.generate_local_indices`` + +Example +------- + +Following is a simple example demonstrating the usage of +``ToolEvaluationMetrics``. + +.. code:: python + + from automata.eval.tool.tool_eval_metrics import ToolEvaluationMetrics + from yourmodule import YourToolEvalResults # Replace with actual module and class + + evaluation_results = [YourToolEvalResults()] # List of evaluation results + + metrics = ToolEvaluationMetrics(evaluation_results) + + print(f'Total Evaluations Conducted: {metrics.total_evaluations}') + print(f'Total Full Matches Observed: {metrics.total_full_matches}') + print(f'Total Partial Matches Observed: {metrics.total_partial_matches}') + print(f'Full Match Rate: {metrics.full_match_rate}') + print(f'Partial Match Rate: {metrics.partial_match_rate}') + +*Note: Replace ``YourToolEvalResults`` with the actual class that +contains results of tool evaluations.* + +Limitations +----------- + +``ToolEvaluationMetrics`` is dependent on the result objects that have +to contain ``is_full_match`` and ``is_partial_match`` properties to +compute the full and partial matches. If a tool’s evaluation results do +not include these properties, the ToolEvaluationMetrics cannot compute +these metrics. + +The above example assumes that you’ve replaced ``YourToolEvalResults`` +with the proper class of your tool evaluation results. + +Follow-up Questions: +-------------------- + +- Is there a way to handle the computation of metrics even when the + ``is_full_match`` and ``is_partial_match`` are not present within the + tool’s evaluation results? +- How are ``is_full_match`` and ``is_partial_match`` being decided + within the tool’s evaluation results? diff --git a/docs/experimental/code_parsers/index.rst b/docs/experimental/code_parsers/index.rst new file mode 100644 index 000000000..625ddf44e --- /dev/null +++ b/docs/experimental/code_parsers/index.rst @@ -0,0 +1,23 @@ +code_parsers +============ + +**Automata** is a Python library for autonomous providers. + +Check out the :doc:`usage` section for further information, including +how to :ref:`installation` the project. + + + +.. AUTO-GENERATED CONTENT START +.. + + .. toctree:: + :maxdepth: 1 + + py/index + +.. AUTO-GENERATED CONTENT END +.. + + + diff --git a/docs/experimental/code_parsers/py/context_processing/base_context_component.rst b/docs/experimental/code_parsers/py/context_processing/base_context_component.rst new file mode 100644 index 000000000..a14ad4d6a --- /dev/null +++ b/docs/experimental/code_parsers/py/context_processing/base_context_component.rst @@ -0,0 +1,84 @@ +BaseContextComponent +==================== + +``BaseContextComponent`` is an abstract base class that provides a +foundation for creating components that handle symbol context processing +in the Automata software. + +Overview +-------- + +The ``BaseContextComponent`` class offers methods to process entries, +manage indentation levels, and generate data specific to the inheriting +subclasses. This class uses configurations such as ``spacer`` and +``indent_level`` for manipulating and formatting the context. Inheriting +classes, however, need to provide the implementation of the ``generate`` +method, as it’s abstract in the base class. + +Related Symbols +--------------- + +- ``automata.code_parsers.directory.File.__init__`` +- ``automata.core.ast_handlers.BoundingBox`` +- ``automata.symbol.symbol_base.SymbolPackage`` +- ``automata.core.ast_handlers.LineItem`` +- ``automata.llm.providers.openai_llm.OpenAIConversation.__init__`` +- ``automata.symbol.graph.symbol_caller_callees.CallerCalleeProcessor.__init__`` +- ``automata.symbol.graph.symbol_graph_base.GraphProcessor`` +- ``automata.symbol.graph.symbol_relationships.RelationshipProcessor`` +- ``automata.experimental.search.symbol_rank.SymbolRank.__init__`` +- ``automata.llm.providers.openai_llm.OpenAIFunction.__init__`` + +Usage Example +------------- + +Considering ``BaseContextComponent`` is an abstract base class, direct +instantiation is not possible. However, you can extend this class to +create a new component for manipulating symbol context. Here is an +example of a subclass: + +.. code:: python + + from typing import Any + from ast import AST + + from automata.experimental.code_parsers.py.context_processing.context_retriever import BaseContextComponent + + + class MyContextComponent(BaseContextComponent): + + def generate(self, symbol: 'Symbol', ast_object: AST, **kwargs: Any) -> str: + # Provide an implementation for the abstract generate method + return f"Symbol: {symbol}, AST Object: {ast_object}" + + # Instantiate and use the subclass + + component = MyContextComponent(spacer='--', indent_level=2) + processed_message = component.process_entry("Hello\nWorld") + print(processed_message) + + # Providing a Symbol and AST object to the generate method might need additional setup + +Please replace ``'Symbol'`` and ``AST`` with proper values fitting your +use case. + +Limitations +----------- + +While ``BaseContextComponent`` provides a flexible structure for context +processing, it has its restrictions. The class is abstract; thus, direct +instantiation isn’t possible. A concrete subclass providing +implementation for the abstract ``generate`` method is required. Also, +only basic context operations are supported: handling strings and +indentations. For more specific operations, further methods need to be +implemented in subclasses. + +Follow-up Questions: +-------------------- + +- How are subclasses of ``BaseContextComponent`` utilized within the + overall architecture of the Automata software and what’s their + interaction with other software components? +- Could there be a standard method for processing symbols and AST + objects within the ``BaseContextComponent`` class instead of the + abstract ``generate`` method? diff --git a/docs/experimental/code_parsers/py/context_processing/context_component.rst b/docs/experimental/code_parsers/py/context_processing/context_component.rst new file mode 100644 index 000000000..b9c5b862d --- /dev/null +++ b/docs/experimental/code_parsers/py/context_processing/context_component.rst @@ -0,0 +1,2 @@ +class ContextComponent(Enum): HEADLINE = ‘headline’ SOURCE_CODE = +‘source_code’ INTERFACE = ‘interface’ diff --git a/docs/experimental/code_parsers/py/context_processing/headline_context_component.rst b/docs/experimental/code_parsers/py/context_processing/headline_context_component.rst new file mode 100644 index 000000000..fdc5a776e --- /dev/null +++ b/docs/experimental/code_parsers/py/context_processing/headline_context_component.rst @@ -0,0 +1,7 @@ +class HeadlineContextComponent(BaseContextComponent): + +:: + + def generate(self, symbol: 'Symbol', ast_object: AST, *args, **kwargs) -> str: + 'Convert a symbol into a headline.' + return self.process_entry(symbol.dotpath) diff --git a/docs/experimental/code_parsers/py/context_processing/index.rst b/docs/experimental/code_parsers/py/context_processing/index.rst new file mode 100644 index 000000000..00197cb98 --- /dev/null +++ b/docs/experimental/code_parsers/py/context_processing/index.rst @@ -0,0 +1,29 @@ +context_processing +================== + +**Automata** is a Python library for autonomous providers. + +Check out the :doc:`usage` section for further information, including +how to :ref:`installation` the project. + + + +.. AUTO-GENERATED CONTENT START +.. + + .. toctree:: + :maxdepth: 1 + + base_context_component + context_component + headline_context_component + interface_context_component + py_context_handler + py_context_handler_config + source_code_context_component + +.. AUTO-GENERATED CONTENT END +.. + + + diff --git a/docs/experimental/code_parsers/py/context_processing/interface_context_component.rst b/docs/experimental/code_parsers/py/context_processing/interface_context_component.rst new file mode 100644 index 000000000..31b3361b7 --- /dev/null +++ b/docs/experimental/code_parsers/py/context_processing/interface_context_component.rst @@ -0,0 +1,95 @@ +InterfaceContextComponent +========================= + +Overview +-------- + +The ``InterfaceContextComponent`` is a class within the context +processing portion of the ``automata.experimental.code_parsers.py`` +module. This class is fundamental in converting Python abstract syntax +trees (``AST``) to an interface that can be used to document the +functionality of code. + +The ``InterfaceContextComponent`` takes symbols and AST objects and +processes them to generate comprehensive code interface information. One +of its main functionalities lies in its ``generate()`` function which +converts a symbol into an interface and selectively processes and skips +private methods or classes as indicated. Another essential method is +``_process_classes_and_methods``, which in turn, delves into the AST +objects, processing all classes and methods contained within. + +Additionally, this class has implemented safeguards against potential +recursion errors, with adjustable settings for maximum recursion depth. +Private classes or methods can be included or excluded based on +requirements, and even the method of documenting such as the headers for +interfaces and classes can be customized. + +Related Symbols +--------------- + +- ``automata.cli.commands.install_indexing`` +- ``automata.cli.commands.run_doc_post_process`` +- ``automata.agent.openai_agent.OpenAIAutomataAgent.conversation`` +- ``automata.symbol.symbol_base.Symbol.__str__`` +- ``automata.llm.providers.openai_llm.OpenAIChatCompletionResult.__init__`` +- ``automata.llm.llm_base.LLMConversation.LLMEmptyConversationError`` +- ``automata.llm.providers.openai_llm.OpenAIChatCompletionProvider.standalone_call`` +- ``automata.symbol.symbol_base.SymbolDescriptor.__repr__`` +- ``automata.core.base.patterns.singleton.Singleton.__call__`` +- ``automata.tasks.task_environment.AutomataTaskEnvironment.reset`` + +Example Usage +------------- + +Here is an example on how to use the InterfaceContextComponent to +generate an interface for a symbol and AST object: + +.. code:: python + + from automata.experimental.code_parsers.py.context_processing.context_retriever import InterfaceContextComponent + import ast + + # create InterfaceContextComponent object + context_gen = InterfaceContextComponent() + + # supply a python file from which to extract the ast + with open('test.py', "r") as source: + tree = ast.parse(source.read()) + + # Generate interface using the context_gen + interface = context_gen.generate(None, tree) + + # The interface string now contains a documented overview of 'test.py' + print(interface) + +Note: The actual usage of this class might be more complex, given it’s +generally combined with the use of ``Symbols`` and various intricacy +associated with the ``AST`` objects. + +Limitations and Unknowns +------------------------ + +There is a maximum recursion depth (default of 2) beyond which the +``InterfaceContextComponent`` will not continue iterating into nested +classes or methods. This limitation can potentially restrict the +comprehensiveness of a large or deeply nested codebase. It is also +essential to note the necessity of handling exceptions for failure to +process individual methods during the interface generation. + +Furthermore, the exact role and utility of this class might be more +informed while understanding its utilization in the bigger context of +the library it resides in. + +Follow-up Questions: +-------------------- + +- How does this class behave with extremely complex and nested ``AST`` + structures? +- Could there be other ways of implementing the parsing of ``AST`` + objects which might reduce the need for recursion depth limits and + the complexity of the code? +- What are the potential use-cases for this class in practical software + development or documentation workflows? +- Can it handle all different Python objects (e.g., decorated methods, + static methods, class methods, properties, etc.)? Or does it have any + specific restrictions? diff --git a/docs/experimental/code_parsers/py/context_processing/py_context_handler.rst b/docs/experimental/code_parsers/py/context_processing/py_context_handler.rst new file mode 100644 index 000000000..c7dd2d6d8 --- /dev/null +++ b/docs/experimental/code_parsers/py/context_processing/py_context_handler.rst @@ -0,0 +1,74 @@ +PyContextHandler +================ + +``PyContextHandler`` is an essential class in the ``automata`` framework +involved in the crucial task of handling the context linked to a symbol. +It helps derive valuable context from a given symbol and its relevant +constituents in the code base. + +Overview +-------- + +The ``PyContextHandler`` class works with other entities, i.e., the +‘PyContextHandlerConfig’ configuration object, the +``PyContextRetriever`` entity, and performs a ``SymbolSearch`` within +the Python codebase. Beyond processing the primary symbol and its +related components, this handler also collects context concerning +supplementary symbols through rank matches, dependencies, and respective +associated tests, handling them accordingly. + +Related Symbols +--------------- + +- ``automata.experimental.code_parsers.py.context_processing.context_handler.PyContextHandlerConfig`` +- ``automata.experimental.code_parsers.py.context_processing.context_retriever.PyContextRetriever`` +- ``automata.symbol.search.SymbolSearch`` + +Usage Example +------------- + +Here’s an example of how to utilize the ``PyContextHandler`` with a mock +instance of a ‘Symbol’, showing how to construct a symbol’s context: + +.. code:: python + + from automata.experimental.code_parsers.py.context_processing.context_handler import PyContextHandler + from automata.experimental.code_parsers.py.context_processing.context_handler import PyContextHandlerConfig + from automata.experimental.code_parsers.py.context_processing.context_retriever import PyContextRetriever + from automata.symbol.search import SymbolSearch + from automata.data_structures.symbol import Symbol + + symbol_search = SymbolSearch() + context_retriever = PyContextRetriever() + config = PyContextHandlerConfig() + symbol = Symbol() + + # Instantiate PyContextHandler + context_handler = PyContextHandler(config, context_retriever, symbol_search) + + # Construct symbol context + symbol_context = context_handler.construct_symbol_context(symbol) + +Note: In this example ‘Symbol’ is only a placeholder. You should replace +‘Symbol’ instances with an actual instance of the ‘Symbol’ class or a +mock ‘Symbol’ object if you are using this in a testing suite. + +Limitations +----------- + +- One of the limitations of the ``PyContextHandler`` is that it does + not currently sort symbols that a given symbol depends on by any + specific criteria such as rank or similarity match. +- Current implementation only includes symbols from tests if they have + ‘automata.test’ in their path. Therefore, test coverage from other + locations would be missing. + +Follow-up Questions: +-------------------- + +- Could there be more flexibility in terms of how the + ‘PyContextHandler’ selects the secondary symbols and dependencies? +- How can we incorporate a methodology to rank or sort the dependent + symbols? +- What alternatives can be considered to make the test retrieval + process more inclusive? diff --git a/docs/experimental/code_parsers/py/context_processing/py_context_handler_config.rst b/docs/experimental/code_parsers/py/context_processing/py_context_handler_config.rst new file mode 100644 index 000000000..1cb501a77 --- /dev/null +++ b/docs/experimental/code_parsers/py/context_processing/py_context_handler_config.rst @@ -0,0 +1,9 @@ +class PyContextHandlerConfig(): ‘The configuration for the +PyContextHandlerConfig’ + +:: + + def __init__(self, top_n_test_matches: int=10, top_n_symbol_rank_matches: int=10, top_n_dependency_matches: int=20) -> None: + self.top_n_test_matches = top_n_test_matches + self.top_n_symbol_rank_matches = top_n_symbol_rank_matches + self.top_n_dependency_matches = top_n_dependency_matches diff --git a/docs/experimental/code_parsers/py/context_processing/source_code_context_component.rst b/docs/experimental/code_parsers/py/context_processing/source_code_context_component.rst new file mode 100644 index 000000000..b9c9564aa --- /dev/null +++ b/docs/experimental/code_parsers/py/context_processing/source_code_context_component.rst @@ -0,0 +1,73 @@ +SourceCodeContextComponent +========================== + +Overview +-------- + +The ``SourceCodeContextComponent`` class comes under OpenAI’s automata +library for code parsing. Its primary function is to convert a symbol +into its source code representation. The ``generate`` method in this +class takes a symbol and an AST (Abstract Syntax Tree) object along with +optional parameters to control the inclusion of imports and docstrings. +It then retrieves and returns the source code corresponding to the given +symbol. Other parameters that can be altered include the maximum length +of the source code to be returned. + +Related Symbols +--------------- + +- ``automata.experimental.memory_store.symbol_doc_embedding_handler.SymbolDocEmbeddingHandler._update_existing_embedding`` +- ``automata.core.ast_handlers.get_node_without_imports`` +- ``automata.symbol.symbol_base.Symbol.is_local`` +- ``automata.experimental.symbol_embedding.symbol_doc_embedding_builder.SymbolDocEmbeddingBuilder._build_class_document`` +- ``automata.memory_store.symbol_code_embedding_handler.SymbolCodeEmbeddingHandler._queue_for_building`` +- ``automata.symbol.graph.symbol_references.ReferenceProcessor.process`` +- ``automata.symbol.symbol_base.Symbol.is_meta`` +- ``automata.symbol.symbol_base.Symbol.is_parameter`` +- ``automata.symbol.graph.symbol_navigator.SymbolGraphNavigator._get_symbol_containing_file`` +- ``automata.experimental.scripts.run_update_tool_eval.call_completion_provider`` + +Example +------- + +The following example demonstrates how to use +``SourceCodeContextComponent`` to retrieve source code for a symbol. + +.. code:: python + + from automata.experimental.code_parsers.py.context_processing.context_retriever import SourceCodeContextComponent + from ast import parse + + # Assume that 'symbol' is a predefined instance of class 'Symbol' + source_code_context = SourceCodeContextComponent() + + # Parse sample source code into an AST object + sample_code = "def square(number): return number ** 2" + ast_object = parse(sample_code) + + # Generate the source code + source = source_code_context.generate(symbol, ast_object) + + print(source) # Prints: "def square(number): return number ** 2" + +Limitations +----------- + +Please note that ``SourceCodeContextComponent`` relies on Abstract +Syntax Trees (AST) which impose some limitations: + +- Dealing with deeply nested structure can be complex. +- The ``include_imports`` and ``include_docstrings`` parameters only + work on AST nodes that support imports and docstrings respectively. +- Source code conversion is limited to Python language constructs that + can be represented as an AST. + +Follow-up Questions: +-------------------- + +- How does ``SourceCodeContextComponent`` deal with objects other than + Python code symbols? +- How is the maximum length of the generated source code determined and + can it be customizable? +- How does ``SourceCodeContextComponent`` interact with other + components in the code parsing and symbol embedding process? diff --git a/docs/experimental/code_parsers/py/index.rst b/docs/experimental/code_parsers/py/index.rst new file mode 100644 index 000000000..5ed7322c5 --- /dev/null +++ b/docs/experimental/code_parsers/py/index.rst @@ -0,0 +1,23 @@ +py +== + +**Automata** is a Python library for autonomous providers. + +Check out the :doc:`usage` section for further information, including +how to :ref:`installation` the project. + + + +.. AUTO-GENERATED CONTENT START +.. + + .. toctree:: + :maxdepth: 1 + + context_processing/index + +.. AUTO-GENERATED CONTENT END +.. + + + diff --git a/docs/experimental/index.rst b/docs/experimental/index.rst index 33e685c48..2fa404fdb 100644 --- a/docs/experimental/index.rst +++ b/docs/experimental/index.rst @@ -17,13 +17,18 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. .. toctree:: :maxdepth: 1 + code_parsers/index + memory_store/index search/index + symbol_embedding/index tools/index .. AUTO-GENERATED CONTENT END diff --git a/docs/experimental/memory_store/index.rst b/docs/experimental/memory_store/index.rst new file mode 100644 index 000000000..03cf48bee --- /dev/null +++ b/docs/experimental/memory_store/index.rst @@ -0,0 +1,23 @@ +memory_store +============ + +**Automata** is a Python library for autonomous providers. + +Check out the :doc:`usage` section for further information, including +how to :ref:`installation` the project. + + + +.. AUTO-GENERATED CONTENT START +.. + + .. toctree:: + :maxdepth: 1 + + symbol_doc_embedding_handler + +.. AUTO-GENERATED CONTENT END +.. + + + diff --git a/docs/experimental/memory_store/symbol_doc_embedding_handler.rst b/docs/experimental/memory_store/symbol_doc_embedding_handler.rst new file mode 100644 index 000000000..60168afc9 --- /dev/null +++ b/docs/experimental/memory_store/symbol_doc_embedding_handler.rst @@ -0,0 +1,71 @@ +SymbolDocEmbeddingHandler +========================= + +``SymbolDocEmbeddingHandler`` is a class that handles the embedding of +symbols. It inherits from the ``SymbolEmbeddingHandler`` and adds the +functionality to handle specific embeddings for documentation associated +with Python symbols, i.e., classes or functions. + +Overview +-------- + +The ``SymbolDocEmbeddingHandler`` processes the embedding for symbols, +it fetches the embedding source code for the symbol and supports +functionality to either ``update_existing_embedding`` or +``_create_new_embedding`` . It also has the attribute ``overwrite`` +which dictates if an existing embedding should be overwritten with a new +one. + +Related Symbols +--------------- + +- ``automata.experimental.symbol_embedding.symbol_doc_embedding_builder.SymbolDocEmbeddingBuilder`` +- ``automata.symbol_embedding.symbol_embedding_handler.SymbolEmbeddingHandler`` +- ``automata.singletons.dependency_factory.DependencyFactory.create_symbol_doc_embedding_handler`` +- ``automata.experimental.tools.builders.document_oracle_builder.DocumentOracleToolkitBuilder`` +- ``automata.experimental.search.symbol_search.SymbolSearch`` +- ``automata.symbol_embedding.vector_databases.ChromaSymbolEmbeddingVectorDatabase`` +- ``automata.embedding.embedding_base.EmbeddingHandler`` +- ``automata.cli.scripts.run_doc_embedding.initialize_providers`` +- ``automata.symbol_embedding.symbol_embedding_base.SymbolEmbedding`` +- ``automata.embedding.embedding_base.EmbeddingVectorProvider`` + +Example +------- + +Here is example usage of the ``SymbolDocEmbeddingHandler``. + +.. code:: python + + from automata.experimental.memory_store.symbol_doc_embedding_handler import SymbolDocEmbeddingHandler + from automata.experimental.symbol_embedding.symbol_doc_embedding_builder import SymbolDocEmbeddingBuilder + from automata.symbol_embedding.vector_databases import ChromaSymbolEmbeddingVectorDatabase + + # Assume 'symbol' is an instance of Symbol for a class or function + # Assume 'source_code' is a string containing Python code + + # Create instance of SymbolDocEmbeddingHandler + embedding_db = ChromaSymbolEmbeddingVectorDatabase('PYTHON_CODE') + embedding_provider = EmbeddingVectorProvider() + embedding_builder = SymbolDocEmbeddingBuilder(embedding_provider) + sde_handler = SymbolDocEmbeddingHandler(embedding_db, embedding_builder, batch_size=1) + + # Process embedding for symbol + sde_handler.process_embedding(symbol) + +Limitations +----------- + +The ``SymbolDocEmbeddingHandler`` currently only supports a +``batch_size`` of 1, meaning it processes one symbol at a time. If a +different ``batch_size`` is used, it raises a ``ValueError``. + +Follow-up Questions: +-------------------- + +- What is the reason for the ``batch_size`` being 1, and are there any + plans to add support for larger batch sizes? +- What strategies can be used for addressing cases where the symbol to + be processed does not have source code? Currently, a ``ValueError`` + is raised, but would there be value in handling these cases + differently, like default behavior or user-defined action? diff --git a/docs/experimental/search/index.rst b/docs/experimental/search/index.rst index 4beed1916..3763612bc 100644 --- a/docs/experimental/search/index.rst +++ b/docs/experimental/search/index.rst @@ -19,6 +19,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/experimental/search/symbol_rank.rst b/docs/experimental/search/symbol_rank.rst index 7b222c25e..9724d04c6 100644 --- a/docs/experimental/search/symbol_rank.rst +++ b/docs/experimental/search/symbol_rank.rst @@ -1,99 +1,86 @@ SymbolRank ========== -SymbolRank class applies the PageRank algorithm on a graph to rank -symbols such as methods and classes based on their semantic context and -structural relationships within a software. - -Symbols are the classes, methods or other elements in a code corpus. A -SymbolGraph is constructed where each symbol forms a node and -dependencies between symbols form edges. This SymbolGraph maps -structural information from the codebase and helps explore symbol -dependencies, relationships and hierarchy. - -Finally, a prepared similarity dictionary between symbols is used in -combination with the SymbolGraph to compute their SymbolRanks. This is -performed using an iterative computation analogous to Google’s PageRank -algorithm, considering symbols’ similarity scores and their connectivity -within the graph. - -For this a SymbolRankConfig is required which provides the necessary -parameters for the computations. - -Methods -------- - -``__init__(self, graph: nx.DiGraph, config: SymbolRankConfig) -> None:`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Initializes a SymbolRank instance with a given graph and a -SymbolRankConfig. If config is not provided, a default SymbolRankConfig -is initialized. +``SymbolRank`` is a class that implements a semantic code analyzer for +software corpora. Using techniques from language models and graph +theory, it assigns a rank to symbols such as classes and methods based +on their semantic context and structural relationships within the +software. This class is an implementation of the PageRank algorithm that +works on symbols in a graph. + +The primary method ``get_ordered_ranks`` executes an iterative +computation similar to Google’s PageRank, but considers both the +symbols’ similarity scores to the query and their connectivity within +the graph. The result is a ranking of code symbols that aids tasks like +code understanding, navigation, recommendation, and search. + +Overview +-------- -``get_ranks(self,query_to_symbol_similarity: Optional[Dict[Symbol, float]] = None,initial_weights: Optional[Dict[Symbol, float]] = None,dangling: Optional[Dict[Symbol, float]] = None,) -> List[Tuple[Symbol, float]]:`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The ``SymbolRank`` class is initialized with a directed graph and a +configuration that’s been validated. It calculates the SymbolRanks of +each node in the graph and allows retrieval of the top N symbols +according to their ranks. It also has methods to prepare the graph for +the SymbolRank algorithm, prepare initial rank values, prepare the +similarity input dictionary, prepare the dangling node weights, and get +the dangling nodes in the graph. -Calculate the SymbolRanks of each node in the graph. +Related Symbols +--------------- -``get_top_symbols(self, n: int) -> List[Tuple[str, float]]:`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Some related symbols include: -Get the top ‘n’ symbols as per their ranks. Returns a list of tuples, -where each tuple contains the dotpath of a symbol and its rank. +- ``automata.experimental.search.symbol_search.SymbolSearch.symbol_rank`` +- ``automata.singletons.dependency_factory.DependencyFactory.create_symbol_rank`` +- ``automata.symbol.graph.symbol_graph.SymbolGraph.default_rankable_subgraph`` +- ``automata.symbol.graph.symbol_graph.SymbolGraph._build_rankable_subgraph`` +- ``automata.symbol.graph.symbol_references.ReferenceProcessor._process_symbol_roles`` +- ``automata.symbol.graph.symbol_navigator.SymbolGraphNavigator.get_symbol_relationships`` +- ``automata.symbol.graph.symbol_navigator.SymbolGraphNavigator.get_sorted_supported_symbols`` +- ``automata.symbol.graph.symbol_graph.SymbolGraph.get_symbol_relationships`` +- ``automata.symbol.symbol_parser._SymbolParser.__init__`` +- ``automata.symbol_embedding.vector_databases.JSONSymbolEmbeddingVectorDatabase.get_ordered_keys`` -Examples --------- +Usage Example +------------- .. code:: python - from automata.symbol.base import Symbol - from automata.experimental.search.rank import SymbolRank, SymbolRankConfig - import networkx as nx + # Note: This is an illustrative example. + # nx, Symbol, SymbolRankConfig are placeholders and should be replaced with actual imports. + import nx + from your_module import Symbol, SymbolRankConfig, SymbolRank - # create a graph - G = nx.DiGraph() - G.add_edge(1, 2) - G.add_edge(2, 3) - G.add_edge(3, 1) - - # initialize SymbolRankConfig and SymbolRank + # Assuming we have a directed graph 'graph' and a configuration 'config' + graph = nx.DiGraph() config = SymbolRankConfig() - sr = SymbolRank(G, config) - # retrieve SymbolRanks - ranks = sr.get_ranks() + symbol_rank = SymbolRank(graph, config) -Related Modules ---------------- + query_to_symbol_similarity = None + initial_weights = None + dangling = None + + ordered_ranks = symbol_rank.get_ordered_ranks(query_to_symbol_similarity, initial_weights, dangling) -- automata.symbol.base.Symbol -- automata.experimental.search.symbol_search.SymbolSearch -- automata.tools.builders.symbol_search.SymbolSearchToolkitBuilder + # Get top 10 symbols + top_symbols = symbol_rank.get_top_symbols(10) Limitations ----------- -- The SymbolRank class assumes that every node in the graph is a symbol - from an application’s corpus. Therefore, the graph should be prepared - accordingly. -- SymbolRank uses an algorithm similar to the PageRank algorithm which - is iterative in nature. Hence, it may take significant time for large - graphs. -- As the ranks depend on both the graph structure and symbol - similarity, inaccurate results can be returned when the graph is not - properly constructed or appropriate symbol similarity is not used. +The ``SymbolRank`` algorithm assumes that every node in the graph is a +symbol to be understood analytically. Misinterpreted or improperly +parsed symbols can lead to inaccurate results. Moreover, it applies the +same relevance weight to all types of symbol relationships, potentially +oversimplifying complex dependency structures. Follow-up Questions: -------------------- -- What is default value of ``SymbolRankConfig`` if not provided while - initializing ``SymbolRank``? -- Are there any specific assumptions or requirements for the format or - structure of ``Query_symbol_similarity`` , ``initial_weights`` , - ``dangling``? -- What is the depth up to which symbol dependencies are considered - while constructing the SymbolGraph? -- How are the weights of the edges in the SymbolGraph determined? -- How is the similarity between symbols computed? -- What happens if the ``get_ranks`` method does not converge in - ``max_iterations``? What approaches can be used to mitigate this? +- How can we modify ``SymbolRank`` to distinguish between different + types of symbol relationships? +- How does ``SymbolRank`` handle cases where some symbols are more + critical to the software’s functionality than others? +- How robustly does ``SymbolRank`` recover in scenarios where there are + parsing errors or misinterpreted symbols? diff --git a/docs/experimental/search/symbol_rank_config.rst b/docs/experimental/search/symbol_rank_config.rst index 0487c15d6..0dbd7ac31 100644 --- a/docs/experimental/search/symbol_rank_config.rst +++ b/docs/experimental/search/symbol_rank_config.rst @@ -1,69 +1,84 @@ SymbolRankConfig ================ -``SymbolRankConfig`` is a configuration class for the SymbolRank object. -It is derived from the BaseModel class and is used to set up -configurations such as alpha, max_iterations, tolerance, and weight_key -for SymbolRank. +``SymbolRankConfig`` is a configuration class meant for use with the +``SymbolRank`` module. Its purpose is to configure and manage various +aspects of the SymbolRank algorithm such as alpha (damping factor), +maximum iterations, tolerance, and weight key. This config allows the +users to manipulate the preprocessing parameters of the SymbolRank +algorithm. Overview -------- -``SymbolRankConfig`` allows for the setup of various parameters: +The ``SymbolRankConfig`` class provides a way to specify and validate +configuration options for the SymbolRank algorithm. The parameters for +the algorithm include: -- alpha: It affects the damping factor used in the calculation of the - SymbolRank. Default value is 0.25. -- max_iterations: Sets the maximum number of iterations for the - SymbolRank calculation. Default value is 100. -- tolerance: Specifies the tolerance for error in the SymbolRank - calculations. The default is 1.0e-6. -- weight_key: Specifies the key for accessing edge weights. The default - is “weight”. +- ``alpha``: This is the damping factor used in the algorithm, a float + in (0, 1). This influences how the algorithm balances between more + specific and more general symbols when forming a ranked list of + symbols. The default value for alpha is 0.25. -An instance of ``SymbolRankConfig`` then validates these values to -ensure that they are within certain bounds. If they fall outside these -bounds, it raises a ValueError. +- ``max_iterations``: This is the maximum number of iterations for the + algorithm to perform, an integer. The default value for + max_iterations is 100. + +- ``tolerance``: This is the tolerance for the calculation, a float in + (1e-4, 1e-8). When the difference between iteratively calculated + values falls below this threshold, the calculation is stopped. The + default value for tolerance is 1e-06. + +- ``weight_key``: This is the key used to retrieve weights from a + graph, a string. The default value for weight_key is ‘weight’. + +The ``validate_config`` function ensures the correctness of the +specified configuration parameters, raising a ``ValueError`` where the +parameters fall outside of their respective valid ranges. Related Symbols --------------- -- ``automata.experimental.search.rank.SymbolRank`` -- ``automata.tests.unit.test_symbol_rank.test_get_ranks`` -- ``automata.tests.unit.test_symbol_rank.test_get_ranks_small_graph`` -- ``automata.experimental.search.symbol_search.SymbolSearch.symbol_rank`` -- ``automata.tests.unit.test_symbol_rank.test_prepare_initial_ranks`` -- ``automata.singletons.dependency_factory.DependencyFactory.create_symbol_rank`` -- ``automata.tests.unit.test_symbol_search_tool.test_symbol_rank_search`` -- ``automata.tests.unit.test_symbol_rank.test_pagerank_config_validation`` -- ``automata.singletons.dependency_factory.DependencyFactory.create_symbol_search`` -- ``automata.tests.regression.test_symbol_searcher_regression.test_symbol_rank_search_on_symbol`` +The following interfaces and procedures are related: -Example -------- +- ``automata.cli.cli_utils.get_custom_style`` +- ``automata.symbol_embedding.vector_databases.JSONSymbolEmbeddingVectorDatabase.get_all_ordered_embeddings`` +- ``automata.singletons.dependency_factory.DependencyFactory.create_symbol_graph`` +- ``automata.symbol_embedding.symbol_embedding_handler.SymbolEmbeddingHandler._get_sorted_supported_symbols`` +- ``automata.cli.scripts.run_agent_config_validation.yaml_schema`` +- ``automata.symbol.symbol_base.Symbol.__repr__`` +- ``automata.symbol.symbol_base.SymbolDescriptor.__init__`` +- ``automata.symbol_embedding.vector_databases.ChromaSymbolEmbeddingVectorDatabase._sort_entries`` +- ``automata.cli.env_operations.select_graph_type`` +- ``automata.llm.llm_base.LLMChatMessage.to_dict`` -Below is a simple example on instantiation and validation of -SymbolRankConfig. +Usage Example +------------- .. code:: python - from automata.experimental.search.rank import SymbolRankConfig - config = SymbolRankConfig(alpha=0.5, max_iterations=100, tolerance=1.0e-6) + from automata.experimental.search.symbol_rank import SymbolRankConfig + config = SymbolRankConfig(alpha=0.3, max_iterations=200, tolerance=1e-06, weight_key='weight') config.validate_config(config) Limitations ----------- -``SymbolRankConfig`` is currently constrained to validate only alpha and -tolerance parameters. However, validation for other parameters such as -max_iterations and weight_key can also be crucial depending upon the -nature of graph and its edges. +``SymbolRankConfig`` mainly validates and ensures that the parameters +are in their respective valid ranges. However, it does not verify if +these parameters are suitable for the specific data or context in which +the SymbolRank algorithm is applied. It’s the user’s responsibility to +ensure that these parameters help the SymbolRank algorithm yield +meaningful and accurate results for their particular application. + +``SymbolRankConfig`` does not support dynamic reconfiguration. All +parameters must be correctly defined when an instance of this +configuration is created. Follow-up Questions: -------------------- -- Are there plans to add any further parameters or configurations in - the ``SymbolRankConfig`` class? -- Is there any specific reason to keep the default value of weight_key - as “weight”? -- What kind of use cases are typically supported by the - ``SymbolRankConfig`` class? +- Are there any safeguards to rectify or handle parameters that don’t + yield meaningful results? +- Would there be benefits to allowing dynamic reconfiguration of + parameters? diff --git a/docs/experimental/search/symbol_search.rst b/docs/experimental/search/symbol_search.rst index 4756422e6..e828c5e13 100644 --- a/docs/experimental/search/symbol_search.rst +++ b/docs/experimental/search/symbol_search.rst @@ -1,18 +1,104 @@ -1. Yes, SymbolSearch can be utilized in real-time as long as the code - being added or edited is properly indexed in the symbol graph. It - operates based on the current state of the graph, so any updates will - be taken into account during subsequent searches. - -2. The handling of renaming or refactoring is mainly dependent on the - indexing step. If symbols are renamed or refactored, the index (and - by extension, the symbol graph) should be updated to reflect these - changes. Once that’s done, SymbolSearch will be able to correctly - identify the refactored or renamed symbols. - -3. Yes, SymbolSearch is designed to support search methods beyond exact - matches. It can also search for symbols semantically, which can find - symbols related to a search pattern even if they don’t exactly match. - The semantic search is done primarily through the use of embeddings, - which capture the semantic relationships between different symbols. - This allows the search to find related symbols based on their - meanings, not just their names. +SymbolSearch +============ + +Overview +-------- + +``SymbolSearch`` is a class that provides various search methods for +symbols. It initiates a search through embeddings and ranks the results +based on the similarity to the query. It can perform such operations as +getting symbol rank results, retrieving source codes based on symbols, +exactly searching across the indexed codebase, etc. + +``SymbolSearch`` benefits from object-oriented programming to allow +different configurations to customize the behaviour of the search +process. This class takes care of various search functionalities that +include calculating similarity between embeddings, ranking, and +reference finding among others. It interacts with classes like +``SymbolGraph``, ``SymbolRankConfig``, and ``EmbeddingHandler`` for +exhaustive and effective search operations. + +Related Symbols +--------------- + +- ``automata.symbol.symbol_base.Symbol.__eq__`` +- ``automata.symbol.symbol_parser._SymbolParser.accept_escaped_identifier`` +- ``automata.symbol.symbol_parser._SymbolParser.peek_next`` +- ``automata.symbol.graph.symbol_navigator.SymbolGraphNavigator.get_symbol_dependencies`` +- ``automata.symbol.graph.symbol_graph.SymbolGraph.get_potential_symbol_callees`` +- ``automata.symbol.symbol_parser._SymbolParser.accept_space_escaped_identifier`` +- ``automata.symbol.symbol_parser._SymbolParser.current`` +- ``automata.symbol.symbol_base.SymbolDescriptor.get_escaped_name`` +- ``automata.symbol.symbol_parser._SymbolParser.accept_character`` +- ``automata.experimental.scripts.run_update_tool_eval.process_missing_symbols`` + +Example +------- + +Here is an example showcasing how to use SymbolSearch class. + +.. code:: python + + from automata.experimental.search.symbol_search import SymbolSearch + from automata.symbol.graph.symbol_graph import SymbolGraph + from automata.symbol.graph.embedding_handler import EmbeddingHandler + from automata.symbol.graph.embedding_similarity_calculator import EmbeddingSimilarityCalculator + from automata.symbol.graph.symbol_rank_config import SymbolRankConfig + + symbol_graph = SymbolGraph() # Assume SymbolGraph is initialized + embedding_handler = EmbeddingHandler() # Assume EmbeddingHandler is initialized + embedding_similarity_calculator = EmbeddingSimilarityCalculator() # Assume EmbeddingSimilarityCalculator is initialized + symbol_rank_config = SymbolRankConfig() # Assume SymbolRankConfig is initialized + + symbol_search = SymbolSearch( + symbol_graph, + symbol_rank_config, + embedding_handler, + embedding_similarity_calculator) + + query = "Insert your query here" + # Get the ranked results based on the symbol + symbol_rank_results = symbol_search.get_symbol_rank_results(query) + + # Get similar symbols based on the query + symbol_similarity_results = symbol_search.get_symbol_code_similarity_results(query) + + # Get references to a certain symbol + symbol_references = symbol_search.symbol_references("symbol_uri") + + # Retrieves the raw text of a module, class, method, or standalone function + source_code = symbol_search.retrieve_source_code_by_symbol("symbol_uri") + + # Performs an exact search across the indexed codebase + exact_search_result = symbol_search.exact_search("pattern") + +Please replace ``"Insert your query here"``, ``"symbol_uri"``, and +``"pattern"`` with your desired values. + +Limitations +----------- + +The main limitation of ``SymbolSearch`` lies in its dependency on the +correct initialization and functioning of ``SymbolGraph``, +``EmbeddingHandler``, ``EmbeddingSimilarityCalculator``, and +``SymbolRankConfig`` classes. If these classes are not correctly +initialized or have errors, ``SymbolSearch`` may not be able to function +as expected. + +In addition, the processing of NLP queries presumes a specific query +format with a ‘type:…’ and ‘query…’. Incorrectly formatted queries lead +to ``ValueError``. + +The behaviour of methods like ``symbol_references``, +``retrieve_source_code_by_symbol``, and ``_find_pattern_in_modules`` +relies on the quality of ``symbol_uri``, ``node``, and ``pattern`` given +to them. These methods may not behave as expected if the input values +are not as expected. + +Follow-up Questions: +-------------------- + +- How could we improve the error handling of ``SymbolSearch`` when + dependencies have errors or are not properly initialized? +- How can we optimize the ``SymbolSearch`` class when dealing with + large amounts of data or highly complex symbol relationships? diff --git a/docs/experimental/symbol_embedding/index.rst b/docs/experimental/symbol_embedding/index.rst new file mode 100644 index 000000000..605969e73 --- /dev/null +++ b/docs/experimental/symbol_embedding/index.rst @@ -0,0 +1,23 @@ +symbol_embedding +================ + +**Automata** is a Python library for autonomous providers. + +Check out the :doc:`usage` section for further information, including +how to :ref:`installation` the project. + + + +.. AUTO-GENERATED CONTENT START +.. + + .. toctree:: + :maxdepth: 1 + + symbol_doc_embedding_builder + +.. AUTO-GENERATED CONTENT END +.. + + + diff --git a/docs/experimental/symbol_embedding/symbol_doc_embedding_builder.rst b/docs/experimental/symbol_embedding/symbol_doc_embedding_builder.rst new file mode 100644 index 000000000..199d84dcc --- /dev/null +++ b/docs/experimental/symbol_embedding/symbol_doc_embedding_builder.rst @@ -0,0 +1,107 @@ +SymbolDocEmbeddingBuilder +========================= + +``SymbolDocEmbeddingBuilder`` is a dedicated class for generating +embeddings from the documentation of symbols. + +Overview +-------- + +The ``SymbolDocEmbeddingBuilder`` class is designed to interact with and +generate embeddings from the documentation of symbols. The class +interacts with the documentation on two main scopes - those specifying a +class, and those specifying a non-class type symbol. + +The class uses a number of helper methods, such as the +``_build_class_document`` and ``_build_class_document_summary`` methods, +to facilitate the generation of embeddings. It also interacts with an +``EmbeddingVectorProvider`` to obtain the actual embeddings for symbols, +and leverages a context handler for producing relevant context for +symbols. The class itself is an implementation of the +``EmbeddingBuilder`` abstract class. + +One key aspect of this class is its ability to also build non-class type +symbols’ documentation and generate embeddings for them. The +``build_non_class`` method is specifically tailored to handle non-class +type symbols. + +Apart from building individual embeddings, ``SymbolDocEmbeddingBuilder`` +can also build a batch of embeddings using its ``batch_build`` method, +but this feature has not been implemented yet. + +It’s also worth mentioning that the ``class_cut_size`` attribute +determines the threshold of the source code’s length, below which the +code is considered insufficient for processing and embedding generation. + +The ``SymbolDocEmbeddingBuilder`` class considers the context of a +symbol, meaning it includes related symbols, dependencies, and test +scripts in the construction of the symbol context. It may also generate +a search list by splicing the search results on the symbol with the +search results biased on automata.tests. + +Related Symbols +--------------- + +- ``EmbeddingBuilder`` +- ``EmbeddingVectorProvider`` +- ``LLMChatCompletionProvider`` +- ``SymbolSearch`` +- ``PyContextHandler`` +- ``SymbolDocEmbedding`` + +Example +------- + +Below is an example of how to create an instance of the +``SymbolDocEmbeddingBuilder`` and subsequently use it to build a +documentation embedding for a symbol: + +.. code:: python + + from automata.experimental.symbol_embedding.symbol_doc_embedding_builder import SymbolDocEmbeddingBuilder + from automata.llm.providers.openai_llm import OpenAIEmbeddingProvider + from automata.llm.llm_base import LLMChatCompletionProvider + from automata.tools.context_generation.symbol_search import SymbolSearch + from automata.experimental.code_parsers.py.context_processing.context_handler import PyContextHandler + from automata.tools.payl_py_code_objs import Symbol + + # Initializing the necessary providers + embedding_provider = OpenAIEmbeddingProvider() + completion_provider = LLMChatCompletionProvider() + symbol_search = SymbolSearch() + handler = PyContextHandler() + + # Create builder instance + doc_embedding_builder = SymbolDocEmbeddingBuilder(embedding_provider, completion_provider, symbol_search, handler) + + # Assume symbol is a generated symbol + # symbol = ... + doc_embedding = doc_embedding_builder.build(symbol.source_code, symbol) + +Limitations +----------- + +One limitation of the ``SymbolDocEmbeddingBuilder`` is the +``batch_build`` method, which is not yet implemented for building +document embeddings. + +Moreover, the class requires an instance of +``EmbeddingVectorProvider``,\ ``LLMChatCompletionProvider``, +``SymbolSearch``, and ``PyContextHandler``. It implies that it can only +function where these four classes are implemented and can provide the +necessary functionalities. + +As with other classes that use machine learning models for generating +embeddings, the quality of the output depends heavily on the underlying +model and the input data. Badly written or incomplete documentation for +a symbol may lead to poor embeddings, and consequently, unreliable +outcomes when these embeddings are utilized. + +Follow-up Questions +------------------- + +- How does the ``class_cut_size`` attribute influence the processing + and embeddings generation of a given symbol? +- How could we handle symbols whose source code length fails to reach + the ``class_cut_size`` mark, beyond skipping them or considering them + non-class type symbols? diff --git a/docs/experimental/tools/builders/advanced_context_oracle_open_ai_toolkit_builder.rst b/docs/experimental/tools/builders/advanced_context_oracle_open_ai_toolkit_builder.rst new file mode 100644 index 000000000..77a24cea7 --- /dev/null +++ b/docs/experimental/tools/builders/advanced_context_oracle_open_ai_toolkit_builder.rst @@ -0,0 +1,94 @@ +AdvancedContextOracleOpenAIToolkitBuilder +========================================= + +Overview +-------- + +``AdvancedContextOracleOpenAIToolkitBuilder`` is a class in the +``automata.experimental.tools.builders`` module designed to build tools +associated with the context oracle for the OpenAI API. This class +extends the functionality of both +``AdvancedContextOracleToolkitBuilder`` and +``OpenAIAgentToolkitBuilder``. The +``AdvancedContextOracleOpenAIToolkitBuilder`` is registered as a tool +manager with the ``OpenAIAutomataAgentToolkitRegistry``. + +This class primarily hosts the ``build_for_open_ai()`` method, which +builds a list of OpenAI tools. Each OpenAI tool is an instance of +``OpenAITool`` with customized properties and required parameters. + +Related Symbols +--------------- + +- ``automata.llm.providers.openai_llm.OpenAIIncorrectMessageTypeError``: + An exception type for incorrect message types. + +- ``automata.cli.install_indexing.install_indexing()``: A function to + execute the install indexing script. + +- ``automata.llm.providers.openai_llm.OpenAIIncorrectMessageTypeError.__init__``: + Initialization method for the OpenAIIncorrectMessageTypeError + exception. + +- ``automata.agent.openai_agent.OpenAIAutomataAgent._build_initial_messages``: + Method to build initial messages for the agent’s conversation. + +- ``automata.memory_store.conversation_database_providers.OpenAIAutomataConversationDatabase.get_messages``: + Method to get all messages corresponded to the original session id. + +- ``automata.tasks.task_registry.AutomataTaskRegistry.fetch_task_by_id``: + Fetches a task by its recorded session id. + +- ``automata.llm.providers.openai_llm.OpenAIChatCompletionProvider._stream_message``: + Method to stream response message from the agent. + +- ``automata.tasks.task_registry.AutomataTaskRegistry.update_task``: + Updates a task in the registry. + +- ``automata.llm.llm_base.LLMChatCompletionProvider.get_next_assistant_completion``: + Abstract method to return the next assistant’s completion. + +- ``automata.tasks.task_environment.AutomataTaskEnvironment.setup``: + Method to set up the environment by cloning the repository into the + task directory. + +Example +------- + +Here is a simplified example demonstrating how to utilize +``AdvancedContextOracleOpenAIToolkitBuilder`` to build OpenAI tools. + +.. code:: python + + from automata.experimental.tools.builders.advanced_context_oracle_builder\ + import AdvancedContextOracleOpenAIToolkitBuilder + + # Instantiate the builder + builder = AdvancedContextOracleOpenAIToolkitBuilder() + + # Build the OpenAI Tools + openai_tools = builder.build_for_open_ai() + + # Explore the built tools + for tool in openai_tools: + print(f"Tool Function: {tool.function}") + print(f"Tool Name: {tool.name}") + print(f"Tool Description: {tool.description}") + +Limitations +----------- + +The ``AdvancedContextOracleOpenAIToolkitBuilder`` is specifically +designed to generate tools associated with the context oracle for the +OpenAI API. Therefore, it may not be suitable or efficient for building +tools associated with other APIs or non-OpenAI contexts. It also +requires the explicit definition of properties and required parameters +which could limit its flexibility. + +Follow-up Questions: +-------------------- + +- What are the exact properties and required parameters needed in + OpenAITool instances? +- Can we make this class more flexible, allowing it to handle tools + associated with different APIs and contexts? diff --git a/docs/experimental/tools/builders/advanced_context_oracle_toolkit_builder.rst b/docs/experimental/tools/builders/advanced_context_oracle_toolkit_builder.rst new file mode 100644 index 000000000..0ae36c6e0 --- /dev/null +++ b/docs/experimental/tools/builders/advanced_context_oracle_toolkit_builder.rst @@ -0,0 +1,93 @@ +AdvancedContextOracleToolkitBuilder +=================================== + +``AdvancedContextOracleToolkitBuilder`` is a builder class in the +Automata framework that provides tools which translate Natural Language +Processing (NLP) queries into relevant context. It retrieves the context +by evaluating semantic similarity between a specified query and +documentation/code of available symbols. + +Overview +-------- + +The ``AdvancedContextOracleToolkitBuilder`` gets initiated with +``symbol_search``, ``symbol_doc_embedding_handler``, +``symbol_code_embedding_handler``, and +``embedding_similarity_calculator`` objects. These dependencies are used +for handling and translating symbols and their embeddings. + +The builder can create a list of ``Tool`` objects through its ``build`` +method. These tools utilize the ``EmbeddingSimilarityCalculator`` and +``SymbolSearch`` to provide context for a given query by computing +semantic similarity between the query and all available symbols’ +documentation and code. + +Finally, the ``_get_context`` method provides the core functionality of +the ``AdvancedContextOracleToolkitBuilder``. Given a query, it +constructs the context by concatenating the source code and +documentation of the most semantically similar symbol to the query. It +also includes documentation summaries of the most highly ranked symbols +which are similar to the query. + +Related Symbols +--------------- + +- ``automata.cli.options.common_options`` +- ``automata.singletons.github_client.GitHubClient.__init__`` +- ``automata.tasks.task_registry.AutomataTaskRegistry.get_all_tasks`` +- ``automata.cli.cli_utils.ask_choice`` +- ``automata.llm.llm_base.LLMConversation.get_messages_for_next_completion`` +- ``automata.tasks.task_environment.AutomataTaskEnvironment.__init__`` +- ``automata.symbol.symbol_base.SymbolPackage.__repr__`` +- ``automata.symbol.graph.symbol_relationships.RelationshipProcessor.__init__`` +- ``automata.tasks.task_database.AutomataAgentTaskDatabase.update_task`` +- ``automata.tasks.task_database.AutomataAgentTaskDatabase.insert_task`` + +Usage Example +------------- + +.. code:: python + + from automata.experimental.tools.builders.advanced_context_oracle_builder import AdvancedContextOracleToolkitBuilder + + symbol_search = # Initialized SymbolSearch object + symbol_doc_embedding_handler = # Initialized SymbolDocEmbeddingHandler object + symbol_code_embedding_handler = # Initialized SymbolCodeEmbeddingHandler object + embedding_similarity_calculator = # Initialized EmbeddingSimilarityCalculator object + + builder = AdvancedContextOracleToolkitBuilder( + symbol_search, + symbol_doc_embedding_handler, + symbol_code_embedding_handler, + embedding_similarity_calculator + ) + + tools = builder.build() + + # Get context for a query + query = "input processing" + context = tools[0].function(query) + + print(context) + +Ensure to replace ``# Initialized ... object`` comments with the actual +initialized objects. + +Limitations +----------- + +The ``AdvancedContextOracleToolkitBuilder`` relies on the semantic +similarity of the query with symbols’ documentation and source code to +produce the context. This can be a potential limitation as the accuracy +of the context provided depends on the similarity metrics and the +quality of the symbols’ documentation and source code. + +Follow-up Questions: +-------------------- + +- What kinds of similarity metrics are used? +- How can we ensure the quality of the documentation and source code? +- Are there ways to improve the semantic similarity calculation for + more accurate context? +- How can we deal with symbols which don’t have meaningful or relevant + documentation or source code? diff --git a/docs/experimental/tools/builders/agentified_search_open_ai_toolkit_builder.rst b/docs/experimental/tools/builders/agentified_search_open_ai_toolkit_builder.rst new file mode 100644 index 000000000..f00c1e82b --- /dev/null +++ b/docs/experimental/tools/builders/agentified_search_open_ai_toolkit_builder.rst @@ -0,0 +1,68 @@ +AgentifiedSearchOpenAIToolkitBuilder +==================================== + +Overview +-------- + +``AgentifiedSearchOpenAIToolkitBuilder`` is a class extending the +``AgentifiedSearchToolkitBuilder`` and ``OpenAIAgentToolkitBuilder``, +associated with the Agent Toolkit ``AGENTIFIED_SEARCH``. This class is +designed to build tools for agentified search functionality using the +OpenAI API. + +The ``build_for_open_ai`` method is used to build and return a list of +OpenAITool objects based on the tools associated with agentified search. +The properties for the created ``OpenAITool`` objects include a query of +string type. + +Related Symbols +--------------- + +- ``automata.experimental.tools.builders.agentified_search_builder.AgentifiedSearchToolkitBuilder`` +- ``automata.llm.providers.openai_llm.OpenAIConversation.add_message`` +- ``automata.singletons.github_client.GitHubClient.create_issue`` +- ``automata.tasks.task_environment.AutomataTaskEnvironment.commit_task`` +- ``automata.tasks.task_database.AutomataAgentTaskDatabase.get_tasks_by_query`` + +Example +------- + +.. code:: python + + from automata.experimental.tools.builders.agentified_search_builder import AgentifiedSearchOpenAIToolkitBuilder + + # Create an instance of the class + tool_builder = AgentifiedSearchOpenAIToolkitBuilder() + + # Build the tools for OpenAI + openai_tools = tool_builder.build_for_open_ai() + +This script begins by importing necessary class +``AgentifiedSearchOpenAIToolkitBuilder``. An instance of this class is +then created. Finally, the tools for agentified search for OpenAI are +built and returned, using the ``build_for_open_ai`` method. + +Limitations +----------- + +``AgentifiedSearchOpenAIToolkitBuilder`` is a builder specifically +designed for agentified search with an OpenAI LLM Provider. This implies +that it may not be suited or compatible with other LLM Providers. The +properties of the built ``OpenAITool`` contains only ‘query’. If you +want to include additional properties, it may require changes in the +builder or an extended implementation. + +Follow-up Questions: +-------------------- + +- How compatible is ``AgentifiedSearchOpenAIToolkitBuilder`` with other + LLM Providers? +- How can additional properties, if required, be added to the built + OpenAITools? + +The context references symbols related to Git tasks, such as +``commit_task`` and ``create_issue``, as well as ones to paths and +indexing, such as ``get_project_paths`` and ``_load_index_protobuf``. +However, it’s unclear how these are connected or used in conjunction +with ``AgentifiedSearchOpenAIToolkitBuilder``. Better clarity on these +relationships is needed. diff --git a/docs/experimental/tools/builders/agentified_search_toolkit_builder.rst b/docs/experimental/tools/builders/agentified_search_toolkit_builder.rst new file mode 100644 index 000000000..005a773d3 --- /dev/null +++ b/docs/experimental/tools/builders/agentified_search_toolkit_builder.rst @@ -0,0 +1,81 @@ +AgentifiedSearchToolkitBuilder +============================== + +Overview +-------- + +``AgentifiedSearchToolkitBuilder`` is a class responsible for +constructing tools used in agent facilitated search operations. Its +principal role is to create a list of ``Tool`` instances where each tool +represents a different operation in the codebase search process. These +tools perform operations such as fetching top N matches from symbol +search, retrieving the complete Python code for the best match among the +obtained results, and getting comprehensive documentation for the best +match if it exists. + +The class uses multiple components to facilitate its operations, such as +a ``SymbolSearch`` object for searching symbols, +``SymbolDocEmbeddingHandler`` for handling symbol document embeddings, +and an ``LLMChatCompletionProvider`` for providing completion prompts. + +The class inherits from the ``AgentToolkitBuilder`` abstract class and +overrides its abstract ``build`` method providing a custom +implementation for generating search specific tools. + +Related Symbols +--------------- + +- ``automata.experimental.search.symbol_search.SymbolSearch`` +- ``automata.experimental.symbol_embedding.symbol_embedding_handler.SymbolDocEmbeddingHandler`` +- ``automata.llm.providers.llm_chat.LLMChatCompletionProvider`` +- ``automata.agent.agent.AgentToolkitBuilder`` + +Example Usage +------------- + +Let’s set up a ``AgentifiedSearchToolkitBuilder`` and build its +corresponding tools: + +.. code:: python + + from automata.experimental.tools.builders.agentified_search_builder import AgentifiedSearchToolkitBuilder + from automata.experimental.search.symbol_search import SymbolSearch + from automata.experimental.symbol_embedding.symbol_embedding_handler import SymbolDocEmbeddingHandler + + # ...assuming we already have some pre-initialized symbol_search and symbol_doc_embedding_handler objects... + + toolkit_builder = AgentifiedSearchToolkitBuilder(symbol_search, symbol_doc_embedding_handler, top_n=5) + + tools = toolkit_builder.build() + + for tool in tools: + print(tool.name) # For instance, print out the name of each created tool + +This should generate tools for the agent-facilitated search feature and +print their names: ‘search-top-matches’, ‘search-best-match-code’, and +‘search-best-match-docs’. + +Limitations +----------- + +``AgentifiedSearchToolkitBuilder`` relies on the +``get_symbol_code_similarity_results`` function of the provided +SymbolSearch object to acquire search results. Any limitations to this +function or inaccurate results produced by this function will affect the +toolkit builder’s performance. + +Moreover, the builder assumes that documentation and code of the best +match are readily available and valid. In scenarios where these are +missing or improperly formatted, the tools generated may fail to perform +as expected. + +Follow-up Questions: +-------------------- + +- How does ``AgentifiedSearchToolkitBuilder`` handle cases where the + ``SymbolSearch`` object does not find any matching symbols? +- What alternatives are there if the + ``get_symbol_code_similarity_results`` function in the provided + ``SymbolSearch`` object is deficient or unavailable? +- How can ``AgentifiedSearchToolkitBuilder`` handle the absence or + improper formatting of documentation and code? diff --git a/docs/experimental/tools/builders/document_oracle_open_ai_toolkit_builder.rst b/docs/experimental/tools/builders/document_oracle_open_ai_toolkit_builder.rst new file mode 100644 index 000000000..849ae3f08 --- /dev/null +++ b/docs/experimental/tools/builders/document_oracle_open_ai_toolkit_builder.rst @@ -0,0 +1,76 @@ +DocumentOracleOpenAIToolkitBuilder +================================== + +``DocumentOracleOpenAIToolkitBuilder`` is an intricate part of the +automata toolkit designed to function as a manager of document oracle +tools in the context of the OpenAI API. This tool builder inherits from +``DocumentOracleToolkitBuilder`` and ``OpenAIAgentToolkitBuilder``. It +is responsible for generating tools related to OpenAI’s document oracle +tasks such as query creation, data retrieval, and data processing. + +Overview +-------- + +The ``DocumentOracleOpenAIToolkitBuilder`` class registers itself within +the toolkit registry and defines its associated methods. Specifically, +the ``build_for_open_ai`` method builds the tools for the OpenAI API +context. These tools are packaged with properties like ‘query’, which +receives a string representing the query to search for in the document. +The ‘query’ property is mandatory for each tool. + +Related Symbols +--------------- + +- ``automata.singletons.github_client.RepositoryClient.create_pull_request`` +- ``automata.symbol.symbol_utils.load_data_path`` +- ``automata.singletons.github_client.RepositoryClient.create_branch`` +- ``automata.llm.providers.openai_llm.OpenAIFunction.to_dict`` +- ``automata.llm.llm_base.LLMConversationDatabaseProvider.update`` +- ``automata.llm.llm_base.LLMConversationDatabaseProvider.save_message`` +- ``automata.singletons.github_client.GitHubClient.clone_repository`` +- ``automata.tasks.task_base.Task.notify_observer`` +- ``automata.symbol_embedding.symbol_embedding_handler.SymbolEmbeddingHandler.flush`` +- ``automata.symbol.symbol_base.Symbol.module_path`` + +Usage Example +------------- + +.. code:: python + + from automata.experimental.tools.builders.document_oracle_builder import DocumentOracleOpenAIToolkitBuilder + from automata.llm.providers.lml_provider import LLMProvider + + doc_oracle_builder = DocumentOracleOpenAIToolkitBuilder() + + # print the name of the toolkit associated with the builder + print(doc_oracle_builder.TOOL_NAME) + + # print the lml provider used by the builder + print(doc_oracle_builder.LLM_PROVIDER) + + # Build tools for an OpenAI Document Oracle + tools = doc_oracle_builder.build_for_open_ai() + + # print the properties and requirements of each tool + for tool in tools: + print(tool.properties) + print(tool.required) + +Limitations +----------- + +The ``DocumentOracleOpenAIToolkitBuilder`` is currently limited by the +predefined properties and methods it inherits from +``DocumentOracleToolkitBuilder`` and ``OpenAIAgentToolkitBuilder``. +Customizations or extensions might be limited or require significant +alterations to the component’s base classes. + +Follow-up Questions: +-------------------- + +- How can we extend ``DocumentOracleOpenAIToolkitBuilder`` to + accommodate a broader range of OpenAI’s document oracle tasks? +- Is there a way to customize or extend the properties of the tools + created by ``DocumentOracleOpenAIToolkitBuilder``? +- How does ``DocumentOracleOpenAIToolkitBuilder`` interact with + OpenAI’s document oracle in real-time applications? diff --git a/docs/experimental/tools/builders/document_oracle_toolkit_builder.rst b/docs/experimental/tools/builders/document_oracle_toolkit_builder.rst new file mode 100644 index 000000000..c36d52621 --- /dev/null +++ b/docs/experimental/tools/builders/document_oracle_toolkit_builder.rst @@ -0,0 +1,76 @@ +DocumentOracleToolkitBuilder +============================ + +Overview +-------- + +``DocumentOracleToolkitBuilder`` is a class derived from +``AgentToolkitBuilder`` that is primarily focused on providing tools +which translate a natural language processing (NLP) query to relevant +context. It accomplishes this by finding the most semantically similar +symbol’s documentation in a Python codebase. + +The builder utilizes two main components: ``SymbolSearch`` and +``SymbolDocEmbeddingHandler``. These components assist in identifying +the most related symbol in a given codebase, and returning its +corresponding class documentation which provides context for the query. + +The builder method constructs the tools associated with the +``DocumentOracleToolkitBuilder``, designated with the name +‘document-oracle’. + +Related Symbols +--------------- + +- ``automata.cli.env_operations.replace_key`` +- ``automata.symbol.symbol_base.Symbol.is_protobuf`` +- ``automata.symbol.symbol_base.SymbolDescriptor.unparse`` +- ``automata.tasks.task_base.Task._get_log_dir`` +- ``automata.tasks.task_base.Task._get_task_dir`` +- ``automata.cli.env_operations.get_key`` +- ``automata.symbol_embedding.symbol_embedding_base.SymbolCodeEmbedding.metadata`` +- ``automata.llm.providers.openai_llm.OpenAIChatMessage.to_dict`` +- ``automata.cli.env_operations.log_cli_output`` +- ``automata.experimental.scripts.run_update_tool_eval.process_modules`` + +Usage Example +------------- + +.. code:: python + + from automata.experimental.tools.builders.document_oracle_builder import DocumentOracleToolkitBuilder + from automata.symbol_search import SymbolSearch + from automata.embedding_handlers.symbol_doc_embedding_handler import SymbolDocEmbeddingHandler + + symbol_search = SymbolSearch(symbol_database, symbol_ranker) + symbol_doc_embedding_handler = SymbolDocEmbeddingHandler(embedding_calculator, symbol_database) + + doc_oracle_builder = DocumentOracleToolkitBuilder(symbol_search, symbol_doc_embedding_handler) + doc_oracle_tool = doc_oracle_builder.build()[0] + + query = "What is the purpose of replace_key function?" + res = doc_oracle_tool.function(query) + +Limitations +----------- + +The primary limitation of the ``DocumentOracleToolkitBuilder`` is its +dependence on the quality and correctness of the embedded documentation +linked to code symbols. It inherently relies on the accuracy of natural +language understanding capabilities to find the right context from the +symbol’s documentation. Thus, poorly documented or ambiguous definitions +could lead to misleading context. Further, it might fail when trying to +generate embeddings for symbols, in such cases, it returns the +corresponding error. + +Lastly, ``DocumentOracleToolkitBuilder`` does not account for potential +changes in a codebase over time. To function optimally, the tool assumes +an up-to-date codebase with complete and relevant documentation for each +symbol. + +Follow-up Questions: +-------------------- + +- Is there a feature to update the symbol embeddings in the + ``DocumentOracleToolkitBuilder`` in case of dynamic changes in the + codebase? diff --git a/docs/experimental/tools/builders/index.rst b/docs/experimental/tools/builders/index.rst index 97d7110b9..3ebfaa984 100644 --- a/docs/experimental/tools/builders/index.rst +++ b/docs/experimental/tools/builders/index.rst @@ -8,13 +8,23 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. .. toctree:: :maxdepth: 1 + advanced_context_oracle_open_ai_toolkit_builder + advanced_context_oracle_toolkit_builder + agentified_search_open_ai_toolkit_builder + agentified_search_toolkit_builder + document_oracle_open_ai_toolkit_builder + document_oracle_toolkit_builder search_tool + symbol_search_open_ai_toolkit_builder + symbol_search_toolkit_builder .. AUTO-GENERATED CONTENT END .. diff --git a/docs/experimental/tools/builders/search_tool.rst b/docs/experimental/tools/builders/search_tool.rst index ed42cfbfb..89ac95907 100644 --- a/docs/experimental/tools/builders/search_tool.rst +++ b/docs/experimental/tools/builders/search_tool.rst @@ -1,14 +1,6 @@ -- If new search tools are developed, they would need to be manually - added as new enumerators to the ``SearchTool`` class. As - ``SearchTool`` is an enumerator, it doesn’t provide built-in - functions to dynamically add or remove members. However, the - developers can extend the functionality as they continue to introduce - new search tools. - -- If a user tries to specify a search tool that isn’t available, it - would result in an ``AttributeError``. The Python Enum module will - raise this error when trying to access an attribute that does not - exist in the enumeration. Ideally, the application should handle this - exception to provide a user-friendly error message. It could be - further enhanced by providing a list of available search tools to the - user. +class SearchTool(Enum): ‘:raw-latex:`\n `Available search +tools.:raw-latex:`\n `’ AGENT_FACILITATED_SEARCH = +‘llm-facilitated-search’ SYMBOL_SIMILARITY_SEARCH = +‘symbol-similarity-search’ SYMBOL_RANK_SEARCH = ‘symbol-rank-search’ +SYMBOL_REFERENCES = ‘symbol-references’ RETRIEVE_SOURCE_CODE_BY_SYMBOL = +‘retrieve-source-code-by-symbol’ EXACT_SEARCH = ‘exact-search’ diff --git a/docs/experimental/tools/builders/symbol_search_open_ai_toolkit_builder.rst b/docs/experimental/tools/builders/symbol_search_open_ai_toolkit_builder.rst new file mode 100644 index 000000000..2bc099fe1 --- /dev/null +++ b/docs/experimental/tools/builders/symbol_search_open_ai_toolkit_builder.rst @@ -0,0 +1,81 @@ +SymbolSearchOpenAIToolkitBuilder +================================ + +``SymbolSearchOpenAIToolkitBuilder`` is a component utilized in building +OpenAI tools, specifically for symbol search operations. It extends from +the ``SymbolSearchToolkitBuilder`` and ``OpenAIAgentToolkitBuilder`` +classes. The primary purpose of this builder is to allow OpenAI tool to +be constructed for specific symbol search operations. + +Overview +-------- + +The main responsibility of ``SymbolSearchOpenAIToolkitBuilder`` is to +define the build process for an OpenAI tool. It sets the ``TOOL_NAME`` +and ``LLM_PROVIDER`` property to Symbol Search and OpenAI respectively, +inheriting from ``SymbolSearchToolkitBuilder`` and +``OpenAIAgentToolkitBuilder``. It then overrides the +``build_for_open_ai`` method, building the tools with specific +properties and requirements designed for OpenAI tools. + +Related Symbols +--------------- + +The ``SymbolSearchOpenAIToolkitBuilder`` is related to the following +symbols: + +- ``automata.singletons.py_module_loader.PyModuleLoader._load_all_modules``: + Loads all modules in the map. +- ``automata.experimental.scripts.run_update_tool_eval.get_extra_symbols``: + Returns a list of the extra symbols. +- ``automata.symbol_embedding.symbol_embedding_handler.SymbolEmbeddingHandler.filter_symbols``: + Filter the symbols to only those in the new sorted_supported_symbols + set +- ``automata.experimental.scripts.run_update_tool_eval.get_missing_symbols``: + Returns a list of the missing symbols. +- ``automata.symbol.graph.symbol_navigator.SymbolGraphNavigator._get_references_to_module``: + Gets all references to a module in the graph. +- ``automata.symbol.symbol_base.SymbolReference.__hash__``: Computes a + hash value for a symbol reference. + +Example +------- + +Before using the example below, please ensure that all necessary +packages and modules have been correctly installed and imported. + +.. code:: python + + from automata.experimental.tools.builders.symbol_search_builder import SymbolSearchOpenAIToolkitBuilder + from automata.tools.tool_metadata import ToolFunction + + # Initialize the tool builder + builder = SymbolSearchOpenAIToolkitBuilder() + + # Create new tools + tool = ToolFunction(function="my_function", name="my_tool", description="my custom tool") + builder.add_tool_function(tool) + + # Build tools for open AI + openai_tools = builder.build_for_open_api() + +This example shows how to create an instance of +``SymbolSearchOpenAIToolkitBuilder``, add a new tool function, and then +build OpenAI tools. + +Limitations +----------- + +As the ``SymbolSearchOpenAIToolkitBuilder`` class is primarily designed +to build OpenAI tools for symbol search operations, it may not be +suitable or effective for building tools for non-symbol search +operations. It also relies on the structure and specifications of the +``OpenAITool`` class for building tools. + +Follow-up Questions: +-------------------- + +- Are there any specific constraints or requirements on the tool + functions added to the ``SymbolSearchOpenAIToolkitBuilder``? +- Can the ``SymbolSearchOpenAIToolkitBuilder`` be extended or modified + to support the building of tools for other types of operations? diff --git a/docs/experimental/tools/builders/symbol_search_toolkit_builder.rst b/docs/experimental/tools/builders/symbol_search_toolkit_builder.rst new file mode 100644 index 000000000..2c5a67834 --- /dev/null +++ b/docs/experimental/tools/builders/symbol_search_toolkit_builder.rst @@ -0,0 +1,72 @@ +SymbolSearchToolkitBuilder +========================== + +Overview +-------- + +The ``SymbolSearchToolkitBuilder`` is a class responsible for the +interaction with the SymbolSearch API. Its main focus is to search an +indexed Python codebase. + +The class leverages SymbolSearch in different capacities, such as +agent-facilitated search, symbol similarity search, symbol rank search, +retrieving source code by symbol, locating symbol references, and +executing exact searches. + +Each type of search operation is encapsulated in its own method and +these are used during the building phase of the toolkit. Upon +construction, the toolkit is populated with a set of pre-defined tools. +The builder also provides a method (``process_query``) to process a +given query by routing it to the appropriate tool. + +Related Symbols +--------------- + +- ``automata.experimental.tools.builders.symbol_search_builder.SymbolSearchToolkitBuilder._symbol_references_processor`` +- ``automata.experimental.tools.builders.symbol_search_builder.SymbolSearchToolkitBuilder._symbol_rank_search_processor`` +- ``automata.experimental.tools.builders.symbol_search_builder.SymbolSearchToolkitBuilder._symbol_code_similarity_search_processor`` +- ``automata.experimental.tools.builders.symbol_search_builder.SymbolSearchToolkitBuilder._symbol_agent_search_processor`` +- ``automata.experimental.tools.builders.symbol_search_builder.SymbolSearchToolkitBuilder._exact_search_processor`` +- ``automata.experimental.tools.builders.symbol_search_builder.SymbolSearchToolkitBuilder.__init__`` +- ``automata.experimental.tools.builders.symbol_search_builder.SymbolSearchToolkitBuilder._retrieve_source_code_by_symbol_processor`` +- ``automata.experimental.search.symbol_search.SymbolSearch.exact_search`` +- ``automata.experimental.search.symbol_search.SymbolSearch.retrieve_source_code_by_symbol`` + +Example +------- + +.. code:: python + + from automata.experimental.tools.builders.symbol_search_builder import SymbolSearchToolkitBuilder + from automata.experimental.search.symbol_search import SymbolSearch + + symbol_search = SymbolSearch() + builder = SymbolSearchToolkitBuilder(symbol_search) + + builder.build() # builds a suite of tools for searching the associated codebase + builder.process_query(SearchTool.SYMBOL_SIMILARITY_SEARCH, 'search_query') # processes a query by routing it to the symbol similarity search tool + +Limitations +----------- + +``SymbolSearchToolkitBuilder`` relies on the ``SymbolSearch`` object +supplied during initialization. Therefore, any limitations inherent to +SymbolSearch will carry over. For instance, if the ``SymbolSearch`` +object is not correctly initialized or its source data is not properly +indexed, the search functionality provided by the +``SymbolSearchToolkitBuilder`` may fail or under-perform. + +Particularly, for methods involving text processing such as +``_symbol_agent_search_processor``, the end results highly rely on the +model used for AI-based suggestions (currently using GPT-4). The quality +and relevance of the results will depend on the capabilities and tuning +parameters of this model. + +Follow-up Questions: +-------------------- + +- What should happen if an unknown ``tool_type`` is supplied to the + ``process_query`` function? +- Is there a need for an update or refresh method to occur if the + underlying source data of the ``SymbolSearch`` object changes after + the ``SymbolSearchToolkitBuilder`` has been initialized? diff --git a/docs/experimental/tools/index.rst b/docs/experimental/tools/index.rst index 7dfcefd80..4f561ce24 100644 --- a/docs/experimental/tools/index.rst +++ b/docs/experimental/tools/index.rst @@ -8,6 +8,8 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/github_management/index.rst b/docs/github_management/index.rst index 2371bb4a6..b5c71715f 100644 --- a/docs/github_management/index.rst +++ b/docs/github_management/index.rst @@ -18,6 +18,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/index.rst b/docs/index.rst index ebb3754d7..6d206dd53 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -85,6 +85,8 @@ The following module documents are auto-generated via the run-doc-embedding pipe + + @@ -99,6 +101,7 @@ The following module documents are auto-generated via the run-doc-embedding pipe faq setup_guide agent/index + cli/index code_handling/index code_parsers/index code_writers/index diff --git a/docs/llm/eval/index.rst b/docs/llm/eval/index.rst index 6fb7de12a..e17d8d827 100644 --- a/docs/llm/eval/index.rst +++ b/docs/llm/eval/index.rst @@ -8,6 +8,8 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/llm/foundation/index.rst b/docs/llm/foundation/index.rst index 0910ff279..ad5aac78e 100644 --- a/docs/llm/foundation/index.rst +++ b/docs/llm/foundation/index.rst @@ -17,6 +17,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/llm/index.rst b/docs/llm/index.rst index cd39e6da7..65e202b25 100644 --- a/docs/llm/index.rst +++ b/docs/llm/index.rst @@ -23,6 +23,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. @@ -36,6 +38,7 @@ Summary of content llm_conversation_database_provider eval/index foundation/index + llm_base/index providers/index .. AUTO-GENERATED CONTENT END diff --git a/docs/llm/llm_base/index.rst b/docs/llm/llm_base/index.rst new file mode 100644 index 000000000..5b873f712 --- /dev/null +++ b/docs/llm/llm_base/index.rst @@ -0,0 +1,23 @@ +llm_base +======== + +**Automata** is a Python library for autonomous providers. + +Check out the :doc:`usage` section for further information, including +how to :ref:`installation` the project. + + + +.. AUTO-GENERATED CONTENT START +.. + + .. toctree:: + :maxdepth: 1 + + llm_empty_conversation_error + +.. AUTO-GENERATED CONTENT END +.. + + + diff --git a/docs/llm/llm_base/llm_empty_conversation_error.rst b/docs/llm/llm_base/llm_empty_conversation_error.rst new file mode 100644 index 000000000..d8ed63206 --- /dev/null +++ b/docs/llm/llm_base/llm_empty_conversation_error.rst @@ -0,0 +1,7 @@ +class LLMEmptyConversationError(Exception): ‘Raised when the +conversation is empty.’ + +:: + + def __init__(self, message: str='The conversation is empty.') -> None: + super().__init__(message) diff --git a/docs/llm/llm_chat_completion_provider.rst b/docs/llm/llm_chat_completion_provider.rst index afe684787..c5665bc5c 100644 --- a/docs/llm/llm_chat_completion_provider.rst +++ b/docs/llm/llm_chat_completion_provider.rst @@ -1,79 +1,91 @@ LLMChatCompletionProvider ========================= -``LLMChatCompletionProvider`` is an abstract base class used to -structure different types of Language Learning Model (LLM) chat -completion providers. The class contains four essential methods that -should be implemented by any subclass. These methods include adding new -chat messages and retrieving the next assistant’s completion from a chat -provider. Additionally, the chat provider can be reset, and it can -operate as a standalone output supplier for the LLM. - -Its main function is to form the fundamental structure for various chat -completion providers in the LLM by standardizing their core methods. - Overview -------- -The ``LLMChatCompletionProvider`` class provides a blueprint for LLM -chat completion providers. It comprises two primary operations – sending -and receiving messages from the chat provider and managing the chat -session. This is especially crucial in controlling the flow of data in -and out of the chat provider, paired with the functionality to control -and manipulate the chat buffer. +``LLMChatCompletionProvider`` is an abstract base class developed in the +``automata.llm.llm_base`` module and serves as a blueprint for building +different types of LLM chat completion providers. Designed as a core +component of an AI assistant, it provides the structure for receiving, +interpreting, and generating responses to user messages. + +The key methods defined in the base class include +``get_next_assistant_completion()``, ``add_message()``, ``reset()``, and +``standalone_call()``. These methods provide a range of capabilities +from fetching the next assistant completion to managing the provider’s +buffer of chat messages. The ``standalone_call()`` method is especially +important as it allows interacting with the LLM chat provider +independently, which can be handy when the provider is treated as a +singular output source rather than a chat provider. Related Symbols --------------- -- ``LLMChatMessage``: This is a base class for different types of chat - messages that are used by LLM and can be provided to the - LLMChatCompletionProvider to add new messages to the chat buffer. -- ``LLMCompletionResult``: This provides the structure for different - types of completion results received from the - ``LLMChatCompletionProvider``. -- ``OpenAIChatCompletionProvider``: This is a subclass of - ``LLMChatCompletionProvider`` that uses the OpenAI API to provide - chat messages. This class has implemented the abstract methods of the - ``LLMChatCompletionProvider`` and can operate as functional - completion provider. +- ``automata.llm.providers.openai_llm.OpenAIConversation.get_latest_message`` +- ``automata.llm.providers.openai_llm.OpenAIConversation.get_messages_for_next_completion`` +- ``automata.llm.llm_base.LLMConversationDatabaseProvider.get_messages`` +- ``automata.llm.providers.openai_llm.OpenAIConversation.reset_conversation`` +- ``automata.llm.llm_base.LLMConversation.register_observer`` +- ``automata.llm.providers.openai_llm.OpenAIConversation.__len__`` +- ``automata.llm.llm_base.LLMConversation.notify_observers`` +- ``automata.llm.llm_base.LLMConversation.unregister_observer`` +- ``automata.core.utils.get_logging_config`` +- ``automata.singletons.github_client.GitHubClient.merge_pull_request``. + +As the methods in ``LLMChatCompletionProvider`` are abstract, they need +to be overriden in any class that inherits from +``LLMChatCompletionProvider``. As such, related symbols include methods +in classes that are likely to override these methods. Example ------- -As ``LLMChatCompletionProvider`` is an abstract base class, you cannot -instantiate it or use it as is. Instead, you use subclasses of -``LLMChatCompletionProvider`` that have implemented the abstract -methods. One such subclass is ``OpenAIChatCompletionProvider``. Below is -an example of how to use it: - .. code:: python - from automata.llm.providers.openai import OpenAIChatCompletionProvider - from automata.llm.foundation import LLMChatMessage + class CustomChatCompletionProvider(LLMChatCompletionProvider): + def get_next_assistant_completion(self) -> LLMChatMessage: + # Implement custom logic to get the next assistant message + pass + + def add_message(self, message: LLMChatMessage, session_id: Optional[str]=None) -> None: + # Implement custom logic to add a new message to the buffer. + pass - provider = OpenAIChatCompletionProvider() + def reset(self) -> None: + # Implement custom logic to reset the chat provider's buffer. + pass - # Add a new message to the provider's buffer - provider.add_message(LLMChatMessage(content="Hello World", channel="general")) + def standalone_call(self, prompt: str, session_id: Optional[str]=None) -> str: + # If the provider's buffer is not devoid of content Throw Exception + # else implement custom logic to handle standalone calls. + pass - # Get the next assistant completion from the LLM. - next_message = provider.get_next_assistant_completion() - print(next_message.content) # Prints the content of the next assistant completion message +This demonstrates how a developer might implement a class that inherits +from ``LLMChatCompletionProvider``. Note, however, each method contains +a ``pass`` statement, indicating that the methods need to be replaced in +accordance with specific completion provider requirements. Limitations ----------- -The LLMChatCompletionProvider only provides an abstract structure and -does not implement the methods which limits its direct usage. Subclasses -are required to implement the where necessary for interacting with -different LLM chat completion providers. - -It also assumes that a unique message can be added to the provider’s -buffer and that the provider can be queried for the next assistant -completion at any time. This may not align with the actual behavior of -all chat completion providers. - -Follow-up Questions -------------------- - -- Can this class be refactored further for more versatile usage? +As ``LLMChatCompletionProvider`` is an abstract base class, it does not +provide any functionality on its own and must be subclassed. These +subclasses must implement all of its abstract methods, or they too will +become abstract classes. Moreover, the actual behavior of the methods is +entirely dependent on their implementation in the subclasses, resulting +in potential variability and inconsistency between different LLM +providers. + +Additionally, the ``standalone_call()`` method may result in an +exception if the chat provider’s buffer is not devoid of content. + +Follow-up Questions: +-------------------- + +- What are some recommended best practices for implementing the + abstract methods in ``LLMChatCompletionProvider`` subclasses? +- How does the ``standalone_call()`` work in synergy with the other + methods of the class? +- Can we build in some consistency-check mechanisms to ensure a + standard behavior across different LLM providers? diff --git a/docs/llm/llm_chat_message.rst b/docs/llm/llm_chat_message.rst index cb90c54cd..f012ee427 100644 --- a/docs/llm/llm_chat_message.rst +++ b/docs/llm/llm_chat_message.rst @@ -1,76 +1,7 @@ -LLMChatMessage -============== +class LLMChatMessage(BaseModel): ‘Base class for different types of LLM +chat messages.’ role: str content: Optional[str] = None -``LLMChatMessage`` is a base class representing different types of Lower -Level Model (LLM) chat messages. This class structures the chat messages -that are processed to and from an LLM. It is used widely throughout the -linked conversational module talks, and plays a critical role in -structuring and storing various chat interactions for retrieval later. +:: -Overview --------- - -The ``LLMChatMessage`` class provides a way to structure conversations -in a conversational user interface with an LLM. Each instance of the -class represents one message in the chat. The ``LLMChatMessage`` class -encapsulates the role and content of a chat message and provides a -uniform interface in the form of the ``to_dict()`` method for converting -the message to a dictionary object. - -The ``LLMChatMessage`` class is included in the interaction with the -chat API, the chat message completion providers, the chat conversations, -and in test scenarios. - -Related Symbols ---------------- - -- ``automata.llm.providers.openai.OpenAIChatMessage`` -- ``automata.llm.providers.openai.OpenAIConversation.get_latest_message`` -- ``automata.llm.foundation.LLMConversation.get_latest_message`` - -Examples --------- - -The following is an example demonstrating how to create an instance of -``LLMChatMessage`` and use it in conversation. - -.. code:: python - - from automata.llm.foundation import LLMChatMessage - - # Create a LLMChatMessage instance - message = LLMChatMessage(role="user", content="Hello, how are you?") - - # Convert the message to a dict - message_dict = message.to_dict() - print(message_dict) # Prints: {'role': 'user', 'content': 'Hello, how are you?'} - -The following is an example demonstrating how to save a conversation -interaction to a database. - -.. code:: python - - from automata.llm.foundation import LLMChatMessage - from automata.core.base.database.relational import SQLDatabase - - # Given a SQL database instance and a conversation interaction - db = SQLDatabase() - interaction = {"role": "user", "content": "Good morning!"} - - # Save the message to the database - db.save_message(LLMChatMessage(**interaction)) - -Limitations ------------ - -``LLMChatMessage`` is essentially a structure providing interface for a -chat message object. It does not check the validity of the chat message -or analyze its text. Additional limitations depend on the -implementations in the related symbols. - -##Follow-up Questions: - -- What are the valid values for the ``role`` attribute in - ``LLMChatMessage``? -- Is there a limit on the ``content`` length for a chat message? If so, - how is a message beyond this limit handled? + def to_dict(self) -> Dict[(str, Any)]: + return {'role': self.role, 'content': self.content} diff --git a/docs/llm/llm_completion_result.rst b/docs/llm/llm_completion_result.rst index 346a69d2f..3cc3c5457 100644 --- a/docs/llm/llm_completion_result.rst +++ b/docs/llm/llm_completion_result.rst @@ -1,63 +1,12 @@ -LLMCompletionResult -=================== +class LLMCompletionResult(BaseModel): ‘Base class for different types of +LLM completion results.’ role: str content: Optional[str] = None -``LLMCompletionResult`` is a base class designed to manage different -types of LLM completion results. With two principal methods: -``get_content()`` and ``get_role()``, this class aids in fetching the -content and role associated with a completion result. +:: -Related Symbols ---------------- + def get_role(self) -> str: + 'Get the role of the completion result.' + return self.role -- ``automata.tests.unit.test_automata_agent.mock_openai_response_with_completion_message`` -- ``automata.llm.foundation.LLMConversation.get_latest_message`` -- ``automata.llm.providers.openai.OpenAIChatCompletionResult`` -- ``automata.tests.unit.sample_modules.sample_module_write.CsSWU`` -- ``automata.config.base.LLMProvider`` -- ``automata.llm.providers.openai.OpenAIConversation.get_latest_message`` -- ``automata.llm.providers.openai.OpenAIChatMessage`` -- ``automata.tests.unit.test_symbol_search_tool.test_retrieve_source_code_by_symbol`` -- ``automata.llm.foundation.LLMChatCompletionProvider.get_next_assistant_completion`` -- ``automata.tests.unit.sample_modules.sample.EmptyClass`` - -Example -------- - -In situations where it’s required to extract the content or role from a -completion result, ``LLMCompletionResult`` is applicable. Below is an -example illustrating its functionality. - -.. code:: python - - from automata.llm.foundation import LLMCompletionResult - - # create an instance of LLMCompletionResult with defined role and content attributes - completion_result = LLMCompletionResult(content="content of the completion result", role="assistant") - - # fetch the content - content = completion_result.get_content() - print(content) # output should be "content of the completion result" - - # fetch the role - role = completion_result.get_role() - print(role) # output should be "assistant" - -Limitations ------------ - -This class serves as a base class and it may not provide any specific -functionality beyond providing an interface for subclasses. Hence, if a -feature is not supported in this class, check the subclasses to see if -they have the feature needed. - -Follow-up Questions: --------------------- - -- What are some practical use-cases of the ``LLMCompletionResult``? -- Are there specific types of completion results this class can’t - handle? If so, what alternative methods or classes should we use for - such cases? -- Are there any constraints or prerequisites for the content or the - role of the completion result? -- How does the ``LLMCompletionResult`` integrate with other components - of Automata? What’s its role in the broader scheme? + def get_content(self) -> Any: + 'Get the content of the completion result.' + return self.content diff --git a/docs/llm/llm_conversation.rst b/docs/llm/llm_conversation.rst index ce40676cb..2d17a49d3 100644 --- a/docs/llm/llm_conversation.rst +++ b/docs/llm/llm_conversation.rst @@ -1,102 +1,92 @@ LLMConversation =============== -``LLMConversation`` is an abstract base class for different types of -Language-Learning Model (LLM) conversations in the Automata framework. -It provides a blueprint for managing multiple conversations with -different observers in a multithreaded application scenario. +``LLMConversation`` acts as an abstract base class defining the +essential features and behaviour for different types of LLM (logic and +language model) conversation models. It provides the structure for +conversation implementations, including getting messages, registering +and notifying observers, and resetting conversations. + +``LLMConversation`` also includes an internal +``LLMEmptyConversationError`` Exception class thrown when the +conversation is empty. Overview -------- -``LLMConversation`` uses the Observer design pattern to manage updates -to the state of the conversation. It contains abstract methods that -provide the structure for handling different types of LLM chat messages -and can be expanded and customized for specific implementations. As an -abstract base class, ``LLMConversation`` cannot be instantiated directly -and must be subclassed to be utilized. +``LLMConversation`` serves as the foundational class for any LLM +conversations. It specifies the necessary interface but does not +implement these methods, expecting the child classes to provide specific +implementations. Key methods available include message retrieval +options, observer management operations, and procedures for obtaining +conversation-specific information like length and the latest message. Related Symbols --------------- -- ``automata.llm.providers.openai.OpenAIConversation.get_latest_message`` -- ``automata.llm.foundation.LLMChatMessage`` -- ``automata.tests.unit.test_conversation_database.test_put_message_increments_interaction_id`` -- ``automata.tests.unit.test_conversation_database.test_get_messages_returns_all_messages_for_session`` -- ``automata.llm.providers.openai.OpenAIConversation`` -- ``automata.memory_store.agent_conversation_database.AgentConversationDatabase`` -- ``automata.core.base.patterns.observer.Observer`` +- ``automata.llm.llm_chat_message.LLMChatMessage`` +- ``automata.llm.llm_base.LLMEmptyConversationError`` +- ``automata.llm.observers.Observer`` Example ------- -Below is an example of how a subclass of ``LLMConversation`` might be -designed: +As ``LLMConversation`` is an abstract base class, it cannot be +instantiated directly. Instead, a child class inheriting from +``LLMConversation`` should implement all the abstract methods. Here’s an +example: .. code:: python - from automata.llm.foundation import LLMConversation, LLMChatMessage - from automata.core.base.patterns.observer import Observer + from automata.llm.llm_base import LLMConversation + from automata.llm.llm_chat_message import LLMChatMessage - class CustomConversation(LLMConversation): + class SimpleLLMConversation(LLMConversation): def __init__(self): super().__init__() - self.messages = [] + self.conversation = [] - def __len__(self): - return len(self.messages) + @property + def messages(self) -> Sequence[LLMChatMessage]: + return self.conversation - def get_latest_message(self) -> LLMChatMessage: - return self.messages[-1] + def __len__(self) -> int: + return len(self.conversation) + + def get_messages_for_next_completion(self) -> Any: + return self.conversation[-1] if self.conversation else None - def get_messages_for_next_completion(self): - return self.messages + def get_latest_message(self) -> LLMChatMessage: + return self.conversation[-1] if self.conversation else None def reset_conversation(self) -> None: - self.messages = [] - - # Subscribing an observer to the custom conversation - class CustomObserver(Observer): - def update(self, subject: LLMConversation) -> None: - print(f"Observer notified. Latest message: {subject.get_latest_message().to_dict()}") - - conversation = CustomConversation() - observer = CustomObserver() - conversation.register_observer(observer) - - # Create and add a message to the conversation - message = LLMChatMessage(role="user", content="Hello!") - conversation.messages.append(message) - # Notify observers of the change - conversation.notify_observers() - -In this script: 1. A custom conversation class is built by subclassing -``LLMConversation`` and defining the required methods. 2. An observer -class is built by subclassing ``Observer`` and implementing the -``update`` method. 3. An instance of the custom conversation is created -and an observer is registered. 4. A new message is created and added to -the conversation, and the ``notify_observers`` method is called to -update all registered observers. + self.conversation = [] Limitations ----------- -``LLMConversation`` is an abstract class and cannot be used until all -its abstract methods are implemented in the subclass. The -responsibilities attached to the abstract methods should be -well-understood before implementing a subclass. +``LLMConversation`` assumes that any inheriting class will provide +concrete implementations for all abstract methods. It’s thus crucial to +ensure that all these methods are adequately defined in child classes. + +Some methods, like notifying observers, assume a traditional observer +pattern. If a different design is used for managing observers, these +methods may need to be overridden or adapted. + +Lastly, the actual interaction with the chat infrastructure (for +example, sending and receiving LLMChatMessages) is not specified within +this class and should be implemented contextually in the subclasses or +surrounding code. Follow-up Questions: -------------------- -- Are there any performance considerations to keep in mind while - implementing the abstract methods, especially when conversations get - too long? -- What is the underlying infrastructure to support notifications to a - possibly large number of observers? Is there a limit on the number of - observers that can be registered to an instance of - ``LLMConversation``? -- What additional functionality might be useful to include in the base - ``LLMConversation`` class that would be universal across all types of - chatbot conversations? Can this be extended to include multimedia - messages along with text? +- How does ``LLMConversation`` interact directly with the LLM if it + requires message information? +- Are there guidelines or standards that must be observed when + implementing the abstract methods? For instance, what should be + considered the “next” messages for the + ``get_messages_for_next_completion`` method? +- How to handle updates to the class due to changes in observer + methods? How should the class be structured to accommodate potential + changes in the notification mechanism? diff --git a/docs/llm/llm_conversation_database_provider.rst b/docs/llm/llm_conversation_database_provider.rst index 0dbb5cb03..2d7e1442a 100644 --- a/docs/llm/llm_conversation_database_provider.rst +++ b/docs/llm/llm_conversation_database_provider.rst @@ -1,74 +1,69 @@ LLMConversationDatabaseProvider =============================== -``LLMConversationDatabaseProvider`` is an abstract base class for -different types of database providers intended to be used in an automata -environment. It contains methods that allow for the retrieval and -storage of messages. - Overview -------- -The ``LLMConversationDatabaseProvider`` class is a crucial component in -automata’s conversation functionality. It includes two abstract methods, -``get_messages`` and ``save_message``, which must be implemented by any -concrete class inheriting from it to retrieve and store messages -respectively. Additionally, the ``update`` method, inherited from the -``Observer`` pattern, is implemented to update the database when the -``LLMConversation`` changes. +``LLMConversationDatabaseProvider`` is an abstract base class +implemented from Observer, SQLDatabase, and Abstract Base Class (ABC). +Being designed as an interface, this class provides an outline for other +database providers to follow. It is used to manage a database of +conversations in the Look Listen and Model (LLM) framework which +provides interactive communication with users using LLMChatMessage +objects. + +The class itself includes multiple methods such as ``update``, +``save_message``, and ``get_messages``. The ``update`` method is a +concrete ``Observer`` method that updates the database whenever the +conversation changes. ``save_message`` and ``get_messages`` are abstract +methods that should be implemented in the child classes for saving a +message to the database and retrieving all messages from database with +the original session id, respectively. Related Symbols --------------- -- ``automata.llm.foundation.LLMConversation.get_latest_message`` -- ``automata.memory_store.agent_conversation_database.AgentConversationDatabase`` -- ``automata.llm.providers.openai.OpenAIConversation.get_latest_message`` - -Usage example -------------- - -The following is a simple example demonstrating a concept of how -``LLMConversationDatabaseProvider`` may be used. +- ``automata.symbol_embedding.vector_databases.ChromaSymbolEmbeddingVectorDatabase.update_entry`` +- ``automata.singletons.github_client.RepositoryClient.stage_all_changes`` +- ``automata.symbol_embedding.vector_databases.ChromaSymbolEmbeddingVectorDatabase._prepare_entry_for_insertion`` +- ``automata.symbol_embedding.vector_databases.ChromaSymbolEmbeddingVectorDatabase._prepare_entries_for_insertion`` +- ``automata.symbol_embedding.vector_databases.ChromaSymbolEmbeddingVectorDatabase.batch_update`` +- ``automata.symbol_embedding.vector_databases.ChromaSymbolEmbeddingVectorDatabase._check_duplicate_entry`` +- ``automata.cli.env_operations.update_graph_type`` +- ``automata.symbol.graph.symbol_graph_base.GraphProcessor.process`` +- ``automata.core.utils.get_embedding_data_fpath`` +- ``automata.singletons.github_client.RepositoryClient.branch_exists`` + +Example +------- + +A standard usage example cannot be provided because +``LLMConversationDatabaseProvider`` is an abstract base class and cannot +be directly instantiated. Instead, a concrete class should inherit from +``LLMConversationDatabaseProvider`` and implement its abstract methods. +Here is a general pattern for this: .. code:: python - class ExampleDatabaseProvider(LLMConversationDatabaseProvider): - def __init__(self): - # some potential implementation for a specific type of database - pass - - def get_messages(self) -> List[LLMChatMessage]: - """Fetches all messages from the implemented database.""" - pass - - def save_message(self, message: LLMChatMessage) -> None: - """Saves a message to the implemented database.""" - pass + class MyDatabaseProvider(LLMConversationDatabaseProvider): + def save_message(self, session_id: str, message: LLMChatMessage) -> None: + # Implement the method as necessary for your class -The above example replaces the abstract methods of -``LLMConversationDatabaseProvider`` with simple illustrations. In an -actual deployment scenario, a specific database technology (like SQLite, -PostgreSQL, etc.) would be implemented in the -``ExampleDatabaseProvider``. + def get_messages(self, session_id: str) -> List[LLMChatMessage]: + # Implement the method as necessary for your class Limitations ----------- -The ``LLMConversationDatabaseProvider`` class does not include any -implementation details, as it is an abstract base class. The -effectiveness, efficiency, and abilities of any concrete class that -inherits ``LLMConversationDatabaseProvider`` would depend on its -implementation. +The primary limitation of this class is that it cannot be used +straightaway due to its abstract nature. It needs to be inherited and +its abstract method need to be overridden for it to be useful. Also, the +class methods mainly handle ``LLMChatMessage`` objects and as such, it +might not be suitable for different types of message objects. -Follow-up Questions: --------------------- +Follow-up Questions +------------------- -- What are the actual implementations provided for the - ``LLMConversationDatabaseProvider`` in the system? -- How do specific implementations handle potential database versioning, - migration, or recovery scenarios? -- In what scenarios is the ``update`` method called to reflect changes - in the LLM conversation? -- How is concurrency managed in the database operations? -- Are there any specific databases that work better or worse with the - system this class is part of? +- How can the class handle more different type of message objects (not + only ``LLMChatMessage``)? +- Could the updates be made in a more efficient or optimized way? diff --git a/docs/llm/providers/index.rst b/docs/llm/providers/index.rst index 2ce7f3f12..2c0ecfdfa 100644 --- a/docs/llm/providers/index.rst +++ b/docs/llm/providers/index.rst @@ -25,6 +25,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/llm/providers/open_ai_chat_completion_result.rst b/docs/llm/providers/open_ai_chat_completion_result.rst index ac5c724ac..54a1304b8 100644 --- a/docs/llm/providers/open_ai_chat_completion_result.rst +++ b/docs/llm/providers/open_ai_chat_completion_result.rst @@ -1,94 +1,87 @@ -``OpenAIChatCompletionResult`` -============================== +OpenAIChatCompletionResult +========================== Overview -------- -``OpenAIChatCompletionResult`` is a class that represents a chat -response from the OpenAI API. This class takes in raw messages from the -chat API and structures them for further use. Apart from regular chat -message contents, it can also include function calls which the API may -return during an ongoing chat. +``OpenAIChatCompletionResult`` is a class that represents a completion +result retrieved from the OpenAI API. It is utilized within the +``automata.llm.providers.openai_llm`` namespace and is designed to +handle and structure the results returned from chat-based tasks using +OpenAI models. + +The class primarily encapsulates the role and content of the message +received from the completion, as well as any function call attached to +the message, if it’s present. It can then provide this information in a +structured format, facilitating easier access and manipulation. The key +methods include ``__str__``, providing a string representation of the +class, and ``get_function_call``, which extracts the function call from +the completion result if available. Related Symbols --------------- -- ``LLMChatCompletionProvider`` -- ``LLMChatMessage`` -- ``FunctionCall`` -- ``OpenAIChatMessage`` -- ``OpenAIChatCompletionProvider`` +- ``automata.llm.llm_completion_result.LLMCompletionResult`` +- ``automata.llm.function_call.FunctionCall`` +- Various methods in scripts such as + ``run_update_tool_eval.get_processed_paths``, + ``run_update_tool_eval.load_json_data``, and + ``run_update_tool_eval.filter_entries`` that handle JSON data + processing and path management. -Initialization --------------- +Example +------- -``OpenAIChatCompletionResult`` class can be initialized by providing the -raw data from OpenAI chat API. The ``__init__`` function processes the -raw data and assigns it to class variables. +The example assumes that you’ve already made a call to the OpenAI API +and received a response. .. code:: python - from automata.llm.providers.openai import OpenAIChatCompletionResult + from automata.llm.providers.openai_llm import OpenAIChatCompletionResult - # Example raw data from OpenAI chat API + # raw_data is the assumed response from the OpenAI API raw_data = { - "choices": [ + 'choices': [ { - "message": { - "role": "assistant", - "content": "Hello, how can I assist you today?", - "function_call": None + 'message': { + 'role': 'system', + 'content': 'Hello, world!', + 'function_call': None } } ] } + # Create an instance of OpenAIChatCompletionResult completion_result = OpenAIChatCompletionResult(raw_data) -Methods -------- - -``OpenAIChatCompletionResult`` class has a few methods for handling and -representing the data it encapsulates: - -- ``__str__``: Produces a string representation of the completion - result. -- ``from_args``: A class method for creating an instance of - ``OpenAIChatCompletionResult``. -- ``get_function_call``: Returns the ``FunctionCall`` object if - present. If no function call is present, it returns None. - -Usage Example -------------- + # Get string representation + print(str(completion_result)) + # Output: system:\ncontent=Hello, world!\nfunction_call=None -Following is an example of a using ``from_args`` method to create an -instance and printing it out using the ``__str__`` method: - -.. code:: python - - # Import necessary classes - from automata.llm.providers.openai import OpenAIChatCompletionResult - - # Create an instance using the `from_args` class method - completion_result = OpenAIChatCompletionResult.from_args("assistant", "Hello, how can I assist you today?", None) - - # Use the `__str__` method to print the instance - print(completion_result) + # Get function call (if any available) + function_call = completion_result.get_function_call() + # Output: None Limitations ----------- -One possible limitation of the ``OpenAIChatCompletionResult`` is its -strict reliance on the OpenAI API’s ``choices`` output structure. Any -changes in the API’s response structure can potentially break the -functionality of this class. +One key limitation of this class is its dependence on the specific +structure of the OpenAI API’s response. If the API changes its response +format, the ``OpenAIChatCompletionResult`` class may break or return +misleading results. + +Furthermore, as of now, the class doesn’t perform any sort of data or +type validation for the inputs provided during the object instantiation, +which potentially may lead to runtime exceptions or errors. -Follow-Up Questions: +Follow-up Questions: -------------------- -- What happens if the OpenAI API changes the response structure? Do we - have a mechanism to handle these changes? -- Is there a validation step to check the integrity and format of the - raw data received from the OpenAI API before it’s processed? -- Is there a way to handle other roles besides “assistant”? Are other - roles anticipated in the future? +- How does the system handle situations where the OpenAI API response + does not match the expected format? +- Does the system incorporate any mechanism to validate the + ``raw_data`` input in its current state? Can this functionality be + added? +- In the event of API changes, how would the transition be managed, and + are there any adjustments required at the user level? diff --git a/docs/llm/providers/open_ai_chat_message.rst b/docs/llm/providers/open_ai_chat_message.rst index 15103471d..4558247e4 100644 --- a/docs/llm/providers/open_ai_chat_message.rst +++ b/docs/llm/providers/open_ai_chat_message.rst @@ -1,75 +1,59 @@ OpenAIChatMessage ================= +``OpenAIChatMessage`` is a class that acts as a representation for a +processed message that is sent TO or FROM the OpenAI LLM Chat API. + Overview -------- -``OpenAIChatMessage`` is a class that represents a processed chat -message TO or FROM the OpenAI LLM Chat API. It provides convenient -methods to parse and generate messages compatible with the OpenAI Chat -API. - -This class is a part of the ``automata.llm.providers.openai`` module and -extends the ``LLMChatMessage`` base class, adding unique fields and -methods suitable for communication with the OpenAI API. +``OpenAIChatMessage`` inherits from the superclass ``LLMChatMessage``. +It is initialized with the role, content, and optionally a function +call. It provides methods to represent the message as a string, convert +it to a dictionary and create a chat message from a completion result. Related Symbols --------------- -- ``automata.llm.providers.openai.OpenAIChatCompletionResult`` -- ``automata.llm.providers.openai.OpenAIConversation`` -- ``automata.llm.providers.openai.OpenAIChatCompletionProvider`` -- ``automata.llm.foundation.LLMChatMessage`` - -Example -------- +- ``automata.singletons.github_client.GitHubClient.remove_label`` +- ``automata.symbol_embedding.vector_databases.ChromaSymbolEmbeddingVectorDatabase.entry_to_key`` +- ``automata.symbol_embedding.vector_databases.ChromaSymbolEmbeddingVectorDatabase.add`` +- ``automata.cli.scripts.run_doc_embedding.parse_dotpaths`` +- ``automata.symbol_embedding.vector_databases.ChromaSymbolEmbeddingVectorDatabase.batch_add`` +- ``automata.core.utils.is_sorted`` -Below is an example of creating a message, converting it to a -dictionary, and retrieving it from a completion result: +Usage Example +------------- .. code:: python - from automata.llm.providers.openai import FunctionCall, OpenAIChatCompletionResult, OpenAIChatMessage + from automata.llm.providers.openai_llm import OpenAIChatMessage, OpenAIChatCompletionResult - # The function call - function_call = FunctionCall.from_response_dict({ - "name": "call_termination", - "arguments": '{"result": "Success"}', - }) + # Create a completion result + completion_result = OpenAIChatCompletionResult(role="system", content="Hello, I'm an AI.") - # Create an OpenAI Chat Message instance - message = OpenAIChatMessage(role="assistant", function_call=function_call) + # Create a chat message from a completion result + message = OpenAIChatMessage.from_completion_result(completion_result) - # Convert message to dictionary - message_dict = message.to_dict() + # Convert the message to a string + print(str(message)) # Outputs: "OpenAIChatMessage(role=system, content=Hello, I'm an AI., function_call=None)" - # Create a mock OpenAI Chat Completion Result - completion_result = OpenAIChatCompletionResult.from_args( - role="assistant", - content=None, - function_call=function_call - ) - - # Retrieve the OpenAI Chat Message from the completion result - retrieved_message = OpenAIChatMessage.from_completion_result(completion_result) + # Convert the message to a dictionary + print(message.to_dict()) # Outputs: {'role': 'system', 'content': "Hello, I'm an AI."} Limitations ----------- -This class assumes that the ``OpenAIChatCompletionResult`` already has -the required fields parsed in the expected format. Consequently, if the -OpenAI API changes its response format, the ``from_completion_result`` -method may not function as expected. - -Machines created from ``OpenAIChatMessage`` may not contain a -``function_call`` field if the processed message does not instruct a -function call. +The methods within ``OpenAIChatMessage`` are all directly related to the +OpenAI LLM Chat API and do not have wider applications outside of their +specific context. The class is designed to specifically interact with +the OpenAI LLM Chat API, so it cannot be used as a general-purpose chat +message manipulator. -Follow-up Questions: --------------------- +Follow-up Questions +------------------- -- How does ``OpenAIChatMessage`` handle unexpected - ``OpenAIChatCompletionResult`` structures? -- Are there safety measures in place to ensure ``OpenAIChatMessage`` - instances are created correctly when a ``function_call`` field is - missing from a message? +- Can this class be modified to support more general use, outside of + the OpenAI LLM Chat API? +- How can we extend the functionality to incorporate more features of + the Chat API, like message logs or instructions? diff --git a/docs/llm/providers/open_ai_conversation.rst b/docs/llm/providers/open_ai_conversation.rst index 17445f6c8..7fcdada23 100644 --- a/docs/llm/providers/open_ai_conversation.rst +++ b/docs/llm/providers/open_ai_conversation.rst @@ -1,76 +1,82 @@ OpenAIConversation ================== -``OpenAIConversation`` is a class that represents a conversation with -the OpenAI API. It manages the series of messages that are part of the -conversation flow. The class includes methods to add messages, get the -latest message, get all messages for the next completion, and reset the -conversation. The OpenAIConversation class is heavily used in -interactions within the agent classes. - Overview -------- -``OpenAIConversation`` provides a way to manage and manipulate the -conversation of an agent with the OpenAI API. Each message in the -conversation is an instance of OpenAIChatMessage. The primary purpose of -``OpenAIConversation`` is to keep track of the series of messages in the -conversation. Each new message is appended to the list of messages and -can be retrieved when required. An important aspect is that the -``OpenAIConversation`` only accepts messages of type -``OpenAIChatMessage``. +``OpenAIConversation`` is a module provided by Automata’s OpenAI lower +level model(LLM) providers. It represents a conversation with the OpenAI +API. It holds a list of messages as an instance variable and provides +methods to interact with this list of messages such as adding a message, +getting messages for the next completion, getting the latest message, +and resetting the conversation. + +Main properties and methods of the ``OpenAIConversation`` include: + +- ``messages``: A list that contains all the messages in the current + conversation. +- ``add_message(message: LLMChatMessage, session_id: Optional[str]) -> None``: + This method adds message to the conversation. +- ``get_messages_for_next_completion() -> List[Dict[str, Any]]``: This + method provides a list of all messages in the current conversation + prepared for the next completion. +- ``get_latest_message() -> LLMChatMessage``: This method returns the + latest message in the conversation. +- ``reset_conversation() -> None``: This method empties the list of + messages, thus resetting the conversation. Related Symbols --------------- -- ``automata.llm.providers.openai.OpenAIChatMessage`` -- ``automata.llm.providers.openai.OpenAIChatCompletionProvider`` -- ``automata.agent.providers.OpenAIAutomataAgent`` -- ``automata.llm.foundation.LLMChatMessage`` +- ``automata.llm.providers.openai_llm.LLMChatMessage`` +- ``automata.llm.providers.openai_llm.OpenAIChatMessage`` +- ``automata.llm.providers.openai_llm.OpenAIIncorrectMessageTypeError`` Example ------- -Here is an example demonstrating how to create and manage messages in an -``OpenAIConversation``: +Below is a simple usage example of how to interact with the OpenAI API +using the ``OpenAIConversation`` class. .. code:: python - from automata.llm.providers.openai import OpenAIConversation, OpenAIChatMessage + from automata.llm.providers.openai_llm import OpenAIConversation, OpenAIChatMessage - # create conversation + # Initialize the OpenAIConversation object. conversation = OpenAIConversation() - # create a message and add it into the conversation - message = OpenAIChatMessage(role="assistant", content="Hello, I am an assistant.") - conversation.add_message(message) + # Create a message. + message = OpenAIChatMessage("Hello, OpenAI!") - # retrieve the latest message - latest_message = conversation.get_latest_message() - print(latest_message) # OpenAIChatMessage object + # Add the message to the conversation. + conversation.add_message(message, None) - # retrieve all messages for next completion - messages_for_completion = conversation.get_messages_for_next_completion() - print(messages_for_completion) # list of messages + # Fetch the latest message in the conversation. + latest_message = conversation.get_latest_message() + print(latest_message) # Output: - # reset the conversation + # Reset the conversation. conversation.reset_conversation() - # checking the length of conversation after reset - print(len(conversation)) # Output: 0 Limitations ----------- -One limitation of ``OpenAIConversation`` is that it only accepts -messages of the type ``OpenAIChatMessage``. This could make it less -flexible if a different message class needs to be used in certain -situations. +A significant limitation of the ``OpenAIConversation`` class is the lack +of support for asynchronous operations. All operations are performed +synchronously which can lead to blocking of the entire application if +the operations are time-consuming, like in a live chat implementation. + +Another limitation is that the conversation is stateful. Once a message +is added to the conversation, it cannot be removed. This makes it +difficult to manage long conversations. While there is a method to reset +the entire conversation (``reset_conversation``), there’s no way to +manipulate individual messages within the conversation. Follow-up Questions: -------------------- -- Is there a way to extend the OpenAIConversation to handle more types - of chat messages? -- How does the class interact with other parts, like agent classes or - completion providers, to contribute to the overall functionality of - the Automata library? +- Is there a way to support asynchronous operations with + ``OpenAIConversation``? +- Can there be methods incorporated to manage (add or remove) + individual messages within the conversation? +- How can the ``OpenAIConversation`` handle much larger conversations? diff --git a/docs/llm/providers/open_ai_embedding_provider.rst b/docs/llm/providers/open_ai_embedding_provider.rst index e5efbfc16..533b04777 100644 --- a/docs/llm/providers/open_ai_embedding_provider.rst +++ b/docs/llm/providers/open_ai_embedding_provider.rst @@ -1,77 +1,77 @@ OpenAIEmbeddingProvider ======================= -``OpenAIEmbeddingProvider`` is a class in the Automata codebase that is -used to generate embeddings from the OpenAI API. The class works by -passing a given source text to the OpenAI API, which then returns an -embedding in the form of a numpy array. +``OpenAIEmbeddingProvider`` is a class that extracts embeddings from the +OpenAI API. It is subclassed from ``EmbeddingVectorProvider``. Overview -------- -``OpenAIEmbeddingProvider`` implements ``EmbeddingVectorProvider``, and -uses the OpenAI API to generate embeddings for given input text. This -class relies heavily on OpenAI’s API and therefore, a key feature of -this embedding provider is its flexibility as the capability of the -provider will extend with any future enhancements made to the core API. +The ``OpenAPIEmbeddingProvider`` class provides methods to create +embeddings from source text or batch of texts using OpenAI API. Its main +functionality is embedded primarily in two methods: +``build_embedding_vector()`` and ``batch_build_embedding_vector()``. The +first method generates an embedding for a single string of text, while +the latter performs the same operation for multiple strings contained +within a list. -In this class, the engine used for generating embeddings is specified at -the time of object initialization, and the default engine used is -“text-embedding-ada-002”. +The class needs the OpenAI API key to be set for it to work properly. By +default, it utilizes the ‘text-embedding-ada-002’ engine. However, it +can also operate with the engine designated in the constructor at object +creation. Related Symbols --------------- -- ``automata.embedding.base.EmbeddingVectorProvider`` -- ``automata.llm.foundation.LLMChatCompletionProvider`` -- ``automata.llm.foundation.LLMChatMessage`` -- ``automata.llm.foundation.LLMCompletionResult`` -- ``automata.llm.foundation.LLMConversation`` -- ``automata.singletons.dependency_factory.DependencyFactory`` -- ``automata.config.base.LLMProvider`` -- ``automata.tools.base.Tool`` +The related symbols for the ``OpenAIEmbeddingProvider`` class are +methods imported from the ``openai.embeddings_utils`` module. They are: +- ``get_embedding()`` - ``get_embeddings()`` Example ------- -Below is an example demonstrating how to use the -``OpenAIEmbeddingProvider``: +Here is an example demonstrating the usage of the +``OpenAIEmbeddingProvider`` class. This includes the full process of +creating an instance, building an embedding vector, and a batch of +vectors. .. code:: python - from automata.llm.providers.openai import OpenAIEmbeddingProvider + from automata.llm.providers.openai_llm import OpenAIEmbeddingProvider import numpy as np - # Create an instance of OpenAIEmbeddingProvider - embedding_provider = OpenAIEmbeddingProvider(engine="text-embedding-ada-002") + # Instantiating the provider using the default engine + provider = OpenAIEmbeddingProvider() - # Generate the embedding for a text - source_text = "This is an example text." - embedding = embedding_provider.build_embedding_vector(source_text) + # Building an embedding vector for a single source + source_text = "OpenAI is an artificial intelligence research lab." + embedding_vector = provider.build_embedding_vector(source_text) + print(embedding_vector) # Outputs the resulting numpy array - # Make sure the embedding is a numpy array - assert isinstance(embedding, np.ndarray) + # Building embedding vectors for a batch of sources + sources_batch = ["OpenAI was founded in December 2015.", + "The lab is associated with Elon Musk."] + batch_embedding_vector = provider.batch_build_embedding_vector(sources_batch) + for vector in batch_embedding_vector: + print(vector) # Outputs numpy arrays Limitations ----------- -One of the main limitations of the ``OpenAIEmbeddingProvider`` is that -its performance and capabilities are directly linked to the OpenAI API. -This means that any limitations in the API, such as maximum input text -size or rate limits, will also apply to the ``OpenAIEmbeddingProvider``. - -For testing purposes, ``OpenAIEmbeddingProvider`` makes use of mocking -to simulate the behavior of actual objects. The mock objects are -instances of the ``Mock`` or ``MagicMock`` class in the -``unittest.mock`` module, which is a built-in module for constructing -mock objects in Python. +The ``OpenAIEmbeddingProvider`` class is reliant on the OpenAI API. As a +result, if the OpenAI API is down or inaccessible, it will also be +unable to function properly. Furthermore, the class requires an OpenAI +API key to operate, which might be a hurdle if you’re not an OpenAI +user. Also, the quality of embeddings depends upon the chosen engine. By +default it uses ‘text-embedding-ada-002’ engine but OpenAI provides +other engines too which might give different results as per their +training. Follow-up Questions: -------------------- -- How does ``OpenAIEmbeddingProvider`` handle potential rate limit - restrictions from the OpenAI API? -- What are the specific error handling strategies in place for API - failures? -- How can customization be introduced to enhance the use of different - ‘engine’ types for different requirements? +- What are the different engines supported by OpenAI for text + embedding? +- How to handle the situation if OpenAI API is down momentarily? +- Is there a way to use a different API key for different instances of + ``OpenAIEmbeddingProvider``? diff --git a/docs/llm/providers/open_ai_function.rst b/docs/llm/providers/open_ai_function.rst index 8947a987d..c64ba5b75 100644 --- a/docs/llm/providers/open_ai_function.rst +++ b/docs/llm/providers/open_ai_function.rst @@ -1,79 +1,70 @@ OpenAIFunction ============== -``OpenAIFunction`` represents a callable function in the OpenAI agent. -It encapsulates required information related to a function such as its -name, description, properties, and optional parameters. +``OpenAIFunction`` is a class that represents a function callable by the +OpenAI agent. It provides methods to convert this function definition +into a dictionary, and to present the function in a format similar to +the way OpenAI handles it internally. -Detailed Description --------------------- +Overview +-------- -The ``OpenAIFunction`` class encapsulates the necessary details needed -to define a function that can be used by the OpenAI agent. The -information includes the name, description, properties, and a list of -required properties for the function. The class also provides the -``to_dict`` method to get the information about the function in a -dictionary format. +The ``OpenAIFunction`` class helps define a function with name, +description, properties, and required fields. After defining this +function, you can use the ``to_dict`` method to transform this function +definition into a dictionary. Additionally, the ``prompt_format`` +property can be used to obtain the function definition in the format +used by OpenAI internally. Related Symbols --------------- -- ``automata.tests.unit.sample_modules.sample.sample_function``: An - example of a function that can be represented by ``OpenAIFunction``. -- ``automata.agent.providers.OpenAIAutomataAgent.functions``: A method - that returns a list of ``OpenAIFunction`` instances representing the - available functions for the agent. -- ``automata.llm.providers.openai.OpenAITool``: A class representing a - tool that can be used by the OpenAI agent which utilizes - ``OpenAIFunction``. +The ``OpenAIFunction`` class typically operates independently and does +not explicitly relate to other symbols. -Usage Example -------------- +Example +------- -The following is an example demonstrating how to create an instance of -``OpenAIFunction`` and get its data in dictionary format using -``to_dict`` method: +Here’s an example of defining a new function using ``OpenAIFunction``, +converting it to a dictionary and retrieving its prompt format. .. code:: python - from automata.llm.providers.openai import OpenAIFunction + from automata.llm.providers.openai_llm import OpenAIFunction - # Initialize OpenAIFunction object + # define a function function = OpenAIFunction( - name="Sample Function", - description="This is a sample function", - properties={"Parameter 1": {"description": "Description for parameter 1"}}, - required=["Parameter 1"] + name="get_current_weather", + description="Get the current weather in a given location", + properties={ + "location": { + "type": "string", + "description": "The city and state, e.g. San Francisco, CA", + }, + "unit": { + "type": "string", + "description": "Unit of measurement, either 'celsius' or 'fahrenheit'." + } + }, + required=["location"], ) - # Get function information in a dictionary format - function_info = function.to_dict() - print(function_info) + # convert it to a dictionary + function_dict = function.to_dict() -In the above example, we first import the ``OpenAIFunction`` class. We -then create an instance of ``OpenAIFunction`` named ``function``, -providing its necessary details such as the name, description, -properties, and the list of required properties. Finally, we get the -information about ``function`` in the form of a dictionary using the -``to_dict`` method, and print this information. + # retrieve its prompt format + function_prompt_format = function.prompt_format Limitations ----------- -The main limitation of ``OpenAIFunction`` is that it strictly assumes -the defined function resides in the OpenAI agent. Externally defined -functions cannot be passed directly, and need to be encapsulated in -``OpenAIFunction`` for the agent to use them. - -Next, note that the ``properties`` argument in -``OpenAIFunction.__init__()`` expects a dictionary where each key-value -pair defines a parameter. We can probably make this more specific to -provide better context about the parameters. +A limitation of ``OpenAIFunction`` is that it assumes the OpenAI’s +internal format while returning the function definition in +``prompt_format`` property. Follow-up Questions: -------------------- -- Can we include an example demonstrating how to define a function that - can be utilized by ``OpenAIFunction``? -- What are the specific attributes that should be included in the - ``properties`` argument when defining an ``OpenAIFunction`` instance? +- What are the possible types for the parameters in the ``properties`` + dictionary while defining a function using ``OpenAIFunction``? +- How can complex properties be represented using ``OpenAIFunction``? diff --git a/docs/llm/providers/open_ai_incorrect_message_type_error.rst b/docs/llm/providers/open_ai_incorrect_message_type_error.rst index 71a42a330..898d50f13 100644 --- a/docs/llm/providers/open_ai_incorrect_message_type_error.rst +++ b/docs/llm/providers/open_ai_incorrect_message_type_error.rst @@ -1,65 +1,6 @@ -OpenAIIncorrectMessageTypeError -=============================== +class OpenAIIncorrectMessageTypeError(Exception): -``OpenAIIncorrectMessageTypeError`` is an error class that is raised -when the type of message provided is not of the expected -``OpenAIChatMessage`` type. +:: -Overview --------- - -The class is used in various methods in OpenAI-based classes, where it -helps in maintaining the correct type of data being used for the -communication with the OpenAI API. - -Related Symbols ---------------- - -1. ``automata.tests.unit.test_automata_agent.mock_openai_response_with_completion_message`` -2. ``automata.tests.unit.test_automata_agent.test_run_with_completion_message`` -3. ``automata.tests.unit.test_automata_agent.test_run_with_no_completion`` -4. ``automata.llm.providers.openai.OpenAIConversation`` -5. ``automata.tests.unit.test_automata_agent.test_build_initial_messages`` -6. ``automata.llm.providers.openai.OpenAIConversation.add_message`` -7. ``automata.tests.unit.test_automata_agent.test_iter_step_without_api_call`` -8. ``automata.agent.providers.OpenAIAutomataAgent`` -9. ``automata.tests.unit.test_automata_agent_builder.test_builder_invalid_input_types`` -10. ``automata.llm.providers.openai.OpenAIConversation.__init__`` - -Example -------- - -The following is an example demonstrating a likely use case for -``OpenAIIncorrectMessageTypeError``. This example supposes a case where -a message to OpenAIConversation of incorrect type is passed and the -error is raised. - -.. code:: python - - from automata.llm.providers.openai import OpenAIConversation, OpenAIIncorrectMessageTypeError - - try: - conversation = OpenAIConversation() - message = "This is a sample message." # Should be of type OpenAIChatMessage - - conversation.add_message(message) # Adds message to the conversation - except OpenAIIncorrectMessageTypeError: - print("Incorrect message type provided.") - -Limitations ------------ - -The ``OpenAIIncorrectMessageTypeError`` class does not provide methods -to automatically correct the type of the message and thus places the -responsibility of ensuring correct message type on the user. - -Follow-up Questions -------------------- - -- Is there a specific reason for not including automatic type - correction within the ``OpenAIIncorrectMessageTypeError`` class? -- Could the design of the ``OpenAIIncorrectMessageTypeError`` class be - improved to allow for more user-friendly data type handling? -- Are there other similar type error classes within the OpenAI suite of - APIs and does ``OpenAIIncorrectMessageTypeError`` interact with them - in any way? + def __init__(self, message: Any) -> None: + super().__init__(f'Expected message to be of type OpenAIChatMessage, but got {type(message)}') diff --git a/docs/llm/providers/open_ai_tool.rst b/docs/llm/providers/open_ai_tool.rst index 46a73a5f4..31fcc98a9 100644 --- a/docs/llm/providers/open_ai_tool.rst +++ b/docs/llm/providers/open_ai_tool.rst @@ -1,77 +1,77 @@ OpenAITool ========== +``OpenAITool`` is a class in the OpenAI language learning model (LLM) +that helps in using the OpenAI agent. By standardizing the necessary +components like the properties and required attributes needed for OpenAI +functions, it offers structured integration and usage of OpenAI. + Overview -------- -``OpenAITool`` is a class intended to represent a tool that can be -implemented by the OpenAI agent. This class mainly provides -functionalities for initializing OpenAI tools with specific functions, -names, descriptions, properties, and requirements. The initialization -process of ``OpenAITool`` involves invoking the ``OpenAIFunction`` -class. - -This class is primarily used by OpenAI’s toolkit builders, such as -``ContextOracleOpenAIToolkitBuilder``, ``PyWriterOpenAIToolkitBuilder``, -and ``SymbolSearchOpenAIToolkitBuilder``, to create lists of -``OpenAITool`` instances for OpenAI. +``OpenAITool`` helps streamline operations with the OpenAI agent. It +holds the properties required to use the OpenAI function. The required +properties and a list of optional properties are stored in a dictionary, +with the function and its description stored separately. Upon +initialization, the function, name, description, properties, and the +required list are passed to set up the tool. Related Symbols --------------- -- ``automata.embedding.base.EmbeddingVectorProvider`` -- ``automata.llm.foundation.LLMChatCompletionProvider`` -- ``automata.llm.foundation.LLMChatMessage`` -- ``automata.llm.foundation.LLMCompletionResult`` -- ``automata.llm.foundation.LLMConversation`` -- ``automata.tools.base.Tool`` -- ``automata.tests.unit.test_tool.TestTool`` +Since context does not provide related symbols to ``OpenAITool``, +detailed related symbols cannot be provided here. Assumed related +symbols might be modules or classes which use ``OpenAITool`` or are used +by ``OpenAITool``. Example ------- -Below is an example of how to instantiate an ``OpenAITool`` using the -test tool as a function, which simply returns a string “TestTool -response” irrespective of the input provided. +The following is an example demonstrating how to use the ``OpenAITool`` +class. .. code:: python - from automata.llm.providers.openai import OpenAITool - from automata.tests.unit.test_tool import TestTool + from automata.llm.providers.openai_llm import OpenAITool - tool = TestTool( - name="TestTool", - description="A test tool for testing purposes", - function=lambda x: "TestTool response", - ) + def my_function(input_string): + # insert function logic here + pass + + properties = { + 'property_1': {'description': 'description', 'type': 'str'}, + 'property_2': {'description': 'description', 'type': 'int'} + } openai_tool = OpenAITool( - function=tool.run, - name=tool.name, - description=tool.description, - properties={'test_prop': {'description': 'A test property', 'type': 'string'}}, + function=my_function, + name="OpenAI Tool", + description="This is a generic function description.", + properties=properties, + required=['property_1', 'property_2'] ) -Here the ``run`` method of the ``TestTool`` instance ``tool`` is passed -as the ``function`` parameter to ``OpenAITool``. The ``properties`` is a -dictionary that includes additional data about the tool, such as a -description and type for each property. The ``name`` and ``description`` -are self-explanatory. +In this example, we have created a new instance of ``OpenAITool`` with a +dummy function ``my_function``, a name, description, a dictionary of +properties and a list of required properties. Limitations ----------- -The OpenAITool provides a basic framework to facilitate the creation and -usage of tools for the OpenAI agent. The actual functionality of the -tool would largely depend on the function passed during its -instantiation. Also, even though it provides a property variable for -additional data storage, it does not inherently provide methods to -handle or manipulate these properties. +The ``OpenAITool`` class relies on the ``OpenAIFunction`` for its +functioning. It ensures the required properties for the OpenAI function +are available and correctly formatted. However, it doesn’t provide +explicit error handling or checks for the function itself. As such, it +assumes that the function provided during instantiation is correct and +valid. Any errors within the function can lead to a breakdown in the +operations of the ``OpenAITool`` object. Follow-up Questions: -------------------- -- How are the properties of the OpenAITool used in the toolkit builders - and eventually by the OpenAI agent? -- Are there any specific requirements or constraints for the function - that is passed during the initialisation of an OpenAITool? +- How is error handling performed within ``OpenAITool`` beyond property + validation? +- Is there a way to update the properties of ``OpenAITool`` instances + after creation? +- Could we improve the ``OpenAITool`` class by adding type checking for + the input function during instantiation? diff --git a/docs/memory_store/index.rst b/docs/memory_store/index.rst index 30aaf46a9..dfa613dd2 100644 --- a/docs/memory_store/index.rst +++ b/docs/memory_store/index.rst @@ -19,6 +19,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. @@ -26,6 +28,7 @@ Summary of content :maxdepth: 1 agent_conversation_database + open_ai_automata_conversation_database symbol_code_embedding_handler symbol_doc_embedding_handler diff --git a/docs/memory_store/open_ai_automata_conversation_database.rst b/docs/memory_store/open_ai_automata_conversation_database.rst new file mode 100644 index 000000000..8edb5551c --- /dev/null +++ b/docs/memory_store/open_ai_automata_conversation_database.rst @@ -0,0 +1,68 @@ +OpenAIAutomataConversationDatabase +================================== + +``OpenAIAutomataConversationDatabase`` is a class used to handle +interactions of an Automata agent with a conversation database. It +facilitates operations such as saving messages, retrieving messages, and +maintaining sessions and interactions within a conversation. + +Overview +-------- + +``OpenAIAutomataConversationDatabase`` provides functionality to handle +conversation storage within a database for an Automata agent. It +inherits from ``LLMConversationDatabaseProvider``, and defines methods +to interact with the database. Primary functions provide the ability to: +check session validity, save messages per sessions, fetch messages based +on session id, and maintain interaction count. The table for +conversation data is created at the time of class object creation. + +Related Symbols +--------------- + +- ``automata.memory_store.conversation_database_providers.LLMConversationDatabaseProvider`` +- ``OpenAIChatMessage`` +- ``chat.open_ai_chat_message.FunctionCall`` + +Example +------- + +The following is an example demonstrating how to create an instance of +``OpenAIAutomataConversationDatabase`` and using its functionalities. + +.. code:: python + + from automata.memory_store.conversation_database_providers import OpenAIAutomataConversationDatabase + from chat.open_ai_chat_message import OpenAIChatMessage, FunctionCall + + # create conversation database + db = OpenAIAutomataConversationDatabase(db_path="path/to/db") + + # create a message + message = OpenAIChatMessage(role="user", content="Hello, bot", function_call=FunctionCall(function_name="Hello", kwargs={})) + + # save the message in session + db.save_message(session_id= "session1", message= message) + + # get all the messages in the session + messages_in_session = db.get_messages(session_id="session1") + +Limitations +----------- + +In its current form, the ``OpenAIAutomataConversationDatabase`` relies +heavily on proper usage of session IDs, and as such, any mistakes with +session IDs can lead to errors. There is also a ‘TODO’ in the method +save_message and get_messages. + +Follow-up Questions: +-------------------- + +- Can we provide different implementations of the ``save_message`` and + ``get_messages`` methods in order to handle any form of session IDs? +- Can potential scaling issues be avoided? For example, if a session + has a very large number of messages, it could impact the retrieval + speed or memory usage. +- How do we handle different types of messages that aren’t just + ``OpenAIChatMessage``? The ``save_message`` and ``get_messages`` + methods currently expect this type of message. diff --git a/docs/memory_store/symbol_code_embedding_handler.rst b/docs/memory_store/symbol_code_embedding_handler.rst index 70537009c..e534c7189 100644 --- a/docs/memory_store/symbol_code_embedding_handler.rst +++ b/docs/memory_store/symbol_code_embedding_handler.rst @@ -1,19 +1,83 @@ -- ``batch_size`` and the frequency of calls to ``flush`` are directly - related. ``batch_size`` specifies the number of items that should be - stored in memory before they are written (flushed) to the database. A - larger ``batch_size`` would result in fewer calls to ``flush``, but - would take up more memory. Therefore, a balance must be struck - depending on the system’s resource constraints. - -- The size of the embeddings processed by - ``SymbolCodeEmbeddingHandler`` can vary depending on the architecture - of the embedding model used. For example, typical configurations of - Word2Vec or GloVe could result in 100, 200, or 300-dimentional - embeddings, while BERT embeddings might be 768-dimensional or larger. - -- The handling of symbols with no source code would depend on the - implementation of ``SymbolCodeEmbeddingBuilder``. A common approach - might be to return a null or zero vector of the same size as other - embeddings. The ``SymbolCodeEmbeddingHandler`` would probably handle - such cases as it does most other embeddings, unless a special case - has been defined. +SymbolCodeEmbeddingHandler +========================== + +``SymbolCodeEmbeddingHandler`` is a class that manages a database of +source code embeddings for ``Symbol`` instances. It comes equipped with +methods to process, update, queue and build embeddings in batches for +efficient handling. + +Overview +-------- + +The ``SymbolCodeEmbeddingHandler`` class extends +``SymbolEmbeddingHandler``. This class is dedicated to handling the +embedding of source code symbols. After initializing with a provided +embeddable database, an embedding builder, and a batch size, the class +allows for the processing and updating of these embeddings. Depending on +whether the symbol source code has changed or not, the class either +updates existing embeddings or queues them for building. The batches of +updated or newly built embeddings can then be flushed to the database. + +Related Symbols +--------------- + +Given the context, the related symbols cannot be determined. However, it +is understood that the ``SymbolCodeEmbeddingHandler`` class is likely a +part of an object-oriented structure where related classes or symbols +exist. + +Usage Example +------------- + +Note: The usage example assumes that the relevant modules, classes, and +functions defined in the module where ``SymbolCodeEmbeddingHandler`` +resides are already imported. + +.. code:: python + + from automata.memory_store.symbol_code_embedding_handler import SymbolCodeEmbeddingHandler + from automata.memory_store.vector_database_provider import VectorDatabaseProvider + from automata.memory_store.symbol_code_embedding_builder import SymbolCodeEmbeddingBuilder + from automata.models.symbol import Symbol + + embedder_db_provider = VectorDatabaseProvider() + embedder_builder = SymbolCodeEmbeddingBuilder() + handler = SymbolCodeEmbeddingHandler(embedder_db_provider,embedder_builder) + + # assuming symbols is a list of Symbol instances + for symbol in symbols: + handler.process_embedding(symbol) + + # Once the embeddings have been processed, we can flush them to the database + handler.flush() + +The above script initializes a ``SymbolCodeEmbeddingHandler`` instance +and processes a list of ``Symbol`` instances for embedding. The embedder +database provider and builder are initialized with hypothetical +parameters ```` and ````, These +need to be filled in with actual parameters based on your +implementation. + +Limitations +----------- + +``SymbolCodeEmbeddingHandler`` requires a database provider +(``VectorDatabaseProvider``) and an embedding builder +(``SymbolCodeEmbeddingBuilder``). It cannot operate without these, so +the absence or failure of these dependencies is a limiting factor for +``SymbolCodeEmbeddingHandler``. Additionally, the current batch +implementation might cause some delay when dealing with large datasets +as the system needs to wait until a batch has been completely populated +for processing. + +Follow-up Questions: +-------------------- + +- What are the consequences if a ``Symbol`` instance does not have + source code attached? +- How are VectorDatabaseProvider and SymbolCodeEmbeddingBuilder used in + the context of SymbolCodeEmbeddingHandler? +- What are the typical sizes for a batch, and what are the implications + of setting it too high or too low? +- What is the specific use case or problem that this class is solving? + It might help clarify its role within the larger system. diff --git a/docs/navigation/index.rst b/docs/navigation/index.rst index 856d114d2..254e966e5 100644 --- a/docs/navigation/index.rst +++ b/docs/navigation/index.rst @@ -21,6 +21,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/navigation/py/index.rst b/docs/navigation/py/index.rst index 37f04f793..805633845 100644 --- a/docs/navigation/py/index.rst +++ b/docs/navigation/py/index.rst @@ -19,6 +19,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/retrievers/index.rst b/docs/retrievers/index.rst index 122fbde78..49768a4d2 100644 --- a/docs/retrievers/index.rst +++ b/docs/retrievers/index.rst @@ -17,6 +17,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/retrievers/py/context/index.rst b/docs/retrievers/py/context/index.rst index 65d5c3d20..0139cb7e9 100644 --- a/docs/retrievers/py/context/index.rst +++ b/docs/retrievers/py/context/index.rst @@ -17,6 +17,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/retrievers/py/index.rst b/docs/retrievers/py/index.rst index fef09ad88..fa4266535 100644 --- a/docs/retrievers/py/index.rst +++ b/docs/retrievers/py/index.rst @@ -19,6 +19,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/singletons/dependency_factory.rst b/docs/singletons/dependency_factory.rst index 874efc733..e5b081541 100644 --- a/docs/singletons/dependency_factory.rst +++ b/docs/singletons/dependency_factory.rst @@ -1,91 +1,97 @@ DependencyFactory ================= -``DependencyFactory`` is a source class found in -**automata.singleton.dependency_factory** that is utilized in the -creation of dependencies for input Tool construction. +``DependencyFactory`` is a Singleton class that is responsible for +creating and managing dependencies that are required for tool +construction. It serves as a centralized location to handle +dependencies, allowing a single point of access to share, coordinate and +manage these dependencies for avoiding conflicts and minimizing +redundancy. + +Overview +-------- + +``DependencyFactory`` provides various methods to get, set, reset and +directly create dependencies. When creating dependencies, it also +supports specification and override of keyword arguments during +initialization. Internally, it maintains a cache of class instances +promoting efficiency. + +It can be used to create dependencies like SymbolGraph, SymbolRank, +SymbolSearch, etc. It also supports retrieving pre-created instances of +dependencies, building dependencies required for a given set of tools, +and even allows to override the creation parameters for dependencies. + +Related Symbols +--------------- + +- ``automata.singletons.Singleton`` +- ``automata.utils.symbol_provider.SynchronizationContext, ISymbolProvider`` +- ``automata.base_config.SymbolRankConfig, PyContextHandlerConfig, EmbeddingDataCategory`` +- ``automata.structures.SymbolGraph, SymbolRank, SymbolSearch`` +- ``automata.embedding_handler.SymbolCodeEmbeddingHandler, SymbolDocEmbeddingHandler`` +- ``automata.py_context.PyContextHandler, PyContextRetriever`` +- ``automata.py_io.PyReader, PyCodeWriter`` +- ``automata.exceptions.UnknownToolError`` +- ``automata.toolkits.agent_tool.Toolkits, AgentToolkitNames, AgentToolFactory`` + +Example +------- + +This example demonstrates how to initialize a ``DependencyFactory`` and +create a SymbolGraph instance: -The main functionality of ``DependencyFactory`` is to ensure that the -dependencies required by any given set of tools are created and made -available for use. +.. code:: python + + from automata.singletons.dependency_factory import DependencyFactory + from automata.utils.interface import EmbeddingDataCategory + from automata.base_config import get_embedding_data_fpath -The ``DependencyFactory`` class implements singleton design pattern, -means there will only be one instance of this class and all the required -dependencies are created on this single instance. + factory = DependencyFactory() + symbol_graph = factory.get('symbol_graph') -Import Statements: ------------------- +In the above example, ``symbol_graph`` will have the instance returned +by the ``create_symbol_graph`` method of the ``DependencyFactory``. The +instance creation is cached, all further calls to +``get('symbol_graph')`` will return the same instance. -When making use of the ``DependencyFactory`` class, below are the -dependencies to import, +Returning custom ``SymbolGraph`` instance with overridden arguments: .. code:: python - import os - import networkx as nx - from functools import lru_cache - from typing import Any, Dict, List, Set, Tuple - from automata.config.base import ConfigCategory - from automata.agent.agent import AgentToolkitNames - from automata.agent.error import AgentGeneralError, UnknownToolError - from automata.core.base.patterns.singleton import Singleton - from automata.code_handling.py.reader import PyReader - from automata.code_handling.py.writer import PyWriter - from automata.embedding.base import EmbeddingSimilarityCalculator - from automata.experimental.search.rank import SymbolRank, SymbolRankConfig - from automata.experimental.search.symbol_search import SymbolSearch - from automata.llm.providers.openai import ( - OpenAIChatCompletionProvider, - OpenAIEmbeddingProvider, - ) - from automata.memory_store.symbol_code_embedding import SymbolCodeEmbeddingHandler - from automata.memory_store.symbol_doc_embedding import SymbolDocEmbeddingHandler - from automata.retrievers.py.context import ( - PyContextRetriever, - PyContextRetrieverConfig, - ) - from automata.symbol.graph import SymbolGraph - from automata.symbol_embedding.base import JSONSymbolEmbeddingVectorDatabase - from automata.symbol_embedding.builders import ( - SymbolCodeEmbeddingBuilder, - SymbolDocEmbeddingBuilder, - ) - from automata.tools.factory import AgentToolFactory, logger - from automata.core.utils import get_config_fpath - -Usage Example -------------- - -Here is an example that showcases how ``DependencyFactory`` is used: + factory = DependencyFactory(symbol_graph_scip_fpath="/custom/path/to/scip") + symbol_graph = factory.get('symbol_graph') -.. code:: python +``symbol_graph`` in this case will be the instance created using the +overridden ``scip_filepath``. - from automata.singletons.dependency_factory import DependencyFactory +Resetting all dependencies: - # Create a DependencyFactory object setting the overrides - dep_factory = DependencyFactory(py_context_retriever_config=PyContextRetrieverConfig()) +.. code:: python - # To get the instance of a created dependency use 'get' method - symbol_ranker = dep_factory.get('symbol_rank') + factory.reset() - # After using the instances, do not forget to reset overrides - dep_factory.set_overrides() +After calling ``reset()``, all cached dependencies are cleared and +``factory.get('symbol_graph')`` will create a new SymbolGraph. Limitations ----------- -The DependencyFactory class doesn’t handle concurrent requests. -Therefore it is not suitable for a multi-threaded or a multi-processed -environment. - -To build more complex dependencies, the DependencyFactory class can -become a bit bloated and difficult to manage as the number of -dependencies increases. +- It is important to understand that the behavior of ``get`` method + will differ based on when it is called, especially if overrides have + been set. +- If setting overrides after Dependency Factory has already created + dependencies, ``Dependency Factory`` will not allow and raise + ValueError. It is suggested to set the overrides during + initialization or just after, prior to creating any dependencies. +- Depending upon the argument values provided, object creation might + fail. Make sure the arguments are in their expected formats and + contain the correct values. Follow-up Questions: -------------------- -- What are some of the solutions to handle concurrent requests for - building dependencies? -- How to manage DependencyFactory when number of dependencies - increases? +- How does DependencyFactory handle object initialization errors when + creating dependencies? +- What happens when an invalid argument is passed to the get method, is + there a default response mechanism? diff --git a/docs/singletons/index.rst b/docs/singletons/index.rst index 75f6d9e29..6d790bd53 100644 --- a/docs/singletons/index.rst +++ b/docs/singletons/index.rst @@ -22,6 +22,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/singletons/open_ai_automata_agent_toolkit_registry.rst b/docs/singletons/open_ai_automata_agent_toolkit_registry.rst index f63223970..803b1a81d 100644 --- a/docs/singletons/open_ai_automata_agent_toolkit_registry.rst +++ b/docs/singletons/open_ai_automata_agent_toolkit_registry.rst @@ -1,75 +1,93 @@ OpenAIAutomataAgentToolkitRegistry ================================== -The ``OpenAIAutomataAgentToolkitRegistry`` is a Singleton [1]_ class -that is responsible for managing and providing access to all the -registered OpenAI agent toolkit builders. This allows different parts of -the system to retrieve the correct toolkit builder when needed, without -needing to have direct knowledge of the specifics of each builder. +``OpenAIAutomataAgentToolkitRegistry`` is a singleton class that +registers and manages different types of tool builders within the +toolkit. It provides an interface to register a new toolkit builder, +fetch a list of all registered builder types, and initialize the +registry by importing modules from the builders’ package. Overview -------- -The ``OpenAIAutomataAgentToolkitRegistry`` class has three main -responsibilities: 1. It maintains a list of all registered toolkit -builders. This is done using the ``register_tool_manager`` static method -that accepts a class of type ``OpenAIAgentToolkitBuilder`` and adds it -to a list. 2. It provides a method, ``get_all_builders``, to retrieve -all the registered builders. 3. It provides an ``initialize`` method to -load all the registered builders when the system starts. +``OpenAIAutomataAgentToolkitRegistry`` uses a metaclass Singleton to +manage and maintain the lifecycle of tool builders, ensuring there is +only a single instance of registry across the application. The class +uses two static data variables: + +- ``_all_builders``: A set that holds instances of the + ``OpenAIAgentToolkitBuilder`` class. +- ``_is_initialized``: A boolean value indicating if the registry is + initialized. + +This class exposes three static methods to manage the registry: + +- ``register_tool_manager(cls)``: Adds a builder class to the + ``_all_builders`` set. +- ``get_all_builders()``: Returns a list of all registered builders. + Initializes the registry if it is not initialized yet. +- ``initialize()``: Imports modules from the builder package, + triggering the registration of the builders. Related Symbols --------------- -- ``automata.agent.providers.OpenAIAgentToolkitBuilder``: This is the - base class for all toolkit builders. Each specific toolkit builder - must subclass this and implement its methods. -- ``automata.tools.builders.PyReaderOpenAIToolkitBuilder``: This is an - example of a specific toolkit builder. It is responsible for building - ``PyReader`` tools for the OpenAI agent. -- ``automata.tools.builders.PyWriterOpenAIToolkitBuilder``: This is - another example of a specific toolkit builder. It is responsible for - building ``PyWriter`` tools for the OpenAI agent. +This class references the following symbols in its implementation: + +- ``Type[OpenAIAgentToolkitBuilder]``: The expected type of the tool + builders that are registerable with + ``OpenAIAutomataAgentToolkitRegistry``. +- ``automata.experimental.tools.builders`` +- ``automata.tools.builders`` + +Example +------- -Usage Example -------------- +Below is an example of how to use the +``OpenAIAutomataAgentToolkitRegistry``. In this example, a custom +builder class CustomToolkitBuilder is registered and retrieved using the +``OpenAIAutomataAgentToolkitRegistry``. .. code:: python - from automata.singletons.toolkit_registries import OpenAIAutomataAgentToolkitRegistry - from automata.tools.builders.py_reader import PyReaderOpenAIToolkitBuilder + from automata.singletons.toolkit_registry import OpenAIAutomataAgentToolkitRegistry - # registering a builder - OpenAIAutomataAgentToolkitRegistry.register_tool_manager(PyReaderOpenAIToolkitBuilder) + class CustomToolkitBuilder: + pass - # retrieving all builders - builders = OpenAIAutomataAgentToolkitRegistry.get_all_builders() + # Register a new Builder + OpenAIAutomataAgentToolkitRegistry.register_tool_manager(CustomToolkitBuilder) - for builder in builders: - print(builder) + # Get all registered Builders + all_builders = OpenAIAutomataAgentToolkitRegistry.get_all_builders() + + print(all_builders) # This will print [] Limitations ----------- -The ``OpenAIAutomataAgentToolkitRegistry`` class assumes that all -toolkit builders are subclasses of ``OpenAIAgentToolkitBuilder`` and -implement its interface. If a class does not implement this interface -correctly, ``OpenAIAutomataAgentToolkitRegistry`` may not work correctly -with that class. +There is a limitation to ensure that the tool builders are set up +properly and the correct modules are imported during the initialization +of the ``OpenAIAutomataAgentToolkitRegistry``. If builders are not +imported correctly during initialization, they might not be registered +correctly to the Toolkit Registry. + +Further, this class requires all builder classes to be compliant with +the ``OpenAIAgentToolkitBuilder`` type. Tool builders not meeting this +requirement might not be registered correctly. Follow-up Questions: -------------------- -- What if we need to support additional types of builders that do not - subclass ``OpenAIAgentToolkitBuilder``? - -Over time, we may need to support additional types of toolkits for new -agent models. Given this class’s current design, we would need to create -a new toolkit builder base class for each new type, and then modify -``OpenAIAutomataAgentToolkitRegistry`` to support instances of that new -class. - -.. [1] - A singleton is a design pattern that restricts a class to a single - instance. In other words, there can only ever be one instance of the - singleton class in the application. +- How can we ensure all required module builders are imported correctly + during the initialization? +- Can we handle the registration of toolkit builders that are not + compliant with the ``OpenAIAgentToolkitBuilder`` type? How can we + validate the classes being registered to the Toolkit Registry? +- How efficient is the Toolkit Registry when registering and fetching a + large number of builder instances? Is there any room for performance + optimizations? +- It’s unclear what the primary use cases for retrieving all toolkit + builders are. What scenarios or functions rely on the capability to + get all registered builders? When should a builder be retrieved + individually vs. collectively? diff --git a/docs/singletons/py_module_loader.rst b/docs/singletons/py_module_loader.rst index d0ef8072e..8260665e8 100644 --- a/docs/singletons/py_module_loader.rst +++ b/docs/singletons/py_module_loader.rst @@ -1,89 +1,72 @@ PyModuleLoader ============== -``PyModuleLoader`` is a Singleton class that provides a reliable and -efficient way to load, cache and maintain in memory Python modules -specifically in the form of ``RedBaron`` FST objects. The class aims to -map modules from their corresponding dotpaths as they are accessed. - -Overview --------- - -Conversion to ``RedBaron`` FST objects helps with the advantage of it -being a full syntax tree, which gives a more detailed representation of -the source code, preserving details like the whitespace and comments -that would be discarded by a simple Abstract Syntax Tree (AST). - -Throughout its methods, ``PyModuleLoader`` ensures initialization status -thereby maintaining the Singleton pattern. It also checks for module’s -dotpath presence to do selective loading of requested modules and offers -a variety of ways to fetch the module, depending on use-case. - -Initialization --------------- - -Initialization is performed by calling the ``initialize`` function, -passing in root_fpath and py_fpath, which default to ``get_root_fpath`` -and ``get_root_py_fpath`` respectively if they’re not provided. The -initialize method raises an Exception if paths have already been -initialized, preventing any overriding of root directories. - -Core Methods ------------- - -The ``_fetch_module`` fetches a specific module, ``_put_module`` puts a -module in the directory and the ``_fetch_existing_module_dotpath`` and -``_fetch_existing_module_fpath_by_dotpath`` return module file and dot -paths respectively. The ``_items`` method returns a dictionary listing -all modules. The ``__contains__`` checks if a module exists. +``PyModuleLoader`` is a class designed to load and cache in memory +Python modules, specified by their dot-paths, as they are accessed. It +operates using a Singleton pattern, meaning there can only be one +instance of this class in any given Python environment. This means any +changes made to ``PyModuleLoader`` within a Python session persists +throughout the entirety of the session. + +The PyModuleLoader also maintains a mapping of dot-paths to their +corresponding ‘Abstract Syntax Tree’ (AST) objects. The AST represents +Python code in a tree format, where each node corresponds to a Python +construct. This enables the object to fetch Python code in an easily +manipulatable and readable format. + +Related Symbols +--------------- + +- ``automata.singletons.singleton.Singleton`` +- ``typing.Optional`` +- ``typing.Dict`` +- ``typing.Tuple`` +- ``typing.Iterable`` +- ``ast.Module`` Example ------- -.. code:: python +Here is an example on how to use the ``PyModuleLoader``. Note that this +class needs to be initialized before usage, and path information should +exist in the specific format that this class expects. - from automata.singletons.py_module_loader import PyModuleLoader - from automata.core.utils import get_root_fpath, get_root_py_fpath - - # Initialize the loader - PyModuleLoader.initialize(root_fpath=get_root_fpath(), py_fpath=get_root_py_fpath()) +.. code:: python - # Fetch a module - module = PyModuleLoader.fetch_module('automata.core.base') + from automata.singletons.py_module_loader import PyModuleLoader as PML + import os - # Inspect the module - print(module) + root_path = os.getcwd() # The root directory path + project_name = 'project' # The project name -Related Symbols ---------------- + # Initialize the PyModuleLoader with root path and project name + PML.initialize(root_path, project_name) -- ``automata.navigation.py.dot_path_map.DotPathMap`` -- ``automata.core.base.patterns.singleton.Singleton`` -- ``automata.navigation.py.dot_path_map.DotPathMap`` -- ``automata.navigation.py.dot_path_map.DotPathMap.contains_dotpath`` -- ``automata.core.utils.get_root_fpath`` -- ``automata.core.utils.get_root_py_fpath`` + # Check if a dotpath exist in the loader + print('automata' in PML) # Replace 'automata' with actual dotpath -Dependencies ------------- + # Fetch an existing module dotpath + print(PML.fetch_ast_module('automata')) + # Replace 'automata' with actual dotpath, if the module does not exist, this will return None -- ``automata.navigation.py.dot_path_map.DotPathMap.put_module`` -- ``automata.navigation.py.dot_path_map.DotPathMap.get_module_dotpath_by_fpath`` + # Reset the PyModuleLoader + PML.reset() Limitations ----------- -One limitation is the dependency on ``DotPathMap`` to manage directories -and files with assurance on initialization. There is also a need to -manually ensure initialization with ``_assert_initialized`` in every -method. +A specific limitation for ``PyModuleLoader`` is that the path +information initialization is required before using any method in the +class. Also, the Python module loader is designed to operate within a +specific directory structure, so if modules are structured differently, +adjustments will be needed. Follow-up Questions: -------------------- -- How can we handle module’s existence checks better to prevent - redundant file accesses? -- How can we enhance the Singleton design pattern application to not - manually ensure initialization in every context? -- Is there a way to optimize or remove the type-ignoring comments which - are present now to suppress warnings? +- How to handle the cases of different directory structure for Python + Modules? +- What is the best solution to avoid repeating the + ``_assert_initialized`` call in every single method? +- What is the strategy to remove type: ignore comments? Is there any + sufficient automated method to achieve this? diff --git a/docs/singletons/repository_client.rst b/docs/singletons/repository_client.rst index 275ea0044..15f1502b7 100644 --- a/docs/singletons/repository_client.rst +++ b/docs/singletons/repository_client.rst @@ -1,19 +1,98 @@ -- As the ``RepositoryClient`` defines an abstract base class, the - methods do not have an implementation and instead serves as a base - for other derived classes. If we are dealing with Git service APIs - that return data structures different from what the abstract methods - in ``RepositoryClient`` specify, it’s the responsibility of the - derived class to handle it properly. When implementing the abstract - methods, the derived class should process the data returned by the - API and format it to match the requirements. - -- It is possible to implement some default behavior in the - ``RepositoryClient``. However, the purpose of using an abstract base - class is to define a common interface for its subclasses. Adding - default behavior in the base class might not make sense in all usage - scenarios and can add unnecessary complexity. If there are common - operations that are shared across all the subclasses, it might be - beneficial to define a utility module or class where all these common - functionalities can be placed. Alternatively, we can have a base - class that implements these common methods and have other classes - inherit from this base class. +RepositoryClient +================ + +``RepositoryClient`` is an abstract base class designed to manage +repositories. It provides an interface for performing various operations +on both local and remote Git repositories such as cloning, creating +branches, checking out branches, staging changes, committing and pushing +changes, creating and merging pull requests, and checking for the +existence of branches. + +Overview +-------- + +``RepositoryClient`` serves as a blueprint for creating classes that +interact with Git repositories. Each method represents a fundamental Git +operation. The methods of this class are abstract, meaning a concrete +class that inherits from ``RepositoryClient`` must provide an +implementation for these methods. + +Related Symbols +--------------- + +There are no closely related symbols in the current context. However, +you should look to the implementation classes of this abstract base +class for related symbols. + +Examples +-------- + +Below is an outline of how an implementation class might look like, +providing concrete implementations for the abstract methods of +``RepositoryClient``: + +.. code:: python + + class GitPythonRepoClient(RepositoryClient): + def clone_repository(self, local_path: str) -> Any: + """Implementation of cloning a repository""" + pass + + def create_branch(self, branch_name: str) -> Any: + """Implementation of creating a new branch""" + pass + + def checkout_branch(self, repo_local_path: str, branch_name: str) -> Any: + """Implementation of checking out a branch""" + pass + + def stage_all_changes(self, repo_local_path: str) -> Any: + """Implementation of staging all changes""" + pass + + def commit_and_push_changes(self, repo_local_path: str, branch_name: str, commit_message: str) -> Any: + """Implementation of committing and pushing changes""" + pass + + def create_pull_request(self, branch_name: str, title: str, body: str) -> Any: + """Implementation of creating a pull request""" + pass + + def merge_pull_request(self, pull_request_number: int, commit_message: str) -> PullRequestMergeStatus.PullRequestMergeStatus: + """Implementation of merging a pull request""" + pass + + def branch_exists(self, branch_name: str) -> bool: + """Implementation of checking if a branch exists""" + pass + +The actual implementation of these methods will depend on the specific +git library being used (such as GitPython, pygit2, etc.) + +Limitations +----------- + +``RepositoryClient`` itself does not perform any actions, it just +defines the interface that should be implemented. Therefore, the +limitations are mainly dependent on the specific implementations of this +abstract base class. These limitations might be related to error +handling, repository access or manipulation limitations, compatibility +with different Git versions, etc. + +Follow-up Questions: +-------------------- + +- Which classes implement this abstract base class and how do they + provide the functionality for managing repositories? +- How does this class integrate with other components? Does it work + solely with Git repositories or can it handle other version control + systems? +- Are there any edge cases or special considerations in the + implementation of these methods? + +The information provided in the context about the ``RepositoryClient`` +is abstract, since it only defines an interface without any +implementation. Therefore, detailed implementation specifics, edge cases +and interacting classes might be missing from this documentation. For +specific details, the documentation of the implementing classes should +be consulted. diff --git a/docs/symbol/base/index.rst b/docs/symbol/base/index.rst index ed2252357..f6256b199 100644 --- a/docs/symbol/base/index.rst +++ b/docs/symbol/base/index.rst @@ -17,6 +17,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/symbol/graph/caller_callee_processor.rst b/docs/symbol/graph/caller_callee_processor.rst index 4d2dc1659..606e12828 100644 --- a/docs/symbol/graph/caller_callee_processor.rst +++ b/docs/symbol/graph/caller_callee_processor.rst @@ -1,37 +1,70 @@ -- Unfortunately, the type of document that ``CallerCalleeProcessor`` - works with isn’t specified in the provided information. However, - given the context, it’s likely that this class works with codebase - documents (such as Python files), containing ``Symbol`` objects - representing different elements of code (classes, methods, etc.). - -- If a ``Symbol`` node refers to something other than a method or - class, the behavior of ``CallerCalleeProcessor`` would depend on its - implementation. However, it’s likely that it simply wouldn’t create - an edge in such situations since the intention is to map - relationships between classes and methods. - -- The question of whether ``CallerCalleeProcessor`` handles recursive - calls isn’t directly addressed in the provided text. However, since - it’s mapping the relationships between caller and callee and since - recursive calls are a relationship between a method and itself, - there’s a high chance that it would handle them appropriately. - -- In terms of error handling during symbol parsing and reference - fetching, it is mentioned that these errors are logged, indicating - that there’s some kind of error handling system in place. However, - the specifics of this system aren’t detailed. - -- The question of whether there’s a way to optimize the ``process`` - method to make it less resource-intensive is a common concern with - resource-heavy operations. While there’s no specific mention in the - provided text, common methods to optimize such operations could - include limiting the number of nodes processed at once, using more - efficient data structures, caching results, and doing lazy loading - when possible. - -- In the case of a large graph with many symbols, the performance of - the ``process`` method would depend on many factors, including the - implementation of the method and the resources available (CPU, - memory, etc.). Techniques like partitioning the graph, parallel - processing, and memory-optimized data structures can be used to help - manage performance on large graphs. +CallerCalleeProcessor +===================== + +``CallerCalleeProcessor`` is a class that extends the functionality of +``GraphProcessor``. The core role of this class is to add edges to a +``MultiDiGraph`` based on the caller-callee relationships between +``Symbol`` nodes. One symbol is considered a caller of another if it +performs a call to the latter. + +Overview +-------- + +The ``CallerCalleeProcessor`` class requires a ``MultiDiGraph`` and a +``document`` during initialization. It uses these inputs to process and +generate edges based on caller-callee relationships. It catches any +exceptions during parsing or data retrieval, thus ensuring that +processing continues despite minor errors. + +Related Symbols +--------------- + +- ``networkx.MultiDiGraph`` +- ``automata.symbol.graph.symbol_graph_navigator.SymbolGraphNavigator`` +- ``automata.symbol.graph.symbol_descriptor.SymbolDescriptor`` + +Example +------- + +Below is a simple example of how to get started with +``CallerCalleeProcessor``. + +.. code:: python + + import networkx as nx + from automata.symbol.graph.symbol_caller_callees import CallerCalleeProcessor + from automata.symbol.document import Document + + # Create a random MultiDiGraph and document + graph = nx.MultiDiGraph() + document = Document() + + # Initialize and use the CallerCalleeProcessor + processor = CallerCalleeProcessor(graph, document) + processor.process() + +This script will add edges to the input graph according to the +caller-callee relationships found in the document’s symbols. + +Limitations +----------- + +The ``CallerCalleeProcessor`` has a few limitations to be aware of: + +1. Constructing the ``CallerCalleeProcessor`` is an expensive operation. + Hence, its instantiation should be used sparingly. +2. The ``process`` method is marked with a TODO to be split into smaller + methods. This indicates that the ``process`` method may perform more + operations than one might expect from a single function, and could + potentially be improved for readability, maintainability and testing. +3. Exceptions are caught and logged, but the exact nature of various + errors are not rethrown or handled further. This might lead to + circumstances where the execution continues despite critical errors. + +Follow-up Questions: +-------------------- + +- How can we optimize the construction of CallerCalleeProcessor? +- What would a suitable strategy be for splitting the ``process`` + method into smaller functions? +- How could we handle exceptions in a more granular manner? diff --git a/docs/symbol/graph/graph_builder.rst b/docs/symbol/graph/graph_builder.rst index 6d0283250..175c08709 100644 --- a/docs/symbol/graph/graph_builder.rst +++ b/docs/symbol/graph/graph_builder.rst @@ -1,19 +1,81 @@ -- For the first question, it really depends on the use cases and - requirements. The current options to build relationships, references, - and caller-callee relationships, cover a wide range of information in - the symbol graph. However, there may be specific contextual - information that could be useful to capture. An example could be - building edges for symbols, which are in the same file or module, or - belong to the same class hierarchy. - -- Regarding the memory issue, one possible solution could be using a - disk-based graph database, such as Neo4j or Amazon Neptune, instead - of an in-memory data structure. These databases are designed to - efficiently store and query large volumes of interconnected data. - However, this would require significant changes to the - ``GraphBuilder`` and its related classes. Alternatively, the size of - the graph could also be reduced by filtering out less important nodes - and edges, or by simplifying the data structure of the nodes. For - example, instead of storing all properties of a ``Symbol``, we could - only store its unique identifier and maintain a separate lookup table - for its properties. +GraphBuilder +============ + +``GraphBuilder`` is a class that the main purpose is to build a network +graph (``SymbolGraph``) from a distinct ``Index``. It provides a concise +way to create and manage the graph for symbol information included in +the documents. It includes functionalities to load data, generate graph +notes, and edges based on relationships, references, and calls between +``Symbol`` nodes. + +Overview +-------- + +The ``GraphBuilder`` class in the +``automata.symbol.graph.graph_builder`` module aims to create a symbol +information graph by iterating over the documents in a given index. It +has various methods that help manage and create graph nodes and edges. +It sets up nodes for each symbol found in the documents, and edges are +formed based on the reference, relationship, or call between two +``Symbol`` nodes. The data used in the graph may also be retrieved or +stored using pickle files for efficient loading and saving. + +Related Symbols +--------------- + +- ``automata.symbol.graph.symbol_graph.SymbolGraph`` +- ``automata.singletons.github_client.GitHubClient`` +- ``automata.core.ast_handlers.fetch_bounding_box`` +- ``automata.symbol.graph.symbol_caller_callees.CallerCalleeProcessor`` +- ``automata.symbol.graph.symbol_relationships.RelationshipProcessor`` +- ``automata.symbol.graph.symbol_references.ReferenceProcessor`` +- ``automata.symbol.graph.symbol_graph_base.GraphProcessor`` +- ``automata.core.base.database.vector_database.ChromaVectorDatabase`` +- ``automata.symbol.graph.symbol_graph.SymbolGraph._build_default_rankable_subgraph`` + +Example +------- + +Given this class relies on a specific index structure, a direct simple +example cannot be provided. However, you should first create an +``Index`` that matches your needs and then use ``GraphBuilder`` to +create a ``SymbolGraph``. Here is a generic description using mock +objects: + +.. code:: python + + from automata.symbol.graph.graph_builder import GraphBuilder + from your_module import YourIndex + + # Suppose you have your own Index class + index = YourIndex('index_file_path') + + # Instantiate GraphBuilder + graph_builder = GraphBuilder(index=index, build_references=True, build_relationships=True, build_caller_relationships=False) + + # Now you can build your graph + graph = graph_builder.build_graph(from_pickle=False, save_graph_pickle=False) + +Keep in mind that this example assumes you have an ``Index`` object +``YourIndex`` to provide for ``GraphBuilder``. ``from_pickle`` indicates +if the graph should be loaded from a pickle file, and +``save_graph_pickle`` denotes if the generated graph should be stored as +a pickle file. + +Limitations +----------- + +``GraphBuilder`` assumes the index provided is in a specific structure, +which means it might not work correctly with arbitrary index data +structures. Also, the ``build_graph`` method could potentially raise a +ValueError if no index data is found or is inaccessible. + +Follow-up Questions: +-------------------- + +- What is the exact structure of the Index that GraphBuilder expects? +- What should be the content of the index file for optimal results? +- Could there be optimization in the graph building process especially + for large data? +- Can the support for additional relationships be added to enhance the + graph’s descriptiveness? diff --git a/docs/symbol/graph/graph_processor.rst b/docs/symbol/graph/graph_processor.rst index 59c4d4be1..75b3e16cd 100644 --- a/docs/symbol/graph/graph_processor.rst +++ b/docs/symbol/graph/graph_processor.rst @@ -1,30 +1,9 @@ -``GraphProcessor`` interacts with classes like ``CallerCalleeProcessor`` -and ``ReferenceProcessor`` by being a parent class to them. -``CallerCalleeProcessor`` and ``ReferenceProcessor`` are specific -instances of a ``GraphProcessor`` that implement the ``process`` method -according to the specific processing they need to do on a -``MultiDiGraph``. For example, the ``CallerCalleeProcessor`` might -implement the ``process`` method to process edges representing -caller-callee relationships while the ``ReferenceProcessor`` may process -edges representing reference relationships. +class GraphProcessor(ABC): ‘Abstract base class for processing edges in +the ``MultiDiGraph``.’ -As for the question about the need for the ``process`` function in a -child class, since it’s declared as an abstract method in the -``GraphProcessor`` class, all child classes are obliged to implement -this method. Abstract methods enforce a specific contract or interface -for a class which the subclasses HAVE to follow. This enforces a design -where all graph processors WILL have a ``process`` method, thereby -giving certainty about the existence of a processing interface -irrespective of what specific type of ``GraphProcessor`` it is. However, -the exact nature of the processing done in the ``process`` method would -depend on the respective child class. If a use-case does not need a -process method, it probably should not inherit from ``GraphProcessor``. +:: -Follow-up Questions: -~~~~~~~~~~~~~~~~~~~~ - -- Can you provide examples on how ``CallerCalleeProcessor`` and - ``ReferenceProcessor`` implement the ``process`` method? -- Is there an operator overloading happening in the base or child class - to accommodate multiple types of ‘graph processing’? -- How do you decide on which ``GraphProcessor`` to use and when? + @abstractmethod + def process(self) -> None: + 'Adds new edges of the specified type to the graph.' + pass diff --git a/docs/symbol/graph/index.rst b/docs/symbol/graph/index.rst index 4c12c8e2e..a66e62455 100644 --- a/docs/symbol/graph/index.rst +++ b/docs/symbol/graph/index.rst @@ -23,6 +23,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/symbol/graph/reference_processor.rst b/docs/symbol/graph/reference_processor.rst index ab29f94e7..30cafa5ac 100644 --- a/docs/symbol/graph/reference_processor.rst +++ b/docs/symbol/graph/reference_processor.rst @@ -1,21 +1,69 @@ -- To handle parsing errors more gracefully within - ``ReferenceProcessor.process()``, a possible way would be to add a - default representation for unparseable symbols. This could be a - certain default symbol node which all unparseable symbols are - connected to. This way, we would still have a record of these - unparseable symbols in the graph, which could be useful for later - analysis or debugging. However, this approach should be treated - carefully as this might lead to misleading interpretations of the - graph (i.e. many references to this default node might be considered - as a prominent symbol in the graph). - -- If ``ReferenceProcessor`` was dealing with a standard ``DiGraph`` - instead of a ``MultiDiGraph``, multiple references between the same - pair of nodes would overwrite each other since ``DiGraph`` does not - support multi-edges. This would lose important information about the - number and context of these multiple references, and may affect the - accuracy of subsequent analyses based on the graph. Therefore, for - use cases where multiple independent references between the same pair - of nodes are possible (like a codebase where a function or variable - might be referenced multiple times), a ``MultiDiGraph`` would be the - more correct choice. +ReferenceProcessor +================== + +``ReferenceProcessor`` is a class that extends from the +``GraphProcessor``. It adds edges to a ``MultiDiGraph`` for the +references that exist between ``Symbol`` nodes in the given document. + +Overview +-------- + +``ReferenceProcessor`` takes in a multidi-graph and a document in its +constructor. The ``process`` method adds edges for symbol references in +the graph. It does this by examining each occurrence of a symbol in the +document, parsing the symbol, and adding an edge for the symbol +reference in the multi-di-graph. If the symbol role also includes a +definition, the ‘contains’ edge to the symbol is ensured to emanate from +the document, removing any incorrect ‘contains’ edges from the graph. + +The ``_process_symbol_roles`` static method takes a role (represented as +an integer), and returns a dictionary mapping each role name to a +boolean value indicating whether the role is present in the given +integer representation. + +Related Symbols +--------------- + +- ``networkx.MultiDiGraph`` +- ``Symbol`` +- ``automata.symbol.graph.symbol_references.GraphProcessor`` + +Example +------- + +Here is an example of how to use ``ReferenceProcessor``: + +.. code:: python + + from networkx import MultiDiGraph + from automata.symbol.graph.symbol_references import ReferenceProcessor + + # Assuming `document` is an object with `occurrences` attribute, + # where each occurrence includes the `symbol`, `range` and `symbol_roles`. + + graph = MultiDiGraph() + processor = ReferenceProcessor(graph, document) + processor.process() + +Please note that this example is a simplified demo. The actual usage of +this class would occur in a more complex scenario where a complete +document is processed to update a given multi-di-graph. + +Limitations +----------- + +One limitation of the ``ReferenceProcessor`` is when the parsing of the +symbol in an occurrence fails. It logs an error and moves on to the next +occurrence. In practice, depending on the cause of the exception, you +may lose valuable information or references in your graph when such +errors occur. + +Follow-up Questions: +-------------------- + +- How can we handle parsing errors in a more robust way? +- What is the potential impact on the multi-di-graph of skipping + occurrences where parsing the symbol failed? +- Is there a way to handle different ``Symbol`` types in a more generic + way? The current approach seems to assume that all symbols will have + the same kind of roles and attributes, which might not be the case. diff --git a/docs/symbol/graph/relationship_processor.rst b/docs/symbol/graph/relationship_processor.rst index 38ad04788..1f43a2c9e 100644 --- a/docs/symbol/graph/relationship_processor.rst +++ b/docs/symbol/graph/relationship_processor.rst @@ -1,24 +1,77 @@ -- ``relationship_labels.pop('symbol')`` removes the ‘symbol’ key from - the dictionary ``relationship_labels``, returning the value - associated with it. This is likely done because the ‘symbol’ key is - not used in the graph. - -- ``symbol_information`` is expected to be a list of ``Symbol`` - Protobuf objects. These objects represent software entities such as - classes or functions. - -- Some potential exceptions could be ``KeyError`` when trying to - ``pop()`` a non-existent key from the dictionary. There could also be - errors if the ``symbol_information`` list does not contain valid - ``Symbol`` Protobuf objects or if there are inconsistencies in the - structure of these objects. - -- The ``RelationshipProcessor`` class is mainly used in the context of - mapping a codebase, particularly to analyze and visualize - relationships between different code entities such as classes, - methods or functions. This helps to understand the dependencies and - interactions within the codebase. It forms part of a process that - includes other components like searching Git repositories, parsing - source code files, or generating symbol definitions from parsed code. - Its result can be used for several purposes like code completion, - code base navigation, and even code generation. +RelationshipProcessor +===================== + +``RelationshipProcessor`` is a class in the +``automata.symbol.graph.symbol_relationships`` module that helps in +adding edges to a ``MultiDiGraph`` in the context of ``Symbol`` nodes’ +relationships. + +Overview +-------- + +``RelationshipProcessor`` implements a processor mechanism for handling +relationship among ``Symbol`` nodes. The primary purpose of this class +is to process relationships in the form of inheritance among symbols and +records these relationships within a ``MultiDiGraph``. ``Symbol``\ s are +considered related if they share an inheritance relationship. + +In this context, when referring to relationships, it can be seen as the +“inheritance” between classes or “implementation” between an interface +and a class. + +Related Symbols +--------------- + +- ``nx.MultiDiGraph``: Multi directed graph from ``networkx`` library + used as data structure. +- ``parse_symbol``: A function (Not visible from the context) to parse + symbol information. +- ``MessageToDict``: A function (Not visible from the context) to + convert a message to dictionary. + +Usage Example +------------- + +The following is an example demonstrating how to use +``RelationshipProcessor``. As the ``parse_symbol`` and ``MessageToDict`` +functions are not visible in the provided context, we’ll use mock +versions of these functions. + +.. code:: python + + import networkx as nx + from automata.symbol.graph.symbol_relationships import RelationshipProcessor + + # Assuming parse_symbol & MessageToDict are importable + from automata.utils import parse_symbol, MessageToDict + + # Note: `symbol_information` structure not detailed in the context + symbol_information = {...} + + graph = nx.MultiDiGraph() + processor = RelationshipProcessor(graph=graph, symbol_information=symbol_information) + processor.process() + +Please make sure to replace ``{...}`` with actual symbol information. + +Limitations +----------- + +The details regarding the format and structure of ``symbol_information`` +provided to the ``RelationshipProcessor`` is not provided in the +context. This structure is pivotal to the functionality of +``RelationshipProcessor`` as it processes the relationships based on +this data. + +Ensuring the proper functioning of the ``RelationshipProcessor`` +requires precise understanding of the implementation of the +``parse_symbol`` and ``MessageToDict`` functions, which are not provided +in the current context. + +Follow-up Questions: +-------------------- + +- What is the expected format and structure of ``symbol_information`` + in ``RelationshipProcessor``? +- Could we get more information - such as function definitions and + imports - for ``parse_symbol`` and ``MessageToDict``? diff --git a/docs/symbol/graph/symbol_graph.rst b/docs/symbol/graph/symbol_graph.rst index f690cd5cb..392d311a6 100644 --- a/docs/symbol/graph/symbol_graph.rst +++ b/docs/symbol/graph/symbol_graph.rst @@ -1,37 +1,91 @@ -1. Extending ``SymbolGraph`` to support more types of relationships like - inheritance and usage is feasible and indeed very useful. This would - involve adding additional logic to detect and represent these - relationships in the underlying data structures. The complexity would - depend on various factors such as the specifics of the file format - and programming language(s) being analyzed. It’s important to note - however, that supporting such relationships may increase memory usage - and computation time, and therefore it is important to optimize the - representations and processing algorithms. - -2. Currently, ``automata`` doesn’t support the generation of the index - file. This is a valid suggestion and could potentially add - significantly to the usefulness of ``SymbolGraph``. Including this - functionality directly in ``automata`` could help users create and - manipulate ``SymbolGraph`` objects without needing to interact with - external tools. This could be implemented as a new feature request if - it aligns with the library’s overall goals and design philosophy. - -3. Performance is a key concern when working with large codebases. As - the number of symbols and relationships increases, so does the - complexity and memory usage of the graph. It’s hard to provide - specific numbers without implementation details and benchmarks, but - in principle, handling millions of symbols and relationships would - require efficient graph algorithms and data structures, and might - necessitate the use of techniques such as graph pruning or - partitioning. - - Furthermore, performance doesn’t depend only on the number of symbols - and relationships. The kinds of operations performed on the graph - (e.g. query speed, update speed), the types of relationships - considered, and the density of the graph also play a role. - - Overall, this reinforces the need for continued care in the design - and implementation of ``SymbolGraph`` to strike the right balance - between expressive power and scalability. Design decisions and - optimizations would need to be carefully considered to cater to the - varying needs of different users. +SymbolGraph +=========== + +The ``SymbolGraph`` class constructs a graph that consists of symbols +that represent files and their relationships. The dependencies between +the symbols are represented as either “contains”, “reference”, +“relationship”, “caller”, or “callee”. + +Each ``SymbolGraph`` instance could be initialized with ``index_path`` +(a string that represents the path where the index file located), +``build_references``, ``build_relationships``, +``build_caller_relationships`` boolean values to specify the type of +relationships to build, ``from_pickle`` option to specify if the graph +needs loading from a pickle file, and ``save_graph_pickle`` option to +decide if the generated graph should be saved in a pickle. + +``SymbolGraph`` also provides several methods to retrieve information +about the symbols, such as ``get_symbol_dependencies`` that returns a +set of symbols that a given symbol directly references or uses, and +``get_symbol_relationships`` that returns a set of symbols that have any +type of relationships with the given symbol. + +The ``SymbolGraph`` class provides methods like +``default_rankable_subgraph`` to create a subgraph that only contains +rankable symbols and their dependencies. ``filter_symbols`` is used to +remove symbol nodes from the graph that are not present in the provided +list. + +Using the ``from_graph`` classmethod, you can create a new instance of +``SymbolGraph`` from an existing networkx MultiDiGraph object. + +Usage Example +------------- + +This section presents a simple example of how to create and use a +``SymbolGraph`` instance. + +.. code:: python + + from automata.symbol.graph.symbol_graph import SymbolGraph + from automata.symbol.symbol_base import Symbol + + # Initialize a SymbolGraph + symbol_graph = SymbolGraph( + index_path="/path/to/index", + build_references=True, + build_relationships=True, + build_caller_relationships=True, + from_pickle=True, + save_graph_pickle=True + ) + + # Assume we have a Symbol instance + symbol = Symbol(...) + + # Get all symbol dependencies + dependencies = symbol_graph.get_symbol_dependencies(symbol) + + # Get all symbol relationships + relationships = symbol_graph.get_symbol_relationships(symbol) + + # Get potential callers of the given symbol + potential_callers = symbol_graph.get_potential_symbol_callers(symbol) + + # Get potential callees of the given symbol + potential_callees = symbol_graph.get_potential_symbol_callees(symbol) + + # Get the references to the given symbol + references = symbol_graph.get_references_to_symbol(symbol) + +Limitations +----------- + +``SymbolGraph`` is very dependent on the correct structure of the index +file provided at initialization. Improperly structured index files may +lead to wrong relationships between symbols. The current implementation +uses Set to store the symbols, which can eliminate duplications but do +not retain order. It might be worth considering a List to preserve order +in future implementations. + +Follow-up Questions: +-------------------- + +- How would the relationships in the graph change if List was used to + store the symbols instead of Set? +- What kind of error handling or validation could be implemented to + guard against improperly structured index files? Would it be helpful + to create an index file validator or reader? +- Would it be beneficial to extend the SymbolGraph class to handle + different types of graphs other than the current one built around + ranked symbols? diff --git a/docs/symbol/graph/symbol_graph_navigator.rst b/docs/symbol/graph/symbol_graph_navigator.rst index 0ba6f3035..e1b06fb4c 100644 --- a/docs/symbol/graph/symbol_graph_navigator.rst +++ b/docs/symbol/graph/symbol_graph_navigator.rst @@ -1,27 +1,73 @@ -- If the graph passed to ``SymbolGraphNavigator`` is not correctly - structured or doesn’t have all the necessary details, the various - methods in the class like ``get_potential_symbol_callees``, - ``get_potential_symbol_calless``, etc., might return incomplete or - incorrect results. It would be best to perform a validation check on - the graph before passing it to ``SymbolGraphNavigator``. - -- As for handling missing or incorrect labels on the graph edges, one - approach could be to raise informative errors when such labels are - not found. The error message could include details about what labels - were expected and suggestions about how to fix the graph. - Alternatively, the class could provide a method for adding or - updating labels on the graph edges. - -- Whether or not functionality should be added to modify the graph in - ``SymbolGraphNavigator`` depends on the use cases of this class. - Currently, the class seems to be designed mainly for navigating the - graph and querying relationships between the symbols. If there were - use cases where the graph needed to be modified after the initial - creation, then it would be worth considering the addition of graph - modification functionality. However, if such needs arise, it might be - more appropriate to handle them in a separate class, in order to keep - the responsibilities of ``SymbolGraphNavigator`` clearly defined and - its implementation more manageable. Another approach could be to have - a separate ``SymbolGraphUpdater`` or ``SymbolGraphEditor`` class - responsible for adding, deleting or modifying nodes or edges in the - graph. +SymbolGraphNavigator +==================== + +``SymbolGraphNavigator`` is a helper class that aids in navigation +within a symbol graph. It contains methods to fetch symbol dependencies, +relationships and references within the symbol graph. The class also +includes methods to retrieve potential callers and callees of a symbol. + +Overview +-------- + +The ``SymbolGraphNavigator`` class is initialized with a +``MultiDiGraph`` object from the networkx library, which represents the +symbol graph. The class includes several methods to get information +about symbols, their references and relationships within the symbol +graph, offering a flexible and convenient way to navigate and analyse +the graph data. + +Related Symbols +--------------- + +- ``networkx.MultiDiGraph``: The ``MultiDiGraph`` class from networkx + is used as the base structure to represent the symbol graph. +- ``nx.in_edges``, ``nx.out_edges``: methods from networkx used to + filter and process the nodes of the ``MultiDiGraph``. + +Example +------- + +Below is an example of how to use the ``SymbolGraphNavigator`` class. +Please note that this example assumes that you have a ``MultiDiGraph`` +object ready to use. + +.. code:: python + + import networkx as nx + from automata.symbol.graph.symbol_navigator import SymbolGraphNavigator + + # assuming a MultiDiGraph named symbol_graph is ready + symbol_graph_navigator = SymbolGraphNavigator(symbol_graph) + + # Fetching all the supported symbols in the symbol graph + supported_symbols = symbol_graph_navigator.get_sorted_supported_symbols() + +Limitations +----------- + +``SymbolGraphNavigator`` needs a well-defined ``MultiDiGraph`` as input +during instantiation, it doesn’t contain any features to construct or +validate the graph. It doesn’t provide any mechanisms to avoid loops in +the graph structure either. + +The ``_get_symbol_containing_file`` method returns the parent file of a +symbol and will raise an assertion error if a symbol has anything other +than exactly one parent file. This limitation should be taken into +account when architecting the symbol graph. + +The ``_pre_compute_rankable_bounding_boxes`` method relies on module +loader (``py_module_loader``) being initialized before invocation. +Failing this, it raises a ``ValueError``. This comes as a constraint +when planning the order of operations while using the +``SymbolGraphNavigator``. + +Follow-up Questions: +-------------------- + +- How can we provide mechanism to construct and validate + ``MultiDiGraph`` within ``SymbolGraphNavigator``? +- How can the class be expanded to support multiple parent files for a + single symbol? +- How can ``SymbolGraphNavigator`` ensure pre-initialization of the + module loader before invoking + ``_pre_compute_rankable_bounding_boxes``? diff --git a/docs/symbol/graph/symbol_graph_type.rst b/docs/symbol/graph/symbol_graph_type.rst index 423559251..f385e1d04 100644 --- a/docs/symbol/graph/symbol_graph_type.rst +++ b/docs/symbol/graph/symbol_graph_type.rst @@ -1,26 +1 @@ -- The exact behavioural difference between ``DYNAMIC`` and ``STATIC`` - graph types is not stated explicitly in the current context. However, - typically, a ``DYNAMIC`` graph is expected to reflect real-time - changes in the symbols and their relations, while a ``STATIC`` graph - remains constant over the execution period once built. - -- Applications or functions specifically might require one type over - the other depending on the usage context. If the application involves - real-time data monitoring or updating symbol relationships as the - program runs (like code documentation generation in an IDE or live - code analyzing), a ``DYNAMIC`` graph may be requested. On the other - hand, for applications that only require a one-time analysis or - structure extraction (like static code analyzing tools), a ``STATIC`` - graph would suffice. - -- The ability to switch between ``DYNAMIC`` and ``STATIC`` types for a - single ``SymbolGraph`` instance in runtime generally lies in the - functional specification and design of the ``SymbolGraph`` object - itself. However, there’s no mention of such a feature in the provided - context. The flexibility to switch between ``DYNAMIC`` and ``STATIC`` - types would need to be incorporated at design time and could involve - creating separate ``SymbolGraph`` objects or updating the graph’s - properties. It’s important to consider that switching from STATIC to - DYNAMIC could possibly demand additional resources for constant - updates, while switching from DYNAMIC to STATIC may imply that the - program is no longer interested in subsequent modifications. +class SymbolGraphType(Enum): DYNAMIC = ‘dynamic’ STATIC = ‘static’ diff --git a/docs/symbol/i_symbol_provider.rst b/docs/symbol/i_symbol_provider.rst index 3a0590934..e6ccb03d7 100644 --- a/docs/symbol/i_symbol_provider.rst +++ b/docs/symbol/i_symbol_provider.rst @@ -1,89 +1,81 @@ ISymbolProvider =============== -``ISymbolProvider`` is an abstract base class that provides an interface -for classes that work with a collection of symbols. It contains several -methods aimed at managing, updating, and retrieving symbols. - Overview -------- -``ISymbolProvider`` is an abstract base class in the Automata library. -Its main purpose is to enforce a standard interface for classes that -manage symbolic representations in the library. It includes methods to -filter, sort, and manipulate symbols, as well as methods to mark symbol -collection as synchronized. ``ISymbolProvider`` is instantiated -indirectly via a child class. +``ISymbolProvider`` is an abstract base class that represents an +interface for providing access to symbols. Here, symbols are defined as +objects that represent certain attributes or functionalities in the +system. The class contains methods for retrieving and filtering +supported symbols, as well as setting a “synchronized” flag which +indicates whether symbols are ready for retrieval. -Methods -------- +The class appears to be designed with a pattern for inheriting classes +to define their own ways of obtaining and filtering symbols. Once these +symbols are processed and synchronized, the +``get_sorted_supported_symbols`` method can be used to retrieve them. -The core methods in the ``ISymbolProvider`` class include: +Notably, the ``ISymbolProvider`` requires sorted symbols and provides +error handling in the ``get_sorted_supported_symbols`` method to make +sure that the symbols are sorted and synchronized before retrieval. -- ``__init__``: Initializes a new instance of an ``ISymbolProvider`` - subclass with the ``is_synchronized`` flag set to ``False``. +The following methods are included in ``ISymbolProvider``: +``_get_sorted_supported_symbols, filter_symbols, get_sorted_supported_symbols, set_synchronized``. -- ``filter_symbols``: An abstract method that needs to be implemented - by any subclass. It is designed to filter the set of symbols managed - by the class. +Related Symbols +--------------- -- ``get_sorted_supported_symbols``: This method retrieves a list of - sorted symbols. If the ``is_synchronized`` flag is ``False``, a - ``RuntimeError`` is raised. It checks that the symbols are properly - sorted. +At this time, none have been specified. -- ``set_synchronized``: This method sets the ``is_synchronized`` flag - to the provided value. This method is used to update the state of the - ``ISymbolProvider`` instance. +Usage Example +------------- -Related Symbols ---------------- +Due to the abstract nature of the ``ISymbolProvider`` class, we cannot +create an instance of it directly. Instead, we need to create a subclass +that implements the abstract methods, like so: -Some symbolic classes and methods that are related to -``ISymbolProvider`` include: +.. code:: python -- ``automata.tests.unit.test_symbol_graph.test_get_all_symbols`` -- ``automata.tests.unit.test_symbol_graph.test_build_real_graph`` -- ``automata.context_providers.symbol_synchronization.SymbolProviderRegistry`` -- ``automata.tests.unit.test_symbol_search_tool.test_retrieve_source_code_by_symbol`` + from automata.symbol.symbol_base import ISymbolProvider, Symbol -Example -------- + class MySymbolProvider(ISymbolProvider): -As ``ISymbolProvider`` is an abstract class, it cannot be directly -instantiated. A subclass implementing the ``filter_symbols`` function -must be created: + def _get_sorted_supported_symbols(self): + # For demo purpose, we will simply return a list of Symbols objects. + # In practical scenario, the implementation will fetch the right set of Symbol objects + return [Symbol("symbol_1"), Symbol("symbol_2"), Symbol("symbol_3")] -.. code:: python + def filter_symbols(self, sorted_supported_symbols): + # For demo purpose, we will not filter the symbols. + # In practical scenario, the implementation may return a subset of the symbols based on certain criteria + return sorted_supported_symbols - class SymbolProviderExample(ISymbolProvider): - def filter_symbols(self, sorted_supported_symbols: List[Symbol]) -> None: - self.sorted_supported_symbols = sorted_supported_symbols + provider = MySymbolProvider() + provider.set_synchronized(True) + + symbols = provider.get_sorted_supported_symbols() + print(symbols) # Outputs: [Symbol("symbol_1"), Symbol("symbol_2"), Symbol("symbol_3")] Limitations ----------- -One major limitation of ``ISymbolProvider`` is that it is an abstract -class. This means it cannot be directly instantiated. Instead, -developers must subclass ``ISymbolProvider`` and provide an -implementation for the ``filter_symbols`` method. - -Another potential limitation is the synchronization requirement, where -the ``is_synchronized`` flag needs to be set prior to attempting to -retrieve symbols. This could potentially lead to runtime exceptions -depending on the order of operations. +``ISymbolProvider`` doesn’t impose any kind of constraints on what the +supported symbols can be or how they are provided. The nature of the +symbols and their sources are fully dependent on the specific +implementation of the subclass. As such, ``ISymbolProvider`` by itself +does not provide a usable implementation and does not have any +meaningful limitations. Any limitations would be inherent to the +specific subclass implementation. Follow-up Questions: -------------------- -1. How are subclasses of ``ISymbolProvider`` intended to implement the - ``filter_symbols`` method? What criteria should they use to filter - symbols? -2. Are there any performance implications associated with the checks - performed in the ``get_sorted_supported_symbols`` method? -3. What happens if the sorted_symbols list is not correctly sorted? How - does this impact the performance and reliability of the symbol - provider? -4. Can there be multiple instances of a child class of - ``ISymbolProvider`` working with different sets of symbols? If so, - how is the synchronization managed across different instances? +1. What are the criteria for a Symbol to be considered ‘supported’? +2. What is the significance of the ``is_synchronized`` flag? +3. Are there any threading concerns or race conditions if multiple + threads may be using an instance of an ``ISymbolProvider`` subclass? +4. What sort of objects are the ``Symbol`` class used here expected to + represent? +5. How are symbols expected to be sorted in + ``_get_sorted_supported_symbols``? diff --git a/docs/symbol/index.rst b/docs/symbol/index.rst index 5541cac74..af5540a0f 100644 --- a/docs/symbol/index.rst +++ b/docs/symbol/index.rst @@ -26,6 +26,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/symbol/symbol.rst b/docs/symbol/symbol.rst index 9fe5b46e3..0801a8883 100644 --- a/docs/symbol/symbol.rst +++ b/docs/symbol/symbol.rst @@ -1,90 +1,82 @@ Symbol ====== -The ``Symbol`` class in Automata is used to represent a reference to a -Python object in a standardized format. This could be a class, method, -or a local variable. The ``Symbol`` is specified by a Uniform Resource -Identifier (URI) with a defined syntax. +``Symbol`` is a class that represents and encapsulates the logic for +symbols in Python. A Symbol can specify a Python class, a method, or +even a local variable. Each symbol is further represented in a +standardized way through a unique URI (Uniform Resource Identifier) +string. + +The ``Symbol`` class includes various attributes to represent a symbol: +uri, scheme, package, and descriptors. With these attributes, ``Symbol`` +captures the critical information about Python symbols and provides an +efficient way to work with or manipulate symbols in Python programs. +Further, it includes various properties and class methods for efficient +interaction and usage. Overview -------- -The ``Symbol`` class primarily works with the concept of a URI. A URI -for a Symbol is composed of a ``scheme``, ``package``, and -``descriptor``. The ``scheme`` consists of any UTF-8 characters, and -spaces within this portion of the URI need to be escaped using a double -space. The ``package`` specifies the ``manager``, ``package-name``, and -``version``. The ``descriptors`` define the ``namespace``, ``type``, -``term``, ``method``, ``type-parameter``, ``parameter``, ``meta``, or -``macro``. - -Useful methods offered by the ``Symbol`` class include: - -- ``__eq__()``: Compares the current symbol to another to determine - equivalence. -- ``__hash__()``: Calculates the hash value of a symbol. -- ``__repr__()``: Returns the string representation of the Symbol - instance. -- ``dotpath()``: Returns the dotpath of the symbol. -- ``from_string()``: Creates a ``Symbol`` instance from a string - representation. -- ``is_local()``, ``is_meta()``, ``is_parameter()``, ``is_protobuf()``: - These methods help determine the type of symbol based on the - descriptor attributes. -- ``module_name()``: Returns the module name of the symbol. -- ``parent()``: Returns the parent symbol of the current symbol. -- ``symbol_kind_by_suffix()``, ``symbol_raw_kind_by_suffix()``: The two - methods convert the suffix of the URI into PyKind and DescriptorProto - respectively, which help determine the type of symbol. - -Examples --------- +The ``Symbol`` class is designed to provide an easy and efficient way to +work with symbols in Python. It offers properties for extracting +information about symbols, such as the parent of a symbol, the kind of +Python element the symbol represents (py_kind), and whether the symbol +represents a local variable, meta information, parameter, etc. + +The ``Symbol`` URI structure conforms to a specific syntax, providing +structure and standardization for symbol representation. Utility +functions, like ``from_string``, are available to create ``Symbol`` +instances from a string representation of a Symbol. + +Related Symbols +--------------- + +Currently, there are no related symbols. -Here is an example of how you can use the ``Symbol`` class: +Usage Example +------------- .. code:: python - from automata.experimental.search.symbol_parser import parse_symbol + from automata.symbol.symbol_base import Symbol + from automata.symbol.symbol_parser import parse_symbol symbol_class = parse_symbol( - "scip-python python automata 75482692a6fe30c72db516201a6f47d9fb4af065 `automata.agent.agent_enums`/ActionIndicator#" + "scip-python python automata 75482692a6fe30c72db516201a6f47d9fb4af065 `automata.agent.agent_enums`/ActionIndicator#" ) + # Returns an instance of Symbol symbol_method = parse_symbol( - "scip-python python automata 75482692a6fe30c72db516201a6f47d9fb4af065 `automata.tools.base`/ToolNotFoundError#__init__()." + "scip-python python automata 75482692a6fe30c72db516201a6f47d9fb4af065 `automata.tools.base`/ToolNotFoundError#__init__()." ) + # Returns an instance of Symbol -Related Symbols ---------------- +Symbol objects can be compared for equality, depending on their URI. -The following are the related symbols: +.. code:: python -- ``automata.tests.unit.test_database_vector.test_lookup_symbol`` -- ``automata.tests.unit.test_symbol_parser.test_parse_symbol`` -- ``automata.symbol_embedding.base.SymbolEmbedding.symbol`` -- ``automata.tests.unit.test_database_vector.test_delete_symbol`` -- ``automata.symbol_embedding.base.SymbolCodeEmbedding`` -- ``automata.tests.unit.test_symbol_parser.test_is_local_symbol`` + symbol_class == symbol_method + # Returns False -Limitations ------------ +In addition, Symbol instances can be hashed, primarily based on their +URI. -Given that the ``Symbol`` class relies on formatting a URI with a -specific syntax, it is important to follow the symbol syntax strictly, -especially when dealing with special characters. +.. code:: python -Dependencies ------------- + hash(symbol_class) + # Returns -729559640 (This is just an example. The actual output may vary) + +Limitations +----------- -- ``automata.symbol.parser.parse_symbol``: This parses a ``Symbol`` - given a URI. +The ``Symbol`` class relies heavily on the input structure to be in the +correct format as described in the URI syntax. Thus, it can raise +exceptions or behave unexpectedly if given incorrectly-formatted input. Follow-up Questions: -------------------- -- What happens if the supplied URI for the ``Symbol`` doesn’t match the - specified format? -- What if the ``scheme`` or ``package`` supplied in the URI doesn’t - exist? -- Is there any way to validate if the ``Symbol`` created maps to a - valid Python object? +- Is there a dynamic way to create or manage Symbols which are not + conforming to the described URI format? +- Can there be improvements done to handle more complex URIs or Symbols + parsing? diff --git a/docs/symbol/symbol_base/index.rst b/docs/symbol/symbol_base/index.rst index 5d56ae149..e5620a170 100644 --- a/docs/symbol/symbol_base/index.rst +++ b/docs/symbol/symbol_base/index.rst @@ -8,6 +8,8 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/symbol/symbol_base/py_kind.rst b/docs/symbol/symbol_base/py_kind.rst index 633494db4..25b100dd4 100644 --- a/docs/symbol/symbol_base/py_kind.rst +++ b/docs/symbol/symbol_base/py_kind.rst @@ -1,20 +1,3 @@ -- No, ``SymbolDescriptor.PyKind`` Enum does not include all possible - classifications of Python entities. For example, it does not include - categories such as Python decorators, generators, coroutines, etc. - However, they include the most commonly used Python entities. For - specific use cases, you might need to extend the Enum. - -- The ‘meta’ category in ``SymbolDescriptor.PyKind`` is used for Python - entities that are related to metaprogramming. Metaprogramming refers - to the potential ability of a program to have knowledge of or - manipulate itself. In Python, it is achieved via metaclasses, - decorators, etc. - -- The ‘type_parameter’ category in ``SymbolDescriptor.PyKind`` is - related to Python’s type hinting system introduced in Python 3.5. - Type hinting is a formal solution to statically indicate the type of - a value within your Python code. This is used with the typing module, - which provides objects that represent complex types like Union, - Optional, etc. The ‘type_parameter’ represents type variables in such - complex types. For example, in ``List[T]``, ``T`` is a type - parameter. +class PyKind(Enum): Local = ‘local’ Module = ‘module’ Class = ‘class’ +Method = ‘method’ Value = ‘value’ Meta = ‘meta’ Macro = ‘macro’ +Parameter = ‘parameter’ TypeParameter = ‘type_parameter’ diff --git a/docs/symbol/symbol_descriptor.rst b/docs/symbol/symbol_descriptor.rst index 1973a0fa0..8b13a149c 100644 --- a/docs/symbol/symbol_descriptor.rst +++ b/docs/symbol/symbol_descriptor.rst @@ -1,19 +1,68 @@ -- The ``unparse()`` method of the ``SymbolDescriptor`` class encodes - unusual or special characters in the symbol string as Unicode - escapes. This is done to ensure proper conversion of symbols into URI - strings that can be transmitted and received without errors. However, - it’s important to handle these characters carefully to avoid possible - misinterpretation. -- Future additions to the ``DescriptorProto``, which contains the - suffixes for the ``SymbolDescriptor``, would largely depend on the - potential use cases that may arise. The current set of suffixes (such - as ``METHOD``, ``TYPE``, etc.) covers a broad range of symbols types. - However, there is a possibility that more specific or different types - of symbols might be required in future iterations or versions of the - software. -- Using unrecognized descriptor suffixes can lead to parsing errors - when trying to unparse the symbol, and it may raise a ``ValueError``. - This could cause scripts or programs that use the - ``SymbolDescriptor`` to stop abruptly or behave unexpectedly. It is - important to use only recognized and properly defined symbol - descriptors to prevent such issues. +SymbolDescriptor +================ + +``SymbolDescriptor`` is a class that represents the description +component of a Symbol URI. It includes functionalities to parse and +unparse symbol identifiers in Python and convert between different +symbol descriptors. + +Overview +-------- + +``SymbolDescriptor`` is used for naming and addressing symbols. It +includes a list of Python kinds of symbols in the nested ``PyKind`` +Enum. Each instance of ``SymbolDescriptor`` stores a symbol name, a +descriptor suffix, and an optional disambiguator. + +The class provides methods to unparse the descriptor back into an URI +string, get the escaped name of a symbol, and convert a Scip suffix to a +python kind. + +Related Symbols +--------------- + +- ``DescriptorProto`` +- ``Enum`` + +Example +------- + +The following is an example demonstrating how to create and manipulate +an instance of ``SymbolDescriptor``. + +.. code:: python + + from automata.symbol.symbol_base import SymbolDescriptor + from DescriptorProto import DescriptorProto + + descriptor = SymbolDescriptor('sample', DescriptorProto.Type, 'disambiguator') + + # Print the descriptor + print(repr(descriptor)) + + # Unparse the descriptor to an URI string + print(descriptor.unparse()) + +Limitations +----------- + +The ``SymbolDescriptor`` class uses strict rules to parse and unparse +symbols. It may not cater to all Python codes if unconventional naming +schemes are used. + +A caveat to note is that the ``unparse`` method raises a ValueError +exception if an invalid Descriptor suffix is provided. Care must be +taken to ensure the symbol’s Descriptor suffix is one of the defined +DescriptorProto values. + +Follow-up Questions: +-------------------- + +- Are there situations where escaping the name of a symbol can cause + issues? +- Can ``SymbolDescriptor`` handle all Python names, or are there some + limitations to consider? +- Is it possible to extend the ``PyKind`` Enum to cater to additional + symbol types? +- How does the class handle ambiguous symbols, ones that can fit into + multiple ``PyKind`` categories? diff --git a/docs/symbol/symbol_package.rst b/docs/symbol/symbol_package.rst index 6c98414a0..2dae74e2f 100644 --- a/docs/symbol/symbol_package.rst +++ b/docs/symbol/symbol_package.rst @@ -1,82 +1,11 @@ -SymbolPackage -============= +@dataclass class SymbolPackage(): ‘A class to represent the package +component of a Symbol URI.’ manager: str name: str version: str -Overview --------- +:: -``SymbolPackage`` is a class representing the package component of a -Symbol URI in Python. A Symbol URI is a standardized string -representation for a python class, method, or local variable. With -``SymbolPackage``, you can easily manage the packages associated with -your Symbols. + def __repr__(self) -> str: + return f'Package({self.unparse()})' -Import Statement ----------------- - -.. code:: python - - from automata.symbol.base import SymbolPackage - -Related Symbols ---------------- - -- ``automata.symbol.scip_pb2.Descriptor as DescriptorProto`` -- ``automata.symbol.parser.parse_symbol`` -- ``automata.core.utils.is_sorted`` -- ``automata.symbol.base.Symbol`` -- ``automata.tests.unit.test_symbol_parser.test_parse_symbol`` -- ``automata.tests.unit.sample_modules.sample.EmptyClass`` -- ``automata.tests.unit.test_symbol_search_tool.test_retrieve_source_code_by_symbol`` -- ``automata.tests.unit.test_symbol_search.test_retrieve_source_code_by_symbol`` -- ``automata.symbol_embedding.base.SymbolCodeEmbedding`` -- ``automata.tests.unit.test_symbol_search.test_exact_search`` -- ``automata.symbol.base.Symbol.__repr__`` -- ``automata.tests.unit.sample_modules.sample.OuterClass`` -- ``automata.context_providers.symbol_synchronization.SymbolProviderSynchronizationContext`` - -Example -------- - -The following is an example demonstrating how to generate -``SymbolPackage``. - -.. code:: python - - from automata.symbol.parser import parse_symbol - - symbol_class = parse_symbol( - "scip-python python automata 75482692a6fe30c72db516201a6f47d9fb4af065 `automata.agent.agent_enums`/ActionIndicator#" - ) - - print(f"Package: {symbol_class.package}") - -Limitations ------------ - -Although ``SymbolPackage`` representation is string-friendly to work -with, it fails to capture the package structure or the hierarchical -relationship between packages and sub-packages, which could be crucial -in complex systems. - -Methods Documentation ---------------------- - -``unparse(self) -> str:`` -~~~~~~~~~~~~~~~~~~~~~~~~~ - -This method converts the SymbolPackage object back to its original URI -string form. - -``__repr__(self) -> str:`` -~~~~~~~~~~~~~~~~~~~~~~~~~~ - -This method generates a string representation of class blueprint. - -Follow-up Questions: --------------------- - -- Can the ``SymbolPackage`` class representation be updated to capture - hierarchical relationships in packaging structures? -- How does ``SymbolPackage`` handle versioning in its string - representation? Especially in situations where multiple versions of - the same package exist in the system. + def unparse(self) -> str: + 'Converts back into URI string' + return f'{self.manager} {self.name} {self.version}' diff --git a/docs/symbol/symbol_reference.rst b/docs/symbol/symbol_reference.rst index 1ff9d26f3..31d51d808 100644 --- a/docs/symbol/symbol_reference.rst +++ b/docs/symbol/symbol_reference.rst @@ -1,74 +1,71 @@ SymbolReference =============== -``SymbolReference`` is a class in the ``automata.symbol.base`` module -that represents a reference to a symbol in a file. It is particularly -useful in complex code structures where the same symbol can be used in -different parts of the code, and these references need to be identified -or compared. +``SymbolReference`` is a data class that represents a reference to a +particular symbol within a text file. Each instance of +``SymbolReference`` includes information pertaining to the symbol, the +line number and column number where the symbol is located, as well as +miscellaneous roles associated with the symbol. Overview -------- -The ``SymbolReference`` class has two magic methods ``__eq__`` and -``__hash__`` which are used to evaluate equality and generate an -immutable hash, respectively. The class is used to compare instances of -``SymbolReference`` and check the equality of the ``uri``, -``line_number`` and ``column_number`` of the symbol reference. They are -also important for the usage of ``SymbolReference`` instances in sets or -dictionaries, where hash values are required. +The ``SymbolReference`` class provides a succinct way to manage and +trace usages of individual symbols across a set of text files. It stores +a ``Symbol`` object, which encapsulates the information related to the +symbol, and the precise location details (line and column numbers). The +roles dictionary (``Dict[str, Any]``) highlights additional attributes +or roles that the symbol may have. -Methods -------- - -The class ``SymbolReference`` contains the following methods: - -- ``__eq__(self, other) -> bool`` : It checks the equality of two - instances of ``SymbolReference`` by comparing the ``uri``, - ``line_number`` and ``column_number`` of the ``SymbolReference`` - instances. - -- ``__hash__(self) -> int`` : This method creates a hash value for the - instance of ``SymbolReference`` using the ``uri``, ``line_number`` - and ``column_number``. - -It should be noted that the ``__hash__`` method could cause collisions -if the same symbol is referenced in different files at the same -location. +The class defines the ``__hash__`` and ``__eq__`` dunder methods, to +allow for comparison of two ``SymbolReference`` instances and calculate +a unique hash value for each instance. This is useful when inserting +these instances into data structures that rely on hashing, like a Python +set or a dictionary. Related Symbols --------------- -- ``automata.symbol.base.Symbol`` -- ``automata.symbol.graph.SymbolGraph`` -- ``automata.symbol.parser.parse_symbol`` -- ``automata.tests.unit.test_symbol_search.test_symbol_references`` -- ``automata.tests.unit.test_symbol_search_tool.test_symbol_references`` +There are no related symbols provided in the context. -Examples --------- - -The following is an example demonstrating how to create an instance of -``SymbolReference`` and how to use the ``__eq__`` method. +Example +------- .. code:: python - from automata.symbol.base import Symbol, SymbolReference - from automata.symbol.parser import parse_symbol + from automata.symbol.symbol_base import Symbol, SymbolReference + + # Creating a symbol instance + symbol = Symbol(uri="file://path/to/file.py", name="MyClass", kind="class") + + # Creating a symbol reference instance + symbol_reference = SymbolReference( + symbol=symbol, + line_number=30, + column_number=25, + roles={"is_method": False} + ) - symbol_uri = "scip-python python automata 75482692a6fe30c72db516201a6f47d9fb4af065 `automata.agent.agent_enums`/ActionIndicator#" - symbol = parse_symbol(symbol_uri) + # Comparing two symbol references + if symbol_reference == symbol_reference: + print("Both symbol references point to the same location.") - symbol_ref_1 = SymbolReference(symbol=symbol, line_number=10, column_number=20) - symbol_ref_2 = SymbolReference(symbol=symbol, line_number=10, column_number=20) +Limitations +----------- - print(symbol_ref_1 == symbol_ref_2) # Will output: True +It should be noted that the ``SymbolReference`` class does not perform +any validity checks on the line or column numbers. Also, it does not +check if the given symbol actually exists in the provided location. If +the file or the symbol do not exist, or the line or column numbers are +invalid, the ``SymbolReference`` will still be created, but it may not +be accurate or useful. -Follow-Up Questions: +Follow-up Questions: -------------------- -- How are instances of ``SymbolReference`` generated in the system? -- What are the likely scenarios where symbol collisions can occur and - how are these handled? -- Potential limitations or drawbacks of the ``__hash__`` implementation - weren’t specified, can these be determined and documented? +- What kind of validation could be included to strengthen the + ``SymbolReference`` class? +- What implications could exist for using ``SymbolReference`` instances + in hash-based data structures? +- How can the dict-type attribute ``roles`` be utilized in different + applications? diff --git a/docs/symbol_embedding/chroma_symbol_embedding_vector_database.rst b/docs/symbol_embedding/chroma_symbol_embedding_vector_database.rst index 79be0871c..3ee5ce864 100644 --- a/docs/symbol_embedding/chroma_symbol_embedding_vector_database.rst +++ b/docs/symbol_embedding/chroma_symbol_embedding_vector_database.rst @@ -1,73 +1,75 @@ ChromaSymbolEmbeddingVectorDatabase =================================== -``ChromaSymbolEmbeddingVectorDatabase`` is a concrete implementation of -a vector database that saves into a Chroma database. It extends the -functionality of ``ChromaVectorDatabase``, allowing storage, retrieval, -and manipulation of ``SymbolEmbedding`` instances. +``ChromaSymbolEmbeddingVectorDatabase`` is a vector database class +responsible for managing the storage, retrieval, and update of vectors +associated with different symbols in the chroma vector database. Overview -------- -``ChromaSymbolEmbeddingVectorDatabase`` provides a variety of methods to -manage entries in the chroma database including adding single or batches -of entries (``add()`` and ``batch_add()``), retrieving entries by their -keys (``get()``, ``batch_get()``) or all entries in a sorted order -(``get_ordered_entries()``, ``get_ordered_keys()``), and updating single -or multiple entries (``update_entry()``, ``batch_update()``). In -addition, it also offers functionality to generate a hashable key from a -``SymbolEmbedding`` instance with ``entry_to_key()`` method. +``ChromaSymbolEmbeddingVectorDatabase`` inherits from the classes +``ChromaVectorDatabase`` and ``IEmbeddingLookupProvider`` to provide +functionality like adding, fetching, and updating embedding vectors in +sorted order. This class acts as a handler to interact with a symbol +embedding collection in a Chroma database. + +The class provides utility methods to generate a hashable key from a +symbol, raise a KeyError if a duplicate entry exists, and prepare +entries for insertion into the database. It also has methods for adding +single or multiple entries and updating existing ones. Related Symbols --------------- -- ``automata.symbol_embedding.base.SymbolEmbedding`` -- ``automata.core.base.database.vector.ChromaVectorDatabase`` -- ``chromadb.api.types.GetResult`` +Due to the highly specialized nature of this class, it doesn’t have a +wide range of directly related symbols. But the class uses following +symbols: - ``V``: A generic type parameter representing the type of the +vector embedded in the database entry. Example ------- -This is a simplified usage example of -``ChromaSymbolEmbeddingVectorDatabase``: +The examples are mainly theoretical because this class is expected to be +used in a larger system with Chroma database installed. .. code:: python - from automata.symbol_embedding.base import SymbolEmbedding - from automata.symbol_embedding.vector_databases import ChromaSymbolEmbeddingVectorDatabase - - factory = SymbolEmbedding + # Assuming we have a collection name, factory function to create vector and directory to store collection_name = "test_collection" + factory = lambda: None # Placeholder for an actual factory function + directory = "/path/to/directory" + + chroma_db = ChromaSymbolEmbeddingVectorDatabase(collection_name, factory, directory) - # Instantiate ChromaSymbolEmbeddingVectorDatabase - database = ChromaSymbolEmbeddingVectorDatabase(collection_name, factory) + # Add entry to collection + entry = factory() # Get the entry using the factory function + chroma_db.add(entry) - # Add an entry - entry = factory(symbol=Symbol, document="some document", vector=np.array([1, 2, 3])) - database.add(entry) + # Get an entry from the collection + key = "somekey" # Placeholder for an actual key + entry = chroma_db.get(key) - # Retrieve entry - retrieved = database.get(database.entry_to_key(entry)) + # Update entry in collection + chroma_db.update_entry(entry) - # Update entry - entry.vector = np.array([4, 5, 6]) - database.update_entry(entry) +Limitations +----------- - # Delete entry - database.discard(database.entry_to_key(entry)) +The class ``ChromaSymbolEmbeddingVectorDatabase`` is heavily dependent +on the installed Chroma database and a suitable configured collection. -Note ----- +Also, it requires a factory function responsible for creating instances +of the generic type ``V``. This may be a potential limitation factor, as +it requires the client code to supply a factory function conforming to +its requirements. -This class does not check if the chroma database instance used is -connected to a database. It’s the user’s responsibility to manage the -chroma database connection. +It doesn’t provide functionality to delete entries from the collection. -Follow-up Questions: --------------------- +Follow-up Questions +------------------- -- How is this class handling connection errors to the Chroma Database? -- Is there a way to manage the database connection from within this - class? -- Are there any limitations regarding the size of the symbol vectors - that can be stored in the database? +- How to provide custom key generation function to generate unique keys + for entries? +- How can we execute deletion of entries from the database? Is it a + necessary feature to have? If so, why it’s not included in the class? diff --git a/docs/symbol_embedding/i_embedding_lookup_provider.rst b/docs/symbol_embedding/i_embedding_lookup_provider.rst index e96066e59..92d3b6882 100644 --- a/docs/symbol_embedding/i_embedding_lookup_provider.rst +++ b/docs/symbol_embedding/i_embedding_lookup_provider.rst @@ -1,28 +1,8 @@ -``IEmbeddingLookupProvider`` is a hypothetical interface, potentially -created in the context of a natural language processing or machine -learning application, such as in a chatbot or recommendation engine. -This interface may be meant to provide consistent, reusable -functionality for classes that need to convert their embeddings into a -hashable key. Here’s an illustrative usage: +class IEmbeddingLookupProvider(abc.ABC): ‘A concrete base class an +interface for embedding lookup providers.’ -.. code:: python +:: - class MyEmbeddingLookupProvider(IEmbeddingLookupProvider): - def embedding_to_key(self, entry: SymbolEmbedding) -> str: - # Implementation based on requirements - pass - -This class could be used by different components or services that need -to turn embeddings into a standardized key for further processing, for -example for retrieving previously stored embeddings or for comparison -against other embeddings. Different implementations of this interface -would handle the specifics of how the embedding would be converted into -a key based on the necessary requirements. - -It’s important to note that without the concrete context, assumptions -have been made about the intended use and functionality of this -interface. The purpose, functionality, and usage could vary based on the -actual context where this interface is designed. Useful follow-up -information would be the actual code where ``IEmbeddingLookupProvider`` -is defined or used. Also, information about the project or system -architecture could aid in providing a more accurate description. + def embedding_to_key(self, entry: SymbolEmbedding) -> str: + 'Concrete implementation to generate a simple hashable key from a Symbol.' + return entry.symbol.dotpath diff --git a/docs/symbol_embedding/index.rst b/docs/symbol_embedding/index.rst index fead16a19..f1fbad3ae 100644 --- a/docs/symbol_embedding/index.rst +++ b/docs/symbol_embedding/index.rst @@ -25,6 +25,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/symbol_embedding/json_symbol_embedding_vector_database.rst b/docs/symbol_embedding/json_symbol_embedding_vector_database.rst index ead4492e7..25d876667 100644 --- a/docs/symbol_embedding/json_symbol_embedding_vector_database.rst +++ b/docs/symbol_embedding/json_symbol_embedding_vector_database.rst @@ -1,18 +1,69 @@ -- The JSON file should be structured in a conventional format that - includes a list of symbol embeddings and their associated hash keys. - Any extra or irrelevant data may cause errors. -- As ``JSONSymbolEmbeddingVectorDatabase`` is a simple JSON database - implementation, it doesn’t have stringent size limit requirements but - efficiency can be a concern with very large datasets. For handling - larger datasets, other forms of databases such as SQL or NoSQL might - be more suitable. -- The current implementation of ``JSONSymbolEmbeddingVectorDatabase`` - does not include specific provisions for handling concurrency or - sharing across different processes. Any concurrent access needs to be - handled at the application level. -- The ``JSONSymbolEmbeddingVectorDatabase`` does not explicitly support - transactional operations like commit and rollback. Any database - failures might need to be manually handled or recovered by - reinitializing from the source data. For more advanced operations - like transactions, consider using more sophisticated databases like - SQLite, PostgreSQL, etc. that inherently support transaction safety. +JSONSymbolEmbeddingVectorDatabase +================================= + +``JSONSymbolEmbeddingVectorDatabase`` is a concrete class that provides +a way to handle a vector database that is saved into a JSON file. It +incorporates methods to return ordered keys, retrieve all ordered +embeddings from the dataset, and generate a hashable key from a Symbol. + +Overview +-------- + +This class behaves as a container for a vector database saved in a JSON +file, which can significantly simplify working with symbolic embedding +vectors. The idea is to allow users to easily save, retrieve, and +utilize embedding vectors for their needs. The class inherits mainly +from ``JSONVectorDatabase``, and is designed to work effectively with +``SymbolEmbedding``. + +Related Symbols +--------------- + +Note: There were no related symbols provided in the context for +``JSONSymbolEmbeddingVectorDatabase``. + +Example +------- + +Below is an example of how to instantiate and use a +``JSONSymbolEmbeddingVectorDatabase``: + +.. code:: python + + from automata.symbol_embedding.vector_databases import JSONSymbolEmbeddingVectorDatabase + from automata.symbol_embedding import SymbolEmbedding + + # Instantiation with a file path + embed_database = JSONSymbolEmbeddingVectorDatabase("./embeddings.json") + + # Generate a list of ordered keys + ordered_keys = embed_database.get_ordered_keys() + + # Fetch all ordered embeddings + ordered_embeddings = embed_database.get_all_ordered_embeddings() + + # Generate a key from the first entry in the database + key = embed_database.entry_to_key(ordered_embeddings[0]) + + print("Ordered Keys:", ordered_keys) + print("Key of first entry:", key) + +Limitations +----------- + +The ``JSONSymbolEmbeddingVectorDatabase`` class only supports JSON files +for saving and retrieving symbolic embeddings. So far, there are no +methods designed to use other types of data formats such as CSV, Excel, +etc. Another limitation is that the ordering methods rely on a method +for converting an entry to a key, and any changes in this method will +affect the output of these order-based methods. + +Follow-up Questions: +-------------------- + +- How can we extend the ``JSONSymbolEmbeddingVectorDatabase`` to work + with other kinds of file formats such as CSV or Excel? +- Would it be useful to allow the users to specify their own method for + creating a key from an entry? +- Are there any other sorting methods or criteria we might want to + support? diff --git a/docs/symbol_embedding/symbol_code_embedding.rst b/docs/symbol_embedding/symbol_code_embedding.rst index 34b7b6bfb..d0105c73f 100644 --- a/docs/symbol_embedding/symbol_code_embedding.rst +++ b/docs/symbol_embedding/symbol_code_embedding.rst @@ -1,18 +1,22 @@ -1. What metadata is expected for ``SymbolCodeEmbedding``? - - - The metadata for ``SymbolCodeEmbedding`` is expected to be the - information about the data being processed, such as the name of - the document or code that the symbol was extracted from, and other - related information that might be useful in understanding the - context of the symbol or the embedding in a larger project or - system. - -2. Is the current method implementation as expected? - - - As the details about the method implementation are not completely - mentioned it’s difficult to comment on this. However, in general, - the method implementation would largely depend on how - ``SymbolCodeEmbedding`` is being used in the project. If the - method is not returning the expected results or expected behavior - then it might need to be re-looked at or debugged for - improvements. +class SymbolCodeEmbedding(SymbolEmbedding): ‘A concrete class for symbol +code embeddings’ + +:: + + def __init__(self, key: Symbol, document: str, vector: np.ndarray): + super().__init__(key, document, vector) + + def __str__(self) -> str: + return f'''SymbolCodeEmbedding( + +symbol={self.symbol}, + +embedding_source={self.document} + +vector_length={len(self.vector)} )’’’ + +:: + + @property + def metadata(self) -> Dict[(str, str)]: + return {} diff --git a/docs/symbol_embedding/symbol_code_embedding_builder.rst b/docs/symbol_embedding/symbol_code_embedding_builder.rst index fa4c13264..a94cc00ce 100644 --- a/docs/symbol_embedding/symbol_code_embedding_builder.rst +++ b/docs/symbol_embedding/symbol_code_embedding_builder.rst @@ -1,93 +1,73 @@ SymbolCodeEmbeddingBuilder ========================== -``SymbolCodeEmbeddingBuilder`` is a builder class that constructs -``Symbol`` source code embeddings. An embedding is essentially a -mathematical representation of the symbol’s source code and is used to -measure the similarity between different symbols. The -``SymbolCodeEmbeddingBuilder`` specifically creates the -``SymbolCodeEmbedding`` from the source code and the ``Symbol``, both of -which are provided as input arguments. - -``SymbolCodeEmbeddingBuilder`` plays a critical role in understanding -and processing python codes in a way that allows more sophisticated -operations, like similarity measurement and recommending pieces of codes -based on existing ones. This is achieved by transforming the code from -its primitive form to numerical representations (vectors) that can be -differentiated and compared. +``SymbolCodeEmbeddingBuilder`` is a class in the +automata.symbol_embedding.symbol_embedding_builders module. It’s +primarily used for generating source code embeddings for a given symbol. Overview -------- -The ``SymbolCodeEmbeddingBuilder`` uses an ``EmbeddingVectorProvider`` -to build an embedding vector from the source code. The embedding vector -captures the syntactical and perhaps some semantic essence of the code, -and in effect, creates a numerical representation for it. The -``SymbolCodeEmbeddingBuilder`` then leverages the source code, the -symbol, and the embedding vector to build a ``SymbolCodeEmbedding``. +``SymbolCodeEmbeddingBuilder`` contains two methods: ``build`` and +``batch_build``. The ``build`` method generates the embedding for a +symbol’s source code, and the ``batch_build`` method generates the +embeddings for a list of symbols’ source code. + +The class inherits from the ``EmbeddingBuilder`` and is instrumental in +building the ``SymbolCodeEmbedding``, which consists of the symbol, its +source code, and the corresponding embedding vector. Related Symbols --------------- -- ``automata.embedding.base.EmbeddingBuilder``: An abstract class to - build embeddings, from which ``SymbolCodeEmbeddingBuilder`` inherits. -- ``automata.embedding.base.EmbeddingVectorProvider``: An abstract - class that provides a standard API for creating embedding vectors. -- ``automata.symbol_embedding.base.SymbolCodeEmbedding``: A class for - symbol code embeddings. -- ``automata.symbol.base.Symbol``: A class which contains the - associated logic for a Symbol. +- ``symbol_representation.symbol.Symbol`` +- ``symbol_representation.symbol_code_embedding.SymbolCodeEmbedding`` +- ``embedding_provider.EmbeddingProvider`` +- ``symbol_embedding.EmdeddingBuilder`` Example ------- -This is an example demonstrating how to create an instance of -``SymbolCodeEmbedding`` using ``SymbolCodeEmbeddingBuilder``. +The following example demonstrates building a ``SymbolCodeEmbedding``: .. code:: python - # Required imports - from automata.symbol_embedding.builders import SymbolCodeEmbeddingBuilder - from automata.symbol.base import Symbol - from automata.embedding.base import EmbeddingVectorProvider - - # Instantiate embedding vector provider - embedding_provider = EmbeddingVectorProvider() # Replace with specific instance of embedding vector provider. + from automata.symbol_embedding.symbol_embedding_builders import SymbolCodeEmbeddingBuilder + from symbol_representation.symbol import Symbol + from symbol_representation.symbol_code_embedding import SymbolCodeEmbedding - # Instantiate SymbolCodeEmbeddingBuilder - embedding_builder = SymbolCodeEmbeddingBuilder(embedding_provider) + symbol_code = "def hello_world(): print('Hello, world!')" + symbol = Symbol('hello_world', symbol_code) + embedding_builder = SymbolCodeEmbeddingBuilder() - # Define the source code and symbol - source_code = "def hello_world():\n print('Hello, world!')" - symbol = Symbol.from_string("scip-python python HelloWorld 1a2b3c HelloWorld#") + # Building SymbolCodeEmbedding for a single symbol + symbol_code_embedding = embedding_builder.build(symbol_code, symbol) + print(symbol_code_embedding) - # Build the SymbolCodeEmbedding - code_embedding = embedding_builder.build(source_code, symbol) + # Building SymbolCodeEmbedding for a batch of symbols + symbol_codes = [symbol_code, symbol_code] + symbols = [symbol, symbol] + symbol_code_embeddings = embedding_builder.batch_build(symbol_codes, symbols) + print(symbol_code_embeddings) Limitations ----------- -One limitation with the ``SymbolCodeEmbeddingBuilder`` is that the -quality of the ``SymbolCodeEmbedding`` that it builds is highly -dependent on the ``EmbeddingVectorProvider`` used. Different providers -may create different quality embeddings. +``SymbolCodeEmbeddingBuilder`` does not handle symbols that are not +identifiable in the source code, like variables or symbols from imported +modules. Ensuring that the source code passed to the ``build`` or +``batch_build`` methods contains the defined symbol is crucial for +proper functionality. -Another limitation is that word, line, symbol, variable or class usages -that span across different files or projects may not be embedded or -tracked correctly. +In addition, the actual implementation of the ``EmbeddingBuilder`` and +``EmbeddingProvider`` is not shown in the context, so assumptions have +been made in the example provided. Depending on the specific +implementations of those classes, additional set-up may be required. Follow-up Questions: -------------------- -- What makes a good ``EmbeddingVectorProvider``? -- What are the trade-offs of relying on ``SymbolCodeEmbedding`` vs - simpler forms of text representations such as Bag of Words or TF-IDF? -- How does the builder handle different scopes in python source code - (i.e. local, global, nonlocal, class scopes)? - -Note: ------ - -This example assumes there’s an implementation of -EmbeddingVectorProvider available. In actuality, you might need to -implement a specific Embedding Provider or use a third-party library. +- Are there specific requirements or best practice guidelines for + source code passed to this builder class? +- How are symbols that are not defined in the provided source code + handled? diff --git a/docs/symbol_embedding/symbol_doc_embedding.rst b/docs/symbol_embedding/symbol_doc_embedding.rst index 2cddcdceb..b790dc85b 100644 --- a/docs/symbol_embedding/symbol_doc_embedding.rst +++ b/docs/symbol_embedding/symbol_doc_embedding.rst @@ -1,32 +1,34 @@ SymbolDocEmbedding ================== -``SymbolDocEmbedding`` is a class in Automata for embedding documents -related to symbols. Each instance of ``SymbolDocEmbedding`` represents a -specific symbol document embedding, with a given symbol, document, -vector, and optional source code, summary, and context. +``SymbolDocEmbedding`` is a concrete class designed for symbol document +embeddings. This class builds upon the ``SymbolEmbedding`` base class to +provide functionality specifically geared towards handling document +embeddings. Paramount in its usage is being able to link associated +source code, summary, and context to the embedded object. Overview -------- -``SymbolDocEmbedding`` helps with connecting metadata about symbols, for -example, linking documentation or source code to the symbol. This -process aids in maintaining semantic associations between pieces of -code, enhancing document retrieval and category analysis functions in -the Automata system. +``SymbolDocEmbedding`` takes four main parameters during initialization +- ``key``, ``document``, ``vector``, ``source_code``, ``summary``, +``context``, with ``source_code``, ``summary``, and ``context`` being +optional parameters. The ``key`` is the Symbol for document embedding, +and the ``document`` is a string representation of the text data to be +embedded. ``vector`` is the NumPy ndarray object shared between source +text data and embedding space. ``source_code``, ``summary``, and +``context`` provide additional context to the symbol document. + +The ``SymbolDocEmbedding`` class primarily provides a str method to +print a string representation that includes the key symbol, the source +document, length of the vector, source code if available, summary and +context. It also gives a metadata property that returns a dictionary of +the symbol object’s source code, summary, and context. Related Symbols --------------- -- ``automata.tests.unit.sample_modules.sample.OuterClass.InnerClass`` -- ``automata.tests.unit.sample_modules.sample.OuterClass`` -- ``automata.tests.unit.sample_modules.sample.OuterClass.InnerClass.inner_method`` -- ``automata.symbol_embedding.builders.SymbolDocEmbeddingBuilder`` -- ``automata.tests.unit.test_py_reader.test_get_docstring_nested_class_method`` -- ``automata.memory_store.symbol_doc_embedding.SymbolDocEmbeddingHandler.get_embedding`` -- ``automata.tests.unit.test_py_reader.test_get_docstring_nested_class`` -- ``automata.memory_store.symbol_doc_embedding.SymbolDocEmbeddingHandler`` -- ``automata.tests.unit.test_py_reader.test_get_docstring_no_docstring_class`` +- ``automata.symbol_embedding.symbol_embedding_base.SymbolEmbedding`` Example ------- @@ -36,44 +38,33 @@ The following is an example demonstrating how to create an instance of .. code:: python - from automata.symbol_embedding.base import SymbolDocEmbedding - from automata.symbol.base import Symbol + from automata.symbol_embedding.symbol_embedding_base import SymbolDocEmbedding import numpy as np - symbol = Symbol.from_string('scip-python python automata') - document = 'Sample document' - vector = np.array([1,2,3]) - source_code = 'def sample(): pass' - summary = 'Sample function' + key = 'example_symbol' + document = 'This is an example document for embedding.' + vector = np.array([0.1, 0.2, 0.3, 0.4, 0.5]) + source_code = 'print("Hello World!")' + summary = 'An example source code printing Hello World.' + context = 'Used for illustrating how to use SymbolDocEmbedding.' - embedding = SymbolDocEmbedding(symbol, document, vector, source_code, summary) + embedding = SymbolDocEmbedding(key, document, vector, source_code, summary, context) + print(str(embedding)) Limitations ----------- -``SymbolDocEmbedding`` class requires connection to a running instance -of the Automata system as it connects to its database to retrieve and -process embedding vector and metadata. It may not offer versatility to -work with other database or storage methods. - -Moreover, it is reliant on the numpy library for vector storage, and may -not adapt to alternative vector representations out of the box. - -Dependencies -~~~~~~~~~~~~ - -This class relies on the -``automata.symbol_embedding.base.SymbolEmbedding`` and -``automata.symbol.base.Symbol`` classes. +``SymbolDocEmbedding`` requires that the input ``document`` and input +``vector`` have compatible dimensions. If these values are not aligned, +the embedding process may fail. ``source_code``, ``summary``, and +``context`` aim to enhance the utility of the embedding by introducing +more context, their absence does not impact the creation of an embedding +but reduces the amount of information in the embedding. Follow-up Questions: -------------------- -- What functionality does ``SymbolDocEmbedding`` offer for error - checking or handling missing metadata elements? -- How would the ``SymbolDocEmbedding`` handle embeddings for symbols - sourced from external Python libraries outside Automata’s codebase? -- What considerations should be made if we want to use a different - library other than numpy for vector representation and manipulation? -- How would the ``SymbolDocEmbedding`` work in an environment without a - database or when disconnected from the Automata system? +- How does the class handle embeddings when the size of the input + document and vector are not compatible? +- What are the default behaviors of the class when optional parameters + are not provided? diff --git a/docs/symbol_embedding/symbol_embedding.rst b/docs/symbol_embedding/symbol_embedding.rst index c6f9dd8d9..2ee39f32c 100644 --- a/docs/symbol_embedding/symbol_embedding.rst +++ b/docs/symbol_embedding/symbol_embedding.rst @@ -1,88 +1,73 @@ SymbolEmbedding =============== -``SymbolEmbedding`` is an abstract base class designed for the handling -of symbol code embeddings within the Automata framework. In machine -learning and natural language processing, embeddings represent data such -as words, sentences, or symbols as vectors in high-dimensional space. -These vector representations capture the inherent relationships and -features of the original data in a format that can be efficiently -processed by machine learning algorithms. The ``SymbolEmbedding`` class -abstracts the embedding process for code symbols, representing them as -vectors that can be further used for tasks such as code analysis, -search, or semantic reasoning. +``SymbolEmbedding`` is an abstract class for creating and managing +symbol code embeddings. This class is used to embed symbols into a high +dimensional space and provides helper functions to manage these embedded +representations in efficient ways. It extends the ``Embedding`` class +and specifies certain features required for handling symbol +representations. + +Key attributes of this class include ``key``, ``document``, and +``vector``. The ``key`` attribute represents the unique identifier for +the symbol. The ``document`` attribute refers to the document where the +symbol was found. The ``vector`` attribute represents the vectorized +form of the symbol. Overview -------- -The ``SymbolEmbedding`` class defines a standard interface for symbol -embeddings by providing an initiation method and an abstract string -representation method. It provides property and setter methods for the -symbol key, allowing for flexible usage and the potential for future -extensions. This class needs to be inherited and the abstract methods -need to be implemented to make a concrete class for specific types of -symbol embeddings. +``SymbolEmbedding`` allows the creation of an embedding of a symbol, +storing useful information like where the symbol was found and its +vector representation. It also contains properties for easy access to +core attributes such as ``symbol`` and ``metadata``. + +In addition, ``SymbolEmbedding`` can be tailored and created directly +from given arguments using the ``from_args`` class method. Related Symbols --------------- -- ``SymbolEmbedding`` is the base class for ``SymbolCodeEmbedding`` and - ``SymbolDocEmbedding``, which are concrete implementations of symbol - embeddings for code symbols and document symbols respectively. - -- ``SymbolCodeEmbeddingHandler`` is a class that handles the embedding - of code symbols, which uses ``SymbolCodeEmbedding``. - -- ``SymbolDocEmbeddingHandler`` is a class to handle the embedding of - document symbols, which uses ``SymbolDocEmbedding``. +- ``automata.symbol_embedding.symbol_embedding_base.Embedding`` +- ``numpy.ndarray`` +- ``typing.Dict`` +- ``abc.abstractmethod`` -Usage Example -------------- +Example +------- -Here’s an example of how a subclass ``SymbolCodeEmbedding`` inherits -from ``SymbolEmbedding``. Note that as ``SymbolEmbedding`` is an -abstract class, it can’t be instantiated directly. +The following example demonstrates how to create an instance of +``SymbolEmbedding`` using valid argument values. .. code:: python - from automata.symbol_embedding.base import SymbolEmbedding, Symbol + from automata.symbol_embedding.symbol_embedding_base import SymbolEmbedding import numpy as np - class SymbolCodeEmbedding(SymbolEmbedding): - def __init__(self, symbol: Symbol, source_code: str, vector: np.ndarray): - super().__init__(symbol, source_code, vector) - - def __str__(self) -> str: - return f"SymbolCodeEmbedding for Symbol: {self.symbol}, with vector: {self.vector}" - -Create an instance of ``SymbolCodeEmbedding``: - -.. code:: python - - from automata.symbol.base import Symbol - symbol = Symbol.from_string("Sample symbol string") - vector = np.array([1, 0, 0, 0]) - embedding_instance = SymbolCodeEmbedding(symbol, "source code", vector) - -Print Embedding: + symbol_key = 'exampleSymbol' + document = 'exampleDocument.txt' + vector = np.array([0.1, 0.2, 0.3, 0.4, 0.5]) -.. code:: python - - print(embedding_instance) + symbol_embedding = SymbolEmbedding(symbol_key, document, vector) Limitations ----------- -The class in itself does not perform any computations for symbol -embedding, but it sets an interface for what methods an embedding class -should implement. Therefore, the actual effectiveness of the embedding -is dependent on the concrete implementation of methods in the subclasses -like ``SymbolCodeEmbedding`` and ``SymbolDocEmbedding``. +One of the main limitations of ``SymbolEmbedding`` is that it relies +heavily on the definition of the ``metadata`` property. Since +``metadata`` is an abstract method, any sub-class of ``SymbolEmbedding`` +must provide its own implementation of this method. + +Another limitation is that the structure of a symbol’s vector +representation is not enforced. This relies on the user to ensure they +are creating consistent and meaningful vector representations. Follow-up Questions: -------------------- -- What specific implementations are possible or planned for this - abstract class in the automata project itself? -- Are there any planned methods or enhancements for these embeddings, - such as embedding update or real-time learning of embeddings? +- What is the ideal dimensionality or structure of a symbols vector + representation? +- How is the metadata for a specific symbol defined and used in the + representation? +- If a large number of symbols are embedded, how would memory and + computation constraints be managed? diff --git a/docs/symbol_embedding/symbol_embedding_handler.rst b/docs/symbol_embedding/symbol_embedding_handler.rst index c34fd161b..09281edf9 100644 --- a/docs/symbol_embedding/symbol_embedding_handler.rst +++ b/docs/symbol_embedding/symbol_embedding_handler.rst @@ -1,27 +1,81 @@ -1. The ``batch_size`` parameter impacts the efficiency of embedding - operations. Larger batch sizes mean that more embeddings are - processed at once, which can provide a significant performance boost - due to the reduction in individual operations. However, larger batch - sizes also consume more memory, which could lead to issues if the - available memory is limited. - -2. As far as threading issues are concerned, it will depend on the - specific implementation. ``SymbolEmbeddingHandler`` itself does not - take care of thread safety and concurrency problems. Therefore, if - multiple threads make changes to the same embeddings simultaneously, - it’s possible that there could be race conditions and data - inconsistency. Therefore, any multi-threaded use of this handler - should come with appropriate synchronization mechanisms to protect - against concurrency issues. - -3. The handler should be integrated into larger systems through classes - that handle symbols, such as ``SymbolDocEmbeddingHandler``, - ``SymbolCodeEmbeddingHandler``, etc. These classes will call the - ``process_embedding`` method when new symbols are added, updated, or - removed and the associated embeddings need to be changed. When any - batch operation is finished, they should call ``flush`` to make sure - all changes have been saved to the underlying database. Beyond this, - the integration will depend on the specifics of the larger system. - For instance, the system may have scheduler or a main loop where - embedding operations are scheduled or triggered, or it may have - callbacks or observers that notify the handler of changes to symbols. +SymbolEmbeddingHandler +====================== + +``SymbolEmbeddingHandler`` is an abstract base class designed to +manipulate and handle embeddings for symbols. It’s equipped with the +ability to access these embeddings from a vector database, manage batch +operations on them and provides an interface for implementing further +detailed processing on the embeddings. + +Overview +-------- + +When creating an instance of ``SymbolEmbeddingHandler``, you need to +provide an embedding database, an embedding builder and a batch size. +The batch size must be less than 2048. After initialization, the handler +class retrieves all the embeddings and stores them. It also prepares +empty lists for embeddings to be added and discarded. + +There are a number of methods available for performing operations on +embeddings: 1. ``process_embedding``: This abstract method, to be +overridden in concrete child classes, performs the desired processing on +a single symbol’s embedding. 2. ``get_embeddings``: This method +retrieves the embeddings associated with a given list of symbols. 3. +``get_all_ordered_embeddings``: This method retrieves all of the symbol +embeddings from the database. 4. ``filter_symbols``: This method prunes +the supported symbols to only those present in a provided list. 5. +``_get_sorted_supported_symbols``: This is an internal method to +retrieve the currently supported symbols. 6. ``flush``: This method +updates the database with any remaining changes. + +Related Symbols +--------------- + +- ``VectorDatabaseProvider`` +- ``EmbeddingBuilder`` +- ``Symbol`` +- ``SymbolEmbedding`` + +Example +------- + +Please keep in mind that ``SymbolEmbeddingHandler`` is an abstract +class. To use it, it must be subclassed with all the necessary abstract +methods being defined. Here is a simple example on how a subclass might +look: + +.. code:: python + + from automata.symbol_embedding.symbol_embedding_handler import SymbolEmbeddingHandler + from automata.symbol_embedding.database_providers import ExampleVectorDatabase + from some_module import ExampleEmbeddingBuilder, Symbol, SymbolEmbedding + + class ExampleSymbolEmbeddingHandler(SymbolEmbeddingHandler): + def process_embedding(self, symbol): + embedding = self.embedding_db.get(symbol.dotpath) + # Define processing steps.. + pass + + database_provider = ExampleVectorDatabase() + embedding_builder = ExampleEmbeddingBuilder() + handler = ExampleSymbolEmbeddingHandler(database_provider, embedding_builder, batch_size=1024) + + symbolA = Symbol('Some.Symbol.Path.A') + symbolB = Symbol('Another.Symbol.Path.B') + + # Process embeddings + handler.process_embedding(symbolA) + handler.process_embedding(symbolB) + +Limitations +----------- + +1. Batch size is limited to less than 2048. +2. The exact behavior of ``process_embedding`` is not defined within + this class and must be implemented in each subclass. + +Follow-up Questions: +-------------------- + +- How can we increase the batch size above 2048? +- How to ensure thread safety when using flush method? diff --git a/docs/tasks/automata_agent_task_database.rst b/docs/tasks/automata_agent_task_database.rst index 0b62b4c0b..931fbcb80 100644 --- a/docs/tasks/automata_agent_task_database.rst +++ b/docs/tasks/automata_agent_task_database.rst @@ -1,111 +1,78 @@ AutomataAgentTaskDatabase ========================= -``AutomataAgentTaskDatabase`` is a class that provides the ability to -manage tasks in a local storage database. - Overview -------- -The ``AutomataAgentTaskDatabase`` class, inherited from ``SQLDatabase``, -serves as a local storage for all ``AutomataTask`` objects. It features -functionality to check if a particular task exists in the database, -retrieve tasks based on a given query, insert new tasks, and update -existing tasks. +``AutomataAgentTaskDatabase`` is an SQLDatabase subclass that offers a +persistent local store specifically designed for ``AutomataTask`` +objects. It is designed to maintain all task-related information such as +the ``json`` representation of the task, the task’s ``instructions``, +and its ``status``. This database class helps in storing, updating, and +querying Automata tasks and ascertaining their existence in the +database. Related Symbols --------------- -- ``automata.tests.unit.test_task_database.db`` -- ``automata.tests.conftest.task`` -- ``automata.tasks.agent_database.AutomataTaskRegistry.__init__`` -- ``automata.tests.unit.test_automata_agent_builder.test_automata_agent_init`` -- ``automata.tasks.agent_database.AutomataTaskRegistry`` -- ``automata.tests.unit.test_task.test_deterministic_task_id`` -- ``automata.memory_store.agent_conversation_database.AgentConversationDatabase`` -- ``automata.tests.unit.test_task_database.task`` -- ``automata.tasks.tasks.AutomataTask`` -- ``automata.tests.unit.test_conversation_database.db`` - -Method Details --------------- - -\__init\_\_(db_path: str = TASK_DB_PATH) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -- Connects to the SQL database at the provided database path. -- Creates a new table, if not existing, with a defined schema in the - connected database. - -contains(task: AutomataTask) -> bool -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -- Checks if the task exists in the database. -- Returns ``True`` if the task exists, otherwise ``False``. +- ``automata.tasks.task.AutomataTask`` +- ``config.database.SQLDatabase`` -get_tasks_by_query(query: str, params: Tuple = ()) -> List[AutomataTask] -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Usage Example +------------- -- Retrieves list of tasks by applying the specified SQL query. -- Returns a list of ``AutomataTask`` objects. +The following is an example demonstrating how to insert, update and +retrieve an AutomataTask from AutomataAgentTaskDatabase: -insert_task(task: AutomataTask) -> None -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -- Inserts a new task into the database. - -update_task(task: AutomataTask) -> None -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -- Updates an existing task in the database. +.. code:: python -Example -------- + from automata.tasks.task import AutomataTask + from automata.tasks.task_database import AutomataAgentTaskDatabase + from automata.tasks.task_status import TaskStatus -This Python library does not provide direct examples of -``AutomataAgentTaskDatabase`` usage. However, the usage of this class is -indirectly shown through its use in test fixtures and various other -parts of the code. Here is a constructed example demonstrating a basic -usage: + # Creating a task instance + task = AutomataTask(session_id="1", instructions="Test Instructions", status=TaskStatus.INCOMPLETE) -.. code:: python + # Creating a database instance and inserting the task + db_path = "tasks.db" + task_db = AutomataAgentTaskDatabase(db_path) + task_db.insert_task(task) - from automata.tasks.agent_database import AutomataAgentTaskDatabase - from automata.tasks.tasks import AutomataTask + # Make some modifications and update the task + task.status = TaskStatus.DONE + task_db.update_task(task) - # Instantiate AutomataTask - task = AutomataTask( - repo_manager=None, - config_to_load="TEST", - generate_deterministic_id=False, - instructions="This is a test." - ) + # Get tasks by a specific query + task_list = task_db.get_tasks_by_query("WHERE status = ?", (TaskStatus.DONE.value,)) - # Initialize Task Database - db = AutomataAgentTaskDatabase() + # Check if task exists in database + existence = task_db.contains(task) - # Insert task - db.insert_task(task) + print(existence) # Returns: True - # Verify insertion - assert db.contains(task) == True +This code segment consists of creating an instance of ``AutomataTask``, +creating an instance of ``AutomataAgentTaskDatabase``, and then +inserting the task into the database through ``insert_task()``. The task +status is then updated and reflected in the database using +``update_task()``. Tasks with a specific status are fetched using +``get_tasks_by_query()``. Finally, the presence of a task in the +database is confirmed using ``contains()``. Limitations ----------- -The primary limitation of ``AutomataAgentTaskDatabase`` is that it is -reliant on the ``AutomataTask`` object structure. Any changes in the -``AutomataTask`` definition may require changes in this class. - -Also, operations like getting tasks by query may fail in complex -scenarios due to data serialization. A more robust error checking -mechanism might be required. +The primary limitation of the ``AutomataAgentTaskDatabase`` is its +dependency on specific structured data. The ``AutomataTask`` objects +need to have a specific predefined structure, and data outside this +format cannot be correctly processed. Additionally, encoding and +decoding of tasks to and from ``json`` format relies on ``jsonpickle``, +which might produce ambiguity or data loss for overly complex or +unconventional data structures. Follow-up Questions: -------------------- -- It would be beneficial to provide a mechanism to delete tasks from - the database, is there a plan for this feature? -- Handling the exception in ``get_tasks_by_query`` function currently - only logs the error. Would it make sense to propagate the error to - the caller? +- How can the AutomataAgentTaskDatabase be made more generic to + accommodate various data structures apart from ``AutomataTask``? +- How can the error handling mechanism be improved for instances when + decoding of tasks fail? diff --git a/docs/tasks/automata_task.rst b/docs/tasks/automata_task.rst index cdbc968f9..a6618d59d 100644 --- a/docs/tasks/automata_task.rst +++ b/docs/tasks/automata_task.rst @@ -1,68 +1,62 @@ AutomataTask ============ -``AutomataTask`` class is designed to be executed by the TaskExecutor. -The tasks initiated by this class are set up with instructions and a -path to a root python folder. +``AutomataTask`` is a class extended from ``Task`` that is executed by +``TaskExecutor``. This class represents a single auto-executable task +with a specific set of instructions. A task in this context is an +operation or series of operations packaged together for the +``TaskExecutor`` to execute and monitor. It also includes functionality +to initialize logging for the task and accumulate logs for the task’s +execution. Overview -------- -``AutomataTask`` manages tasks to be executed by initializing its -properties and validating its instructions. It also manages the logging -for the task by creating a log file in the task directory and fetches -log content. The class utilizes parent class ``Task`` from -``automata.tasks.base`` to handle the underlying procedures. +An ``AutomataTask`` gets initialized with a set of arguments and keyword +arguments. Two critical keyword arguments that are necessary for task +initialization include ``instructions`` - the instructions for the task, +and ``path_to_root_py`` - the path to the root python folder. Provided +instructions cannot be empty; otherwise, it raises a +``TaskInstructionsError``. The Initialize logs and get logs methods are +used to configure logging for the task execution. Related Symbols --------------- -- ``automata.tasks.environment.AutomataTaskEnvironment`` -- ``automata.tasks.agent_database.AutomataTaskRegistry.get_all_tasks`` -- ``automata.tasks.agent_database.AutomataAgentTaskDatabase.insert_task`` -- ``automata.tasks.executor.IAutomataTaskExecution`` +- ``automata.tasks.Task`` +- ``automata.tasks.TaskExecutor`` +- ``automata.errors.TaskInstructionsError`` Example ------- -Examples on how to create instances of ``AutomataTask``: +The following example demonstrates how to create an instance of +``AutomataTask``. .. code:: python - from automata.tasks.tasks import AutomataTask + from automata.tasks.automata_task import AutomataTask - task = AutomataTask("task1", instructions="instruction1") + instructions = "These are the task instructions" + path_to_root_py = "/path/to/root/python/directory" -.. code:: python - - from automata.tasks.tasks import AutomataTask - from tests.mocks import MockRepositoryClient - from config.config_enums import AgentConfigName - - repo_manager = MockRepositoryClient() - - task = AutomataTask( - repo_manager, - config_to_load=AgentConfigName.TEST.value, - generate_deterministic_id=False, - instructions="This is a test.", - ) + task = AutomataTask(instructions=instructions, path_to_root_py=path_to_root_py) Limitations ----------- -``task_id`` generated here can take either a deterministic form based on -the hash of hashable keyword arguments or a random ``uuid`` depending on -the ``generate_deterministic_id`` flag. There’s no way to provide a -custom method of generating task_id. - -``AutomataTask`` assumes the python folder is in the root folder, which -can limit the extraction of python files if the directory structure -changes. - -Follow-up Questions -------------------- - -- Can a custom task_id generation method be facilitated? -- Can the assumption of the python folder being in the root folder be - eliminated, making it more robust? +The primary limitation of ``AutomataTask`` is that it heavily relies on +the arguments and keyword arguments provided during its instantiation. +If the keyword argument ``instructions`` is not provided or is empty, a +``TaskInstructionsError`` will be raised. Additionally, ``AutomataTask`` +assumes the existence of a specific directory structure for logging. + +Follow-up Questions: +-------------------- + +- Can we include a validation method for arguments and keyword + arguments necessary for the AutomataTask? +- What happens when the path provided in ``path_to_root_py`` does not + exist or is not accessible? +- Is there a way to customise the logging path or log file from outside + the class? Is it possible to decide not to log the task at all? diff --git a/docs/tasks/automata_task_environment.rst b/docs/tasks/automata_task_environment.rst index df715cdad..325e0bc10 100644 --- a/docs/tasks/automata_task_environment.rst +++ b/docs/tasks/automata_task_environment.rst @@ -1,33 +1,70 @@ -- Extending ``AutomataTaskEnvironment`` to support other modes beyond - GitHub might involve creating additional environment types (e.g., - bitbucket, gitlab, local file system, ftp, http, etc.) then adding - appropriate handling for each type in the ``AutomataTaskEnvironment`` - class methods. This would involve implementing the necessary logic - for each operation (setup, validate, commit, etc.) for each new - environment type. - -- ``validate`` could ensure that the task structure, dependencies, and - metadata are correctly formed, ``reset`` could revert the task to its - initial state (probably by re-cloning the repository), and - ``teardown`` could remove any local resources associated with the - task. Whether these operations are needed depends on the workflow and - resources being used. They could be useful to ensure consistency - across tasks and users, manage resources carefully, and provide a - standard API for agents to interact with tasks. - -- The limitation to ``AutomataTask`` might be due to specific - requirements or behaviors expected from tasks in the - ``AutomataTaskEnvironment`` that are only provided by the - ``AutomataTask`` implementation. If a different type of task were - expected to be used with the environment, it would likely need to - satisfy the same interface, or the ``AutomataTaskEnvironment`` would - need to be accommodated to support various types of tasks. - -- The exact handling of failures in - ``AutomataTaskEnvironment.commit_task`` would depend on the - higher-level logic in the application. It could involve retries, - falling back to alternative operations, alerting the user, recording - error information for debugging, etc. The use of - ``AgentTaskException`` would be to signal to the higher-level logic - that a failure occurred, and additionally provide context-specific - information to help handle the failure appropriately. +AutomataTaskEnvironment +======================= + +``AutomataTaskEnvironment`` is a concrete implementation of the +``Abstract TaskEnvironment`` specifically designed for Automata +providers. The class provides avenue for setting-up, validating, +resetting and tearing down environments for tasks during its execution +within an Automata environment. + +Instances of ``AutomataTaskEnvironment`` are associated with a Github +Manager and an ``EnvironmentMode``, and provide features for setting-up +the environment (modeled on a Github repository in the ``GITHUB`` mode), +as well as committing executed tasks to the remote repository. + +The ``AutomataTaskEnvironment`` works best with instances of +``AutomataTask`` and raises exceptions when operations involve +invalid/incorrect task instances. + +Related Symbols +--------------- + +- ``TaskEnvironment`` +- ``GitHubClient`` +- ``EnvironmentMode`` +- ``Task`` +- ``AutomataTask`` +- ``TaskStateError`` +- ``TaskGeneralError`` + +Usage Example +------------- + +In the example below, we create an instance of +``AutomataTaskEnvironment`` using a ``GITHUB`` ``EnvironmentMode`` and +then set up an ``AutomataTask``. + +.. code:: python + + from automata.tasks.task_environment import AutomataTaskEnvironment + from automata.github.client import GitHubClient + from automata.tasks.task_enum import EnvironmentMode + from automata.tasks.automata_task import AutomataTask + + github_manager = GitHubClient(token="", owner="", repository="") + environment = AutomataTaskEnvironment(github_manager, EnvironmentMode.GITHUB) + + # Assume a valid AutomataTask instance created earlier + task = AutomataTask(session_id="654321") + environment.setup(task) + +Limitations +----------- + +``AutomataTaskEnvironment`` requires valid ``AutomataTask`` instances +and a proper ``GitHubClient`` initialised with the correct Github token, +owner name and repository name for it to function correctly. It also +expects the Github repository to be available and accessible. Commits on +Github could fail if the branch already exists or if there are checkout +or commit failures. + +Follow-up Questions: +-------------------- + +- How can the ``AutomataTaskEnvironment`` handle recovery/resumption of + tasks interrupted during execution? +- Could the class be extended to support other ``EnvironmentMode`` + apart from Github? If yes, how would this affect the existing methods + and how could they be made more generic? +- What would be the expected behavior of ``AutomataTaskEnvironment`` + when there are network failures during Github operations? diff --git a/docs/tasks/automata_task_executor.rst b/docs/tasks/automata_task_executor.rst index bd92883e2..ee24986f5 100644 --- a/docs/tasks/automata_task_executor.rst +++ b/docs/tasks/automata_task_executor.rst @@ -1,82 +1,82 @@ AutomataTaskExecutor ==================== -``AutomataTaskExecutor`` is a class that adopts ``ITaskExecution`` -behavior for executing an ``AutomataTask``. It executes the task -following the behavior specified in the execution instance provided -during the initialization of ``AutomataTaskExecutor``. The task -execution can go through multiple stages with different ``TaskStatus`` -such as ``PENDING``, ``RUNNING``, ``SUCCESS``, etc. If a task fails and -does not exceed the maximum retries, it will be retried after a period -of ‘exponential backoff’ time. +``AutomataTaskExecutor`` is the module in charge of managing the +execution of tasks in a given automata. The class takes a task with +behavior specified through an ``ITaskExecution`` interface and performs +the task, handling retries on failure, status tracking, and error +logging. + +It uses the exponential backoff algorithm to space out retries, doubling +the wait time with each failed attempt. This algorithm has proven vital +in networking related tasks, as it gives a system time to recover, +reducing the chances of a system being overwhelmed. Overview -------- -AutomataTaskExecutor is intended to execute an ``AutomataTask`` -following the ``ITaskExecution`` interface standards. The execution of -the task is carried out by the ``execute`` method of the task execution -instance, which raises exceptions if the task is not in the ``PENDING`` -status or if the task fails exceeding the maximum retries. +``AutomataTaskExecutor`` manages task execution within an automata, +ensuring that tasks that fail are retried until a maximum number of +attempts are reached. The class provides an ``execute`` method that +handles the main task execution, thoroughly logging each step of the +process. The method checks the task’s status, executes it, logs status +updates, and retries if necessary based on the task’s own max retries +setting. + +The ``AutomataTaskExecutor`` class communicates task status updates +throughout the execution process, changing the task status from +``PENDING`` to ``RUNNING``, then to ``SUCCESS`` or ``RETRYING`` as +appropriate. If the task fails and retries are exhausted, an exception +is propagated upwards. Related Symbols --------------- -- ``ITaskExecution`` -- ``TaskStatus`` -- ``AutomataTask`` -- ``AutomataTaskEnvironment`` -- ``AutomataTaskRegistry`` - -Example -------- +- ``automata.tasks.task.ITaskExecution`` +- ``automata.tasks.task.AutomataTask`` +- ``automata.tasks.task_status.TaskStatus`` +- ``automata.tasks.task_state_error.TaskStateError`` -The following is an example demonstrating how to use -``AutomataTaskExecutor`` to execute an ``AutomataTask``. +Usage Example +------------- .. code:: python - from automata.tasks.executor import AutomataTaskExecutor - from automata.tasks.tasks import AutomataTask - from automata.tests.unit.test_task_executor import TestExecuteBehavior + from automata.tasks.task_executor import AutomataTaskExecutor + from automata.tasks.ITaskExecution import ITaskExecution + from automata.tasks.task import AutomataTask + from automata.tasks.task_status import TaskStatus - # Create an AutomataTask instance - my_task = AutomataTask( - title="Task 1", - instructions="Perform task 1", - max_retries=5 - ) + # Define your ITaskExecution behavior + class CustomTaskExecution(ITaskExecution): + def execute(self, task: AutomataTask) -> any: + # Place your execution logic here. + return "Custom Task Execution Result" - # Create a TestExecuteBehavior instance - test_execution_behavior = TestExecuteBehavior() + # Create your AutomataTask + task = AutomataTask(id='TASK-1', session_id='SESS-1', status=TaskStatus.PENDING) - # Create an AutomataTaskExecutor instance - task_executor = AutomataTaskExecutor(test_execution_behavior) + # Pass the task object and execution behavior to AutomataTaskExecutor + task_executor = AutomataTaskExecutor(execution=CustomTaskExecution()) + result = task_executor.execute(task) - # Execute the task - task_executor.execute(my_task) - -This will execute the ``AutomataTask`` with the behavior specified in -``TestExecuteBehavior``. + print(result) # Outputs: Custom Task Execution Result Limitations ----------- -``AutomataTaskExecutor`` relies heavily on the provided -``ITaskExecution`` behavior passed during its instantiation. If the -execution behavior doesn’t correctly implement the ``execute`` method, -the task execution might not work as intended. - -``AutomataTaskExecutor`` also requires the task to be in a ``PENDING`` -status to be successfully executed. Therefore, tasks that aren’t in the -``PENDING`` status would require explicit modification before the -execution. +The ``AutomataTaskExecutor`` will only run tasks with the ``PENDING`` +status. When a task status is not ``PENDING``, the execution raises a +``TaskStateError``. It also does not handle side effects of failure +related to external systems used in the task execution code provided +through the ``ITaskExecution`` interface. Follow-up Questions: -------------------- -- How do we handle exceptions raised by other parts of the - ``ITaskExecution`` process? -- Are there specific rules for exponential backoff time? -- Can tasks in other states (like ``SUCCESS``, ``FAILED``, etc.) be - re-executed? +- Can the ``AutomataTaskExecutor`` be improved to handle the status of + tasks in other stages and states? +- How might we handle side effects of task failures within the + ``AutomataTaskExecutor`` system? +- How can we customize the behavior of the exponential backoff + algorithm based on specific task characteristics or conditions? diff --git a/docs/tasks/automata_task_registry.rst b/docs/tasks/automata_task_registry.rst index 2d3c46a64..7c1413ad6 100644 --- a/docs/tasks/automata_task_registry.rst +++ b/docs/tasks/automata_task_registry.rst @@ -1,29 +1,59 @@ -1. The ``AutomataTaskRegistry`` is designed to manage tasks, not create - or delete them. The tasks should be created and potentially deleted - outside of the registry, and then registered and managed within it. - However, adding a deletion functionality could be useful if a task is - no longer needed. Creation could be kept separate to maintain - separation of responsibilities, but you also could combine the - functionality if this better suits your use-case. - -2. In general, having non-unique ``session_id``\ s is not a good - practice as it can cause confusion and potential errors down the - line. However, there might be very specific use cases where this - could be beneficial. For example, there might be tasks which are - identical and need to be performed periodically and you would like to - easily group them together. In this case having the same - ``session_id`` could be a way to achieve this. Nonetheless, it would - be more common to have a separate identifier for the group or type of - task, and keep the ``session_id`` unique. - -3. The error handling could definitely be refined as the system gets - more complex. Rather than just throwing a generic exception, we could - have custom exceptions that specify the exact reason for the failure, - like ``TaskAlreadyRegistered``, ``TaskNotInCreatedState``, - ``NonUniqueSessionID``, etc. Depending on the exact use-case you - might want to fail loudly or try to recover. In general, it might be - a good idea to fail loudly during development to notice and fix - errors early. In a production system, on the other hand, a more - sophisticated error handling system that attempts recovery while - alerting the system administrators about the issue might be - preferable. +AutomataTaskRegistry +==================== + +``AutomataTaskRegistry`` is a class that manages the storage and +retrieval of tasks. Each task is stored with a ``session_id`` that +uniquely identifies it and their status are updated in the registry. + +Overview +-------- + +``AutomataTaskRegistry`` interacts with the +``AutomataAgentTaskDatabase`` to interact with stored tasks. It offers +methods to register tasks, update tasks, fetch tasks by their session id +and get all tasks in the registry. Each task must be in the ``CREATED`` +status to be registered, and an exception is raised if a task is +attempted to be registered again or if it doesn’t exist in the registry. + +Related Symbols +--------------- + +- ``automata.tasks.task_registry.AutomataTask`` +- ``automata.tasks.task_registry.AutomataAgentTaskDatabase`` +- ``automata.tasks.task_registry.TaskStatus`` + +Example +------- + +The following is an example demonstrating how to register a new task to +the ``AutomataTaskRegistry``: + +.. code:: python + + from automata.tasks.task_registry import AutomataTaskRegistry, AutomataTask, AutomataAgentTaskDatabase, TaskStatus + + db = AutomataAgentTaskDatabase() + registry = AutomataTaskRegistry(db) + + task = AutomataTask("session_id_1", "task_name", TaskStatus.CREATED) + registry.register(task) + +Limitations +----------- + +The ``AutomataTaskRegistry`` class assumes each ``AutomataTask`` has a +unique ``session_id``. Fetching a task by ``session_id`` will raise an +exception if multiple tasks with the same ``session_id`` are found. +Also, it can only handle ``AutomataTask`` and not any other type of +tasks. + +Follow-up Questions: +-------------------- + +- Robust error handling and clear error messages if a task doesn’t + exist in the database, or if multiple tasks with the same + ``session_id`` are found. +- Could ``AutomataTaskRegistry`` be extended to support other types of + tasks, not just ``AutomataTask``? +- How are tasks removed or dequeued from the registry once they’ve been + completed? diff --git a/docs/tasks/environment_mode.rst b/docs/tasks/environment_mode.rst index 562001575..188ebfd23 100644 --- a/docs/tasks/environment_mode.rst +++ b/docs/tasks/environment_mode.rst @@ -1,34 +1 @@ -The ``EnvironmentMode`` class in Automata is quite limited as it -currently only supports two modes - ``GITHUB`` and ``LOCAL_COPY``. -However, it is possible to extend this class by adding more modes to -accommodate different environments. - -If an unsupported ``EnvironmentMode`` is supplied to -``AutomataTaskEnvironment``, it would result in an error. Currently, the -``AutomataTaskEnvironment`` does not have a fallback mechanism for -automatically selecting another mode if the one provided is unsupported. -The environment mode needs to be explicitly set during instantiation. - -Error handling and recovery are handled differently depending on the -``EnvironmentMode``. If the mode is ``GITHUB``, any errors would be -reflected on the remote repository, and the task might fail, requiring a -manual recovery. If the mode is ``LOCAL_COPY``, the task would fail -locally, and recovery would typically involve fixing the issues in the -local copy and rerunning the task. - -However, the behavior during errors depends largely on the -implementation of the specific task performed in the given environment. -This class is more or less a configuration class and does not include -any error handling or recovery mechanisms. - -As per the ``EnvironmentMode`` extension, the ability to add more -environment types would largely depend on whether the -``AutomataTaskEnvironment`` class supports those environments. If it -doesn’t, you would need to modify or extend that class to support the -new environment modes. - -To provide more robust error handling, it is recommended to wrap calls -to tasks in try/except blocks and to implement appropriate error -handling in the task definition itself. It might also be beneficial to -implement logging or send error notifications when the task fails so -that recovery actions can be triggered. +class EnvironmentMode(Enum): GITHUB = ‘github’ LOCAL_COPY = ‘local_copy’ diff --git a/docs/tasks/i_automata_task_execution.rst b/docs/tasks/i_automata_task_execution.rst index 54d0c25cc..3c421a212 100644 --- a/docs/tasks/i_automata_task_execution.rst +++ b/docs/tasks/i_automata_task_execution.rst @@ -1,19 +1,76 @@ -1. If the ``_build_agent`` method of ``IAutomataTaskExecution`` is used - with other configurations, it would depend on how the new - configuration gets along with the other parts of the system. The - ``_build_agent`` method uses a specific configuration to create an - ``OpenAIAutomataAgent``, which is designed to work with that specific - configuration. Using a different configuration may fail if it’s - incompatible with the ``OpenAIAutomataAgent`` or the task at hand. - -2. Yes, the retry functionality could potentially be improved to adapt - depending on the nature of the task or the kind of error detected. - For example, some errors might be identified as temporary or - network-related, which could be rectified by simply retrying after a - brief delay. On the other hand, some errors might be due to an issue - with the implementation of the task, and in these cases, retrying - would just cause the same error again. In these scenarios, the system - could be improved to recognize the type of error and make an informed - decision about whether to retry or not. However, implementing such an - adaptive retry mechanism would likely be complex and might require - significant changes to the existing architecture. +IAutomataTaskExecution +====================== + +Overview +-------- + +``IAutomataTaskExecution`` is a class intended for executing general +tasks. It serves as the driving mechanism for task execution, +constructing a OpenAIAutomataAgent from a provided task and managing the +agent’s lifecycle. This includes starting the execution of the agent, +handling task failures and reattempting, and logging the status of the +task. + +The ``execute(task: Task)`` method is used to execute an instance of the +``Task`` class. The method performs a set of operations in an orderly +manner. The task’s status is switched to ``RUNNING``, after which an +``OpenAIAutomataAgent`` is constructed for the task and executed. If the +execution finishes successfully, the task’s result is obtained, and its +status is updated to ``SUCCESS``. If an error occurs during execution, +the task’s error field is updated, its status is updated to ``FAILED``, +and its retry number is incremented. + +The ``OpenAIAutomataAgent`` is created using the +``_build_agent(task: AutomataTask)`` method. This agent is generated +from the ``OpenAIAutomataAgentConfigBuilder`` using the task’s +arguments. + +Related Symbols +--------------- + +- ``automata.tasks.Task`` +- ``automata.agent.OpenAIAutomataAgent`` +- ``automata.agent.config.OpenAIAutomataAgentConfigBuilder`` +- ``automata.tasks.task_enums.TaskStatus`` + +Example +------- + +This example demonstrates how to create an instance of +``IAutomataTaskExecution`` and execute a task. + +.. code:: python + + from automata.tasks.task_executor import IAutomataTaskExecution + from automata.tasks import AutomataTask + + # create a task instance + task = AutomataTask( + session_id="testing_session", + kwargs={ + "instructions": "Translate the text from English to French", + "text": "Hello world" + } + ) + + # Create an instance of IAutomataTaskExecution and execute the task + task_executor = IAutomataTaskExecution() + agent = task_executor.execute(task) + +Limitations +----------- + +One of the known limitations of the ``IAutomataTaskExecution`` class is +that it continues to attempt execution after errors. This could lead to +undesired consequences in case of a persistent problem causing the task +to fail repeatedly. Additionally, this class only accepts task instances +of ``AutomataTask`` type. The execution fails if the task instance does +not belong to this type. + +Follow-up Questions: +-------------------- + +- How is the number of retries managed? Is there a set limit to the + number of times a failed task is reattempted? +- Is there a mechanism for intercepting and mitigating persistent + errors during task execution? diff --git a/docs/tasks/i_task_execution.rst b/docs/tasks/i_task_execution.rst index caa7f18a7..75123bdd4 100644 --- a/docs/tasks/i_task_execution.rst +++ b/docs/tasks/i_task_execution.rst @@ -1,66 +1,7 @@ -ITaskExecution -============== +class ITaskExecution(ABC): ‘Interface for task execution behaviors.’ -``ITaskExecution`` is an interface specifying the behavior for task -execution in the ``automata.tasks`` module. It provides an abstract -method ``execute`` which defines how a task object should be executed. +:: -Overview --------- - -As an abstract base class (ABC), ``ITaskExecution`` does not include any -concrete implementations. Instead, it presents a method signature for -``execute`` to guide and enforce interface in inheriting classes. This -class ensures that all task execution behaviors adhere to a standard -protocol facilitating code reuse, modularity, and comprehensibility. - -Related Symbols ---------------- - -- ``automata.tests.unit.test_task_executor.TestExecuteBehavior`` -- ``automata.tasks.executor.AutomataTaskExecutor`` -- ``automata.tasks.executor.IAutomataTaskExecution`` -- ``automata.tasks.base.Task`` -- ``automata.tasks.base.TaskStatus`` - -Example -------- - -Inheriting classes must implement the ``execute`` method. Below is an -example of the ``TestExecuteBehavior`` class that provides a concrete -implementation of the ``execute`` method. - -.. code:: python - - from automata.tasks.base import ITaskExecution, Task - - class TestExecuteBehavior(ITaskExecution): - """ - Class for executing test tasks. - """ - - def execute(self, task: Task): - # execution logic goes here - task.result = "Test result" - -In this example, the ``execute`` method modifies the ``result`` -attribute of the ``task`` argument. Typical cases would differ depending -on the complexity and requirements of the task to be executed. - -Limitation ----------- - -The primary limitation of ``ITaskExecution`` is that it only stipulates -how to handle task execution but does not provide any concrete -implementation. Thus, it relies on the classes that implement the -interface to provide the actual task execution behavior. This includes -the error handling and reporting strategy during the execution of tasks. - -Follow-up Questions -------------------- - -- What kind of tasks are the ``ITaskExecution`` and its children - classes meant to handle? -- How can error handling be standardized across all classes that - implement this interface? Is there a need for a standard strategy or - should error handling be left to the concrete implementation? + @abstractmethod + def execute(self, task: Task) -> Any: + pass diff --git a/docs/tasks/index.rst b/docs/tasks/index.rst index 69a22e35d..08d55d259 100644 --- a/docs/tasks/index.rst +++ b/docs/tasks/index.rst @@ -26,6 +26,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/tasks/task.rst b/docs/tasks/task.rst index b71d3394b..4a16c825e 100644 --- a/docs/tasks/task.rst +++ b/docs/tasks/task.rst @@ -1,73 +1,74 @@ Task ==== -``Task`` is a primary object used by ``TaskExecutor`` in the Automata -library. The ``Task`` object is responsible for storing the task id, -priority, max retries, and delivering task-oriented action to the -respective task function when the task is executed. In addition to -these, it also contains a method to generate a deterministic task id -which is based on the hash of the hashable keyword arguments. +``Task`` is a generic object used by ``TaskExecutor``. It is responsible +for storing relevant task details such as the task id, priority level, +and maximum retries. The ``Task`` class also provides parameters to +receive arguments and keyword arguments which are then passed to the +task function when the task is executed. Additionally, it includes a +method to generate a deterministic task id based on a hash of the +hashable keyword arguments. Overview -------- -The ``Task`` plays a pivotal role in task execution by storing the -necessary details about the task. By default, it assigns unique -identifiers generated by the universally unique identifier (uuid) -module. However, you have the option to generate a deterministic id -based on the hash of the hashable keyword arguments. +The ``Task`` class is initiated with optional keyword arguments for task +priority and maximum retries, defaulting to 0 and 3 respectively. +Optional ``generate_deterministic_id`` keyword argument can also be +provided to generate deterministic task id based on the hash of hashable +kwargs. -Note that the task status will be noted by ``TaskExecutor`` as it -proceeds to execute different stages. This task status is retrievable -using the status property of the class. Furthermore, you can set the -status of the task using the status setter property. +Task status is handled through properties, allowing for the task’s +status to be updated as it moves through different stages of execution. + +The ``Task`` object also includes support for logging, with +notifications system when task status is changed. Related Symbols --------------- -- ``automata.tests.unit.test_task_database.task`` -- ``automata.tests.unit.test_task_environment.test_commit_task`` -- ``automata.tasks.tasks.AutomataTask`` -- ``automata.tests.unit.test_task_database.test_get_tasks_by_query`` -- ``automata.tasks.base.ITaskExecution.execute`` -- ``automata.tests.unit.test_task_database.test_update_task`` -- ``automata.tests.unit.test_task_database.test_database_lifecycle`` -- ``automata.tests.unit.test_task_database.test_insert_task`` -- ``automata.tasks.base.ITaskExecution`` -- ``automata.tests.conftest.registry`` +- ``TaskExecutor`` +- ``TaskStatus`` Usage Example ------------- +**Initialization** + .. code:: python - from automata.tasks.base import Task + from tasks.task_base import Task + + task = Task(priority=2, max_retries=5, generate_deterministic_id=True) + +**Setting Status** + +.. code:: python - task = Task(priority=1, max_retries=5) - print(f"ID of the created task: {task.task_id}") + from tasks.task_base import Task, TaskStatus + + # Initialize a task + task = Task(priority=2, max_retries=5) + + # Set status of the task + task.status = TaskStatus.STARTED Limitations ----------- -The current design does not allow to retrieve the task id, priority and -max retries once the task object is initialized. If you need to retrieve -these properties, you will need to capture these values during task -initialization. - -If a task status is set to ``RETRYING``, and if the maximum retries set -is exceeded, the task status will be marked ``FAILED``. In case, the -application logic requires further retries despite the retry limit, you -will need to create a new Task instance with the required parameters and -execute it as a fresh task. - -Follow-up Questions -------------------- - -- Can we include mechanism that would enable us to retrieve the initial - properties of Task object, such as task id, priority and max_retries - once initialized? -- Should we allow indefinite retries despite exceeding the maximum - retry limit? If yes, what would be the best approach to implement - this feature? -- Should there be an option to reset the status of the task back to its - original state in case of failure in execution? +Task’s status cannot be set to ``RETRYING`` if the maximum number of +retries has been reached. In such cases, default status is ‘FAILED’. + +Another limitation is the potential for collision if deterministic +session_ids are generated from identical sets of keyword arguments. This +could potentially overwrite previous task with the same derived task id. + +Follow-up Questions: +-------------------- + +- Is the ``retry_count`` field incremented when a task’s status is set + to ``RETRYING``? What happens to this count when a task is successful + or fails? +- How are tasks with identical deterministic task ids handled? In the + presence of such a scenario, will it lead to data loss by overwriting + the existing task details with the new task details? diff --git a/docs/tasks/task_environment.rst b/docs/tasks/task_environment.rst index 870bc6bb3..d845974ce 100644 --- a/docs/tasks/task_environment.rst +++ b/docs/tasks/task_environment.rst @@ -1,85 +1,75 @@ TaskEnvironment =============== +``TaskEnvironment`` is an abstract base class (ABC) designed to +represent a task environment. This class defines four methods for +managing the task environment: setup, teardown, validate, and reset. +These methods must be overridden by any concrete class inheriting from +``TaskEnvironment``. + Overview -------- -The TaskEnvironment is an abstract base class which provides an -interface for defining a context in which tasks are executed. It has -four abstract methods - ``reset``, ``setup``, ``teardown`` and -``validate``. As an abstract base class, it must be subclassed and its -methods implemented. +The ``TaskEnvironment`` class sets the basic structure to implement a +task environment within the application. The abstract methods it +contains are expected to include the business logic for setting up an +environment, tearing it down, validating it and resetting it to its +initial state. These methods need to be implemented in subclasses to +work as intended. Related Symbols --------------- -- ``automata.tests.conftest.environment`` -- ``automata.tests.unit.test_task_environment.test_commit_task`` -- ``automata.tasks.environment.AutomataTaskEnvironment`` -- ``automata.tests.conftest.task`` -- ``automata.tests.unit.test_task_database.db`` -- ``automata.tasks.environment.AutomataTaskEnvironment.teardown`` -- ``automata.tests.unit.test_task_executor.test_execute_automata_task_success`` -- ``automata.tests.unit.test_task_executor.test_execute_automata_task_fail`` -- ``automata.tasks.base.Task`` +As ``TaskEnvironment`` is an abstract class, the related symbols would +typically be the classes that inherit from it and implement its abstract +methods. As it is not contextually provided here, we can’t name them +specifically. -Example -------- +Usage Example +------------- -The following is a class that extends from TaskEnvironment and -implements its abstract methods: +The example below shows a basic usage of a subclass of +``TaskEnvironment``. Please note that ``TaskEnvironment`` is an abstract +base class and cannot be instantiated on its own. .. code:: python - from automata.tasks.base import TaskEnvironment - - class MyEnvironment(TaskEnvironment): - - def reset(self): - """Reset the environment to its initial state.""" - pass - - def setup(self, task): - """Set up the environment.""" - pass - - def teardown(self): - """Tear down the environment.""" - pass - - def validate(self): - """Validate the environment.""" - pass - -After creating the subclass, you can use it to create an object and call -its methods: - -.. code:: python - - env = MyEnvironment() - env.setup(task) - # Do some operations... - env.teardown() - -Note: In the real implementation, you would likely put some real logic -into the methods ``reset``, ``setup``, ``teardown``, and ``validate``. + from automata.tasks.task_base import TaskEnvironment, Task + + class MyTaskEnvironment(TaskEnvironment): + def setup(self, task: Task): + print("Setting up the task environment.") + + def teardown(self): + print("Tearing down the task environment.") + + def validate(self): + print("Validating the task environment.") + + def reset(self): + print("Resetting the task environment back to initial state.") + + # usage + my_env = MyTaskEnvironment() + my_env.setup(my_task) # assuming my_task is an instance of Task + my_env.validate() + my_env.teardown() + my_env.reset() Limitations ----------- -The TaskEnvironment class is only useful as a superclass for other -classes. It does not offer any functionality on its own because it only -defines an interface without any concrete implementation. The -limitations of a TaskEnvironment will therefore depend on the specific -subclass that implements its methods. +Its main limitations are that it is highly abstract, meaning it doesn’t +provide any concrete functionality on its own. It relies on subclasses +to provide specific implementations of its methods. Therefore, using it +directly would lead to errors because its methods are yet to be +implemented. -Follow-up Questions -------------------- +Follow-up Questions: +-------------------- -- Are there any expectations or requirements for the implementations of - the ``setup`` and ``teardown`` methods? -- Are there any specific conditions which would make the ``validate`` - method return False? -- There seems to be test setup data included in the context, though - it’s not clear how or if it should be included in the final - documentation. +- What are the specific criteria that should be validated in the + validate method? +- What does resetting the environment entail in this context? +- Are there any restrictions, rules, or requirements for setting up or + tearing down the environment that subclasses should adhere to? diff --git a/docs/tasks/task_general_error.rst b/docs/tasks/task_general_error.rst index 0cce12d39..623f0c560 100644 --- a/docs/tasks/task_general_error.rst +++ b/docs/tasks/task_general_error.rst @@ -1,17 +1,2 @@ -- ``TaskGeneralError`` is usually triggered under circumstances where - an error doesn’t fit into other specific categories of exceptions - thrown during task execution. A few such scenarios could include - general system failure, invalid parameters passed to a task, database - errors during task performance, or unforeseen errors due to unique - edge cases. - -- Error handling in the broader Automata system would usually rely on - exception handling, logging, and possibly retry mechanisms for - certain operation types. It would largely depend on the design and - architectural decisions made within the Automata system. As for - general errors represented by ``TaskGeneralError``, retries might not - always be useful, since a repeat of the action might just produce the - same error yet again. Instead, these may typically require immediate - manual intervention or a structural fix, depending on what caused the - error. However, this could vary and automated retries could be - implemented if deemed appropriate. +class TaskGeneralError(AutomataError): ‘An exception raised when a +general error occurs during task execution.’ pass diff --git a/docs/tasks/task_instructions_error.rst b/docs/tasks/task_instructions_error.rst index 251b7de5e..57466dd59 100644 --- a/docs/tasks/task_instructions_error.rst +++ b/docs/tasks/task_instructions_error.rst @@ -1,14 +1,2 @@ -- ``TaskInstructionsError`` might be raised in situations where the - task instructions are missing crucial information, contain incorrect - or unexpected data types, or are otherwise malformed in a way that - precludes successful task execution. The specifics would depend on - the requirements and expectations of the components that are - processing the instructions. - -- If there is a standard format for task instructions, details about - its requirements would be contingent on the design of the - ``Automata`` ecosystem. ``TaskInstructionsError`` would likely be - raised if the instructions violate this format. Since - ``TaskInstructionsError`` is intended to catch issues with - instructions, it is reasonable to assume it would be used to enforce - conformity to a standard format. +class TaskInstructionsError(AutomataError): ‘An exception raised when +there is an error with the task instructions.’ pass diff --git a/docs/tasks/task_state_error.rst b/docs/tasks/task_state_error.rst index 5b8c6853d..e8570afc9 100644 --- a/docs/tasks/task_state_error.rst +++ b/docs/tasks/task_state_error.rst @@ -1,15 +1,2 @@ -- In the Automata framework, task states are updated by the - AutomataTaskEnvironment class and the updates mainly depend on the - lifecycle of the task. When a task is created, it’s in a “CREATED” - state. When it’s currently running, it’s in a “RUNNING” state. After - it’s finished, it’s in a “COMPLETED” state and so on. - -- Actions in response to a ``TaskStateError`` can vary depending on the - operation that caused the error and the specific task at hand. - Generally, it can involve retrying the operation (especially if it - involves network requests), checking for and fixing any mistakes in - the code that’s managing the task, contacting a subsystem that may be - managing the state incorrectly, or changing program flow to not - attempt the operation until the task is in the correct state. In some - cases, it might be necessary to log the error for future - troubleshooting, or to notify the user or system administrator. +class TaskStateError(AutomataError): ‘An exception raised when the task +is not in the correct state for the operation.’ pass diff --git a/docs/tasks/task_status.rst b/docs/tasks/task_status.rst index e62683758..ad222b00e 100644 --- a/docs/tasks/task_status.rst +++ b/docs/tasks/task_status.rst @@ -1,13 +1,3 @@ -- No, in general, once an Enum is defined, it can’t be updated or - modified. This is by design as Enums are meant to provide a fixed set - of values. However, if different or additional statuses are needed, a - new Enum class or subclasses can be created to accommodate these. - -- The transition between different statuses is managed by the task - executor. If an error occurs during the execution of a task, it is - caught and handled by the executor, and the status of the task is - updated to ``TaskStatus.FAILED``. If the error is recoverable and the - task has not reached its maximum number of retries, the status may be - updated to ``TaskStatus.RETRYING`` and the task will be attempted - again. The specific mechanisms of error handling can vary depending - on the implementation of the task and the executor. +class TaskStatus(Enum): CREATED = ‘created’ REGISTERED = ‘registered’ +PENDING = ‘pending’ RUNNING = ‘running’ SUCCESS = ‘success’ COMMITTED = +‘committed’ FAILED = ‘failed’ RETRYING = ‘retrying’ diff --git a/docs/tests/index.rst b/docs/tests/index.rst index c2ce94e3e..9363433ba 100644 --- a/docs/tests/index.rst +++ b/docs/tests/index.rst @@ -18,6 +18,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/tests/unit/index.rst b/docs/tests/unit/index.rst index 7e95a694e..d39d856df 100644 --- a/docs/tests/unit/index.rst +++ b/docs/tests/unit/index.rst @@ -21,6 +21,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. @@ -31,7 +33,6 @@ Summary of content test_execute_behavior test_tool test_url - sample_modules/index .. AUTO-GENERATED CONTENT END .. diff --git a/docs/tools/agent_tool_factory.rst b/docs/tools/agent_tool_factory.rst index 8a63cb31c..a3ca91a40 100644 --- a/docs/tools/agent_tool_factory.rst +++ b/docs/tools/agent_tool_factory.rst @@ -4,70 +4,63 @@ AgentToolFactory Overview -------- -The ``AgentToolFactory`` class is responsible for creating tools from a -given agent tool name. It leverages the system of agent tool names and -the registry of OpenAI Automata Agent toolkit builders to create and -manage tools. The primary methods of the ``AgentToolFactory`` are -``build_tools`` and ``create_tools_from_builder``, which are used for -generating tools and creating tools from builders respectively. - -Methods -------- - -The ``build_tools`` method creates a collection of tools given a list of -``toolkit_list`` tool names. It loops through the list of available tool -names, for each of them checks whether the agent tool manager can handle -it, and then applies the ``create_tools_from_builder`` function to -create the corresponding tool. - -The ``create_tools_from_builder`` function creates tools from a given -``agent_tool`` tool name. This tool name is passed into the builder’s -``can_handle`` function to confirm if the builder can create the -requested tool. Once verified, the tool is created using the builder’s -corresponding build function. +``AgentToolFactory`` is a class designed to utilize the toolkit registry +to create tools and builders according to provided agent tool names. +This tool factory maintains a mapping of toolkit types to their +arguments, allowing for quick and reliable tool creation. Tools are +fundamental components in handling various tasks like Python code +reading, writing, symbolic search, etc. + +The main methods of ``AgentToolFactory`` include +``create_tools_from_builder`` which uses the toolkit registry to create +tools from a given agent tool name, and ``build_tools``, a method that +accepts a list of tool names and generates associated tools accordingly. +For all unknown or unhandled tool names, the factory raises +``UnknownToolError``. Related Symbols --------------- -- ``automata.tests.unit.test_tool.test_tool`` -- ``automata.tests.unit.test_tool.test_tool_instantiation`` -- ``automata.singletons.toolkit_registries.OpenAIAutomataAgentToolkitRegistry.register_tool_manager`` -- ``automata.tests.unit.test_py_writer_tool.python_writer_tool_builder`` +- ``automata.singletons.toolkit_registry.OpenAIAutomataAgentToolkitRegistry`` +- ``automata.tools.builders.py_writer_builder.PyCodeWriterToolkitBuilder.build`` +- ``automata.tools.builders.py_writer_builder.PyCodeWriterOpenAIToolkitBuilder.build_for_open_ai`` +- ``automata.tools.builders.py_reader_builder.PyReaderOpenAIToolkitBuilder.build_for_open_ai`` +- ``automata.core.base.database.relational_database.SQLDatabase.close`` +- ``automata.symbol_embedding.vector_databases.ChromaSymbolEmbeddingVectorDatabase.__init__`` +- ``automata.tools.builders.TaskEnvironment.AutomataTaskEnvironment`` Usage Example ------------- +Here’s a simple example that demonstrates how to use +``AgentToolFactory``: + .. code:: python - from automata.tools.factory import AgentToolFactory - from automata.agent.agent import AgentToolkitNames + from automata.tools.agent_tool_factory import AgentToolFactory + from automata.tool_agent_manager import AgentToolkitNames + toolkits = ['symbol_search', 'py_reader'] + tools = AgentToolFactory.build_tools(toolkits) - toolkit_list = ["tool_name1", "tool_name2"] - tools = AgentToolFactory.build_tools(toolkit_list) +In the above example, ``AgentToolFactory`` is imported and utilized to +build tools with the names ‘symbol_search’ and ‘py_reader’. Limitations ----------- -The ``AgentToolFactory`` is limited by the builders registered in the -``OpenAIAutomataAgentToolkitRegistry``. If a builder for a desired tool -isn’t registered yet or doesn’t exist, the ``AgentToolFactory`` won’t be -able to create that tool. - -Dependencies ------------- - -Some key dependencies include: - -- OpenAIAutomataAgentToolkitRegistry for querying the builder’s - registry. -- AgentToolkitNames for managing and validating tool names. -- OpenAIAgentToolkitBuilder for building the tools for the OpenAI - agent. +While the ``AgentToolFactory`` provides significant flexibility in tool +creation, it does rely on the builder registry and canned toolkit types +defined in ``AgentToolkitNames``. This design makes it less adaptable to +toolkits not already defined in the software architecture. Furthermore, +the factory will raise an error if it is asked to build a toolkit that +is not registered or unknown, which could limit its extensibility with +other third-party tools or custom toolkits. Follow-up Questions: -------------------- -- How can we extend the ``AgentToolFactory`` to handle a wider range of - tools or to handle custom tools? -- Could there be a more efficient way to create enums from the tool - names instead of handling them as plan strings? +- Might there be a more flexible way to register new toolkit types + without modifying internal code or the ``AgentToolkitNames`` + enumerator? +- How can one implement an extension mechanism to allow support for + other third-party toolkits outside OpenAI’s prebuilt toolkits? diff --git a/docs/tools/base/index.rst b/docs/tools/base/index.rst index cfe60cad4..eb2350c56 100644 --- a/docs/tools/base/index.rst +++ b/docs/tools/base/index.rst @@ -17,6 +17,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/tools/builders/index.rst b/docs/tools/builders/index.rst index dae688d92..e6aec863a 100644 --- a/docs/tools/builders/index.rst +++ b/docs/tools/builders/index.rst @@ -25,6 +25,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. @@ -33,8 +35,10 @@ Summary of content context_oracle_open_ai_toolkit_builder context_oracle_toolkit_builder + py_code_writer_open_ai_toolkit_builder py_code_writer_toolkit_builder py_reader_open_ai_toolkit + py_reader_open_ai_toolkit_builder py_reader_toolkit_builder py_writer_open_ai_toolkit_builder py_writer_toolkit_builder diff --git a/docs/tools/builders/py_code_writer_open_ai_toolkit_builder.rst b/docs/tools/builders/py_code_writer_open_ai_toolkit_builder.rst new file mode 100644 index 000000000..3889a5ecf --- /dev/null +++ b/docs/tools/builders/py_code_writer_open_ai_toolkit_builder.rst @@ -0,0 +1,64 @@ +PyCodeWriterOpenAIToolkitBuilder +================================ + +``PyCodeWriterOpenAIToolkitBuilder`` is a class that builds and manages +tools for the OpenAI’s Python code writer. It inherits from both +PyCodeWriterToolkitBuilder and OpenAIAgentToolkitBuilder. The class is +used for creating machine learning models trained to accomplish specific +programming tasks. + +Overview +-------- + +``PyCodeWriterOpenAIToolkitBuilder`` primarily leverages the +``build_for_open_ai`` method to generate a list of tools specifically +designed for OpenAI usage. These tools consist of OpenAITool instances, +which include the tool’s function, name, description, and details of +properties and requirements for proper usage. + +This class is useful when there is a need to modify or generate code +through OpenAI tools. It’s registered with a unique tool name and a +provider to ensure proper categorization and usage within the software +development process. + +Related Symbols +--------------- + +Unfortunately, the provided context does not include any related symbols +for the ``PyCodeWriterOpenAIToolkitBuilder`` class. + +Example +------- + +Below is a basic way of using ``PyCodeWriterOpenAIToolkitBuilder``: + +.. code:: python + + from automata.tools.builders.py_writer_builder import PyCodeWriterOpenAIToolkitBuilder + + builder = PyCodeWriterOpenAIToolkitBuilder() + tools = builder.build_for_open_ai() + +Limitations +----------- + +The properties and required parameters (``module_dotpath`` and ``code``) +appear to be hardcoded into the ``build_for_open_ai`` method. This might +restrict the class’s flexibility to adapt to different or expanded tool +property requirements. If this is the case, it may be worth considering +making these configurable. + +Further, it’s unclear how ``PyCodeWriterOpenAIToolkitBuilder`` works +within the larger context of the application, considering the lack of +tests and related symbols provided. + +Follow-up Questions: +-------------------- + +- How can we make the properties and required parameters configurable? +- What is the wider context within which + ``PyCodeWriterOpenAIToolkitBuilder`` operates, and how does it + interact with other components of the system? +- Does the ``PyCodeWriterOpenAIToolkitBuilder.build_for_open_ai`` + method completely replace the functionalities of the parent class + ``build`` method or complements it in some way? diff --git a/docs/tools/builders/py_code_writer_toolkit_builder.rst b/docs/tools/builders/py_code_writer_toolkit_builder.rst index 6390fc6f8..011ae823e 100644 --- a/docs/tools/builders/py_code_writer_toolkit_builder.rst +++ b/docs/tools/builders/py_code_writer_toolkit_builder.rst @@ -1,24 +1,71 @@ -1. Code validation can be introduced in ``PyCodeWriterToolkitBuilder`` - using the ``ast.parse()`` function provided by Python’s built-in - Abstract Syntax Trees (AST) module. This can be done before - attempting to create or update a module. If ``ast.parse()`` raises a - ``SyntaxError``, it signifies that the input python code is invalid. - -2. A configuration option, perhaps as a boolean variable, can be - introduced to the ``PyCodeWriterToolkitBuilder``. This option would - determine whether or not to throw error when a module does not exist. - The builder checks if the module exists, and if not, uses this - configuration to decide if they should be created. If the config - option is set to not create new modules, an error should be raised - when a non-existent module is encountered. - -3. Extending ``PyCodeWriterToolkitBuilder`` to handle other programming - languages or generic text files would require building similar - toolkit builders for those languages or files, as they would likely - have unique syntax and specifications. However, this could greatly - improve the versatility of the toolkits and broaden their areas of - application. Note that building these systems could be complex, given - the differences in syntax and semantics across programming languages. - Design considerations would need to be made to ensure the tools - retain a common interface while supporting different languages’ - specifics. +PyCodeWriterToolkitBuilder +========================== + +Overview +-------- + +``PyCodeWriterToolkitBuilder`` is a class developed for interacting with +the PythonWriter API. The class provides functionality to modify python +code with the help of built-in methods that can create or update +existing python modules. Class’s initialization requires an instance of +``PyCodeWriter`` and a boolean variable ``do_write`` deciding whether to +write these changes to disk. + +Important methods contained in this class include ``build``, +``_update_existing_module``, and ``_create_new_module``. The ``build`` +method generates a toolkit that includes two functionalities: updating +existing python code and creating a new python module. If a required +object doesn’t exist in the module being modified, it is created +automatically. If it already exists, the existing code is modified. To +create a new module, the complete code is provided as a parameter. + +Related Symbols +--------------- + +- ``automata.tools.builders.AgentToolkitBuilder`` +- ``automata.tools.builder.PyCodeWriter`` + +Example +------- + +The following examples demonstrate how to use +``PyCodeWriterToolkitBuilder`` for modifying an existing python module +and creating a new python module. + +.. code:: python + + from automata.tools.builders.py_writer_builder import PyCodeWriterToolkitBuilder + from automata.tools.writer import PyCodeWriter + + py_writer = PyCodeWriter() + py_writer_builder_toolkit = PyCodeWriterToolkitBuilder(py_writer) + + update_module_tool = py_writer_builder_toolkit.build()[0] + result = update_module_tool.function('my_folder.my_file.MyClass', 'def my_method():\n "My Method"\n print("hello world")\n') + + create_module_tool = py_writer_builder_toolkit.build()[1] + result = create_module_tool.function('my_folder.my_new_file', 'import math\ndef my_method():\n "My Method"\n print(math.sqrt(4))\n') + +Limitations +----------- + +The primary limitation is that ``PyCodeWriterToolkitBuilder`` can only +modify the python code of an existing module or create a new module with +provided complete code. This toolkit has no context outside of the +passed arguments. Any additional statements, especially any import +statements that the code block may depend upon, should be included +within the code block itself. + +Also, Error handling within the toolkit can return generic exceptions +which might not provide a clear understanding of the exact issue +limiting the ease of debugging. + +Follow-up Questions: +-------------------- + +- Could we provide a way to have better error handling or return more + specific exceptions? Would that help usability in large projects? +- Can we provide the ability to read and write the changes on the go as + per user requirements? +- Is there a possibility to add a feature that can directly modify code + within the actual Project’s structure itself? diff --git a/docs/tools/builders/py_reader_open_ai_toolkit_builder.rst b/docs/tools/builders/py_reader_open_ai_toolkit_builder.rst new file mode 100644 index 000000000..d20370e0a --- /dev/null +++ b/docs/tools/builders/py_reader_open_ai_toolkit_builder.rst @@ -0,0 +1,79 @@ +PyReaderOpenAIToolkitBuilder +============================ + +Overview +-------- + +``PyReaderOpenAIToolkitBuilder`` is a class that serves as a builder for +tools that aim to interact with Python-based resources. It extends +``PyReaderToolkitBuilder`` and ``OpenAIAgentToolkitBuilder`` classes, +providing the ability to build tools specifically for OpenAI interface. +The class produces a list of ``OpenAITool`` instances through its +``build_for_open_ai`` method. These tools carry information like +function they execute, their name, description, and the properties they +require for proper execution. + +Each tool is initialized through the ``OpenAITool`` constructor and is +stored in the ``openai_tools`` list which is finally returned by the +``build_for_open_ai`` method. The tools’ properties are determined +through a fixed dictionary containing keys like ``module_path`` and +``node_path``, providing context for the code retrieval process. + +Related Symbols +--------------- + +- ``automata.tools.builders.py_reader_builder.PyReaderToolkitBuilder`` +- ``automata.tools.builders.openai_agent_toolkit_builder.OpenAIAgentToolkitBuilder`` +- ``automata.tools.tool.AgentToolkitNames`` +- ``automata.tools.provider.LLMProvider`` +- ``automata.tools.openai_tool.OpenAITool`` + +Usage Example +------------- + +Below is an example of how to use ``PyReaderOpenAIToolkitBuilder`` to +build an OpenAI tool: + +.. code:: python + + from automata.tools.builders.py_reader_builder import PyReaderOpenAIToolkitBuilder + + # Initialize the builder. + builder = PyReaderOpenAIToolkitBuilder() + + # Build the OpenAI tools. + openai_tools = builder.build_for_open_ai() + + # Let's print the details of the first tool for demonstrative purposes. + first_tool = openai_tools[0] + print(f"Tool name: {first_tool.name}") + print(f"Tool description: {first_tool.description}") + print(f"Tool required properties: {first_tool.required}") + # Output can be specific to the build set-up and tools built. + +Limitations +----------- + +The primary limitation of ``PyReaderOpenAIToolkitBuilder`` is that the +parameters assigned while building ``OpenAITool`` instances are fixed - +``module_path`` is a required field, and ``node_path`` is an optional +field. This means ``PyReaderOpenAIToolkitBuilder`` may not be flexible +enough for all use cases where different types of parameters might be +needed. + +Moreover, the ``build_for_open_ai`` method extensively relies on the +``build`` method of the parent ``PyReaderToolkitBuilder``. Any changes +or issues in the parent class or method would directly impact +``PyReaderOpenAIToolkitBuilder``. + +Follow-up Questions: +-------------------- + +- Is there a way to make ``PyReaderOpenAIToolkitBuilder`` more flexible + in terms of the parameters it assigns while building ``OpenAITool`` + objects? +- What is the type and nature of ``tool.function`` referenced inside of + the ``build_for_open_ai`` method? +- How can we potentially handle different configurations or extensions + of this tool builder class to accommodate potential future needs or + changes in the API? diff --git a/docs/tools/builders/py_reader_toolkit_builder.rst b/docs/tools/builders/py_reader_toolkit_builder.rst index 14f2eec95..24e3e070b 100644 --- a/docs/tools/builders/py_reader_toolkit_builder.rst +++ b/docs/tools/builders/py_reader_toolkit_builder.rst @@ -1,23 +1,84 @@ -1. As of now, specific plans to extend the functionality of - ``PyReaderToolkitBuilder`` to facilitate easier error handling - haven’t been disclosed. However, development in the OpenAI codebase - is ongoing, and enhancements may include better error handling. - Please keep an eye on the OpenAI updates. - -2. Even though it hasn’t been stated explicitly, the architecture of the - tools builders classes in the OpenAI codebase encourages composition. - ``PyReaderToolkitBuilder`` itself is part of a composition with - ``PyReader``. So one can imagine integrating with other code parsing - tools builders where common interfaces align. It’s up to your - implementation and design as to how this could be achieved. - -3. ``PyReaderToolkitBuilder`` should work will with any tool that uses - the ``PythonIndexer`` API. Regarding compatibility requirements, - given that ``PyReaderToolkitBuilder`` and related tools work heavily - with Python indexing, retrieving and parsing, any module, or code you - intend to work with should be compatible with the Python standards. - Potential issues could arise from the inability to resolve relative - imports or if code is written in a way that is not amenable to - docstring extraction or code parsing. As ‘PyReaderToolkitBuilder’ can - only read, it may not inherently modify or correct these parts of the - code. +PyReaderToolkitBuilder +====================== + +Overview +-------- + +``PyReaderToolkitBuilder`` is a class designed to provide an interface +with Python’s Indexer API, allowing direct retrieval of Python code. By +instantiating the ``PyReaderToolkitBuilder`` with a ``PyReader`` object, +users can construct tools that retrieve a Python package’s code, +including modules, standalone functions, classes or methods. Two types +of tools can be built: one for retrieving raw code and the other for +retrieving docstrings only. + +Two private methods ``_run_indexer_retrieve_code`` and +``_run_indexer_retrieve_docstring`` in the class are the underlying +functions behind these tools. They attempt to fetch the source code or +docstring correspondingly, and return a failure message in case an +exception occurs. + +Related Symbols +--------------- + +The related symbols associated with the ``PyReaderToolkitBuilder`` class +are: + +- ``automata.tools.builders.agent_toolkit_builder.AgentToolkitBuilder`` +- ``automata.common.types.PyReader`` +- ``automata.common.types.Tool`` +- ``typing.List``. +- ``typing.Optional`` + +Usage Example +------------- + +Before constructing the tools, you will first need a ``PyReader`` +object. Suppose ``mock_py_reader`` is your ``PyReader`` object. + +.. code:: python + + from automata.common.types.tool import Tool + from automata.tools.builders.py_reader_builder import PyReaderToolkitBuilder + + # Instantiate PyReaderToolkitBuilder + py_reader_builder = PyReaderToolkitBuilder(mock_py_reader) + + # Build Tools + tools = py_reader_builder.build() + + for tool in tools: + if tool.name == 'retrieve-code': + # Retrieve source code + result = tool.function('module_directory.target_module', 'TargetClass.target_function') + print("Source Code: ", result) + elif tool.name == 'retrieve-docstring': + # Retrieve docstring + result = tool.function('module_directory.target_module', 'TargetClass.target_function') + print("Docstring: ", result) + +In the above example, ``module_directory.target_module`` is the path to +the Python file and ``TargetClass.target_function`` is the function +defined in the ``TargetClass`` in the module we wish to retrieve. +Replace these with the actual values as per your requirements. + +Limitations +----------- + +``PyReaderToolkitBuilder`` depends on its ``PyReader`` attribute, an +instance of the ``PyReader`` class. It is possible to encounter +exceptions during the retrieval of Python code due to reasons such as +incorrect paths or missing files. These exceptions are captured and +returned as error messages. + +In the current implementation, it is assumed that the desired Python +files are local and accessible. Therefore, fetching code from remote or +protected directories may not be directly supported. + +Follow-up Questions: +-------------------- + +- How can we make this class handle remote or restricted files or + directories? +- Can we add more tools into the ``build`` method to retrieve other + components of Python code such as class variables or decorators? diff --git a/docs/tools/i_tool_execution.rst b/docs/tools/i_tool_execution.rst new file mode 100644 index 000000000..883be008a --- /dev/null +++ b/docs/tools/i_tool_execution.rst @@ -0,0 +1,7 @@ +class IToolExecution(ABC): ‘Interface for executing tools.’ + +:: + + @abstractmethod + def execute(self, function_call: 'FunctionCall') -> str: + pass diff --git a/docs/tools/index.rst b/docs/tools/index.rst index b9e649aa4..c16d242b0 100644 --- a/docs/tools/index.rst +++ b/docs/tools/index.rst @@ -20,6 +20,8 @@ how to :ref:`installation` the project. Summary of content + + .. AUTO-GENERATED CONTENT START .. @@ -27,8 +29,11 @@ Summary of content :maxdepth: 1 agent_tool_factory + i_tool_execution tool + tool_execution tool_executor + unknown_tool_error base/index builders/index tool_base/index diff --git a/docs/tools/tool.rst b/docs/tools/tool.rst index 842b071c1..1117c7682 100644 --- a/docs/tools/tool.rst +++ b/docs/tools/tool.rst @@ -1,71 +1,15 @@ -Tool -==== +class Tool(BaseModel): ‘``Tool`` exposes a function or coroutine +directly.’ -``Tool`` directly exposes a function or coroutine. It takes inputs in -the form of dictionary in a run method. The ``Tool`` class is part of -the automata.tools.base module. +:: -Overview --------- + class Config(): + extra = Extra.forbid + arbitrary_types_allowed = True + function: Callable[(..., str)] + name: str = '' + description: str = '' + coroutine: Optional[Callable[(..., Awaitable[str])]] = None -In the larger context of the Automata software architecture, ``Tool`` is -an abstraction that represents a tool or functionality. It encapsulates -a function or routine, exposing it through a ``run`` method that accepts -inputs in form of a dictionary. - -The principle use-case is to encapsulate tasks that involve fetching, -processing, or generating data summarized into a single callable method -``run``. - -Related Symbols ---------------- - -- ``automata.tests.unit.test_tool.test_tool`` -- ``automata.tests.unit.test_tool.TestTool`` -- ``automata.agent.providers.OpenAIAgentToolkitBuilder.can_handle`` -- ``automata.tests.unit.test_tool.test_tool_run`` -- ``automata.llm.providers.openai.OpenAITool`` -- ``automata.tests.unit.test_tool.TestTool.run`` -- ``automata.agent.agent.AgentToolkitBuilder.build`` -- ``automata.tests.unit.test_tool.test_tool_instantiation`` -- ``automata.tools.builders.symbol_search.SymbolSearchToolkitBuilder.build`` -- ``automata.tests.unit.test_symbol_search_tool.test_build`` - -Example -------- - -The following is an example use-case of creating a ``Tool`` instance, -and running it with an input. - -.. code:: python - - from automata.tools.base import Tool - - test_tool = Tool( - name="TestTool", - description="A test tool for testing purposes", - function=lambda x: "TestTool response") - - # Running the tool - tool_input = {"test": "test"} - response = test_tool.run(tool_input) - # Outputs: "TestTool response" - -Limitations ------------ - -The ``Tool`` class is designed to execute individually encapsulated -tasks, and is not suitable for managing tasks that involve significant -inter-dependence or require coordination between multiple tasks. - -While ``Tool`` instances can be used simultaneously they lack built-in -mechanisms for sharing information between one another, which might -limit application in more complex, real-world scenarios. - -Follow-up Questions -------------------- - -- How can ``Tool`` instances communicate with each other when running - in parallel? -- Is there a way to make the ``Tool`` capable of handling inter-task - dependencies? + def run(self, tool_input: Dict[(str, str)]) -> str: + return self.function(**tool_input) diff --git a/docs/tools/tool_base/config.rst b/docs/tools/tool_base/config.rst index b6d2814dc..2c7262813 100644 --- a/docs/tools/tool_base/config.rst +++ b/docs/tools/tool_base/config.rst @@ -1,21 +1 @@ -1. Yes, the Tool.Config can be potentially extended to suit specific use - cases. It serves as a base configuration class and, if needed, you - may further extend or customize it according to the needs of your - specific tool. However, it would be prudent to ensure that the - enhancements maintain the overall goal of the class, which is to - provide a clean and maintainable configuration. - -2. The implementation uses methods and directives offered by the - underlying Pydantic library to design a model with these constraints. - The Pydantic BaseModel class offers ways to set configuration - attributes that control these characteristics. - -3. While it’s true that there may be scenarios where allowing extra - attributes would beneficial, it does pose a risk of a messy and - uncontrolled configuration structure. Therefore, the Tool.Config - class prioritizes ease of maintainability and simplicity by - forbidding extra attributes. However, if a scenario demands extra - attributes, a new derived class can be created from the base - Tool.Config, where these requirements can be managed specifically. - It’s important to be mindful of ensuring that this does not introduce - unnecessary complexity. +class Config(): extra = Extra.forbid arbitrary_types_allowed = True diff --git a/docs/tools/tool_base/index.rst b/docs/tools/tool_base/index.rst index bdd90f733..6802bf8c4 100644 --- a/docs/tools/tool_base/index.rst +++ b/docs/tools/tool_base/index.rst @@ -8,6 +8,8 @@ how to :ref:`installation` the project. + + .. AUTO-GENERATED CONTENT START .. diff --git a/docs/tools/tool_execution.rst b/docs/tools/tool_execution.rst new file mode 100644 index 000000000..5991abbb5 --- /dev/null +++ b/docs/tools/tool_execution.rst @@ -0,0 +1,12 @@ +class ToolExecution(IToolExecution): ‘Class for executing tools.’ + +:: + + def __init__(self, tools: Sequence[Tool]) -> None: + self.tools = {tool.name: tool for tool in tools} + + def execute(self, function_call: 'FunctionCall') -> str: + if (tool := self.tools.get(function_call.name)): + return tool.run(function_call.arguments) + else: + raise Exception(f'No tool found for function call: {function_call.name}') diff --git a/docs/tools/tool_executor.rst b/docs/tools/tool_executor.rst index 3dd527f12..04337942e 100644 --- a/docs/tools/tool_executor.rst +++ b/docs/tools/tool_executor.rst @@ -1,17 +1,10 @@ -- The ``ToolExecutor`` is explicitly designed to work with objects that - implements the ``IToolExecution`` interface. If other types of - execution behavior interfaces are to be used, they would need to have - a similar ``execute`` method that takes a ``FunctionCall`` object as - an argument and returns a string. -- As of the time of writing, it doesn’t look like there’s any explicit - error handling or validation to ensure the ``IToolExecution`` - instance is valid. If the passed object does not correctly implement - the ``IToolExecution`` interface, a Python ``TypeError`` will likely - be raised when attempting to call its ``execute`` method. -- In terms of limitations on the ``FunctionCall`` class, the most - crucial point is that the ``name`` property should correspond to a - valid tool in the agent’s repertoire, and the arguments should match - the expected inputs for that tool. How these inputs are used will - depend on the specifications of the tool being called. Therefore, - it’s important to be familiar with the requirements of each specific - tool. +class ToolExecutor(): ‘Class for using IToolExecution behavior to +execute a tool.’ + +:: + + def __init__(self, execution: IToolExecution) -> None: + self.execution = execution + + def execute(self, function_call: 'FunctionCall') -> str: + return self.execution.execute(function_call) diff --git a/docs/tools/unknown_tool_error.rst b/docs/tools/unknown_tool_error.rst new file mode 100644 index 000000000..78296f120 --- /dev/null +++ b/docs/tools/unknown_tool_error.rst @@ -0,0 +1,7 @@ +class UnknownToolError(Exception): ‘An exception for when an unknown +tools type is provided.’ ERROR_STRING = ‘Unknown tools type: %s’ + +:: + + def __init__(self, tool_kit: str) -> None: + super().__init__((self.ERROR_STRING % tool_kit))