From 7ae42dc100cc6ad18e38ccb735d5cd62bda37c26 Mon Sep 17 00:00:00 2001 From: Nick Harder <56074305+nick-harder@users.noreply.github.com> Date: Mon, 4 Dec 2023 10:02:42 +0100 Subject: [PATCH 1/4] Created using Colaboratory --- 04_Reinforcement_learning_example.ipynb | 2170 +++++++++++++++++++++++ 1 file changed, 2170 insertions(+) create mode 100644 04_Reinforcement_learning_example.ipynb diff --git a/04_Reinforcement_learning_example.ipynb b/04_Reinforcement_learning_example.ipynb new file mode 100644 index 00000000..bd5cc468 --- /dev/null +++ b/04_Reinforcement_learning_example.ipynb @@ -0,0 +1,2170 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true, + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "source": [ + "# Tutorial: Reinforcement Learning in ASSUME\n", + "\n", + "This tutorial will introduce users into ASSUME and its ways of using reinforcement leanring (RL). The main objective of this tutorial is to ensure participants grasp the steps required to equip a new unit with RL strategies or modify the action dimensions.\n", + "Our emphasis lies in the bidding strategy class, with less emphasis on the algorithm and role. The latter are usable as a plug and play solution in the framework. The following coding tasks will highlight the key aspects to be adjusted, as already outlined in the learning_strategies.py file.\n", + "\n", + "The outline of this tutorial is as follows. We will start with a basic summary of the implementation of reinforcement learning (RL) in ASSUME and its architectrue (1. ASSUME & Learning Basics) . If you need a refresher on RL in general, please visit our readthedocs (https://assume.readthedocs.io/en/latest/). Afterwards, we install ASSUME in this Google Colab (2. Get ASSUME running) and then we dive into the learning_strategies.py file and explain how we need to adjust conventional bidding strategies to incorporate RL (3. Make ASSUME learn).\n", + "\n", + "#### As a whole, this tutorial covers the following codding tasks:\n", + "3.1 How to define a step function in the assume framework?\n", + "\n", + "3.2 How do we get observations from the simulation framework?\n", + "\n", + "3.3 How do we define actions based on the output of the actor neuronal net considering necesarry exploration?\n", + "\n", + "3.4 How do we define the reward?" + ], + "metadata": { + "id": "4JeBorbE6FYr" + } + }, + { + "cell_type": "markdown", + "source": [ + "# 1 ASSUME & LEARNING BASICS\n", + "\n", + "ASSUME in general is intended for researchers, planners, utilities and everyone searching to understand market dynamics of energy markets. It provides an easy-to-use tool-box as a free software that can be tailored to the specific use case of the user.\n", + "\n", + "In the following figure the architecture of the framework is depicted. It can be roughly devided into two parts. On the left side of the world class the markets are located and on the right side the market participants, which are here named units. Both world are connected via the orders that market participants place on the markets. The learning capability is sketched out with the yellow classes on the right side, namely the units side.\n", + "\n", + "\n", + "\n", + "![architecture.svg]()" + ], + "metadata": { + "id": "bj2C4ElILNNv" + } + }, + { + "cell_type": "markdown", + "source": [ + "Let's focus on the bright yellow part of the architecture, namely the learning algorithm, the actor and the critic. We start with some **reinforcement learning backround**. In the current implementation of ASSUME, we model the electricity market as a partially observable Markov game, which is an extension of MDPs for multi-agent setups.\n", + "\n", + "**Multi-agent DRL** is understood as the simultaneous learning of multiple agents interacting in the same environment. The Markov game for $N$ agents consists of a set of states $S$, a set of actions $A_1, ..., A_N$, a set of observations $O_1, ..., O_N$, and a state transition function $P: S \\times A_1 \\times ... \\times A_N \\rightarrow \\mathcal{P}(S)$ dependent on the state and actions of all agents. After taking action $a_i \\in A_i$ in state $s_i \\in S$ according to a policy $\\pi_i:O_i\\rightarrow A_i$, every agent $i$ is transitioned into the new state $s'_i \\in S$. Each agent receives a reward $r_i$ according to the individual reward function $R_i$ and a private observation correlated with the state $o_i:S \\rightarrow O_i$. Like MDP, each agent $i$ learns an optimal policy $\\pi_i^*(s)$ that maximizes its expected reward.\n", + "\n", + "To enable multi-agent learning some adjustments are needed within the learning algorithm to get from the TD3 to an MATD3 algorithm. Other authors used similar tweaks to improve the TD3 into the MADDPG algorithm and derive the MA-TD3 algorithm. We'll start explaining the learning by focusing on a single agent and then extend it to multi-agent learning.\n", + "\n", + "### 1.1 Single-Agent Learning\n", + "\n", + "We use the actor-critic approach to train the learning agent. The actor-critic approach is a popular RL algorithm that uses two neural networks: an actor network and a critic network. The actor network is responsible for selecting actions, while the critic network evaluates the quality of the actions taken by the actor.\n", + "\n", + "The actor and critic networks are trained simultaneously using the actor-critic algorithm, which updates the weights of both networks at each time step. The actor-critic algorithm is a form of policy iteration, where the policy is updated based on the estimated value function, and the value function is updated based on the.\n", + "\n", + "##### **Actor**\n", + "The actor network is trained using the policy gradient method, which updates the weights of the actor network in the direction of the gradient of the expected reward with respect to the network parameters:\n", + "\n", + "$\\nabla_{\\theta} J(\\theta) = E[\\nabla_{\\theta} log \\pi_{\\theta}(a_t|s_t) * Q^{\\pi}(s_t, a_t)]$\n", + "\n", + "where $J(\\theta)$ is the expected reward, $\\theta$ are the weights of the actor network, $\\pi_{\\theta}(a_t|s_t)$ is the probability of selecting action a_t given state $s_t$, and $Q^{\\pi}(s_t, a_t)$ is the expected reward of taking action $a_t$ in state $s_t$ under policy $\\pi$.\n", + "\n", + "##### **Critic**\n", + "The critic network is trained using the temporal difference (TD) learning method, which updates the weights of the critic network based on the difference between the estimated value of the current state and the estimated value of the next state:\n", + "\n", + "$\\delta_t = r_t + \\gamma * V(s_{t+1}) - V(s_t)$\n", + "\n", + "where $\\delta_t$ is the TD error, $r_t$ is the reward obtained at time step $t$, $\\gamma$ is the discount factor, $V(s_t)$ is the estimated value of state $s_t$, and $V(s_{t+1})$ is the estimated value of the next state $s_{t+1}$.\n", + "\n", + "The weights of the critic network are updated in the direction of the gradient of the mean squared TD error:\n", + "\n", + "$\\nabla_{\\theta} L = E[(\\delta_t)^2]$\n", + "\n", + "where L is the loss function.\n", + "\n" + ], + "metadata": { + "id": "dDn1blWvPM7Z" + } + }, + { + "cell_type": "markdown", + "source": [ + "### 1.2 Multi-Agent Learning\n", + "\n", + "While in a single-agent setup, the state transition and respective reward depend only on the actions of a single agent, the state transitions and rewards depend on the actions of all learning agents in a multi-agent setup. This makes the environment non-stationary for a single agent, which violates the Markov property. Hence, the convergence guarantees of single-agent RL algorithms are no longer valid. Therefore, we utilize the framework of centralized training and decentralized execution and expand upon the MADDPG algorithm. The main idea of this approach is to use a centralized critic during the training phase, which has access to the entire state $\\textbf{S}$, and all actions $a_1, ..., a_N$, thus resolving the issue of non-stationarity, as changes in state transitions and rewards can be explained by the actions of other agents. Meanwhile, during both training and execution, the actor has access only to its local observations $o_i$ derived from the entire state $\\textbf{S}$.\n", + "\n", + "For each agent $i$, we train two centralized critics $Q_{i,θ_1,2}(S, a_1, ..., a_N)$ together with two target critic networks. Similar to TD3, the smaller value of the two critics and target action noise $a_i$,$k~$ is used to calculate the target $y_i,k$:\n", + "\n", + "$y_i,k = r_i,k + γ * min_j=1,2 Q_i,θ′_j(S′_k, a_1,k, ..., a_N,k, π′(o_i,k))$\n", + "\n", + "where $r_i,k$ is the reward obtained by agent $i$ at time step $k$, $γ$ is the discount factor, $S′_k$ is the next state of the environment, and $π′(o_i,k)$ is the target policy of agent $i$.\n", + "\n", + "The critics are trained using the mean squared Bellman error (MSBE) loss:\n", + "\n", + "$L(Q_i,θ_j) = E[(y_i,k - Q_i,θ_j(S_k, a_1,k, ..., a_N,k))^2]$\n", + "\n", + "The actor policy of each agent is updated using the deterministic policy gradient (DPG) algorithm:\n", + "\n", + "$∇_a Q_i,θ_j(S_k, a_1,k, ..., a_N,k, π(o_i,k))|a_i,k=π(o_i,k) * ∇_θ π(o_i,k)$\n", + "\n", + "The actor is updated similarly using only one critic network $Q_{θ1}$. These changes to the original DDPG algorithm allow increased stability and convergence of the TD3 algorithm. This is especially relevant when approaching a multi-agent RL setup, as discussed in the following section." + ], + "metadata": { + "id": "OMvIl2xLVi1l" + } + }, + { + "cell_type": "markdown", + "source": [ + "# 2 GET ASSUME RUNNING\n", + "Here we just install the ASSUME core package via pip. In general the instructions for an installation can be found here: https://assume.readthedocs.io/en/latest/installation.html. All the required steps are executed here and since we are working in colab the generation of a venv is not necessary. \n" + ], + "metadata": { + "id": "OeeZDtIFmmhn" + } + }, + { + "cell_type": "code", + "source": [ + "!pip install assume-framework" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "m0DaRwFA7VgW", + "outputId": "5655adad-5b7a-4fe3-9067-6b502a06136b" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Collecting assume-framework\n", + " Downloading assume_framework-0.2.0-py3-none-any.whl (112 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m112.9/112.9 kB\u001b[0m \u001b[31m1.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting mango-agents-assume<2.0.0,>=1.1.1-1 (from assume-framework)\n", + " Downloading mango_agents_assume-1.1.1.post3-py3-none-any.whl (59 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m59.1/59.1 kB\u001b[0m \u001b[31m5.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting mypy<2.0.0,>=1.1.1 (from assume-framework)\n", + " Downloading mypy-1.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.2 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.2/12.2 MB\u001b[0m \u001b[31m80.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: nest-asyncio<2.0.0,>=1.5.6 in /usr/local/lib/python3.10/dist-packages (from assume-framework) (1.5.8)\n", + "Collecting paho-mqtt<2.0.0,>=1.5.1 (from assume-framework)\n", + " Downloading paho-mqtt-1.6.1.tar.gz (99 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m99.4/99.4 kB\u001b[0m \u001b[31m10.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Collecting pandas<3.0.0,>=2.0.0 (from assume-framework)\n", + " Downloading pandas-2.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.3/12.3 MB\u001b[0m \u001b[31m85.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting psycopg2-binary<3.0.0,>=2.9.5 (from assume-framework)\n", + " Downloading psycopg2_binary-2.9.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.0/3.0 MB\u001b[0m \u001b[31m84.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting pyomo<7.0.0,>=6.6.1 (from assume-framework)\n", + " Downloading Pyomo-6.6.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.7/12.7 MB\u001b[0m \u001b[31m9.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: python-dateutil<3.0.0,>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from assume-framework) (2.8.2)\n", + "Requirement already satisfied: pyyaml<7.0,>=6.0 in /usr/local/lib/python3.10/dist-packages (from assume-framework) (6.0.1)\n", + "Requirement already satisfied: sqlalchemy<3.0.0,>=2.0.9 in /usr/local/lib/python3.10/dist-packages (from assume-framework) (2.0.22)\n", + "Requirement already satisfied: tqdm<5.0.0,>=4.64.1 in /usr/local/lib/python3.10/dist-packages (from assume-framework) (4.66.1)\n", + "Collecting dill<0.4.0,>=0.3.6 (from mango-agents-assume<2.0.0,>=1.1.1-1->assume-framework)\n", + " Downloading dill-0.3.7-py3-none-any.whl (115 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m115.3/115.3 kB\u001b[0m \u001b[31m13.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting msgspec>=0.14.2 (from mango-agents-assume<2.0.0,>=1.1.1-1->assume-framework)\n", + " Downloading msgspec-0.18.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (202 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m202.2/202.2 kB\u001b[0m \u001b[31m22.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: protobuf<4.0.0,>=3.20.3 in /usr/local/lib/python3.10/dist-packages (from mango-agents-assume<2.0.0,>=1.1.1-1->assume-framework) (3.20.3)\n", + "Requirement already satisfied: typing-extensions>=4.1.0 in /usr/local/lib/python3.10/dist-packages (from mypy<2.0.0,>=1.1.1->assume-framework) (4.5.0)\n", + "Collecting mypy-extensions>=1.0.0 (from mypy<2.0.0,>=1.1.1->assume-framework)\n", + " Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n", + "Requirement already satisfied: tomli>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from mypy<2.0.0,>=1.1.1->assume-framework) (2.0.1)\n", + "Requirement already satisfied: numpy>=1.22.4 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0.0,>=2.0.0->assume-framework) (1.23.5)\n", + "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0.0,>=2.0.0->assume-framework) (2023.3.post1)\n", + "Collecting tzdata>=2022.1 (from pandas<3.0.0,>=2.0.0->assume-framework)\n", + " Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m341.8/341.8 kB\u001b[0m \u001b[31m33.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting ply (from pyomo<7.0.0,>=6.6.1->assume-framework)\n", + " Downloading ply-3.11-py2.py3-none-any.whl (49 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.6/49.6 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil<3.0.0,>=2.8.2->assume-framework) (1.16.0)\n", + "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy<3.0.0,>=2.0.9->assume-framework) (3.0.0)\n", + "Building wheels for collected packages: paho-mqtt\n", + " Building wheel for paho-mqtt (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for paho-mqtt: filename=paho_mqtt-1.6.1-py3-none-any.whl size=62118 sha256=46bea794d75243f95bc3a98068cd0a951731cd65c87a297d3299fef8781a9990\n", + " Stored in directory: /root/.cache/pip/wheels/8b/bb/0c/79444d1dee20324d442856979b5b519b48828b0bd3d05df84a\n", + "Successfully built paho-mqtt\n", + "Installing collected packages: ply, paho-mqtt, tzdata, pyomo, psycopg2-binary, mypy-extensions, msgspec, dill, pandas, mypy, mango-agents-assume, assume-framework\n", + " Attempting uninstall: pandas\n", + " Found existing installation: pandas 1.5.3\n", + " Uninstalling pandas-1.5.3:\n", + " Successfully uninstalled pandas-1.5.3\n", + "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "lida 0.0.10 requires fastapi, which is not installed.\n", + "lida 0.0.10 requires kaleido, which is not installed.\n", + "lida 0.0.10 requires python-multipart, which is not installed.\n", + "lida 0.0.10 requires uvicorn, which is not installed.\n", + "google-colab 1.0.0 requires pandas==1.5.3, but you have pandas 2.1.1 which is incompatible.\u001b[0m\u001b[31m\n", + "\u001b[0mSuccessfully installed assume-framework-0.2.0 dill-0.3.7 mango-agents-assume-1.1.1.post3 msgspec-0.18.4 mypy-1.6.1 mypy-extensions-1.0.0 paho-mqtt-1.6.1 pandas-2.1.1 ply-3.11 psycopg2-binary-2.9.9 pyomo-6.6.2 tzdata-2023.3\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "And easy like this we have ASSUME installed. Now we can let it run. Please note though that we cannot use the functionalities tied to docker and, hence, cannot access the predefined dashboards in colab. For this please install docker and ASSUME on your personal machine.\n", + "\n", + "Further we would like to access the predefined scenarios in ASSUME which are stored on the git repository. Hence, we clone the repository." + ], + "metadata": { + "id": "IIw_QIE3pY34" + } + }, + { + "cell_type": "code", + "source": [ + "!git clone https://github.com/assume-framework/assume.git" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "_5hB0uDisSsg", + "outputId": "1241881f-e090-4f26-9b02-560adfcb3a3e" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Cloning into 'assume'...\n", + "remote: Enumerating objects: 6035, done.\u001b[K\n", + "remote: Counting objects: 100% (2933/2933), done.\u001b[K\n", + "remote: Compressing objects: 100% (912/912), done.\u001b[K\n", + "remote: Total 6035 (delta 2377), reused 2236 (delta 2020), pack-reused 3102\u001b[K\n", + "Receiving objects: 100% (6035/6035), 11.58 MiB | 9.87 MiB/s, done.\n", + "Resolving deltas: 100% (4280/4280), done.\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "**Let the magic happen.** Now you can run your first ever simulation in ASSUME. The following code naviagtes to the respective assume folder and starts the simulation example example_01b using the local database here in colab." + ], + "metadata": { + "id": "Fg7DyNjLuvSb" + } + }, + { + "cell_type": "code", + "source": [ + "!cd assume && assume -s example_01b -db \"sqlite:///./examples/local_db/assume_db_example_01b.db\"" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "3eVM60Qx8SC0", + "outputId": "20434515-6e65-4d34-d44d-8c4529a46ece" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "example_01b_base 2019-01-02 23:00:00: 6% 172801.0/2678400 [00:06<03:02, 13749.57it/s]" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# 3 MAKE ASSUME LEARN\n", + "\n", + "Now it is time to get your hands dirty and actually dive into coding in ASSUME. The main objective of this session is to ensure participants grasp the steps required to equip a new unit with RL strategies or modify the action dimensions. Our emphasis lies in the bidding strategy class, with less emphasis on the algorithm and role. Coding tasks will highlight the key aspects to be a djusted, as already outlined in the learning_strategies.py file. Subsequent\n", + "sections will present the tasks and provide the correct answers for the coding exercises.\n", + "\n", + "We start by initializing the class of our Learning Strategy. This is very cloesly related to the general strucutre of a bidding strategy.\n", + "\n", + "\n", + "**But first some imports:**" + ], + "metadata": { + "id": "zMyZhaNM7NRP" + } + }, + { + "cell_type": "code", + "source": [ + "# install jdc for some in line magic,\n", + "# that allows us defining functions of classes across different cells\n", + "\n", + "!pip install jdc" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "qoWI_agIJOE4", + "outputId": "9b40e670-bfef-4560-d6e8-61a1b29d1975" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Collecting jdc\n", + " Downloading jdc-0.0.9-py2.py3-none-any.whl (2.1 kB)\n", + "Installing collected packages: jdc\n", + "Successfully installed jdc-0.0.9\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "from datetime import datetime, timedelta\n", + "from pathlib import Path\n", + "\n", + "import numpy as np\n", + "import pandas as pd\n", + "import torch as th\n", + "import jdc\n", + "import yaml\n", + "import logging\n", + "import os\n", + "\n", + "from assume import World, load_custom_units, load_scenario_folder, run_learning\n", + "from assume.common.base import LearningStrategy, SupportsMinMax\n", + "from assume.common.market_objects import MarketConfig, Orderbook, Product\n", + "from assume.reinforcement_learning.learning_utils import Actor, NormalActionNoise" + ], + "metadata": { + "id": "xUsbeZdPJ_2Q" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "class RLStrategy(LearningStrategy):\n", + " \"\"\"\n", + " Reinforcement Learning Strategy\n", + "\n", + " :param foresight: Number of time steps to look ahead. Default 24.\n", + " :type foresight: int\n", + " :param max_bid_price: Maximum bid price\n", + " :type max_bid_price: float\n", + " :param max_demand: Maximum demand\n", + " :type max_demand: float\n", + " :param device: Device to run on\n", + " :type device: str\n", + " :param float_type: Float type to use\n", + " :type float_type: str\n", + " :param learning_mode: Whether to use learning mode\n", + " :type learning_mode: bool\n", + " :param actor: Actor network\n", + " :type actor: torch.nn.Module\n", + " \"\"\"\n", + "\n", + " def __init__(self, *args, **kwargs):\n", + " super().__init__(*args, **kwargs)\n", + "\n", + " self.unit_id = kwargs[\"unit_id\"]\n", + "\n", + " # defines bounds of actions space\n", + " self.max_bid_price = kwargs.get(\"max_bid_price\", 100)\n", + " self.max_demand = kwargs.get(\"max_demand\", 10e3)\n", + "\n", + " # tells us whether we are training the agents or just executing per-learnind stategies\n", + " self.learning_mode = kwargs.get(\"learning_mode\", False)\n", + "\n", + " # sets the devide of the actor network\n", + " device = kwargs.get(\"device\", \"cpu\")\n", + " self.device = th.device(device if th.cuda.is_available() else \"cpu\")\n", + " if not self.learning_mode:\n", + " self.device = th.device(\"cpu\")\n", + "\n", + " # future: add option to choose between float16 and float32\n", + " # float_type = kwargs.get(\"float_type\", \"float32\")\n", + " self.float_type = th.float\n", + "\n", + " # for definition of observation space\n", + " self.foresight = kwargs.get(\"foresight\", 24)\n", + "\n", + " if self.learning_mode:\n", + " self.learning_role = None\n", + " self.collect_initial_experience_mode = kwargs.get(\n", + " \"episodes_collecting_initial_experience\", True\n", + " )\n", + "\n", + " self.action_noise = NormalActionNoise(\n", + " mu=0.0,\n", + " sigma=kwargs.get(\"noise_sigma\", 0.1),\n", + " action_dimension=self.act_dim,\n", + " scale=kwargs.get(\"noise_scale\", 1.0),\n", + " dt=kwargs.get(\"noise_dt\", 1.0),\n", + " )\n", + "\n", + " elif Path(load_path=kwargs[\"trained_actors_path\"]).is_dir():\n", + " self.load_actor_params(load_path=kwargs[\"trained_actors_path\"])\n", + "\n", + " def testfunction():\n", + "\n", + " return None" + ], + "metadata": { + "id": "UXYSesx4Ifp5" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### 3.0 The \"Step Function\"\n", + "\n", + "The key function in an RL problem is the step that is taken in the so called environment. It consist the following parts:\n", + "\n", + "1. Get an observation\n", + "2. Choose an action\n", + "3. Get a reward\n", + "4. Update your policy\n", + "\n", + "In ASSUME we do not have such a straight forward step function. The steps 1 & 2 are combined in the calculate_bids() function which is called as soon as an offer on the market is placed. The step 3, however, can only happen after we get the market feedback from the simulation run and, hence, is in the calculate_reward() function. Step 4 is solely handeled by the learning_role as it shedules the policy update manages the buffer and what not. Hence, it is actually not included in this notebook, since we only focus on transforming the bidding strategy into a learning one.\n", + "\n", + "**Step 1-3 will be implemented in the following sections 3.1 to 3.3. If there is a coding task for you it will be marked accordingly.**" + ], + "metadata": { + "id": "8UM1QPZrIdqK" + } + }, + { + "cell_type": "code", + "source": [ + "#magic to enable class definitions across colab cells\n", + "%%add_to RLStrategy\n", + "def calculate_bids(\n", + " self,\n", + " unit: SupportsMinMax,\n", + " market_config: MarketConfig,\n", + " product_tuples: list[Product],\n", + " **kwargs,\n", + ") -> Orderbook:\n", + " \"\"\"\n", + " Calculate bids for a unit -> STEP 1 & 2\n", + "\n", + " :param unit: Unit to calculate bids for\n", + " :type unit: SupportsMinMax\n", + " :param market_config: Market configuration\n", + " :type market_config: MarketConfig\n", + " :param product_tuples: Product tuples\n", + " :type product_tuples: list[Product]\n", + " :return: Bids containing start time, end time, price and volume\n", + " :rtype: Orderbook\n", + " \"\"\"\n", + "\n", + " bid_quantity_inflex, bid_price_inflex = 0, 0\n", + " bid_quantity_flex, bid_price_flex = 0, 0\n", + "\n", + " start = product_tuples[0][0]\n", + " end = product_tuples[0][1]\n", + " # get technical bounds for the unit output from the unit\n", + " min_power, max_power = unit.calculate_min_max_power(start, end)\n", + " min_power = min_power[start]\n", + " max_power = max_power[start]\n", + "\n", + " # =============================================================================\n", + " # 1. Get the Observations, which are the basis of the action decision\n", + " # =============================================================================\n", + " next_observation = self.create_observation(\n", + " unit=unit,\n", + " start=start,\n", + " end=end,\n", + " )\n", + "\n", + " # =============================================================================\n", + " # 2. Get the Actions, based on the observations\n", + " # =============================================================================\n", + " actions, noise = self.get_actions(next_observation)\n", + "\n", + " bids = actions\n", + "\n", + "\n", + " return bids" + ], + "metadata": { + "id": "iApbQsg5x_u2" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "#magic to enable class definitions across colab cells\n", + "%%add_to RLStrategy\n", + "def calculate_reward(\n", + " self,\n", + " unit,\n", + " marketconfig: MarketConfig,\n", + " orderbook: Orderbook,\n", + "):\n", + " \"\"\"\n", + " Calculate reward\n", + "\n", + " :param unit: Unit to calculate reward for\n", + " :type unit: SupportsMinMax\n", + " :param marketconfig: Market configuration\n", + " :type marketconfig: MarketConfig\n", + " :param orderbook: Orderbook\n", + " :type orderbook: Orderbook\n", + " \"\"\"\n", + "\n", + " return None" + ], + "metadata": { + "id": "_4cJ8Y8uvMgV" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## 3.1 Get an observation\n", + "\n", + "The decision about the observations received by each agent plays a crucial role when designing a multi-agent RL setup. The following describes the task of learning agents representing profit-maximizing electricity market participants who either sell a generating unit's output or optimize a storage unit's operation. They are represented through their plants' techno-economic parameters, such as minimal operational capacity $P^{min}$, start-up $c^{su}$, and shut-down $c^{sd}$ costs. This information is all know by the unit istself and, hence, also accessible in the bidding strategy.\n", + "\n", + "During the training phase, the centralized critic receives observations from all agents, resulting in an input size that grows linearly with the number of agents. This can lead to unstable training behavior of the critic networks, which limits the maximal number of agents in the simulation. This effect is known as the dimensionality curse, which likely contributed to the small number of learning agents in existing approaches. To address the dimensionality curse, we use a single observation that is the same for all agents and added a small size of unique observations for each agent to improve their performance. This modification allows the use of only one observation in the centralized critic, decoupled from the number of learning agents, significantly reducing the observation size and enabling simultaneous training of hundreds of learning agents with stable training behavior. The only limiting factor is the available working memory.\n", + "\n", + "At time-step $t$, agent $i$ receives the observation $o_{i,t}$ consisting of vectors $[L_{\\mathrm{h},t}, L_{\\mathrm{f},t}, M_{\\mathrm{h},t}, M_{\\mathrm{f},t}, mc_{i,t}]$. Here $L_{\\mathrm{h},t}, L_{\\mathrm{f},t}$ and $M_{\\mathrm{h},t}, M_{\\mathrm{f},t}$ are the past and the forecast residual loads and market prices, respectively. These information stems from the world, where a overall forecasting role generates them. The price forecast is calculated ahead of the simulation run using a simple merit order model based on the residual load forecast and the marginal cost of power plants. This part of the observation is the same for all agents. In addition, each agent receives its current marginal cost $mc_{i,t}$. Information about the marginal cost is shared with a centralized critic during the training phase. Still, it is not shared with other agents during the execution phase. All the inputs are normalized to improve the performance of the training process.\n" + ], + "metadata": { + "id": "Jgjx14997Y9s" + } + }, + { + "cell_type": "markdown", + "source": [ + "#### **Task 3.1**\n", + "**Goal**: With the help of the *unit*, the *starttime* and the *endtime* we want to create the Observations for the unit.\n", + "\n", + "There are 4 different observations:\n", + "- residual load forecast\n", + "- price forecast\n", + "- total capacity of the unit\n", + "- marginal costs of the unit\n", + "\n", + "For all observations we need scaling factors. Why do you think it is important to scale the input? How would you define the scaling factors?" + ], + "metadata": { + "id": "PngYyvs72UxB" + } + }, + { + "cell_type": "code", + "source": [ + "#magic to enable class definitions across colab cells\n", + "%%add_to RLStrategy\n", + "def create_observation(\n", + " self,\n", + " unit: SupportsMinMax,\n", + " start: datetime,\n", + " end: datetime,\n", + "):\n", + " \"\"\"\n", + " Create observation\n", + "\n", + " :param unit: Unit to create observation for\n", + " :type unit: SupportsMinMax\n", + " :param start: Start time\n", + " :type start: datetime\n", + " :param end: End time\n", + " :type end: datetime\n", + " :return: Observation\n", + " :rtype: torch.Tensor\"\"\"\n", + " end_excl = end - unit.index.freq\n", + "\n", + " # get the forecast length depending on the tme unit considered in the modelled unit\n", + " forecast_len = pd.Timedelta((self.foresight - 1) * unit.index.freq)\n", + "\n", + " # =============================================================================\n", + " # 1.1 Get the Observations, which are the basis of the action decision\n", + " # =============================================================================\n", + " scaling_factor_res_load = #TODO\n", + "\n", + " # price forecast\n", + " scaling_factor_price = #TODO\n", + "\n", + " # total capacity and marginal cost\n", + " scaling_factor_total_capacity = #TODO\n", + "\n", + " # marginal cost\n", + " # Obs[2*foresight+1:2*foresight+2]\n", + " scaling_factor_marginal_cost = #TODO\n", + "\n", + " # checks if we are at end of simulation horizon, since we need to change the forecast then\n", + " # for residual load and price forecast and scale them\n", + " if end_excl + forecast_len > unit.forecaster[\"residual_load_EOM\"].index[-1]:\n", + " scaled_res_load_forecast = (\n", + " unit.forecaster[\"residual_load_EOM\"].loc[start:].values\n", + " / scaling_factor_res_load\n", + " )\n", + " scaled_res_load_forecast = np.concatenate(\n", + " [\n", + " scaled_res_load_forecast,\n", + " unit.forecaster[\"residual_load_EOM\"].iloc[\n", + " : self.foresight - len(scaled_res_load_forecast)\n", + " ],\n", + " ]\n", + " )\n", + "\n", + " else:\n", + " scaled_res_load_forecast = (\n", + " unit.forecaster[\"residual_load_EOM\"]\n", + " .loc[start : end_excl + forecast_len]\n", + " .values\n", + " / scaling_factor_res_load\n", + " )\n", + "\n", + " if end_excl + forecast_len > unit.forecaster[\"price_EOM\"].index[-1]:\n", + " scaled_price_forecast = (\n", + " unit.forecaster[\"price_EOM\"].loc[start:].values / scaling_factor_price\n", + " )\n", + " scaled_price_forecast = np.concatenate(\n", + " [\n", + " scaled_price_forecast,\n", + " unit.forecaster[\"price_EOM\"].iloc[\n", + " : self.foresight - len(scaled_price_forecast)\n", + " ],\n", + " ]\n", + " )\n", + "\n", + " else:\n", + " scaled_price_forecast = (\n", + " unit.forecaster[\"price_EOM\"].loc[start : end_excl + forecast_len].values\n", + " / scaling_factor_price\n", + " )\n", + "\n", + " # get last accapted bid volume and the current marginal costs of the unit\n", + " current_volume = unit.get_output_before(start)\n", + " current_costs = unit.calc_marginal_cost_with_partial_eff(current_volume, start)\n", + "\n", + " # scale unit outpus\n", + " scaled_total_capacity = current_volume / scaling_factor_total_capacity\n", + " scaled_marginal_cost = current_costs / scaling_factor_marginal_cost\n", + "\n", + " # concat all obsverations into one array\n", + " observation = np.concatenate(\n", + " [\n", + " scaled_res_load_forecast,\n", + " scaled_price_forecast,\n", + " np.array([scaled_total_capacity, scaled_marginal_cost]),\n", + " ]\n", + " )\n", + "\n", + " # transfer arry to GPU for NN processing\n", + " observation = (\n", + " th.tensor(observation, dtype=self.float_type)\n", + " .to(self.device, non_blocking=True)\n", + " .view(-1)\n", + " )\n", + "\n", + " return observation.detach().clone()" + ], + "metadata": { + "id": "0ww-L9fABnw3" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Solution 3.1**\n", + "\n", + "First why do we scale?\n", + "\n", + "Scaling observations is a crucial preprocessing step in machine learning, including reinforcement learning. It involves transforming the features so that they all fall within a similar numerical range. This is important for several reasons. Firstly, it aids in numerical stability during training. Large input values can lead to numerical precision issues, potentially causing the algorithm to perform poorly or even fail to converge. By scaling the features, we mitigate this risk, ensuring a more stable and reliable learning process.\n", + "\n", + "Additionally, scaling promotes uniformity in the learning process. Many optimization algorithms, like gradient descent, adjust model parameters based on the magnitude of gradients. When features have vastly different scales, some may dominate the learning process, while others receive less attention. This imbalance can hinder convergence and result in a suboptimal model. Scaling addresses this issue, allowing the algorithm to treat all features equally and progress more efficiently towards an optimal solution. This not only expedites the learning process but also enhances the model's ability to generalize to new, unseen data. In essence, scaling observations is a fundamental practice that enhances the performance and robustness of machine learning models across a wide array of applications.\n", + "\n", + "According to this the scaling should ensure a similar range for all input parameteres. You can achieve that by chosing the following scaling factors. If you add new observations, choose your scaling factors wisely." + ], + "metadata": { + "id": "kDYKZGERKJ6V" + } + }, + { + "cell_type": "code", + "source": [ + "\"\"\"\n", + "#scaling factors for all observations\n", + "#residual load forecast\n", + "scaling_factor_res_load = self.max_demand\n", + "\n", + "# price forecast\n", + "scaling_factor_price = self.max_bid_price\n", + "\n", + "# total capacity\n", + "scaling_factor_total_capacity = unit.max_power\n", + "\n", + "# marginal cost\n", + "scaling_factor_marginal_cost = self.max_bid_price\n", + "\"\"\"" + ], + "metadata": { + "id": "PYoI3ncSKJSX", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 70 + }, + "outputId": "4b4341d7-5a21-49c4-ee25-b8c55f693cd1" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'\\n#scaling factors for all observations\\n#residual load forecast\\nscaling_factor_res_load = self.max_demand\\n\\n# price forecast\\nscaling_factor_price = self.max_bid_price\\n\\n# total capacity\\nscaling_factor_total_capacity = unit.max_power\\n\\n# marginal cost\\nscaling_factor_marginal_cost = self.max_bid_price\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 13 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## 3.2 Choose an action\n", + "\n", + "To differentiate between the inflexible and flexible parts of a plant's generation capacity, we split the bids into two parts. The first bid part allows agents to bid a very low or even negative price for the inflexible capacity; this reflects the agent's motivation to stay infra-marginal during periods of very low net load (e.g., in periods of high solar and wind power generation) to avoid the cost of a shut-down and subsequent start-up of the plant. The flexible part of the capacity can be offered at a higher price to provide chances for higher profits. The actions of agent $i$ at time-step $t$ are defined as $a_{i,t} = [ep^\\mathrm{inflex}_{i,t}, ep^\\mathrm{flex}_{i,t}] \\in [ep^{min},ep^{max}]$, where $ep^\\mathrm{inflex}_{i,t}$ and $ep^\\mathrm{flex}_{i,t}$ are bid prices for the inflexible and flexible capacities, and $ep^{min},ep^{max}$ are minimal and maximal bid prices, respectively.\n", + "\n", + "How do we learn, how to make good decisions? Basically by try and error, also know as **exploration**. Exploration is a fundamental concept in reinforcement learning, representing the strategy by which an agent interacts with its environment to gather information about the consequences of its actions. This is crucial because without exploration, the agent might settle for suboptimal policies based on its initial knowledge, limiting its ability to discover more rewarding states or actions.\n", + "\n", + "In the initial stages of training, also often called initial exploration, it's imperative to employ almost random actions. This means having the agent take actions purely by chance. This seemingly counterintuitive approach serves a critical purpose. Initially, the agent lacks any meaningful information about the environment, making it impossible to make informed decisions. By taking random actions, it can quickly gather a broad range of experiences, allowing it to grasp the fundamental structure of the environment. These random actions serve as a kind of \"baseline exploration,\" providing a starting point from which the agent can refine its policy through learning. With our domain knowledge we can even guide the initial exploration process, to enhance learning capabilities.\n", + "\n", + "\n", + "Following up on these concepts the following tasks will:\n", + "1. obtain the action values from the neurnal net in the bidding staretgy and\n", + "2. then transform theses values into the actual bids of an order. \n", + "\n" + ], + "metadata": { + "id": "rW_1op6fCTV-" + } + }, + { + "cell_type": "markdown", + "source": [ + "#### **Task 3.2.1**\n", + "**Goal**: With the observations and noise we generate actions\n", + "\n", + "In the following task we define the actions for the initial exploration mode. As described before we can guide it by not letting it choose random actions but defining a base-bid on which we add a good amount of noise. In this way the initial strategy starts from a solution that we know works somewhat well. Define the respective base bid in the followin code. Remeber we are defining bids for a conventional power plant bidding in an Energy-Only-Market with a uniform pricing auction. " + ], + "metadata": { + "id": "Cho84Pqs2N2G" + } + }, + { + "cell_type": "code", + "source": [ + "#magic to enable class definitions across colab cells\n", + "%%add_to RLStrategy\n", + "def get_actions(self, next_observation):\n", + " \"\"\"\n", + " Get actions\n", + "\n", + " :param next_observation: Next observation\n", + " :type next_observation: torch.Tensor\n", + " :return: Actions\n", + " :rtype: torch.Tensor\n", + " \"\"\"\n", + "\n", + " # distinction whetere we are in learning mode or not to handle exploration realised with noise\n", + " if self.learning_mode:\n", + " # if we are in learning mode the first x episodes we want to explore the entire action space\n", + " # to get a good initial experience, in the area around the costs of the agent\n", + " if self.collect_initial_experience_mode:\n", + " # define current action as soley noise\n", + " noise = (\n", + " th.normal(\n", + " mean=0.0, std=0.2, size=(1, self.act_dim), dtype=self.float_type\n", + " )\n", + " .to(self.device)\n", + " .squeeze()\n", + " )\n", + "\n", + " # =============================================================================\n", + " # 2.1 Get Actions and handle exploration\n", + " # =============================================================================\n", + " #==> YOUR CODE HERE\n", + " base_bid = #TODO\n", + "\n", + " # add niose to the last dimension of the observation\n", + " # needs to be adjusted if observation space is changed, because only makes sense\n", + " # if the last dimension of the observation space are the marginal cost\n", + " curr_action = noise + base_bid.clone().detach()\n", + "\n", + " else:\n", + " # if we are not in the initial exploration phase we chose the action with the actor neuronal net\n", + " # and add noise to the action\n", + " curr_action = self.actor(next_observation).detach()\n", + " noise = th.tensor(\n", + " self.action_noise.noise(), device=self.device, dtype=self.float_type\n", + " )\n", + " curr_action += noise\n", + " else:\n", + " # if we are not in learning mode we just use the actor neuronal net to get the action without adding noise\n", + "\n", + " curr_action = self.actor(next_observation).detach()\n", + " noise = tuple(0 for _ in range(self.act_dim))\n", + "\n", + " curr_action = curr_action.clamp(-1, 1)\n", + "\n", + " return curr_action, noise\n" + ], + "metadata": { + "id": "8ehlm5Z9CbRw" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Solution 3.2.1**\n", + "\n", + "So how do we define the base bid?\n", + "\n", + "Assuming the described auction is a efficient market with full information and competition, we know that bidding the marginal costs of the power plant is the economically best bid. With the RL strategy we can recreate the abuse of market power and incomplete information, which enables us to model different market settings. Yet, starting of with the theoretically styleized optimal solution guides our RL agents porperly. As the marginal costs of the power plant are part of the oberservations we can define the base bid in the following way. " + ], + "metadata": { + "id": "OTaqkwV3xcf6" + } + }, + { + "cell_type": "code", + "source": [ + "\"\"\"\n", + "#base_bid = marginal costs\n", + "base_bid = next_observation[-1] # = marginal_costs\n", + "\"\"\"" + ], + "metadata": { + "id": "rfXJBGOKxbk7", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "outputId": "06f76c52-e215-4998-8f61-f7492b880e4d" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'\\n#base_bid = marginal costs\\nbase_bid = next_observation[-1] # = marginal_costs\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 15 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Task 3.2.2**\n", + "**Goal**: Define the actual bids with the outputs of the actors\n", + "\n", + "Similarly to every other output of a neuronal network, the actions are in the range of 0-1. These values need to be translated into the actual bids $a_{i,t} = [ep^\\mathrm{inflex}_{i,t}, ep^\\mathrm{flex}_{i,t}] \\in [ep^{min},ep^{max}]$. This can be done in a way that further helps the RL agent to learn, if we put some thought into.\n", + "\n", + "For this we go back into the calculate_bids() function and instead of just defining bids=actions, which was just a place holder, we actually make them into bids. Think about a smart way to transform them and fill the gaps in the following code. Remember:\n", + "\n", + " - *bid_quantity_inflex* represent the inflexible part of the bid. This represents the minimum run capacity of the unit.\n", + " - *bid_quantity_flex* represent the flexible part of the bid. This represents the flexible capacity of the unit." + ], + "metadata": { + "id": "B5Hgh88Vz0wD" + } + }, + { + "cell_type": "code", + "source": [ + "#magic to enable class definitions across colab cells\n", + "%%add_to RLStrategy\n", + "def calculate_bids(\n", + " self,\n", + " unit: SupportsMinMax,\n", + " market_config: MarketConfig,\n", + " product_tuples: list[Product],\n", + " **kwargs,\n", + ") -> Orderbook:\n", + " \"\"\"\n", + " Calculate bids for a unit\n", + "\n", + " :param unit: Unit to calculate bids for\n", + " :type unit: SupportsMinMax\n", + " :param market_config: Market configuration\n", + " :type market_config: MarketConfig\n", + " :param product_tuples: Product tuples\n", + " :type product_tuples: list[Product]\n", + " :return: Bids containing start time, end time, price and volume\n", + " :rtype: Orderbook\n", + " \"\"\"\n", + "\n", + " bid_quantity_inflex, bid_price_inflex = 0, 0\n", + " bid_quantity_flex, bid_price_flex = 0, 0\n", + "\n", + " start = product_tuples[0][0]\n", + " end = product_tuples[0][1]\n", + " # get technical bounds for the unit output from the unit\n", + " min_power, max_power = unit.calculate_min_max_power(start, end)\n", + " min_power = min_power[start]\n", + " max_power = max_power[start]\n", + "\n", + " # =============================================================================\n", + " # 1. Get the Observations, which are the basis of the action decision\n", + " # =============================================================================\n", + " next_observation = self.create_observation(\n", + " unit=unit,\n", + " start=start,\n", + " end=end,\n", + " )\n", + "\n", + " # =============================================================================\n", + " # 2. Get the Actions, based on the observations\n", + " # =============================================================================\n", + " actions, noise = self.get_actions(next_observation)\n", + "\n", + " bids = actions\n", + "\n", + " # =============================================================================\n", + " # 3.2 Transform Actions into bids\n", + " # =============================================================================\n", + " #==> YOUR CODE HERE\n", + " # actions are in the range [0,1], we need to transform them into actual bids\n", + " # we can use our domain knowledge to guide the bid formulation\n", + " bid_prices = actions * self.max_bid_price\n", + "\n", + " # 3.1 formulate the bids for Pmin\n", + " # Pmin, the minium run capacity is the inflexible part of the bid, which should always be accepted\n", + " bid_quantity_inflex = min_power\n", + " bid_price_inflex = #TODO\n", + "\n", + " # 3.1 formulate the bids for Pmax - Pmin\n", + " # Pmin, the minium run capacity is the inflexible part of the bid, which should always be accepted\n", + " bid_quantity_flex = max_power - bid_quantity_inflex\n", + " bid_price_flex = #TODO\n", + "\n", + " # actually formulate bids in orderbook format\n", + " bids = [\n", + " {\n", + " \"start_time\": start,\n", + " \"end_time\": end,\n", + " \"only_hours\": None,\n", + " \"price\": bid_price_inflex,\n", + " \"volume\": bid_quantity_inflex,\n", + " },\n", + " {\n", + " \"start_time\": start,\n", + " \"end_time\": end,\n", + " \"only_hours\": None,\n", + " \"price\": bid_price_flex,\n", + " \"volume\": bid_quantity_flex,\n", + " },\n", + " ]\n", + "\n", + " # store results in unit outputs which are written to database by unit operator\n", + " unit.outputs[\"rl_observations\"][start] = next_observation\n", + " unit.outputs[\"rl_actions\"][start] = actions\n", + " unit.outputs[\"rl_exploration_noise\"][start] = noise\n", + "\n", + " return bids" + ], + "metadata": { + "id": "Y81HzlkjNHJ0" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Solution 3.2.2**\n", + "\n", + "So how do we define the actual bid from the action?\n", + "\n", + "We have the bid price for the minimum power (inflex) and the rest of the power. As the power plant needs to run at minimal the minum power in order to offer generation in general, it makes sense to offer this generation at a lower price than the rest of the power. Hence, we can alocate the actions to the bid prices in the following way. In addition, the actions need to be rescaled of course.\n" + ], + "metadata": { + "id": "3n-kJeOFCfRB" + } + }, + { + "cell_type": "code", + "source": [ + "\"\"\"\n", + "#calculate actual bids\n", + "#rescale actions to actual prices\n", + "bid_prices = actions * self.max_bid_price\n", + "\n", + "#calculate inflexible part of the bid\n", + "bid_quantity_inflex = min_power\n", + "bid_price_inflex = min(bid_prices)\n", + "\n", + "#calculate flexible part of the bid\n", + "bid_quantity_flex = max_power - bid_quantity_inflex\n", + "bid_price_flex = max(bid_prices)\n", + "\"\"\"" + ], + "metadata": { + "id": "wB7X-pFkCje3", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 70 + }, + "outputId": "ff905a9d-e3f2-4487-9e8a-9dbf4e855ab7" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'\\n#calculate actual bids\\n#rescale actions to actual prices\\nbid_prices = actions * self.max_bid_price\\n\\n#calculate inflexible part of the bid\\nbid_quantity_inflex = min_power\\nbid_price_inflex = min(bid_prices)\\n\\n#calculate flexible part of the bid\\nbid_quantity_flex = max_power - bid_quantity_inflex\\nbid_price_flex = max(bid_prices)\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 17 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## 3.3 Get a reward\n", + "This step is done in the *calculate_reward*()-function, which is called after the market is cleared and we get the market feedback, so we can calculate the profit. In RL, the design of a reward function is as important as the choice of the correct algorithm. During the initial phase of the work, pure economic reward in the form of the agent's profit was used. Typically, electricity market models consider only a single restart cost. Still, in the case of using RL, the split into shut-down and start-up costs allow the agents to better differentiate between these two events and learn a better policy.\n", + "\n", + "\n", + "\\begin{equation}\n", + "\\pi_{i,t} =\n", + "\t\\begin{cases}\n", + "\t P^\\text{conf}_{i,t} (M_t - mc_{i,t}) dt - c^{su}_i & \\text{if $P^\\text{conf}_{i,t}$ $\\geq P^{min}_i$} \\\\\n", + " & \\text{and $P_{i,t-1}$ $= 0$} \\\\\n", + "\t P^\\text{conf}_{i,t} (M_t - mc_{i,t}) dt & \\text{if $P^\\text{conf}_{i,t}$ $\\geq P^{min}_i$} \\\\\n", + " & \\text{and $P_{i,t-1}$ $\\neq 0$} \\\\\n", + "\t - c^{sd}_i & \\text{if $P^\\text{conf}_{i,t}$ $\\leq P^{min}_i$} \\\\\n", + " & \\text{and $P_{i,t-1}$ $\\neq 0$} \\\\\n", + " 0 & \\text{otherwise} \\\\\n", + "\t \\end{cases}\n", + "\\end{equation}\n", + "\n", + "\n", + "In this equation, $P^\\text{conf}$ is the confirmed capacity on the market, $P^{min}$ --- minimal stable capacity, $M$ --- market clearing price, $mc$ --- marginal generation cost, $dt$ --- market time resolution, $c^{su}, c^{sd}$ --- start-up and shut-down costs, respectively.\n", + "\n", + "The profit-driven reward function was sufficient for a few agents, but the learning performance decreased significantly with more agents. Therefore, we add an additional regret term $cm$." + ], + "metadata": { + "id": "hr15xKuGCkbn" + } + }, + { + "cell_type": "markdown", + "source": [ + "#### **Task 3.3**\n", + "**Goal**: Define the reward guiding the learning process of the agent.\n", + "\n", + "As the reward plays such a crucial role in the learning think of ways how to integrate further signals exceeding the monetary profit. One example could be integrating a regret term, namely the opportunity costs. Your task is to define the rewrad using the opportunity costs and to scale it." + ], + "metadata": { + "id": "aGyaOUgo3Y8Q" + } + }, + { + "cell_type": "code", + "source": [ + "#magic to enable class definitions across colab cells\n", + "%%add_to RLStrategy\n", + "def calculate_reward(\n", + " self,\n", + " unit,\n", + " marketconfig: MarketConfig,\n", + " orderbook: Orderbook,\n", + " ):\n", + " \"\"\"\n", + " Calculate reward\n", + "\n", + " :param unit: Unit to calculate reward for\n", + " :type unit: SupportsMinMax\n", + " :param marketconfig: Market configuration\n", + " :type marketconfig: MarketConfig\n", + " :param orderbook: Orderbook\n", + " :type orderbook: Orderbook\n", + " \"\"\"\n", + "\n", + " # =============================================================================\n", + " # 3. Calculate Reward\n", + " # =============================================================================\n", + " # function is called after the market is cleared and we get the market feedback,\n", + " # so we can calculate the profit\n", + "\n", + " product_type = marketconfig.product_type\n", + "\n", + " profit = 0\n", + " reward = 0\n", + " opportunity_cost = 0\n", + "\n", + " # iterate over all orders in the orderbook, to calculate order specific profit\n", + " for order in orderbook:\n", + " start = order[\"start_time\"]\n", + " end = order[\"end_time\"]\n", + " end_excl = end - unit.index.freq\n", + "\n", + " # depending on way the unit calaculates marginal costs we take costs\n", + " if unit.marginal_cost is not None:\n", + " marginal_cost = (\n", + " unit.marginal_cost[start]\n", + " if len(unit.marginal_cost) > 1\n", + " else unit.marginal_cost\n", + " )\n", + " else:\n", + " marginal_cost = unit.calc_marginal_cost_with_partial_eff(\n", + " power_output=unit.outputs[product_type].loc[start:end_excl],\n", + " timestep=start,\n", + " )\n", + "\n", + " duration = (end - start) / timedelta(hours=1)\n", + "\n", + " # calculate profit as income - running_cost from this event\n", + " price_difference = order[\"accepted_price\"] - marginal_cost\n", + " order_profit = price_difference * order[\"accepted_volume\"] * duration\n", + "\n", + " # calculate opportunity cost\n", + " # as the loss of income we have because we are not running at full power\n", + " order_opportunity_cost = (\n", + " price_difference\n", + " * (\n", + " unit.max_power - unit.outputs[product_type].loc[start:end_excl]\n", + " ).sum()\n", + " * duration\n", + " )\n", + "\n", + " # if our opportunity costs are negative, we did not miss an opportunity to earn money and we set them to 0\n", + " order_opportunity_cost = max(order_opportunity_cost, 0)\n", + "\n", + " # collect profit and opportunity cost for all orders\n", + " opportunity_cost += order_opportunity_cost\n", + " profit += order_profit\n", + "\n", + " # consideration of start-up costs, which are evenly divided between the\n", + " # upward and downward regulation events\n", + " if (\n", + " unit.outputs[product_type].loc[start] != 0\n", + " and unit.outputs[product_type].loc[start - unit.index.freq] == 0\n", + " ):\n", + " profit = profit - unit.hot_start_cost / 2\n", + " elif (\n", + " unit.outputs[product_type].loc[start] == 0\n", + " and unit.outputs[product_type].loc[start - unit.index.freq] != 0\n", + " ):\n", + " profit = profit - unit.hot_start_cost / 2\n", + "\n", + " # =============================================================================\n", + " # =============================================================================\n", + " # ==> YOUR CODE HERE\n", + " # The straight forward implemntation would be reward = profit, yet we would like to give the agent more guidance\n", + " # in the learning process, so we add a regret term to the reward, which is the opportunity cost\n", + " # define the reward and scale it\n", + "\n", + " scaling = #TODO\n", + " regret_scale = #TODO\n", + " reward = #TODO\n", + "\n", + " # store results in unit outputs which are written to database by unit operator\n", + " unit.outputs[\"profit\"].loc[start:end_excl] += profit\n", + " unit.outputs[\"reward\"].loc[start:end_excl] = reward\n", + " unit.outputs[\"regret\"].loc[start:end_excl] = opportunity_cost\n" + ], + "metadata": { + "id": "U9HX41mODuBU" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Solution 3.3**\n", + "\n", + "So how do we define the actual reward?\n", + "\n", + "We use the opportunity costs for further guidance, which quantify the expected contribution margin, as defined by the following equation, with $P^{max}$ as the maximal available capacity.\n", + "\n", + "\\begin{equation}\n", + " cm_{i,t} = \\max[(P^{max}_i - P^\\text{conf}_{i,t}) (M_t - mc_{i,t}) dt, 0]\n", + "\\end{equation}\n", + "\n", + "The regret term gives a negative signal to the agent when there is opportunity cost due to the unsold capacity, thus correcting the agent's actions. This term also introduces an increased influence of the competition between agents in learning. By minimizing the regret, the agents drive the bid prices closer to the marginal generation cost, which drives the market price down.\n", + "\n", + "The reward of agent $i$ at time-step $t$ is defined by the equation below.\n", + "\n", + "\\begin{equation}\n", + " R_{i,t} = \\pi_{i,t} + \\beta cm_{i,t}\n", + "\\end{equation}\n", + "\n", + "Here, $\\beta$ is the regret scaling factor to adjust the ratio between profit-maximizing and regret-minimizing learning.\n", + "\n", + "The described reward function has proven to perform well even with many agents and to accelerate learning convergence. This is because minimizing the regret term drives the overall system to equilibrium. At a point close to the equilibrium point, the average reward of all agents would converge to a constant value since further policy changes would not lead to an additional reduction in regrets or an increase in profits. Therefore, the average reward value can also be a good indicator of learning performance and convergence." + ], + "metadata": { + "id": "gWF7D4QA2-kz" + } + }, + { + "cell_type": "code", + "source": [ + "\"\"\"\n", + "scaling = 0.1 / unit.max_power\n", + "regret_scale = 0.2\n", + "reward = float(profit - regret_scale * opportunity_cost) * scaling\n", + "\"\"\"" + ], + "metadata": { + "id": "e1XdVXPSCo_k", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 52 + }, + "outputId": "585d94a5-7475-4e96-d0a1-5e82b711c6a5" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'\\nscaling = 0.1 / unit.max_power\\nregret_scale = 0.2\\nreward = float(profit - regret_scale * opportunity_cost) * scaling\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 19 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## 3.4 Start the simulation\n", + "\n", + "We are almost done with all the changes to actually be able to make ASSUME learn here in google colab. If you would rather like to load our pretrained strategies, we need a function for loading parameters, which can be found below. \n", + "\n" + ], + "metadata": { + "id": "L3flH5iY4x7Z" + } + }, + { + "cell_type": "code", + "source": [ + "#magic to enable class definitions across colab cells\n", + "%%add_to RLStrategy\n", + "def load_actor_params(self, load_path):\n", + " \"\"\"\n", + " Load actor parameters\n", + "\n", + " :param simulation_id: Simulation ID\n", + " :type simulation_id: str\n", + " \"\"\"\n", + " directory = f\"{load_path}/actors/actor_{self.unit_id}.pt\"\n", + "\n", + " params = th.load(directory, map_location=self.device)\n", + "\n", + " self.actor = Actor(self.obs_dim, self.act_dim, self.float_type)\n", + " self.actor.load_state_dict(params[\"actor\"])\n", + "\n", + " if self.learning_mode:\n", + " self.actor_target = Actor(self.obs_dim, self.act_dim, self.float_type)\n", + " self.actor_target.load_state_dict(params[\"actor_target\"])\n", + " self.actor_target.eval()\n", + " self.actor.optimizer.load_state_dict(params[\"actor_optimizer\"])" + ], + "metadata": { + "id": "ZwVtpK3B5gR6" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "To control the learning process, the config file determines the parameters of the learning algorithm. As we want to temper with these values in the notebook we will overwrite the learning config in the next cell and then load it into our world. " + ], + "metadata": { + "id": "cTlqMouufKyo" + } + }, + { + "cell_type": "code", + "source": [ + "learning_config = {'observation_dimension': 50,\n", + " 'action_dimension': 2,\n", + " 'continue_learning': False,\n", + " 'load_model_path': 'None',\n", + " 'max_bid_price': 100,\n", + " 'algorithm': 'matd3',\n", + " 'learning_rate': 0.001,\n", + " 'training_episodes': 100,\n", + " 'episodes_collecting_initial_experience': 5,\n", + " 'train_freq': 24,\n", + " 'gradient_steps': -1,\n", + " 'batch_size': 256,\n", + " 'gamma': 0.99,\n", + " 'device': 'cpu',\n", + " 'noise_sigma': 0.1,\n", + " 'noise_scale': 1,\n", + " 'noise_dt': 1,\n", + " 'validation_episodes_interval': 5}" + ], + "metadata": { + "id": "moZ_UD7FfkOh" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Read the YAML file\n", + "with open('assume/examples/inputs/example_02a/config.yaml', 'r') as file:\n", + " data = yaml.safe_load(file)\n", + "\n", + "#store our modifications to the config file\n", + "data['base']['learning_mode']= True\n", + "data['base']['learning_config']=learning_config\n", + "\n", + "# Write the modified data back to the file\n", + "with open('assume/examples/inputs/example_02a/config.yaml', 'w') as file:\n", + " yaml.safe_dump(data, file)" + ], + "metadata": { + "id": "iPz8v4N5hpfr" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "In order to let the simulation run with the integrated learning we need to touch up the main file that runs it in the following way." + ], + "metadata": { + "id": "ZlRnTgCy5d9W" + } + }, + { + "cell_type": "code", + "source": [ + "log = logging.getLogger(__name__)\n", + "\n", + "csv_path = \"./outputs\"\n", + "os.makedirs(\"./local_db\", exist_ok=True)\n", + "\n", + "if __name__ == \"__main__\":\n", + " \"\"\"\n", + " Available examples:\n", + " - local_db: without database and grafana\n", + " - timescale: with database and grafana (note: you need docker installed)\n", + " \"\"\"\n", + " data_format = \"local_db\" # \"local_db\" or \"timescale\"\n", + "\n", + " if data_format == \"local_db\":\n", + " db_uri = \"sqlite:///./local_db/assume_db.db\"\n", + " elif data_format == \"timescale\":\n", + " db_uri = \"postgresql://assume:assume@localhost:5432/assume\"\n", + "\n", + " input_path = \"assume/examples/inputs\"\n", + " scenario = \"example_02a\"\n", + " study_case = \"base\"\n", + "\n", + " # create world\n", + " world = World(database_uri=db_uri, export_csv_path=csv_path)\n", + "\n", + " # we import our defined bidding strategey class including the learning into the world bidding strategies\n", + " # in the example files we provided the name of the learning bidding strategeis in the input csv is \"pp_learning\"\n", + " #hence we define this strategey to be one of the learning class\n", + " world.bidding_strategies[\"pp_learning\"] = RLStrategy\n", + "\n", + " # then we load the scenario specified above from the respective input files\n", + " load_scenario_folder(\n", + " world,\n", + " inputs_path=input_path,\n", + " scenario=scenario,\n", + " study_case=study_case,\n", + " )\n", + "\n", + " # run learning if learning mode is enabled\n", + " # needed as we simulate the modelling horizon multiple times to train reinforcement learning run_learning( world, inputs_path=input_path, scenario=scenario, study_case=study_case, )\n", + "\n", + " if world.learning_config.get(\"learning_mode\", False):\n", + "\n", + " run_learning(\n", + " world,\n", + " inputs_path=input_path,\n", + " scenario=scenario,\n", + " study_case=study_case,\n", + " )\n", + "\n", + " #after the learning is done we make a normal run of the simulation, which equasl a test run\n", + " world.run()\n" + ], + "metadata": { + "id": "ZlWxXxZr54WV", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "outputId": "e30f4279-7a4e-4efc-9cfb-61416e4fe2f1" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "INFO:assume.world:connected to db\n", + "INFO:assume.common.scenario_loader:Starting Scenario example_02a/base from assume/examples/inputs\n", + "INFO:assume.common.scenario_loader:Loading input data\n", + "INFO:assume.common.scenario_loader:storage_units not found. Returning None\n", + "INFO:assume.common.scenario_loader:Adding forecast\n", + "INFO:assume.common.scenario_loader:forecasts_df not found. Returning None\n", + "INFO:assume.common.scenario_loader:Downsampling demand_df successful.\n", + "INFO:assume.common.scenario_loader:cross_border_flows not found. Returning None\n", + "INFO:assume.common.scenario_loader:availability_df not found. Returning None\n", + "INFO:assume.common.scenario_loader:electricity_prices not found. Returning None\n", + "INFO:assume.common.scenario_loader:price_forecasts not found. Returning None\n", + "INFO:assume.common.scenario_loader:temperature not found. Returning None\n", + "INFO:assume.common.scenario_loader:Adding markets\n", + "INFO:assume.common.scenario_loader:Adding unit operators\n", + "INFO:assume.common.scenario_loader:Adding power_plant units\n", + "INFO:assume.common.scenario_loader:Adding demand units\n", + "Training Episodes: 0%| | 0/100 [00:00\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 42\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mworld\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlearning_config\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"learning_mode\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 43\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 44\u001b[0;31m run_learning(\n\u001b[0m\u001b[1;32m 45\u001b[0m \u001b[0mworld\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 46\u001b[0m \u001b[0minputs_path\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minput_path\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/assume/common/scenario_loader.py\u001b[0m in \u001b[0;36mrun_learning\u001b[0;34m(world, inputs_path, scenario, study_case)\u001b[0m\n\u001b[1;32m 595\u001b[0m \u001b[0;31m# TODO normally, loading twice should not create issues, somehow a scheduling issue is raised currently\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 596\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mepisode\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 597\u001b[0;31m load_scenario_folder(\n\u001b[0m\u001b[1;32m 598\u001b[0m \u001b[0mworld\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 599\u001b[0m \u001b[0minputs_path\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/assume/common/scenario_loader.py\u001b[0m in \u001b[0;36mload_scenario_folder\u001b[0;34m(world, inputs_path, scenario, study_case, perform_learning, perform_evaluation, episode, eval_episode, trained_actors_path)\u001b[0m\n\u001b[1;32m 547\u001b[0m \u001b[0;34m:\u001b[0m\u001b[0mtype\u001b[0m \u001b[0mstudy_case\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 548\u001b[0m \"\"\"\n\u001b[0;32m--> 549\u001b[0;31m world.loop.run_until_complete(\n\u001b[0m\u001b[1;32m 550\u001b[0m load_scenario_folder_async(\n\u001b[1;32m 551\u001b[0m \u001b[0mworld\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mworld\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/nest_asyncio.py\u001b[0m in \u001b[0;36mrun_until_complete\u001b[0;34m(self, future)\u001b[0m\n\u001b[1;32m 91\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_log_destroy_pending\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 92\u001b[0m \u001b[0;32mwhile\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdone\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 93\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_run_once\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 94\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_stopping\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 95\u001b[0m \u001b[0;32mbreak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/nest_asyncio.py\u001b[0m in \u001b[0;36m_run_once\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 127\u001b[0m \u001b[0mhandle\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mready\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpopleft\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 128\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mhandle\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_cancelled\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 129\u001b[0;31m \u001b[0mhandle\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_run\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 130\u001b[0m \u001b[0mhandle\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 131\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/lib/python3.10/asyncio/events.py\u001b[0m in \u001b[0;36m_run\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 78\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_run\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 79\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 80\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_context\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_callback\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_args\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 81\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mSystemExit\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mKeyboardInterrupt\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 82\u001b[0m \u001b[0;32mraise\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/mango/agent/core.py\u001b[0m in \u001b[0;36mraise_exceptions\u001b[0;34m(self, fut)\u001b[0m\n\u001b[1;32m 454\u001b[0m \u001b[0;34mf\"Agent {self.aid}: Caught the following exception in _check_inbox: {fut.exception()}\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 455\u001b[0m )\n\u001b[0;32m--> 456\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mfut\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexception\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 457\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 458\u001b[0m \u001b[0;32masync\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_check_inbox\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/assume/world.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 411\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 412\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 413\u001b[0;31m return self.loop.run_until_complete(\n\u001b[0m\u001b[1;32m 414\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0masync_run\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstart_ts\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mstart_ts\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mend_ts\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mend_ts\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 415\u001b[0m )\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/nest_asyncio.py\u001b[0m in \u001b[0;36mrun_until_complete\u001b[0;34m(self, future)\u001b[0m\n\u001b[1;32m 91\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_log_destroy_pending\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 92\u001b[0m \u001b[0;32mwhile\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdone\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 93\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_run_once\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 94\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_stopping\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 95\u001b[0m \u001b[0;32mbreak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/nest_asyncio.py\u001b[0m in \u001b[0;36m_run_once\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 127\u001b[0m \u001b[0mhandle\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mready\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpopleft\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 128\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mhandle\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_cancelled\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 129\u001b[0;31m \u001b[0mhandle\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_run\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 130\u001b[0m \u001b[0mhandle\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 131\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/lib/python3.10/asyncio/events.py\u001b[0m in \u001b[0;36m_run\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 78\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_run\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 79\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 80\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_context\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_callback\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_args\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 81\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mSystemExit\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mKeyboardInterrupt\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 82\u001b[0m \u001b[0;32mraise\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/lib/python3.10/asyncio/tasks.py\u001b[0m in \u001b[0;36m__wakeup\u001b[0;34m(self, future)\u001b[0m\n\u001b[1;32m 313\u001b[0m \u001b[0;31m# instead of `__next__()`, which is slower for futures\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 314\u001b[0m \u001b[0;31m# that return non-generator iterators from their `__iter__`.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 315\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__step\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 316\u001b[0m \u001b[0mself\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m \u001b[0;31m# Needed to break cycles when an exception occurs.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 317\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/nest_asyncio.py\u001b[0m in \u001b[0;36mstep\u001b[0;34m(task, exc)\u001b[0m\n\u001b[1;32m 203\u001b[0m \u001b[0mcurr_task\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcurr_tasks\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtask\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_loop\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 204\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 205\u001b[0;31m \u001b[0mstep_orig\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtask\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 206\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 207\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcurr_task\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/lib/python3.10/asyncio/tasks.py\u001b[0m in \u001b[0;36m__step\u001b[0;34m(***failed resolving arguments***)\u001b[0m\n\u001b[1;32m 230\u001b[0m \u001b[0;31m# We use the `send` method directly, because coroutines\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 231\u001b[0m \u001b[0;31m# don't have `__iter__` and `__next__` methods.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 232\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcoro\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 233\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 234\u001b[0m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcoro\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mthrow\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mexc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/mango/agent/core.py\u001b[0m in \u001b[0;36m_check_inbox\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 470\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"priority\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpriority\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 471\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 472\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhandle_message\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcontent\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmeta\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 473\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 474\u001b[0m \u001b[0;31m# signal to the Queue that the message is handled\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/mango/agent/role.py\u001b[0m in \u001b[0;36mhandle_message\u001b[0;34m(self, content, meta)\u001b[0m\n\u001b[1;32m 460\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 461\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mhandle_message\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mDict\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mAny\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 462\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_role_context\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhandle_message\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 463\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 464\u001b[0m \u001b[0;32masync\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mshutdown\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/mango/agent/role.py\u001b[0m in \u001b[0;36mhandle_message\u001b[0;34m(self, content, meta)\u001b[0m\n\u001b[1;32m 352\u001b[0m \u001b[0;34m:\u001b[0m\u001b[0mparam\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 353\u001b[0m \"\"\"\n\u001b[0;32m--> 354\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_role_handler\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhandle_message\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 355\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 356\u001b[0m async def send_message(\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/mango/agent/role.py\u001b[0m in \u001b[0;36mhandle_message\u001b[0;34m(self, content, meta)\u001b[0m\n\u001b[1;32m 214\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mrole\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmessage_condition\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_message_subs\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 215\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_is_role_active\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrole\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mmessage_condition\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 216\u001b[0;31m \u001b[0mmethod\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 217\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 218\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_notify_send_message_subs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mreceiver_addr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mreceiver_id\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/assume/common/units_operator.py\u001b[0m in \u001b[0;36mhandle_market_feedback\u001b[0;34m(self, content, meta)\u001b[0m\n\u001b[1;32m 178\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mset_unit_dispatch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0morderbook\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmarketconfig\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 179\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwrite_learning_params\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0morderbook\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmarketconfig\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 180\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwrite_actual_dispatch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 181\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 182\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mset_unit_dispatch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0morderbook\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mOrderbook\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmarketconfig\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mMarketConfig\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/assume/common/units_operator.py\u001b[0m in \u001b[0;36mwrite_actual_dispatch\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 220\u001b[0m \u001b[0munit_dispatch_dfs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 221\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0munit_id\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0munit\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munits\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 222\u001b[0;31m \u001b[0mcurrent_dispatch\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0munit\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute_current_dispatch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstart\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnow\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 223\u001b[0m \u001b[0mend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnow\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 224\u001b[0m \u001b[0mcurrent_dispatch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"power\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/assume/units/powerplant.py\u001b[0m in \u001b[0;36mexecute_current_dispatch\u001b[0;34m(self, start, end)\u001b[0m\n\u001b[1;32m 178\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moutputs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"energy\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 179\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 180\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moutputs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"energy\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstart\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mend\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 181\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 182\u001b[0m def calc_simple_marginal_cost(\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1151\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1152\u001b[0m \u001b[0mmaybe_callable\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply_if_callable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1153\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmaybe_callable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1154\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1155\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_is_scalar_access\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_getitem_axis\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 1370\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1371\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mslice\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1372\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_validate_key\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1373\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_slice_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1374\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mcom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_bool_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_validate_key\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 1186\u001b[0m \u001b[0;31m# Key Checks\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1187\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1188\u001b[0;31m \u001b[0;34m@\u001b[0m\u001b[0mdoc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0m_LocationIndexer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_validate_key\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1189\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_validate_key\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mAxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1190\u001b[0m \u001b[0;31m# valid for a collection of labels (we check their presence later)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mKeyboardInterrupt\u001b[0m: " + ] + } + ] + } + ] +} \ No newline at end of file From 6fd2b0db953ec52ac1a0c911f756fb948d8661aa Mon Sep 17 00:00:00 2001 From: Nick Harder Date: Mon, 4 Dec 2023 10:57:23 +0100 Subject: [PATCH 2/4] -add rl example as notebook --- 04_Reinforcement_learning_example.ipynb | 2170 ----------------- README.md | 2 +- docs/source/conf.py | 1 + .../04_Reinforcement_learning_example.nblink | 1 + ...nforcement_learning_example.nblink.license | 0 docs/source/examples_basic.rst | 3 +- .../notebooks/01_minimal_manual_example.ipynb | 898 +++---- ..._minimal_manual_example.ipynb copy.license | 3 + .../04_Reinforcement_learning_example.ipynb | 1345 ++++++++++ ...inforcement_learning_example.ipynb.license | 3 + 10 files changed, 1811 insertions(+), 2615 deletions(-) delete mode 100644 04_Reinforcement_learning_example.ipynb create mode 100644 docs/source/examples/04_Reinforcement_learning_example.nblink rename examples/notebooks/01_minimal_manual_example.ipynb.license => docs/source/examples/04_Reinforcement_learning_example.nblink.license (100%) create mode 100644 examples/notebooks/01_minimal_manual_example.ipynb copy.license create mode 100644 examples/notebooks/04_Reinforcement_learning_example.ipynb create mode 100644 examples/notebooks/04_Reinforcement_learning_example.ipynb.license diff --git a/04_Reinforcement_learning_example.ipynb b/04_Reinforcement_learning_example.ipynb deleted file mode 100644 index bd5cc468..00000000 --- a/04_Reinforcement_learning_example.ipynb +++ /dev/null @@ -1,2170 +0,0 @@ -{ - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [], - "toc_visible": true, - "include_colab_link": true - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } - }, - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "view-in-github", - "colab_type": "text" - }, - "source": [ - "\"Open" - ] - }, - { - "cell_type": "markdown", - "source": [ - "# Tutorial: Reinforcement Learning in ASSUME\n", - "\n", - "This tutorial will introduce users into ASSUME and its ways of using reinforcement leanring (RL). The main objective of this tutorial is to ensure participants grasp the steps required to equip a new unit with RL strategies or modify the action dimensions.\n", - "Our emphasis lies in the bidding strategy class, with less emphasis on the algorithm and role. The latter are usable as a plug and play solution in the framework. The following coding tasks will highlight the key aspects to be adjusted, as already outlined in the learning_strategies.py file.\n", - "\n", - "The outline of this tutorial is as follows. We will start with a basic summary of the implementation of reinforcement learning (RL) in ASSUME and its architectrue (1. ASSUME & Learning Basics) . If you need a refresher on RL in general, please visit our readthedocs (https://assume.readthedocs.io/en/latest/). Afterwards, we install ASSUME in this Google Colab (2. Get ASSUME running) and then we dive into the learning_strategies.py file and explain how we need to adjust conventional bidding strategies to incorporate RL (3. Make ASSUME learn).\n", - "\n", - "#### As a whole, this tutorial covers the following codding tasks:\n", - "3.1 How to define a step function in the assume framework?\n", - "\n", - "3.2 How do we get observations from the simulation framework?\n", - "\n", - "3.3 How do we define actions based on the output of the actor neuronal net considering necesarry exploration?\n", - "\n", - "3.4 How do we define the reward?" - ], - "metadata": { - "id": "4JeBorbE6FYr" - } - }, - { - "cell_type": "markdown", - "source": [ - "# 1 ASSUME & LEARNING BASICS\n", - "\n", - "ASSUME in general is intended for researchers, planners, utilities and everyone searching to understand market dynamics of energy markets. It provides an easy-to-use tool-box as a free software that can be tailored to the specific use case of the user.\n", - "\n", - "In the following figure the architecture of the framework is depicted. It can be roughly devided into two parts. On the left side of the world class the markets are located and on the right side the market participants, which are here named units. Both world are connected via the orders that market participants place on the markets. The learning capability is sketched out with the yellow classes on the right side, namely the units side.\n", - "\n", - "\n", - "\n", - "![architecture.svg]()" - ], - "metadata": { - "id": "bj2C4ElILNNv" - } - }, - { - "cell_type": "markdown", - "source": [ - "Let's focus on the bright yellow part of the architecture, namely the learning algorithm, the actor and the critic. We start with some **reinforcement learning backround**. In the current implementation of ASSUME, we model the electricity market as a partially observable Markov game, which is an extension of MDPs for multi-agent setups.\n", - "\n", - "**Multi-agent DRL** is understood as the simultaneous learning of multiple agents interacting in the same environment. The Markov game for $N$ agents consists of a set of states $S$, a set of actions $A_1, ..., A_N$, a set of observations $O_1, ..., O_N$, and a state transition function $P: S \\times A_1 \\times ... \\times A_N \\rightarrow \\mathcal{P}(S)$ dependent on the state and actions of all agents. After taking action $a_i \\in A_i$ in state $s_i \\in S$ according to a policy $\\pi_i:O_i\\rightarrow A_i$, every agent $i$ is transitioned into the new state $s'_i \\in S$. Each agent receives a reward $r_i$ according to the individual reward function $R_i$ and a private observation correlated with the state $o_i:S \\rightarrow O_i$. Like MDP, each agent $i$ learns an optimal policy $\\pi_i^*(s)$ that maximizes its expected reward.\n", - "\n", - "To enable multi-agent learning some adjustments are needed within the learning algorithm to get from the TD3 to an MATD3 algorithm. Other authors used similar tweaks to improve the TD3 into the MADDPG algorithm and derive the MA-TD3 algorithm. We'll start explaining the learning by focusing on a single agent and then extend it to multi-agent learning.\n", - "\n", - "### 1.1 Single-Agent Learning\n", - "\n", - "We use the actor-critic approach to train the learning agent. The actor-critic approach is a popular RL algorithm that uses two neural networks: an actor network and a critic network. The actor network is responsible for selecting actions, while the critic network evaluates the quality of the actions taken by the actor.\n", - "\n", - "The actor and critic networks are trained simultaneously using the actor-critic algorithm, which updates the weights of both networks at each time step. The actor-critic algorithm is a form of policy iteration, where the policy is updated based on the estimated value function, and the value function is updated based on the.\n", - "\n", - "##### **Actor**\n", - "The actor network is trained using the policy gradient method, which updates the weights of the actor network in the direction of the gradient of the expected reward with respect to the network parameters:\n", - "\n", - "$\\nabla_{\\theta} J(\\theta) = E[\\nabla_{\\theta} log \\pi_{\\theta}(a_t|s_t) * Q^{\\pi}(s_t, a_t)]$\n", - "\n", - "where $J(\\theta)$ is the expected reward, $\\theta$ are the weights of the actor network, $\\pi_{\\theta}(a_t|s_t)$ is the probability of selecting action a_t given state $s_t$, and $Q^{\\pi}(s_t, a_t)$ is the expected reward of taking action $a_t$ in state $s_t$ under policy $\\pi$.\n", - "\n", - "##### **Critic**\n", - "The critic network is trained using the temporal difference (TD) learning method, which updates the weights of the critic network based on the difference between the estimated value of the current state and the estimated value of the next state:\n", - "\n", - "$\\delta_t = r_t + \\gamma * V(s_{t+1}) - V(s_t)$\n", - "\n", - "where $\\delta_t$ is the TD error, $r_t$ is the reward obtained at time step $t$, $\\gamma$ is the discount factor, $V(s_t)$ is the estimated value of state $s_t$, and $V(s_{t+1})$ is the estimated value of the next state $s_{t+1}$.\n", - "\n", - "The weights of the critic network are updated in the direction of the gradient of the mean squared TD error:\n", - "\n", - "$\\nabla_{\\theta} L = E[(\\delta_t)^2]$\n", - "\n", - "where L is the loss function.\n", - "\n" - ], - "metadata": { - "id": "dDn1blWvPM7Z" - } - }, - { - "cell_type": "markdown", - "source": [ - "### 1.2 Multi-Agent Learning\n", - "\n", - "While in a single-agent setup, the state transition and respective reward depend only on the actions of a single agent, the state transitions and rewards depend on the actions of all learning agents in a multi-agent setup. This makes the environment non-stationary for a single agent, which violates the Markov property. Hence, the convergence guarantees of single-agent RL algorithms are no longer valid. Therefore, we utilize the framework of centralized training and decentralized execution and expand upon the MADDPG algorithm. The main idea of this approach is to use a centralized critic during the training phase, which has access to the entire state $\\textbf{S}$, and all actions $a_1, ..., a_N$, thus resolving the issue of non-stationarity, as changes in state transitions and rewards can be explained by the actions of other agents. Meanwhile, during both training and execution, the actor has access only to its local observations $o_i$ derived from the entire state $\\textbf{S}$.\n", - "\n", - "For each agent $i$, we train two centralized critics $Q_{i,θ_1,2}(S, a_1, ..., a_N)$ together with two target critic networks. Similar to TD3, the smaller value of the two critics and target action noise $a_i$,$k~$ is used to calculate the target $y_i,k$:\n", - "\n", - "$y_i,k = r_i,k + γ * min_j=1,2 Q_i,θ′_j(S′_k, a_1,k, ..., a_N,k, π′(o_i,k))$\n", - "\n", - "where $r_i,k$ is the reward obtained by agent $i$ at time step $k$, $γ$ is the discount factor, $S′_k$ is the next state of the environment, and $π′(o_i,k)$ is the target policy of agent $i$.\n", - "\n", - "The critics are trained using the mean squared Bellman error (MSBE) loss:\n", - "\n", - "$L(Q_i,θ_j) = E[(y_i,k - Q_i,θ_j(S_k, a_1,k, ..., a_N,k))^2]$\n", - "\n", - "The actor policy of each agent is updated using the deterministic policy gradient (DPG) algorithm:\n", - "\n", - "$∇_a Q_i,θ_j(S_k, a_1,k, ..., a_N,k, π(o_i,k))|a_i,k=π(o_i,k) * ∇_θ π(o_i,k)$\n", - "\n", - "The actor is updated similarly using only one critic network $Q_{θ1}$. These changes to the original DDPG algorithm allow increased stability and convergence of the TD3 algorithm. This is especially relevant when approaching a multi-agent RL setup, as discussed in the following section." - ], - "metadata": { - "id": "OMvIl2xLVi1l" - } - }, - { - "cell_type": "markdown", - "source": [ - "# 2 GET ASSUME RUNNING\n", - "Here we just install the ASSUME core package via pip. In general the instructions for an installation can be found here: https://assume.readthedocs.io/en/latest/installation.html. All the required steps are executed here and since we are working in colab the generation of a venv is not necessary. \n" - ], - "metadata": { - "id": "OeeZDtIFmmhn" - } - }, - { - "cell_type": "code", - "source": [ - "!pip install assume-framework" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "m0DaRwFA7VgW", - "outputId": "5655adad-5b7a-4fe3-9067-6b502a06136b" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Collecting assume-framework\n", - " Downloading assume_framework-0.2.0-py3-none-any.whl (112 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m112.9/112.9 kB\u001b[0m \u001b[31m1.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting mango-agents-assume<2.0.0,>=1.1.1-1 (from assume-framework)\n", - " Downloading mango_agents_assume-1.1.1.post3-py3-none-any.whl (59 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m59.1/59.1 kB\u001b[0m \u001b[31m5.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting mypy<2.0.0,>=1.1.1 (from assume-framework)\n", - " Downloading mypy-1.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.2 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.2/12.2 MB\u001b[0m \u001b[31m80.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: nest-asyncio<2.0.0,>=1.5.6 in /usr/local/lib/python3.10/dist-packages (from assume-framework) (1.5.8)\n", - "Collecting paho-mqtt<2.0.0,>=1.5.1 (from assume-framework)\n", - " Downloading paho-mqtt-1.6.1.tar.gz (99 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m99.4/99.4 kB\u001b[0m \u001b[31m10.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Collecting pandas<3.0.0,>=2.0.0 (from assume-framework)\n", - " Downloading pandas-2.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.3/12.3 MB\u001b[0m \u001b[31m85.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting psycopg2-binary<3.0.0,>=2.9.5 (from assume-framework)\n", - " Downloading psycopg2_binary-2.9.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.0/3.0 MB\u001b[0m \u001b[31m84.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting pyomo<7.0.0,>=6.6.1 (from assume-framework)\n", - " Downloading Pyomo-6.6.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.7/12.7 MB\u001b[0m \u001b[31m9.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: python-dateutil<3.0.0,>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from assume-framework) (2.8.2)\n", - "Requirement already satisfied: pyyaml<7.0,>=6.0 in /usr/local/lib/python3.10/dist-packages (from assume-framework) (6.0.1)\n", - "Requirement already satisfied: sqlalchemy<3.0.0,>=2.0.9 in /usr/local/lib/python3.10/dist-packages (from assume-framework) (2.0.22)\n", - "Requirement already satisfied: tqdm<5.0.0,>=4.64.1 in /usr/local/lib/python3.10/dist-packages (from assume-framework) (4.66.1)\n", - "Collecting dill<0.4.0,>=0.3.6 (from mango-agents-assume<2.0.0,>=1.1.1-1->assume-framework)\n", - " Downloading dill-0.3.7-py3-none-any.whl (115 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m115.3/115.3 kB\u001b[0m \u001b[31m13.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting msgspec>=0.14.2 (from mango-agents-assume<2.0.0,>=1.1.1-1->assume-framework)\n", - " Downloading msgspec-0.18.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (202 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m202.2/202.2 kB\u001b[0m \u001b[31m22.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: protobuf<4.0.0,>=3.20.3 in /usr/local/lib/python3.10/dist-packages (from mango-agents-assume<2.0.0,>=1.1.1-1->assume-framework) (3.20.3)\n", - "Requirement already satisfied: typing-extensions>=4.1.0 in /usr/local/lib/python3.10/dist-packages (from mypy<2.0.0,>=1.1.1->assume-framework) (4.5.0)\n", - "Collecting mypy-extensions>=1.0.0 (from mypy<2.0.0,>=1.1.1->assume-framework)\n", - " Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n", - "Requirement already satisfied: tomli>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from mypy<2.0.0,>=1.1.1->assume-framework) (2.0.1)\n", - "Requirement already satisfied: numpy>=1.22.4 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0.0,>=2.0.0->assume-framework) (1.23.5)\n", - "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0.0,>=2.0.0->assume-framework) (2023.3.post1)\n", - "Collecting tzdata>=2022.1 (from pandas<3.0.0,>=2.0.0->assume-framework)\n", - " Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m341.8/341.8 kB\u001b[0m \u001b[31m33.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hCollecting ply (from pyomo<7.0.0,>=6.6.1->assume-framework)\n", - " Downloading ply-3.11-py2.py3-none-any.whl (49 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.6/49.6 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil<3.0.0,>=2.8.2->assume-framework) (1.16.0)\n", - "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy<3.0.0,>=2.0.9->assume-framework) (3.0.0)\n", - "Building wheels for collected packages: paho-mqtt\n", - " Building wheel for paho-mqtt (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Created wheel for paho-mqtt: filename=paho_mqtt-1.6.1-py3-none-any.whl size=62118 sha256=46bea794d75243f95bc3a98068cd0a951731cd65c87a297d3299fef8781a9990\n", - " Stored in directory: /root/.cache/pip/wheels/8b/bb/0c/79444d1dee20324d442856979b5b519b48828b0bd3d05df84a\n", - "Successfully built paho-mqtt\n", - "Installing collected packages: ply, paho-mqtt, tzdata, pyomo, psycopg2-binary, mypy-extensions, msgspec, dill, pandas, mypy, mango-agents-assume, assume-framework\n", - " Attempting uninstall: pandas\n", - " Found existing installation: pandas 1.5.3\n", - " Uninstalling pandas-1.5.3:\n", - " Successfully uninstalled pandas-1.5.3\n", - "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", - "lida 0.0.10 requires fastapi, which is not installed.\n", - "lida 0.0.10 requires kaleido, which is not installed.\n", - "lida 0.0.10 requires python-multipart, which is not installed.\n", - "lida 0.0.10 requires uvicorn, which is not installed.\n", - "google-colab 1.0.0 requires pandas==1.5.3, but you have pandas 2.1.1 which is incompatible.\u001b[0m\u001b[31m\n", - "\u001b[0mSuccessfully installed assume-framework-0.2.0 dill-0.3.7 mango-agents-assume-1.1.1.post3 msgspec-0.18.4 mypy-1.6.1 mypy-extensions-1.0.0 paho-mqtt-1.6.1 pandas-2.1.1 ply-3.11 psycopg2-binary-2.9.9 pyomo-6.6.2 tzdata-2023.3\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "And easy like this we have ASSUME installed. Now we can let it run. Please note though that we cannot use the functionalities tied to docker and, hence, cannot access the predefined dashboards in colab. For this please install docker and ASSUME on your personal machine.\n", - "\n", - "Further we would like to access the predefined scenarios in ASSUME which are stored on the git repository. Hence, we clone the repository." - ], - "metadata": { - "id": "IIw_QIE3pY34" - } - }, - { - "cell_type": "code", - "source": [ - "!git clone https://github.com/assume-framework/assume.git" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "_5hB0uDisSsg", - "outputId": "1241881f-e090-4f26-9b02-560adfcb3a3e" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Cloning into 'assume'...\n", - "remote: Enumerating objects: 6035, done.\u001b[K\n", - "remote: Counting objects: 100% (2933/2933), done.\u001b[K\n", - "remote: Compressing objects: 100% (912/912), done.\u001b[K\n", - "remote: Total 6035 (delta 2377), reused 2236 (delta 2020), pack-reused 3102\u001b[K\n", - "Receiving objects: 100% (6035/6035), 11.58 MiB | 9.87 MiB/s, done.\n", - "Resolving deltas: 100% (4280/4280), done.\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "**Let the magic happen.** Now you can run your first ever simulation in ASSUME. The following code naviagtes to the respective assume folder and starts the simulation example example_01b using the local database here in colab." - ], - "metadata": { - "id": "Fg7DyNjLuvSb" - } - }, - { - "cell_type": "code", - "source": [ - "!cd assume && assume -s example_01b -db \"sqlite:///./examples/local_db/assume_db_example_01b.db\"" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "3eVM60Qx8SC0", - "outputId": "20434515-6e65-4d34-d44d-8c4529a46ece" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "example_01b_base 2019-01-02 23:00:00: 6% 172801.0/2678400 [00:06<03:02, 13749.57it/s]" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "# 3 MAKE ASSUME LEARN\n", - "\n", - "Now it is time to get your hands dirty and actually dive into coding in ASSUME. The main objective of this session is to ensure participants grasp the steps required to equip a new unit with RL strategies or modify the action dimensions. Our emphasis lies in the bidding strategy class, with less emphasis on the algorithm and role. Coding tasks will highlight the key aspects to be a djusted, as already outlined in the learning_strategies.py file. Subsequent\n", - "sections will present the tasks and provide the correct answers for the coding exercises.\n", - "\n", - "We start by initializing the class of our Learning Strategy. This is very cloesly related to the general strucutre of a bidding strategy.\n", - "\n", - "\n", - "**But first some imports:**" - ], - "metadata": { - "id": "zMyZhaNM7NRP" - } - }, - { - "cell_type": "code", - "source": [ - "# install jdc for some in line magic,\n", - "# that allows us defining functions of classes across different cells\n", - "\n", - "!pip install jdc" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "qoWI_agIJOE4", - "outputId": "9b40e670-bfef-4560-d6e8-61a1b29d1975" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Collecting jdc\n", - " Downloading jdc-0.0.9-py2.py3-none-any.whl (2.1 kB)\n", - "Installing collected packages: jdc\n", - "Successfully installed jdc-0.0.9\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "from datetime import datetime, timedelta\n", - "from pathlib import Path\n", - "\n", - "import numpy as np\n", - "import pandas as pd\n", - "import torch as th\n", - "import jdc\n", - "import yaml\n", - "import logging\n", - "import os\n", - "\n", - "from assume import World, load_custom_units, load_scenario_folder, run_learning\n", - "from assume.common.base import LearningStrategy, SupportsMinMax\n", - "from assume.common.market_objects import MarketConfig, Orderbook, Product\n", - "from assume.reinforcement_learning.learning_utils import Actor, NormalActionNoise" - ], - "metadata": { - "id": "xUsbeZdPJ_2Q" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "class RLStrategy(LearningStrategy):\n", - " \"\"\"\n", - " Reinforcement Learning Strategy\n", - "\n", - " :param foresight: Number of time steps to look ahead. Default 24.\n", - " :type foresight: int\n", - " :param max_bid_price: Maximum bid price\n", - " :type max_bid_price: float\n", - " :param max_demand: Maximum demand\n", - " :type max_demand: float\n", - " :param device: Device to run on\n", - " :type device: str\n", - " :param float_type: Float type to use\n", - " :type float_type: str\n", - " :param learning_mode: Whether to use learning mode\n", - " :type learning_mode: bool\n", - " :param actor: Actor network\n", - " :type actor: torch.nn.Module\n", - " \"\"\"\n", - "\n", - " def __init__(self, *args, **kwargs):\n", - " super().__init__(*args, **kwargs)\n", - "\n", - " self.unit_id = kwargs[\"unit_id\"]\n", - "\n", - " # defines bounds of actions space\n", - " self.max_bid_price = kwargs.get(\"max_bid_price\", 100)\n", - " self.max_demand = kwargs.get(\"max_demand\", 10e3)\n", - "\n", - " # tells us whether we are training the agents or just executing per-learnind stategies\n", - " self.learning_mode = kwargs.get(\"learning_mode\", False)\n", - "\n", - " # sets the devide of the actor network\n", - " device = kwargs.get(\"device\", \"cpu\")\n", - " self.device = th.device(device if th.cuda.is_available() else \"cpu\")\n", - " if not self.learning_mode:\n", - " self.device = th.device(\"cpu\")\n", - "\n", - " # future: add option to choose between float16 and float32\n", - " # float_type = kwargs.get(\"float_type\", \"float32\")\n", - " self.float_type = th.float\n", - "\n", - " # for definition of observation space\n", - " self.foresight = kwargs.get(\"foresight\", 24)\n", - "\n", - " if self.learning_mode:\n", - " self.learning_role = None\n", - " self.collect_initial_experience_mode = kwargs.get(\n", - " \"episodes_collecting_initial_experience\", True\n", - " )\n", - "\n", - " self.action_noise = NormalActionNoise(\n", - " mu=0.0,\n", - " sigma=kwargs.get(\"noise_sigma\", 0.1),\n", - " action_dimension=self.act_dim,\n", - " scale=kwargs.get(\"noise_scale\", 1.0),\n", - " dt=kwargs.get(\"noise_dt\", 1.0),\n", - " )\n", - "\n", - " elif Path(load_path=kwargs[\"trained_actors_path\"]).is_dir():\n", - " self.load_actor_params(load_path=kwargs[\"trained_actors_path\"])\n", - "\n", - " def testfunction():\n", - "\n", - " return None" - ], - "metadata": { - "id": "UXYSesx4Ifp5" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### 3.0 The \"Step Function\"\n", - "\n", - "The key function in an RL problem is the step that is taken in the so called environment. It consist the following parts:\n", - "\n", - "1. Get an observation\n", - "2. Choose an action\n", - "3. Get a reward\n", - "4. Update your policy\n", - "\n", - "In ASSUME we do not have such a straight forward step function. The steps 1 & 2 are combined in the calculate_bids() function which is called as soon as an offer on the market is placed. The step 3, however, can only happen after we get the market feedback from the simulation run and, hence, is in the calculate_reward() function. Step 4 is solely handeled by the learning_role as it shedules the policy update manages the buffer and what not. Hence, it is actually not included in this notebook, since we only focus on transforming the bidding strategy into a learning one.\n", - "\n", - "**Step 1-3 will be implemented in the following sections 3.1 to 3.3. If there is a coding task for you it will be marked accordingly.**" - ], - "metadata": { - "id": "8UM1QPZrIdqK" - } - }, - { - "cell_type": "code", - "source": [ - "#magic to enable class definitions across colab cells\n", - "%%add_to RLStrategy\n", - "def calculate_bids(\n", - " self,\n", - " unit: SupportsMinMax,\n", - " market_config: MarketConfig,\n", - " product_tuples: list[Product],\n", - " **kwargs,\n", - ") -> Orderbook:\n", - " \"\"\"\n", - " Calculate bids for a unit -> STEP 1 & 2\n", - "\n", - " :param unit: Unit to calculate bids for\n", - " :type unit: SupportsMinMax\n", - " :param market_config: Market configuration\n", - " :type market_config: MarketConfig\n", - " :param product_tuples: Product tuples\n", - " :type product_tuples: list[Product]\n", - " :return: Bids containing start time, end time, price and volume\n", - " :rtype: Orderbook\n", - " \"\"\"\n", - "\n", - " bid_quantity_inflex, bid_price_inflex = 0, 0\n", - " bid_quantity_flex, bid_price_flex = 0, 0\n", - "\n", - " start = product_tuples[0][0]\n", - " end = product_tuples[0][1]\n", - " # get technical bounds for the unit output from the unit\n", - " min_power, max_power = unit.calculate_min_max_power(start, end)\n", - " min_power = min_power[start]\n", - " max_power = max_power[start]\n", - "\n", - " # =============================================================================\n", - " # 1. Get the Observations, which are the basis of the action decision\n", - " # =============================================================================\n", - " next_observation = self.create_observation(\n", - " unit=unit,\n", - " start=start,\n", - " end=end,\n", - " )\n", - "\n", - " # =============================================================================\n", - " # 2. Get the Actions, based on the observations\n", - " # =============================================================================\n", - " actions, noise = self.get_actions(next_observation)\n", - "\n", - " bids = actions\n", - "\n", - "\n", - " return bids" - ], - "metadata": { - "id": "iApbQsg5x_u2" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "#magic to enable class definitions across colab cells\n", - "%%add_to RLStrategy\n", - "def calculate_reward(\n", - " self,\n", - " unit,\n", - " marketconfig: MarketConfig,\n", - " orderbook: Orderbook,\n", - "):\n", - " \"\"\"\n", - " Calculate reward\n", - "\n", - " :param unit: Unit to calculate reward for\n", - " :type unit: SupportsMinMax\n", - " :param marketconfig: Market configuration\n", - " :type marketconfig: MarketConfig\n", - " :param orderbook: Orderbook\n", - " :type orderbook: Orderbook\n", - " \"\"\"\n", - "\n", - " return None" - ], - "metadata": { - "id": "_4cJ8Y8uvMgV" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## 3.1 Get an observation\n", - "\n", - "The decision about the observations received by each agent plays a crucial role when designing a multi-agent RL setup. The following describes the task of learning agents representing profit-maximizing electricity market participants who either sell a generating unit's output or optimize a storage unit's operation. They are represented through their plants' techno-economic parameters, such as minimal operational capacity $P^{min}$, start-up $c^{su}$, and shut-down $c^{sd}$ costs. This information is all know by the unit istself and, hence, also accessible in the bidding strategy.\n", - "\n", - "During the training phase, the centralized critic receives observations from all agents, resulting in an input size that grows linearly with the number of agents. This can lead to unstable training behavior of the critic networks, which limits the maximal number of agents in the simulation. This effect is known as the dimensionality curse, which likely contributed to the small number of learning agents in existing approaches. To address the dimensionality curse, we use a single observation that is the same for all agents and added a small size of unique observations for each agent to improve their performance. This modification allows the use of only one observation in the centralized critic, decoupled from the number of learning agents, significantly reducing the observation size and enabling simultaneous training of hundreds of learning agents with stable training behavior. The only limiting factor is the available working memory.\n", - "\n", - "At time-step $t$, agent $i$ receives the observation $o_{i,t}$ consisting of vectors $[L_{\\mathrm{h},t}, L_{\\mathrm{f},t}, M_{\\mathrm{h},t}, M_{\\mathrm{f},t}, mc_{i,t}]$. Here $L_{\\mathrm{h},t}, L_{\\mathrm{f},t}$ and $M_{\\mathrm{h},t}, M_{\\mathrm{f},t}$ are the past and the forecast residual loads and market prices, respectively. These information stems from the world, where a overall forecasting role generates them. The price forecast is calculated ahead of the simulation run using a simple merit order model based on the residual load forecast and the marginal cost of power plants. This part of the observation is the same for all agents. In addition, each agent receives its current marginal cost $mc_{i,t}$. Information about the marginal cost is shared with a centralized critic during the training phase. Still, it is not shared with other agents during the execution phase. All the inputs are normalized to improve the performance of the training process.\n" - ], - "metadata": { - "id": "Jgjx14997Y9s" - } - }, - { - "cell_type": "markdown", - "source": [ - "#### **Task 3.1**\n", - "**Goal**: With the help of the *unit*, the *starttime* and the *endtime* we want to create the Observations for the unit.\n", - "\n", - "There are 4 different observations:\n", - "- residual load forecast\n", - "- price forecast\n", - "- total capacity of the unit\n", - "- marginal costs of the unit\n", - "\n", - "For all observations we need scaling factors. Why do you think it is important to scale the input? How would you define the scaling factors?" - ], - "metadata": { - "id": "PngYyvs72UxB" - } - }, - { - "cell_type": "code", - "source": [ - "#magic to enable class definitions across colab cells\n", - "%%add_to RLStrategy\n", - "def create_observation(\n", - " self,\n", - " unit: SupportsMinMax,\n", - " start: datetime,\n", - " end: datetime,\n", - "):\n", - " \"\"\"\n", - " Create observation\n", - "\n", - " :param unit: Unit to create observation for\n", - " :type unit: SupportsMinMax\n", - " :param start: Start time\n", - " :type start: datetime\n", - " :param end: End time\n", - " :type end: datetime\n", - " :return: Observation\n", - " :rtype: torch.Tensor\"\"\"\n", - " end_excl = end - unit.index.freq\n", - "\n", - " # get the forecast length depending on the tme unit considered in the modelled unit\n", - " forecast_len = pd.Timedelta((self.foresight - 1) * unit.index.freq)\n", - "\n", - " # =============================================================================\n", - " # 1.1 Get the Observations, which are the basis of the action decision\n", - " # =============================================================================\n", - " scaling_factor_res_load = #TODO\n", - "\n", - " # price forecast\n", - " scaling_factor_price = #TODO\n", - "\n", - " # total capacity and marginal cost\n", - " scaling_factor_total_capacity = #TODO\n", - "\n", - " # marginal cost\n", - " # Obs[2*foresight+1:2*foresight+2]\n", - " scaling_factor_marginal_cost = #TODO\n", - "\n", - " # checks if we are at end of simulation horizon, since we need to change the forecast then\n", - " # for residual load and price forecast and scale them\n", - " if end_excl + forecast_len > unit.forecaster[\"residual_load_EOM\"].index[-1]:\n", - " scaled_res_load_forecast = (\n", - " unit.forecaster[\"residual_load_EOM\"].loc[start:].values\n", - " / scaling_factor_res_load\n", - " )\n", - " scaled_res_load_forecast = np.concatenate(\n", - " [\n", - " scaled_res_load_forecast,\n", - " unit.forecaster[\"residual_load_EOM\"].iloc[\n", - " : self.foresight - len(scaled_res_load_forecast)\n", - " ],\n", - " ]\n", - " )\n", - "\n", - " else:\n", - " scaled_res_load_forecast = (\n", - " unit.forecaster[\"residual_load_EOM\"]\n", - " .loc[start : end_excl + forecast_len]\n", - " .values\n", - " / scaling_factor_res_load\n", - " )\n", - "\n", - " if end_excl + forecast_len > unit.forecaster[\"price_EOM\"].index[-1]:\n", - " scaled_price_forecast = (\n", - " unit.forecaster[\"price_EOM\"].loc[start:].values / scaling_factor_price\n", - " )\n", - " scaled_price_forecast = np.concatenate(\n", - " [\n", - " scaled_price_forecast,\n", - " unit.forecaster[\"price_EOM\"].iloc[\n", - " : self.foresight - len(scaled_price_forecast)\n", - " ],\n", - " ]\n", - " )\n", - "\n", - " else:\n", - " scaled_price_forecast = (\n", - " unit.forecaster[\"price_EOM\"].loc[start : end_excl + forecast_len].values\n", - " / scaling_factor_price\n", - " )\n", - "\n", - " # get last accapted bid volume and the current marginal costs of the unit\n", - " current_volume = unit.get_output_before(start)\n", - " current_costs = unit.calc_marginal_cost_with_partial_eff(current_volume, start)\n", - "\n", - " # scale unit outpus\n", - " scaled_total_capacity = current_volume / scaling_factor_total_capacity\n", - " scaled_marginal_cost = current_costs / scaling_factor_marginal_cost\n", - "\n", - " # concat all obsverations into one array\n", - " observation = np.concatenate(\n", - " [\n", - " scaled_res_load_forecast,\n", - " scaled_price_forecast,\n", - " np.array([scaled_total_capacity, scaled_marginal_cost]),\n", - " ]\n", - " )\n", - "\n", - " # transfer arry to GPU for NN processing\n", - " observation = (\n", - " th.tensor(observation, dtype=self.float_type)\n", - " .to(self.device, non_blocking=True)\n", - " .view(-1)\n", - " )\n", - "\n", - " return observation.detach().clone()" - ], - "metadata": { - "id": "0ww-L9fABnw3" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "#### **Solution 3.1**\n", - "\n", - "First why do we scale?\n", - "\n", - "Scaling observations is a crucial preprocessing step in machine learning, including reinforcement learning. It involves transforming the features so that they all fall within a similar numerical range. This is important for several reasons. Firstly, it aids in numerical stability during training. Large input values can lead to numerical precision issues, potentially causing the algorithm to perform poorly or even fail to converge. By scaling the features, we mitigate this risk, ensuring a more stable and reliable learning process.\n", - "\n", - "Additionally, scaling promotes uniformity in the learning process. Many optimization algorithms, like gradient descent, adjust model parameters based on the magnitude of gradients. When features have vastly different scales, some may dominate the learning process, while others receive less attention. This imbalance can hinder convergence and result in a suboptimal model. Scaling addresses this issue, allowing the algorithm to treat all features equally and progress more efficiently towards an optimal solution. This not only expedites the learning process but also enhances the model's ability to generalize to new, unseen data. In essence, scaling observations is a fundamental practice that enhances the performance and robustness of machine learning models across a wide array of applications.\n", - "\n", - "According to this the scaling should ensure a similar range for all input parameteres. You can achieve that by chosing the following scaling factors. If you add new observations, choose your scaling factors wisely." - ], - "metadata": { - "id": "kDYKZGERKJ6V" - } - }, - { - "cell_type": "code", - "source": [ - "\"\"\"\n", - "#scaling factors for all observations\n", - "#residual load forecast\n", - "scaling_factor_res_load = self.max_demand\n", - "\n", - "# price forecast\n", - "scaling_factor_price = self.max_bid_price\n", - "\n", - "# total capacity\n", - "scaling_factor_total_capacity = unit.max_power\n", - "\n", - "# marginal cost\n", - "scaling_factor_marginal_cost = self.max_bid_price\n", - "\"\"\"" - ], - "metadata": { - "id": "PYoI3ncSKJSX", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 70 - }, - "outputId": "4b4341d7-5a21-49c4-ee25-b8c55f693cd1" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "'\\n#scaling factors for all observations\\n#residual load forecast\\nscaling_factor_res_load = self.max_demand\\n\\n# price forecast\\nscaling_factor_price = self.max_bid_price\\n\\n# total capacity\\nscaling_factor_total_capacity = unit.max_power\\n\\n# marginal cost\\nscaling_factor_marginal_cost = self.max_bid_price\\n'" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "string" - } - }, - "metadata": {}, - "execution_count": 13 - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## 3.2 Choose an action\n", - "\n", - "To differentiate between the inflexible and flexible parts of a plant's generation capacity, we split the bids into two parts. The first bid part allows agents to bid a very low or even negative price for the inflexible capacity; this reflects the agent's motivation to stay infra-marginal during periods of very low net load (e.g., in periods of high solar and wind power generation) to avoid the cost of a shut-down and subsequent start-up of the plant. The flexible part of the capacity can be offered at a higher price to provide chances for higher profits. The actions of agent $i$ at time-step $t$ are defined as $a_{i,t} = [ep^\\mathrm{inflex}_{i,t}, ep^\\mathrm{flex}_{i,t}] \\in [ep^{min},ep^{max}]$, where $ep^\\mathrm{inflex}_{i,t}$ and $ep^\\mathrm{flex}_{i,t}$ are bid prices for the inflexible and flexible capacities, and $ep^{min},ep^{max}$ are minimal and maximal bid prices, respectively.\n", - "\n", - "How do we learn, how to make good decisions? Basically by try and error, also know as **exploration**. Exploration is a fundamental concept in reinforcement learning, representing the strategy by which an agent interacts with its environment to gather information about the consequences of its actions. This is crucial because without exploration, the agent might settle for suboptimal policies based on its initial knowledge, limiting its ability to discover more rewarding states or actions.\n", - "\n", - "In the initial stages of training, also often called initial exploration, it's imperative to employ almost random actions. This means having the agent take actions purely by chance. This seemingly counterintuitive approach serves a critical purpose. Initially, the agent lacks any meaningful information about the environment, making it impossible to make informed decisions. By taking random actions, it can quickly gather a broad range of experiences, allowing it to grasp the fundamental structure of the environment. These random actions serve as a kind of \"baseline exploration,\" providing a starting point from which the agent can refine its policy through learning. With our domain knowledge we can even guide the initial exploration process, to enhance learning capabilities.\n", - "\n", - "\n", - "Following up on these concepts the following tasks will:\n", - "1. obtain the action values from the neurnal net in the bidding staretgy and\n", - "2. then transform theses values into the actual bids of an order. \n", - "\n" - ], - "metadata": { - "id": "rW_1op6fCTV-" - } - }, - { - "cell_type": "markdown", - "source": [ - "#### **Task 3.2.1**\n", - "**Goal**: With the observations and noise we generate actions\n", - "\n", - "In the following task we define the actions for the initial exploration mode. As described before we can guide it by not letting it choose random actions but defining a base-bid on which we add a good amount of noise. In this way the initial strategy starts from a solution that we know works somewhat well. Define the respective base bid in the followin code. Remeber we are defining bids for a conventional power plant bidding in an Energy-Only-Market with a uniform pricing auction. " - ], - "metadata": { - "id": "Cho84Pqs2N2G" - } - }, - { - "cell_type": "code", - "source": [ - "#magic to enable class definitions across colab cells\n", - "%%add_to RLStrategy\n", - "def get_actions(self, next_observation):\n", - " \"\"\"\n", - " Get actions\n", - "\n", - " :param next_observation: Next observation\n", - " :type next_observation: torch.Tensor\n", - " :return: Actions\n", - " :rtype: torch.Tensor\n", - " \"\"\"\n", - "\n", - " # distinction whetere we are in learning mode or not to handle exploration realised with noise\n", - " if self.learning_mode:\n", - " # if we are in learning mode the first x episodes we want to explore the entire action space\n", - " # to get a good initial experience, in the area around the costs of the agent\n", - " if self.collect_initial_experience_mode:\n", - " # define current action as soley noise\n", - " noise = (\n", - " th.normal(\n", - " mean=0.0, std=0.2, size=(1, self.act_dim), dtype=self.float_type\n", - " )\n", - " .to(self.device)\n", - " .squeeze()\n", - " )\n", - "\n", - " # =============================================================================\n", - " # 2.1 Get Actions and handle exploration\n", - " # =============================================================================\n", - " #==> YOUR CODE HERE\n", - " base_bid = #TODO\n", - "\n", - " # add niose to the last dimension of the observation\n", - " # needs to be adjusted if observation space is changed, because only makes sense\n", - " # if the last dimension of the observation space are the marginal cost\n", - " curr_action = noise + base_bid.clone().detach()\n", - "\n", - " else:\n", - " # if we are not in the initial exploration phase we chose the action with the actor neuronal net\n", - " # and add noise to the action\n", - " curr_action = self.actor(next_observation).detach()\n", - " noise = th.tensor(\n", - " self.action_noise.noise(), device=self.device, dtype=self.float_type\n", - " )\n", - " curr_action += noise\n", - " else:\n", - " # if we are not in learning mode we just use the actor neuronal net to get the action without adding noise\n", - "\n", - " curr_action = self.actor(next_observation).detach()\n", - " noise = tuple(0 for _ in range(self.act_dim))\n", - "\n", - " curr_action = curr_action.clamp(-1, 1)\n", - "\n", - " return curr_action, noise\n" - ], - "metadata": { - "id": "8ehlm5Z9CbRw" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "#### **Solution 3.2.1**\n", - "\n", - "So how do we define the base bid?\n", - "\n", - "Assuming the described auction is a efficient market with full information and competition, we know that bidding the marginal costs of the power plant is the economically best bid. With the RL strategy we can recreate the abuse of market power and incomplete information, which enables us to model different market settings. Yet, starting of with the theoretically styleized optimal solution guides our RL agents porperly. As the marginal costs of the power plant are part of the oberservations we can define the base bid in the following way. " - ], - "metadata": { - "id": "OTaqkwV3xcf6" - } - }, - { - "cell_type": "code", - "source": [ - "\"\"\"\n", - "#base_bid = marginal costs\n", - "base_bid = next_observation[-1] # = marginal_costs\n", - "\"\"\"" - ], - "metadata": { - "id": "rfXJBGOKxbk7", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 35 - }, - "outputId": "06f76c52-e215-4998-8f61-f7492b880e4d" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "'\\n#base_bid = marginal costs\\nbase_bid = next_observation[-1] # = marginal_costs\\n'" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "string" - } - }, - "metadata": {}, - "execution_count": 15 - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "#### **Task 3.2.2**\n", - "**Goal**: Define the actual bids with the outputs of the actors\n", - "\n", - "Similarly to every other output of a neuronal network, the actions are in the range of 0-1. These values need to be translated into the actual bids $a_{i,t} = [ep^\\mathrm{inflex}_{i,t}, ep^\\mathrm{flex}_{i,t}] \\in [ep^{min},ep^{max}]$. This can be done in a way that further helps the RL agent to learn, if we put some thought into.\n", - "\n", - "For this we go back into the calculate_bids() function and instead of just defining bids=actions, which was just a place holder, we actually make them into bids. Think about a smart way to transform them and fill the gaps in the following code. Remember:\n", - "\n", - " - *bid_quantity_inflex* represent the inflexible part of the bid. This represents the minimum run capacity of the unit.\n", - " - *bid_quantity_flex* represent the flexible part of the bid. This represents the flexible capacity of the unit." - ], - "metadata": { - "id": "B5Hgh88Vz0wD" - } - }, - { - "cell_type": "code", - "source": [ - "#magic to enable class definitions across colab cells\n", - "%%add_to RLStrategy\n", - "def calculate_bids(\n", - " self,\n", - " unit: SupportsMinMax,\n", - " market_config: MarketConfig,\n", - " product_tuples: list[Product],\n", - " **kwargs,\n", - ") -> Orderbook:\n", - " \"\"\"\n", - " Calculate bids for a unit\n", - "\n", - " :param unit: Unit to calculate bids for\n", - " :type unit: SupportsMinMax\n", - " :param market_config: Market configuration\n", - " :type market_config: MarketConfig\n", - " :param product_tuples: Product tuples\n", - " :type product_tuples: list[Product]\n", - " :return: Bids containing start time, end time, price and volume\n", - " :rtype: Orderbook\n", - " \"\"\"\n", - "\n", - " bid_quantity_inflex, bid_price_inflex = 0, 0\n", - " bid_quantity_flex, bid_price_flex = 0, 0\n", - "\n", - " start = product_tuples[0][0]\n", - " end = product_tuples[0][1]\n", - " # get technical bounds for the unit output from the unit\n", - " min_power, max_power = unit.calculate_min_max_power(start, end)\n", - " min_power = min_power[start]\n", - " max_power = max_power[start]\n", - "\n", - " # =============================================================================\n", - " # 1. Get the Observations, which are the basis of the action decision\n", - " # =============================================================================\n", - " next_observation = self.create_observation(\n", - " unit=unit,\n", - " start=start,\n", - " end=end,\n", - " )\n", - "\n", - " # =============================================================================\n", - " # 2. Get the Actions, based on the observations\n", - " # =============================================================================\n", - " actions, noise = self.get_actions(next_observation)\n", - "\n", - " bids = actions\n", - "\n", - " # =============================================================================\n", - " # 3.2 Transform Actions into bids\n", - " # =============================================================================\n", - " #==> YOUR CODE HERE\n", - " # actions are in the range [0,1], we need to transform them into actual bids\n", - " # we can use our domain knowledge to guide the bid formulation\n", - " bid_prices = actions * self.max_bid_price\n", - "\n", - " # 3.1 formulate the bids for Pmin\n", - " # Pmin, the minium run capacity is the inflexible part of the bid, which should always be accepted\n", - " bid_quantity_inflex = min_power\n", - " bid_price_inflex = #TODO\n", - "\n", - " # 3.1 formulate the bids for Pmax - Pmin\n", - " # Pmin, the minium run capacity is the inflexible part of the bid, which should always be accepted\n", - " bid_quantity_flex = max_power - bid_quantity_inflex\n", - " bid_price_flex = #TODO\n", - "\n", - " # actually formulate bids in orderbook format\n", - " bids = [\n", - " {\n", - " \"start_time\": start,\n", - " \"end_time\": end,\n", - " \"only_hours\": None,\n", - " \"price\": bid_price_inflex,\n", - " \"volume\": bid_quantity_inflex,\n", - " },\n", - " {\n", - " \"start_time\": start,\n", - " \"end_time\": end,\n", - " \"only_hours\": None,\n", - " \"price\": bid_price_flex,\n", - " \"volume\": bid_quantity_flex,\n", - " },\n", - " ]\n", - "\n", - " # store results in unit outputs which are written to database by unit operator\n", - " unit.outputs[\"rl_observations\"][start] = next_observation\n", - " unit.outputs[\"rl_actions\"][start] = actions\n", - " unit.outputs[\"rl_exploration_noise\"][start] = noise\n", - "\n", - " return bids" - ], - "metadata": { - "id": "Y81HzlkjNHJ0" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "#### **Solution 3.2.2**\n", - "\n", - "So how do we define the actual bid from the action?\n", - "\n", - "We have the bid price for the minimum power (inflex) and the rest of the power. As the power plant needs to run at minimal the minum power in order to offer generation in general, it makes sense to offer this generation at a lower price than the rest of the power. Hence, we can alocate the actions to the bid prices in the following way. In addition, the actions need to be rescaled of course.\n" - ], - "metadata": { - "id": "3n-kJeOFCfRB" - } - }, - { - "cell_type": "code", - "source": [ - "\"\"\"\n", - "#calculate actual bids\n", - "#rescale actions to actual prices\n", - "bid_prices = actions * self.max_bid_price\n", - "\n", - "#calculate inflexible part of the bid\n", - "bid_quantity_inflex = min_power\n", - "bid_price_inflex = min(bid_prices)\n", - "\n", - "#calculate flexible part of the bid\n", - "bid_quantity_flex = max_power - bid_quantity_inflex\n", - "bid_price_flex = max(bid_prices)\n", - "\"\"\"" - ], - "metadata": { - "id": "wB7X-pFkCje3", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 70 - }, - "outputId": "ff905a9d-e3f2-4487-9e8a-9dbf4e855ab7" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "'\\n#calculate actual bids\\n#rescale actions to actual prices\\nbid_prices = actions * self.max_bid_price\\n\\n#calculate inflexible part of the bid\\nbid_quantity_inflex = min_power\\nbid_price_inflex = min(bid_prices)\\n\\n#calculate flexible part of the bid\\nbid_quantity_flex = max_power - bid_quantity_inflex\\nbid_price_flex = max(bid_prices)\\n'" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "string" - } - }, - "metadata": {}, - "execution_count": 17 - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## 3.3 Get a reward\n", - "This step is done in the *calculate_reward*()-function, which is called after the market is cleared and we get the market feedback, so we can calculate the profit. In RL, the design of a reward function is as important as the choice of the correct algorithm. During the initial phase of the work, pure economic reward in the form of the agent's profit was used. Typically, electricity market models consider only a single restart cost. Still, in the case of using RL, the split into shut-down and start-up costs allow the agents to better differentiate between these two events and learn a better policy.\n", - "\n", - "\n", - "\\begin{equation}\n", - "\\pi_{i,t} =\n", - "\t\\begin{cases}\n", - "\t P^\\text{conf}_{i,t} (M_t - mc_{i,t}) dt - c^{su}_i & \\text{if $P^\\text{conf}_{i,t}$ $\\geq P^{min}_i$} \\\\\n", - " & \\text{and $P_{i,t-1}$ $= 0$} \\\\\n", - "\t P^\\text{conf}_{i,t} (M_t - mc_{i,t}) dt & \\text{if $P^\\text{conf}_{i,t}$ $\\geq P^{min}_i$} \\\\\n", - " & \\text{and $P_{i,t-1}$ $\\neq 0$} \\\\\n", - "\t - c^{sd}_i & \\text{if $P^\\text{conf}_{i,t}$ $\\leq P^{min}_i$} \\\\\n", - " & \\text{and $P_{i,t-1}$ $\\neq 0$} \\\\\n", - " 0 & \\text{otherwise} \\\\\n", - "\t \\end{cases}\n", - "\\end{equation}\n", - "\n", - "\n", - "In this equation, $P^\\text{conf}$ is the confirmed capacity on the market, $P^{min}$ --- minimal stable capacity, $M$ --- market clearing price, $mc$ --- marginal generation cost, $dt$ --- market time resolution, $c^{su}, c^{sd}$ --- start-up and shut-down costs, respectively.\n", - "\n", - "The profit-driven reward function was sufficient for a few agents, but the learning performance decreased significantly with more agents. Therefore, we add an additional regret term $cm$." - ], - "metadata": { - "id": "hr15xKuGCkbn" - } - }, - { - "cell_type": "markdown", - "source": [ - "#### **Task 3.3**\n", - "**Goal**: Define the reward guiding the learning process of the agent.\n", - "\n", - "As the reward plays such a crucial role in the learning think of ways how to integrate further signals exceeding the monetary profit. One example could be integrating a regret term, namely the opportunity costs. Your task is to define the rewrad using the opportunity costs and to scale it." - ], - "metadata": { - "id": "aGyaOUgo3Y8Q" - } - }, - { - "cell_type": "code", - "source": [ - "#magic to enable class definitions across colab cells\n", - "%%add_to RLStrategy\n", - "def calculate_reward(\n", - " self,\n", - " unit,\n", - " marketconfig: MarketConfig,\n", - " orderbook: Orderbook,\n", - " ):\n", - " \"\"\"\n", - " Calculate reward\n", - "\n", - " :param unit: Unit to calculate reward for\n", - " :type unit: SupportsMinMax\n", - " :param marketconfig: Market configuration\n", - " :type marketconfig: MarketConfig\n", - " :param orderbook: Orderbook\n", - " :type orderbook: Orderbook\n", - " \"\"\"\n", - "\n", - " # =============================================================================\n", - " # 3. Calculate Reward\n", - " # =============================================================================\n", - " # function is called after the market is cleared and we get the market feedback,\n", - " # so we can calculate the profit\n", - "\n", - " product_type = marketconfig.product_type\n", - "\n", - " profit = 0\n", - " reward = 0\n", - " opportunity_cost = 0\n", - "\n", - " # iterate over all orders in the orderbook, to calculate order specific profit\n", - " for order in orderbook:\n", - " start = order[\"start_time\"]\n", - " end = order[\"end_time\"]\n", - " end_excl = end - unit.index.freq\n", - "\n", - " # depending on way the unit calaculates marginal costs we take costs\n", - " if unit.marginal_cost is not None:\n", - " marginal_cost = (\n", - " unit.marginal_cost[start]\n", - " if len(unit.marginal_cost) > 1\n", - " else unit.marginal_cost\n", - " )\n", - " else:\n", - " marginal_cost = unit.calc_marginal_cost_with_partial_eff(\n", - " power_output=unit.outputs[product_type].loc[start:end_excl],\n", - " timestep=start,\n", - " )\n", - "\n", - " duration = (end - start) / timedelta(hours=1)\n", - "\n", - " # calculate profit as income - running_cost from this event\n", - " price_difference = order[\"accepted_price\"] - marginal_cost\n", - " order_profit = price_difference * order[\"accepted_volume\"] * duration\n", - "\n", - " # calculate opportunity cost\n", - " # as the loss of income we have because we are not running at full power\n", - " order_opportunity_cost = (\n", - " price_difference\n", - " * (\n", - " unit.max_power - unit.outputs[product_type].loc[start:end_excl]\n", - " ).sum()\n", - " * duration\n", - " )\n", - "\n", - " # if our opportunity costs are negative, we did not miss an opportunity to earn money and we set them to 0\n", - " order_opportunity_cost = max(order_opportunity_cost, 0)\n", - "\n", - " # collect profit and opportunity cost for all orders\n", - " opportunity_cost += order_opportunity_cost\n", - " profit += order_profit\n", - "\n", - " # consideration of start-up costs, which are evenly divided between the\n", - " # upward and downward regulation events\n", - " if (\n", - " unit.outputs[product_type].loc[start] != 0\n", - " and unit.outputs[product_type].loc[start - unit.index.freq] == 0\n", - " ):\n", - " profit = profit - unit.hot_start_cost / 2\n", - " elif (\n", - " unit.outputs[product_type].loc[start] == 0\n", - " and unit.outputs[product_type].loc[start - unit.index.freq] != 0\n", - " ):\n", - " profit = profit - unit.hot_start_cost / 2\n", - "\n", - " # =============================================================================\n", - " # =============================================================================\n", - " # ==> YOUR CODE HERE\n", - " # The straight forward implemntation would be reward = profit, yet we would like to give the agent more guidance\n", - " # in the learning process, so we add a regret term to the reward, which is the opportunity cost\n", - " # define the reward and scale it\n", - "\n", - " scaling = #TODO\n", - " regret_scale = #TODO\n", - " reward = #TODO\n", - "\n", - " # store results in unit outputs which are written to database by unit operator\n", - " unit.outputs[\"profit\"].loc[start:end_excl] += profit\n", - " unit.outputs[\"reward\"].loc[start:end_excl] = reward\n", - " unit.outputs[\"regret\"].loc[start:end_excl] = opportunity_cost\n" - ], - "metadata": { - "id": "U9HX41mODuBU" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "#### **Solution 3.3**\n", - "\n", - "So how do we define the actual reward?\n", - "\n", - "We use the opportunity costs for further guidance, which quantify the expected contribution margin, as defined by the following equation, with $P^{max}$ as the maximal available capacity.\n", - "\n", - "\\begin{equation}\n", - " cm_{i,t} = \\max[(P^{max}_i - P^\\text{conf}_{i,t}) (M_t - mc_{i,t}) dt, 0]\n", - "\\end{equation}\n", - "\n", - "The regret term gives a negative signal to the agent when there is opportunity cost due to the unsold capacity, thus correcting the agent's actions. This term also introduces an increased influence of the competition between agents in learning. By minimizing the regret, the agents drive the bid prices closer to the marginal generation cost, which drives the market price down.\n", - "\n", - "The reward of agent $i$ at time-step $t$ is defined by the equation below.\n", - "\n", - "\\begin{equation}\n", - " R_{i,t} = \\pi_{i,t} + \\beta cm_{i,t}\n", - "\\end{equation}\n", - "\n", - "Here, $\\beta$ is the regret scaling factor to adjust the ratio between profit-maximizing and regret-minimizing learning.\n", - "\n", - "The described reward function has proven to perform well even with many agents and to accelerate learning convergence. This is because minimizing the regret term drives the overall system to equilibrium. At a point close to the equilibrium point, the average reward of all agents would converge to a constant value since further policy changes would not lead to an additional reduction in regrets or an increase in profits. Therefore, the average reward value can also be a good indicator of learning performance and convergence." - ], - "metadata": { - "id": "gWF7D4QA2-kz" - } - }, - { - "cell_type": "code", - "source": [ - "\"\"\"\n", - "scaling = 0.1 / unit.max_power\n", - "regret_scale = 0.2\n", - "reward = float(profit - regret_scale * opportunity_cost) * scaling\n", - "\"\"\"" - ], - "metadata": { - "id": "e1XdVXPSCo_k", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 52 - }, - "outputId": "585d94a5-7475-4e96-d0a1-5e82b711c6a5" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "'\\nscaling = 0.1 / unit.max_power\\nregret_scale = 0.2\\nreward = float(profit - regret_scale * opportunity_cost) * scaling\\n'" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "string" - } - }, - "metadata": {}, - "execution_count": 19 - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## 3.4 Start the simulation\n", - "\n", - "We are almost done with all the changes to actually be able to make ASSUME learn here in google colab. If you would rather like to load our pretrained strategies, we need a function for loading parameters, which can be found below. \n", - "\n" - ], - "metadata": { - "id": "L3flH5iY4x7Z" - } - }, - { - "cell_type": "code", - "source": [ - "#magic to enable class definitions across colab cells\n", - "%%add_to RLStrategy\n", - "def load_actor_params(self, load_path):\n", - " \"\"\"\n", - " Load actor parameters\n", - "\n", - " :param simulation_id: Simulation ID\n", - " :type simulation_id: str\n", - " \"\"\"\n", - " directory = f\"{load_path}/actors/actor_{self.unit_id}.pt\"\n", - "\n", - " params = th.load(directory, map_location=self.device)\n", - "\n", - " self.actor = Actor(self.obs_dim, self.act_dim, self.float_type)\n", - " self.actor.load_state_dict(params[\"actor\"])\n", - "\n", - " if self.learning_mode:\n", - " self.actor_target = Actor(self.obs_dim, self.act_dim, self.float_type)\n", - " self.actor_target.load_state_dict(params[\"actor_target\"])\n", - " self.actor_target.eval()\n", - " self.actor.optimizer.load_state_dict(params[\"actor_optimizer\"])" - ], - "metadata": { - "id": "ZwVtpK3B5gR6" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "To control the learning process, the config file determines the parameters of the learning algorithm. As we want to temper with these values in the notebook we will overwrite the learning config in the next cell and then load it into our world. " - ], - "metadata": { - "id": "cTlqMouufKyo" - } - }, - { - "cell_type": "code", - "source": [ - "learning_config = {'observation_dimension': 50,\n", - " 'action_dimension': 2,\n", - " 'continue_learning': False,\n", - " 'load_model_path': 'None',\n", - " 'max_bid_price': 100,\n", - " 'algorithm': 'matd3',\n", - " 'learning_rate': 0.001,\n", - " 'training_episodes': 100,\n", - " 'episodes_collecting_initial_experience': 5,\n", - " 'train_freq': 24,\n", - " 'gradient_steps': -1,\n", - " 'batch_size': 256,\n", - " 'gamma': 0.99,\n", - " 'device': 'cpu',\n", - " 'noise_sigma': 0.1,\n", - " 'noise_scale': 1,\n", - " 'noise_dt': 1,\n", - " 'validation_episodes_interval': 5}" - ], - "metadata": { - "id": "moZ_UD7FfkOh" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "# Read the YAML file\n", - "with open('assume/examples/inputs/example_02a/config.yaml', 'r') as file:\n", - " data = yaml.safe_load(file)\n", - "\n", - "#store our modifications to the config file\n", - "data['base']['learning_mode']= True\n", - "data['base']['learning_config']=learning_config\n", - "\n", - "# Write the modified data back to the file\n", - "with open('assume/examples/inputs/example_02a/config.yaml', 'w') as file:\n", - " yaml.safe_dump(data, file)" - ], - "metadata": { - "id": "iPz8v4N5hpfr" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "In order to let the simulation run with the integrated learning we need to touch up the main file that runs it in the following way." - ], - "metadata": { - "id": "ZlRnTgCy5d9W" - } - }, - { - "cell_type": "code", - "source": [ - "log = logging.getLogger(__name__)\n", - "\n", - "csv_path = \"./outputs\"\n", - "os.makedirs(\"./local_db\", exist_ok=True)\n", - "\n", - "if __name__ == \"__main__\":\n", - " \"\"\"\n", - " Available examples:\n", - " - local_db: without database and grafana\n", - " - timescale: with database and grafana (note: you need docker installed)\n", - " \"\"\"\n", - " data_format = \"local_db\" # \"local_db\" or \"timescale\"\n", - "\n", - " if data_format == \"local_db\":\n", - " db_uri = \"sqlite:///./local_db/assume_db.db\"\n", - " elif data_format == \"timescale\":\n", - " db_uri = \"postgresql://assume:assume@localhost:5432/assume\"\n", - "\n", - " input_path = \"assume/examples/inputs\"\n", - " scenario = \"example_02a\"\n", - " study_case = \"base\"\n", - "\n", - " # create world\n", - " world = World(database_uri=db_uri, export_csv_path=csv_path)\n", - "\n", - " # we import our defined bidding strategey class including the learning into the world bidding strategies\n", - " # in the example files we provided the name of the learning bidding strategeis in the input csv is \"pp_learning\"\n", - " #hence we define this strategey to be one of the learning class\n", - " world.bidding_strategies[\"pp_learning\"] = RLStrategy\n", - "\n", - " # then we load the scenario specified above from the respective input files\n", - " load_scenario_folder(\n", - " world,\n", - " inputs_path=input_path,\n", - " scenario=scenario,\n", - " study_case=study_case,\n", - " )\n", - "\n", - " # run learning if learning mode is enabled\n", - " # needed as we simulate the modelling horizon multiple times to train reinforcement learning run_learning( world, inputs_path=input_path, scenario=scenario, study_case=study_case, )\n", - "\n", - " if world.learning_config.get(\"learning_mode\", False):\n", - "\n", - " run_learning(\n", - " world,\n", - " inputs_path=input_path,\n", - " scenario=scenario,\n", - " study_case=study_case,\n", - " )\n", - "\n", - " #after the learning is done we make a normal run of the simulation, which equasl a test run\n", - " world.run()\n" - ], - "metadata": { - "id": "ZlWxXxZr54WV", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 1000 - }, - "outputId": "e30f4279-7a4e-4efc-9cfb-61416e4fe2f1" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stderr", - "text": [ - "INFO:assume.world:connected to db\n", - "INFO:assume.common.scenario_loader:Starting Scenario example_02a/base from assume/examples/inputs\n", - "INFO:assume.common.scenario_loader:Loading input data\n", - "INFO:assume.common.scenario_loader:storage_units not found. Returning None\n", - "INFO:assume.common.scenario_loader:Adding forecast\n", - "INFO:assume.common.scenario_loader:forecasts_df not found. Returning None\n", - "INFO:assume.common.scenario_loader:Downsampling demand_df successful.\n", - "INFO:assume.common.scenario_loader:cross_border_flows not found. Returning None\n", - "INFO:assume.common.scenario_loader:availability_df not found. Returning None\n", - "INFO:assume.common.scenario_loader:electricity_prices not found. Returning None\n", - "INFO:assume.common.scenario_loader:price_forecasts not found. Returning None\n", - "INFO:assume.common.scenario_loader:temperature not found. Returning None\n", - "INFO:assume.common.scenario_loader:Adding markets\n", - "INFO:assume.common.scenario_loader:Adding unit operators\n", - "INFO:assume.common.scenario_loader:Adding power_plant units\n", - "INFO:assume.common.scenario_loader:Adding demand units\n", - "Training Episodes: 0%| | 0/100 [00:00\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 42\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mworld\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlearning_config\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"learning_mode\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 43\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 44\u001b[0;31m run_learning(\n\u001b[0m\u001b[1;32m 45\u001b[0m \u001b[0mworld\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 46\u001b[0m \u001b[0minputs_path\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minput_path\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/assume/common/scenario_loader.py\u001b[0m in \u001b[0;36mrun_learning\u001b[0;34m(world, inputs_path, scenario, study_case)\u001b[0m\n\u001b[1;32m 595\u001b[0m \u001b[0;31m# TODO normally, loading twice should not create issues, somehow a scheduling issue is raised currently\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 596\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mepisode\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 597\u001b[0;31m load_scenario_folder(\n\u001b[0m\u001b[1;32m 598\u001b[0m \u001b[0mworld\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 599\u001b[0m \u001b[0minputs_path\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/assume/common/scenario_loader.py\u001b[0m in \u001b[0;36mload_scenario_folder\u001b[0;34m(world, inputs_path, scenario, study_case, perform_learning, perform_evaluation, episode, eval_episode, trained_actors_path)\u001b[0m\n\u001b[1;32m 547\u001b[0m \u001b[0;34m:\u001b[0m\u001b[0mtype\u001b[0m \u001b[0mstudy_case\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 548\u001b[0m \"\"\"\n\u001b[0;32m--> 549\u001b[0;31m world.loop.run_until_complete(\n\u001b[0m\u001b[1;32m 550\u001b[0m load_scenario_folder_async(\n\u001b[1;32m 551\u001b[0m \u001b[0mworld\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mworld\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/nest_asyncio.py\u001b[0m in \u001b[0;36mrun_until_complete\u001b[0;34m(self, future)\u001b[0m\n\u001b[1;32m 91\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_log_destroy_pending\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 92\u001b[0m \u001b[0;32mwhile\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdone\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 93\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_run_once\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 94\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_stopping\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 95\u001b[0m \u001b[0;32mbreak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/nest_asyncio.py\u001b[0m in \u001b[0;36m_run_once\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 127\u001b[0m \u001b[0mhandle\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mready\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpopleft\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 128\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mhandle\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_cancelled\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 129\u001b[0;31m \u001b[0mhandle\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_run\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 130\u001b[0m \u001b[0mhandle\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 131\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/lib/python3.10/asyncio/events.py\u001b[0m in \u001b[0;36m_run\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 78\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_run\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 79\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 80\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_context\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_callback\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_args\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 81\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mSystemExit\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mKeyboardInterrupt\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 82\u001b[0m \u001b[0;32mraise\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/mango/agent/core.py\u001b[0m in \u001b[0;36mraise_exceptions\u001b[0;34m(self, fut)\u001b[0m\n\u001b[1;32m 454\u001b[0m \u001b[0;34mf\"Agent {self.aid}: Caught the following exception in _check_inbox: {fut.exception()}\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 455\u001b[0m )\n\u001b[0;32m--> 456\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mfut\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexception\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 457\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 458\u001b[0m \u001b[0;32masync\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_check_inbox\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/assume/world.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 411\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 412\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 413\u001b[0;31m return self.loop.run_until_complete(\n\u001b[0m\u001b[1;32m 414\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0masync_run\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstart_ts\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mstart_ts\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mend_ts\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mend_ts\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 415\u001b[0m )\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/nest_asyncio.py\u001b[0m in \u001b[0;36mrun_until_complete\u001b[0;34m(self, future)\u001b[0m\n\u001b[1;32m 91\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_log_destroy_pending\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 92\u001b[0m \u001b[0;32mwhile\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdone\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 93\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_run_once\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 94\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_stopping\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 95\u001b[0m \u001b[0;32mbreak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/nest_asyncio.py\u001b[0m in \u001b[0;36m_run_once\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 127\u001b[0m \u001b[0mhandle\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mready\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpopleft\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 128\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mhandle\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_cancelled\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 129\u001b[0;31m \u001b[0mhandle\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_run\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 130\u001b[0m \u001b[0mhandle\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 131\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/lib/python3.10/asyncio/events.py\u001b[0m in \u001b[0;36m_run\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 78\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_run\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 79\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 80\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_context\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_callback\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_args\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 81\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mSystemExit\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mKeyboardInterrupt\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 82\u001b[0m \u001b[0;32mraise\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/lib/python3.10/asyncio/tasks.py\u001b[0m in \u001b[0;36m__wakeup\u001b[0;34m(self, future)\u001b[0m\n\u001b[1;32m 313\u001b[0m \u001b[0;31m# instead of `__next__()`, which is slower for futures\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 314\u001b[0m \u001b[0;31m# that return non-generator iterators from their `__iter__`.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 315\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__step\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 316\u001b[0m \u001b[0mself\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m \u001b[0;31m# Needed to break cycles when an exception occurs.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 317\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/nest_asyncio.py\u001b[0m in \u001b[0;36mstep\u001b[0;34m(task, exc)\u001b[0m\n\u001b[1;32m 203\u001b[0m \u001b[0mcurr_task\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcurr_tasks\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtask\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_loop\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 204\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 205\u001b[0;31m \u001b[0mstep_orig\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtask\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 206\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 207\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcurr_task\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/lib/python3.10/asyncio/tasks.py\u001b[0m in \u001b[0;36m__step\u001b[0;34m(***failed resolving arguments***)\u001b[0m\n\u001b[1;32m 230\u001b[0m \u001b[0;31m# We use the `send` method directly, because coroutines\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 231\u001b[0m \u001b[0;31m# don't have `__iter__` and `__next__` methods.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 232\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcoro\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 233\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 234\u001b[0m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcoro\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mthrow\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mexc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/mango/agent/core.py\u001b[0m in \u001b[0;36m_check_inbox\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 470\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"priority\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpriority\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 471\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 472\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhandle_message\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcontent\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmeta\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 473\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 474\u001b[0m \u001b[0;31m# signal to the Queue that the message is handled\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/mango/agent/role.py\u001b[0m in \u001b[0;36mhandle_message\u001b[0;34m(self, content, meta)\u001b[0m\n\u001b[1;32m 460\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 461\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mhandle_message\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mDict\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mAny\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 462\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_role_context\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhandle_message\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 463\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 464\u001b[0m \u001b[0;32masync\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mshutdown\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/mango/agent/role.py\u001b[0m in \u001b[0;36mhandle_message\u001b[0;34m(self, content, meta)\u001b[0m\n\u001b[1;32m 352\u001b[0m \u001b[0;34m:\u001b[0m\u001b[0mparam\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 353\u001b[0m \"\"\"\n\u001b[0;32m--> 354\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_role_handler\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhandle_message\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 355\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 356\u001b[0m async def send_message(\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/mango/agent/role.py\u001b[0m in \u001b[0;36mhandle_message\u001b[0;34m(self, content, meta)\u001b[0m\n\u001b[1;32m 214\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mrole\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmessage_condition\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_message_subs\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 215\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_is_role_active\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrole\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mmessage_condition\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 216\u001b[0;31m \u001b[0mmethod\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmeta\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 217\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 218\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_notify_send_message_subs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mreceiver_addr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mreceiver_id\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/assume/common/units_operator.py\u001b[0m in \u001b[0;36mhandle_market_feedback\u001b[0;34m(self, content, meta)\u001b[0m\n\u001b[1;32m 178\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mset_unit_dispatch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0morderbook\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmarketconfig\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 179\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwrite_learning_params\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0morderbook\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmarketconfig\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 180\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwrite_actual_dispatch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 181\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 182\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mset_unit_dispatch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0morderbook\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mOrderbook\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmarketconfig\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mMarketConfig\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/assume/common/units_operator.py\u001b[0m in \u001b[0;36mwrite_actual_dispatch\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 220\u001b[0m \u001b[0munit_dispatch_dfs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 221\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0munit_id\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0munit\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munits\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 222\u001b[0;31m \u001b[0mcurrent_dispatch\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0munit\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute_current_dispatch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstart\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnow\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 223\u001b[0m \u001b[0mend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnow\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 224\u001b[0m \u001b[0mcurrent_dispatch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"power\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/assume/units/powerplant.py\u001b[0m in \u001b[0;36mexecute_current_dispatch\u001b[0;34m(self, start, end)\u001b[0m\n\u001b[1;32m 178\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moutputs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"energy\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 179\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 180\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moutputs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"energy\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstart\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mend\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 181\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 182\u001b[0m def calc_simple_marginal_cost(\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1151\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1152\u001b[0m \u001b[0mmaybe_callable\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply_if_callable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1153\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmaybe_callable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1154\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1155\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_is_scalar_access\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_getitem_axis\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 1370\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1371\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mslice\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1372\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_validate_key\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1373\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_slice_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1374\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mcom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_bool_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_validate_key\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 1186\u001b[0m \u001b[0;31m# Key Checks\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1187\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1188\u001b[0;31m \u001b[0;34m@\u001b[0m\u001b[0mdoc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0m_LocationIndexer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_validate_key\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1189\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_validate_key\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mAxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1190\u001b[0m \u001b[0;31m# valid for a collection of labels (we check their presence later)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mKeyboardInterrupt\u001b[0m: " - ] - } - ] - } - ] -} \ No newline at end of file diff --git a/README.md b/README.md index a6573e4e..3d1c1984 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ SPDX-License-Identifier: AGPL-3.0-or-later [![](https://img.shields.io/pypi/status/assume-framework.svg)](https://pypi.org/pypi/assume-framework/) [![](https://img.shields.io/readthedocs/assume)](https://assume.readthedocs.io/) -[![Open Tutorials In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1LISiM1QvDIMXU68pJH-NqrMw5w7Awb24?usp=sharing) +[![Try examples in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/assume-framework/assume/tree/main/examples/notebooks) **ASSUME** is an open-source toolbox for agent-based simulations of European electricity markets, with a primary focus on the German market setup. Developed as an open-source model, its primary objectives are to ensure usability and customizability for a wide range of users and use cases in the energy system modeling community. diff --git a/docs/source/conf.py b/docs/source/conf.py index 2f4da38f..7a68c651 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -75,6 +75,7 @@ .. note:: You can `download `_ this example as a Jupyter notebook + or open ir directly in `Google Colab `_. """ nbsphinx_allow_errors = True diff --git a/docs/source/examples/04_Reinforcement_learning_example.nblink b/docs/source/examples/04_Reinforcement_learning_example.nblink new file mode 100644 index 00000000..0ade1c04 --- /dev/null +++ b/docs/source/examples/04_Reinforcement_learning_example.nblink @@ -0,0 +1 @@ +{"path": "../../../examples/notebooks/04_Reinforcement_learning_example.ipynb"} diff --git a/examples/notebooks/01_minimal_manual_example.ipynb.license b/docs/source/examples/04_Reinforcement_learning_example.nblink.license similarity index 100% rename from examples/notebooks/01_minimal_manual_example.ipynb.license rename to docs/source/examples/04_Reinforcement_learning_example.nblink.license diff --git a/docs/source/examples_basic.rst b/docs/source/examples_basic.rst index 7c494fbf..3b9d914e 100644 --- a/docs/source/examples_basic.rst +++ b/docs/source/examples_basic.rst @@ -7,10 +7,11 @@ Basic Usage ############ -Here you can find several examples for basic usage of ASSUME framework to get you started: +Here you can find several examples for usage of ASSUME framework to get you started: .. toctree:: :maxdepth: 1 examples/01_minimal_manual_example.ipynb + examples/04_Reinforcement_learning_example.ipynb diff --git a/examples/notebooks/01_minimal_manual_example.ipynb b/examples/notebooks/01_minimal_manual_example.ipynb index ce1e6b26..a074ccf7 100644 --- a/examples/notebooks/01_minimal_manual_example.ipynb +++ b/examples/notebooks/01_minimal_manual_example.ipynb @@ -1,446 +1,458 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Minimal manual example\n", - "In this notebook, we will walk through a minimal example of how to use the ASSUME framework. We will first initialize the world instance, next we will create a single market and its operator, afterwards we wll add a generation and a demand agents, and finally start the simulation." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Setting Up the Simulation Environment\n", - "\n", - "First, let's set up the necessary environment and import the required libraries." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "from datetime import datetime, timedelta\n", - "\n", - "import pandas as pd\n", - "from dateutil import rrule as rr\n", - "\n", - "from assume import World\n", - "from assume.common.forecasts import NaiveForecast\n", - "from assume.common.market_objects import MarketConfig, MarketProduct\n", - "\n", - "log = logging.getLogger(__name__)\n", - "\n", - "os.makedirs(\"./local_db\", exist_ok=True)\n", - "\n", - "db_uri = \"sqlite:///./local_db/assume_db_min_example.db\"\n", - "\n", - "world = World(database_uri=db_uri)\n", - "\n", - "start = datetime(2023, 10, 4)\n", - "end = datetime(2023, 12, 5)\n", - "index = pd.date_range(\n", - " start=start,\n", - " end=end + timedelta(hours=24),\n", - " freq=\"H\",\n", - ")\n", - "sim_id = \"world_script_simulation\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this section, we begin by importing the necessary libraries and modules. Additionally, we define the database URI. For this instance, we will utilize a local SQLite database to store our results. In subsequent notebooks, we will transition to using a timescaledb database to store the results, which can then be visualized using the included Grafana dashboards. \n", - "\n", - "Subsequently, we instantiate the `World` class, the primary class responsible for managing the simulation. We also establish the simulation's start and end dates, define the simulation index and step size, and assign a simulation ID. This unique identifier is crucial for referencing the simulation in the database." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initializing the Simulation\n", - "Next, we initialize the simulation by executing the setup function. The setup function sets up the environment for the simulation. It initializes various parameters and components required for the simulation run, including the clock, learning configuration, forecaster, container, connection type, and output agents." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "await world.setup(\n", - " start=start,\n", - " end=end,\n", - " save_frequency_hours=48,\n", - " simulation_id=sim_id,\n", - " index=index,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configuring market\n", - "Here, we define a market configuration, set up a market operator, and add the configured market to the simulation world." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "marketdesign = [\n", - " MarketConfig(\n", - " name=\"EOM\",\n", - " opening_hours=rr.rrule(rr.HOURLY, interval=24, dtstart=start, until=end),\n", - " opening_duration=timedelta(hours=1),\n", - " market_mechanism=\"pay_as_clear\",\n", - " market_products=[MarketProduct(timedelta(hours=1), 24, timedelta(hours=1))],\n", - " additional_fields=[\"block_id\", \"link\", \"exclusive_id\"],\n", - " )\n", - "]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This code segment sets up a market configuration named \"EOM\" with specific opening hours, market mechanism, products, and additional fields, providing the foundation for simulating and analyzing the behavior of this particular electricity market.\n", - "\n", - "In this code:\n", - "- `marketdesign` is a list containing a single market configuration.\n", - "\n", - "- `MarketConfig(...)` defines the configuration for a specific market. In this case, it's named \"EOM\" (End of Month).\n", - "\n", - " - `name=\"EOM\"` - Specifies the name of the market configuration as \"EOM\".\n", - "\n", - " - `opening_hours=rr.rrule(rr.HOURLY, interval=24, dtstart=start, until=end)` - Defines the opening hours for the market using a rule that repeats hourly with a 24-hour interval, starting at `start` and ending at `end`. This indicates that the market operates on a daily basis.\n", - "\n", - " - `opening_duration=timedelta(hours=1)` - Specifies the duration of each market opening as 1 hour.\n", - "\n", - " - `market_mechanism=\"pay_as_clear\"` - Indicates the market mechanism used, in this case, \"pay as clear\", which is a common mechanism in electricity markets where all accepted bids are paid the market-clearing price.\n", - "\n", - " - `market_products=[MarketProduct(timedelta(hours=1), 24, timedelta(hours=1))]` - Defines the market products available. In this case, it seems to be a single product with a duration of 1 hour, 24 periods, and a period duration of 1 hour.\n", - "\n", - " - `additional_fields=[\"block_id\", \"link\", \"exclusive_id\"]` - Specifies additional fields associated with this market configuration, such as \"block_id\", \"link\", and \"exclusive_id\"." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "mo_id = \"market_operator\"\n", - "world.add_market_operator(id=mo_id)\n", - "\n", - "for market_config in marketdesign:\n", - " world.add_market(market_operator_id=mo_id, market_config=market_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this section, we add a market operator to the simulation world and create a market with previously defined configuration.\n", - "\n", - "In this code:\n", - "- `mo_id = \"market_operator\"` assigns the identifier \"market_operator\" to the market operator.\n", - "\n", - "- `world.add_market_operator(id=mo_id)` adds a market operator to the simulation world with the specified identifier \"market_operator\". A market operator in this context represents an entity responsible for operating and managing one or more markets within the simulation.\n", - "\n", - "- The loop `for market_config in marketdesign:` iterates over the market configurations defined in the `marketdesign` list.\n", - "\n", - " - `world.add_market(market_operator_id=mo_id, market_config=market_config)` associates each market configuration with the market operator identified by \"market_operator\". This effectively adds the specified market configuration to the simulation world under the management of the market operator." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Adding Unit Operators and Units\n", - "\n", - "After initializing the simulation, and creating a market, we add unit operators and units to the simulation world." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "world.add_unit_operator(\"demand_operator\")\n", - "\n", - "demand_forecast = NaiveForecast(index, demand=100)\n", - "\n", - "world.add_unit(\n", - " id=\"demand_unit\",\n", - " unit_type=\"demand\",\n", - " unit_operator_id=\"demand_operator\",\n", - " unit_params={\n", - " \"min_power\": 0,\n", - " \"max_power\": 1000,\n", - " \"bidding_strategies\": {\"energy\": \"naive\"},\n", - " \"technology\": \"demand\",\n", - " },\n", - " forecaster=demand_forecast,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This code segment sets up a demand unit managed by the \"my_demand\" unit operator, equipped with a naive demand forecast, and establishes its operational parameters within the electricity market simulation framework.\n", - "\n", - "In this code:\n", - "- `world.add_unit_operator(\"demand_operator\")` adds a unit operator with the identifier \"my_demand\" to the simulation world. A unit operator manages a group of similar units within the simulation.\n", - "\n", - "- `demand_forecast = NaiveForecast(index, demand=100)` creates a naive demand forecast object named `demand_forecast`. This forecast is initialized with an index and a constant demand value of 100.\n", - "\n", - "- `world.add_unit(...)` adds a demand unit to the simulation world with the following specifications:\n", - "\n", - " - `id=\"demand_unit\"` assigns the identifier \"demand1\" to the demand unit.\n", - "\n", - " - `unit_type=\"demand\"` specifies that this unit is of type \"demand\", indicating that it represents a consumer of electricity.\n", - "\n", - " - `unit_operator_id=\"demand_operator\"` associates the unit with the unit operator identified as \"my_demand\".\n", - "\n", - " - `unit_params` provides various parameters for the demand unit, including minimum and maximum power, bidding strategies, and technology type.\n", - "\n", - " - `forecaster=demand_forecast` associates the demand forecast (`demand_forecast`) with the demand unit, allowing the unit to utilize this forecast for its behavior within the simulation." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "world.add_unit_operator(\"unit_operator\")\n", - "\n", - "nuclear_forecast = NaiveForecast(index, availability=1, fuel_price=3, co2_price=0.1)\n", - "\n", - "world.add_unit(\n", - " id=\"nuclear_unit\",\n", - " unit_type=\"power_plant\",\n", - " unit_operator_id=\"unit_operator\",\n", - " unit_params={\n", - " \"min_power\": 200,\n", - " \"max_power\": 1000,\n", - " \"bidding_strategies\": {\"energy\": \"naive\"},\n", - " \"technology\": \"nuclear\",\n", - " },\n", - " forecaster=nuclear_forecast,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This code segment sets up a nuclear power plant unit managed by the \"unit_operator\" unit operator, equipped with a naive availability and cost forecast, and establishes its operational parameters within the electricity market simulation framework.\n", - "\n", - "In this code:\n", - "- `world.add_unit_operator(\"unit_operator\")` adds a unit operator with the identifier \"unit_operator\" to the simulation world. This unit operator will manage a group of similar units within the simulation.\n", - "\n", - "- `nuclear_forecast = NaiveForecast(index, availability=1, fuel_price=3, co2_price=0.1)` creates a naive forecast for the nuclear power plant. This forecast is initialized with an index, a constant availability of 1, a fuel price of 3, and a CO2 price of 0.1.\n", - "\n", - "- `world.add_unit(...)` adds a nuclear power plant unit to the simulation world with the following specifications:\n", - "\n", - " - `id=\"nuclear_unit\"` assigns the identifier \"nuclear_unit\" to the nuclear power plant unit.\n", - "\n", - " - `unit_type=\"power_plant\"` specifies that this unit is of type \"power_plant\", indicating that it represents a power generation facility.\n", - "\n", - " - `unit_operator_id=\"unit_operator\"` associates the unit with the unit operator identified as \"unit_operator\".\n", - "\n", - " - `unit_params` provides various parameters for the nuclear power plant unit, including minimum and maximum power, bidding strategies, and technology type.\n", - "\n", - " - `forecaster=nuclear_forecast` associates the nuclear forecast (`nuclear_forecast`) with the nuclear power plant unit, allowing the unit to utilize this forecast for its behavior within the simulation." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Running the Simulation\n", - "\n", - "Finally, we run the simulation to observe the market behaviors and outcomes." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "world_script_simulation 2023-12-05 00:00:00: : 5356801.0it [00:03, 1534875.35it/s] \n" - ] - } - ], - "source": [ - "world.run()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Conclusion\n", - "\n", - "In this notebook, we have demonstrated the basic steps involved in setting up and running a simulation using the ASSUME framework for simulating electricity markets. This example is intended to provide a detailed overview of internal workings of the framework and its components. This approach can be used for small simulations with a few agents and markets. In the next notebook we will explore how this process is automated for large scale simulation using input files." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# The whole code as a single cell\n" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - " 0%| | 0/7689600 [00:00 Orderbook:\n", + " \"\"\"\n", + " Calculate bids for a unit -> STEP 1 & 2\n", + "\n", + " :param unit: Unit to calculate bids for\n", + " :type unit: SupportsMinMax\n", + " :param market_config: Market configuration\n", + " :type market_config: MarketConfig\n", + " :param product_tuples: Product tuples\n", + " :type product_tuples: list[Product]\n", + " :return: Bids containing start time, end time, price and volume\n", + " :rtype: Orderbook\n", + "\n", + " \"\"\"\n", + "\n", + " bid_quantity_inflex, bid_price_inflex = 0, 0\n", + " bid_quantity_flex, bid_price_flex = 0, 0\n", + "\n", + " start = product_tuples[0][0]\n", + " end = product_tuples[0][1]\n", + " # get technical bounds for the unit output from the unit\n", + " min_power, max_power = unit.calculate_min_max_power(start, end)\n", + " min_power = min_power[start]\n", + " max_power = max_power[start]\n", + "\n", + " # =============================================================================\n", + " # 1. Get the Observations, which are the basis of the action decision\n", + " # =============================================================================\n", + " next_observation = self.create_observation(\n", + " unit=unit,\n", + " start=start,\n", + " end=end,\n", + " )\n", + "\n", + " # =============================================================================\n", + " # 2. Get the Actions, based on the observations\n", + " # =============================================================================\n", + " actions, noise = self.get_actions(next_observation)\n", + "\n", + " bids = actions\n", + "\n", + " return bids" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "_4cJ8Y8uvMgV" + }, + "outputs": [], + "source": [ + "# magic to enable class definitions across colab cells\n", + "%%add_to RLStrategy\n", + "\n", + "def calculate_reward(\n", + " self,\n", + " unit,\n", + " marketconfig: MarketConfig,\n", + " orderbook: Orderbook,\n", + "):\n", + " \"\"\"\n", + " Calculate reward\n", + "\n", + " :param unit: Unit to calculate reward for\n", + " :type unit: SupportsMinMax\n", + " :param marketconfig: Market configuration\n", + " :type marketconfig: MarketConfig\n", + " :param orderbook: Orderbook\n", + " :type orderbook: Orderbook\n", + "\n", + " \"\"\"\n", + "\n", + " return None" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Jgjx14997Y9s" + }, + "source": [ + "## 3.2 Get an observation\n", + "\n", + "The decision about the observations received by each agent plays a crucial role when designing a multi-agent RL setup. The following describes the task of learning agents representing profit-maximizing electricity market participants who either sell a generating unit's output or optimize a storage unit's operation. They are represented through their plants' techno-economic parameters, such as minimal operational capacity $P^{min}$, start-up $c^{su}$, and shut-down $c^{sd}$ costs. This information is all know by the unit istself and, hence, also accessible in the bidding strategy.\n", + "\n", + "During the training phase, the centralized critic receives observations from all agents, resulting in an input size that grows linearly with the number of agents. This can lead to unstable training behavior of the critic networks, which limits the maximal number of agents in the simulation. This effect is known as the dimensionality curse, which likely contributed to the small number of learning agents in existing approaches. To address the dimensionality curse, we use a single observation that is the same for all agents and added a small size of unique observations for each agent to improve their performance. This modification allows the use of only one observation in the centralized critic, decoupled from the number of learning agents, significantly reducing the observation size and enabling simultaneous training of hundreds of learning agents with stable training behavior. The only limiting factor is the available working memory.\n", + "\n", + "At time-step $t$, agent $i$ receives the observation $o_{i,t}$ consisting of vectors $[L_{\\mathrm{h},t}, L_{\\mathrm{f},t}, M_{\\mathrm{h},t}, M_{\\mathrm{f},t}, mc_{i,t}]$. Here $L_{\\mathrm{h},t}, L_{\\mathrm{f},t}$ and $M_{\\mathrm{h},t}, M_{\\mathrm{f},t}$ are the past and the forecast residual loads and market prices, respectively. These information stems from the world, where a overall forecasting role generates them. The price forecast is calculated ahead of the simulation run using a simple merit order model based on the residual load forecast and the marginal cost of power plants. This part of the observation is the same for all agents. In addition, each agent receives its current marginal cost $mc_{i,t}$. Information about the marginal cost is shared with a centralized critic during the training phase. Still, it is not shared with other agents during the execution phase. All the inputs are normalized to improve the performance of the training process.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PngYyvs72UxB" + }, + "source": [ + "### **Task 1**\n", + "**Goal**: With the help of the *unit*, the *starttime* and the *endtime* we want to create the Observations for the unit.\n", + "\n", + "There are 4 different observations:\n", + "- residual load forecast\n", + "- price forecast\n", + "- total capacity of the unit\n", + "- marginal costs of the unit\n", + "\n", + "For all observations we need scaling factors. Why do you think it is important to scale the input? How would you define the scaling factors?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0ww-L9fABnw3" + }, + "outputs": [], + "source": [ + "#magic to enable class definitions across colab cells\n", + "%%add_to RLStrategy\n", + "\n", + "def create_observation(\n", + " self,\n", + " unit: SupportsMinMax,\n", + " start: datetime,\n", + " end: datetime,\n", + "):\n", + " \"\"\"\n", + " Create observation\n", + "\n", + " :param unit: Unit to create observation for\n", + " :type unit: SupportsMinMax\n", + " :param start: Start time\n", + " :type start: datetime\n", + " :param end: End time\n", + " :type end: datetime\n", + " :return: Observation\n", + " :rtype: torch.Tensor\n", + "\n", + " \"\"\"\n", + " \n", + " end_excl = end - unit.index.freq\n", + "\n", + " # get the forecast length depending on the tme unit considered in the modelled unit\n", + " forecast_len = pd.Timedelta((self.foresight - 1) * unit.index.freq)\n", + "\n", + " # =============================================================================\n", + " # 1.1 Get the Observations, which are the basis of the action decision\n", + " # =============================================================================\n", + " scaling_factor_res_load = #TODO\n", + "\n", + " # price forecast\n", + " scaling_factor_price = #TODO\n", + "\n", + " # total capacity and marginal cost\n", + " scaling_factor_total_capacity = #TODO\n", + "\n", + " # marginal cost\n", + " # Obs[2*foresight+1:2*foresight+2]\n", + " scaling_factor_marginal_cost = #TODO\n", + "\n", + " # checks if we are at end of simulation horizon, since we need to change the forecast then\n", + " # for residual load and price forecast and scale them\n", + " if end_excl + forecast_len > unit.forecaster[\"residual_load_EOM\"].index[-1]:\n", + " scaled_res_load_forecast = (\n", + " unit.forecaster[\"residual_load_EOM\"].loc[start:].values\n", + " / scaling_factor_res_load\n", + " )\n", + " scaled_res_load_forecast = np.concatenate(\n", + " [\n", + " scaled_res_load_forecast,\n", + " unit.forecaster[\"residual_load_EOM\"].iloc[\n", + " : self.foresight - len(scaled_res_load_forecast)\n", + " ],\n", + " ]\n", + " )\n", + "\n", + " else:\n", + " scaled_res_load_forecast = (\n", + " unit.forecaster[\"residual_load_EOM\"]\n", + " .loc[start : end_excl + forecast_len]\n", + " .values\n", + " / scaling_factor_res_load\n", + " )\n", + "\n", + " if end_excl + forecast_len > unit.forecaster[\"price_EOM\"].index[-1]:\n", + " scaled_price_forecast = (\n", + " unit.forecaster[\"price_EOM\"].loc[start:].values / scaling_factor_price\n", + " )\n", + " scaled_price_forecast = np.concatenate(\n", + " [\n", + " scaled_price_forecast,\n", + " unit.forecaster[\"price_EOM\"].iloc[\n", + " : self.foresight - len(scaled_price_forecast)\n", + " ],\n", + " ]\n", + " )\n", + "\n", + " else:\n", + " scaled_price_forecast = (\n", + " unit.forecaster[\"price_EOM\"].loc[start : end_excl + forecast_len].values\n", + " / scaling_factor_price\n", + " )\n", + "\n", + " # get last accapted bid volume and the current marginal costs of the unit\n", + " current_volume = unit.get_output_before(start)\n", + " current_costs = unit.calc_marginal_cost_with_partial_eff(current_volume, start)\n", + "\n", + " # scale unit outpus\n", + " scaled_total_capacity = current_volume / scaling_factor_total_capacity\n", + " scaled_marginal_cost = current_costs / scaling_factor_marginal_cost\n", + "\n", + " # concat all obsverations into one array\n", + " observation = np.concatenate(\n", + " [\n", + " scaled_res_load_forecast,\n", + " scaled_price_forecast,\n", + " np.array([scaled_total_capacity, scaled_marginal_cost]),\n", + " ]\n", + " )\n", + "\n", + " # transfer arry to GPU for NN processing\n", + " observation = (\n", + " th.tensor(observation, dtype=self.float_type)\n", + " .to(self.device, non_blocking=True)\n", + " .view(-1)\n", + " )\n", + "\n", + " return observation.detach().clone()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kDYKZGERKJ6V" + }, + "source": [ + "### **Solution 1**\n", + "\n", + "First why do we scale?\n", + "\n", + "Scaling observations is a crucial preprocessing step in machine learning, including reinforcement learning. It involves transforming the features so that they all fall within a similar numerical range. This is important for several reasons. Firstly, it aids in numerical stability during training. Large input values can lead to numerical precision issues, potentially causing the algorithm to perform poorly or even fail to converge. By scaling the features, we mitigate this risk, ensuring a more stable and reliable learning process.\n", + "\n", + "Additionally, scaling promotes uniformity in the learning process. Many optimization algorithms, like gradient descent, adjust model parameters based on the magnitude of gradients. When features have vastly different scales, some may dominate the learning process, while others receive less attention. This imbalance can hinder convergence and result in a suboptimal model. Scaling addresses this issue, allowing the algorithm to treat all features equally and progress more efficiently towards an optimal solution. This not only expedites the learning process but also enhances the model's ability to generalize to new, unseen data. In essence, scaling observations is a fundamental practice that enhances the performance and robustness of machine learning models across a wide array of applications.\n", + "\n", + "According to this the scaling should ensure a similar range for all input parameteres. You can achieve that by chosing the following scaling factors. If you add new observations, choose your scaling factors wisely." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 70 + }, + "id": "PYoI3ncSKJSX", + "outputId": "4b4341d7-5a21-49c4-ee25-b8c55f693cd1" + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "#scaling factors for all observations\n", + "#residual load forecast\n", + "scaling_factor_res_load = self.max_demand\n", + "\n", + "# price forecast\n", + "scaling_factor_price = self.max_bid_price\n", + "\n", + "# total capacity\n", + "scaling_factor_total_capacity = unit.max_power\n", + "\n", + "# marginal cost\n", + "scaling_factor_marginal_cost = self.max_bid_price\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rW_1op6fCTV-" + }, + "source": [ + "## 3.3 Choose an action\n", + "\n", + "To differentiate between the inflexible and flexible parts of a plant's generation capacity, we split the bids into two parts. The first bid part allows agents to bid a very low or even negative price for the inflexible capacity; this reflects the agent's motivation to stay infra-marginal during periods of very low net load (e.g., in periods of high solar and wind power generation) to avoid the cost of a shut-down and subsequent start-up of the plant. The flexible part of the capacity can be offered at a higher price to provide chances for higher profits. The actions of agent $i$ at time-step $t$ are defined as $a_{i,t} = [ep^\\mathrm{inflex}_{i,t}, ep^\\mathrm{flex}_{i,t}] \\in [ep^{min},ep^{max}]$, where $ep^\\mathrm{inflex}_{i,t}$ and $ep^\\mathrm{flex}_{i,t}$ are bid prices for the inflexible and flexible capacities, and $ep^{min},ep^{max}$ are minimal and maximal bid prices, respectively.\n", + "\n", + "How do we learn, how to make good decisions? Basically by try and error, also know as **exploration**. Exploration is a fundamental concept in reinforcement learning, representing the strategy by which an agent interacts with its environment to gather information about the consequences of its actions. This is crucial because without exploration, the agent might settle for suboptimal policies based on its initial knowledge, limiting its ability to discover more rewarding states or actions.\n", + "\n", + "In the initial stages of training, also often called initial exploration, it's imperative to employ almost random actions. This means having the agent take actions purely by chance. This seemingly counterintuitive approach serves a critical purpose. Initially, the agent lacks any meaningful information about the environment, making it impossible to make informed decisions. By taking random actions, it can quickly gather a broad range of experiences, allowing it to grasp the fundamental structure of the environment. These random actions serve as a kind of \"baseline exploration,\" providing a starting point from which the agent can refine its policy through learning. With our domain knowledge we can even guide the initial exploration process, to enhance learning capabilities.\n", + "\n", + "\n", + "Following up on these concepts the following tasks will:\n", + "1. obtain the action values from the neurnal net in the bidding staretgy and\n", + "2. then transform theses values into the actual bids of an order. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Cho84Pqs2N2G" + }, + "source": [ + "### **Task 2.1**\n", + "**Goal**: With the observations and noise we generate actions\n", + "\n", + "In the following task we define the actions for the initial exploration mode. As described before we can guide it by not letting it choose random actions but defining a base-bid on which we add a good amount of noise. In this way the initial strategy starts from a solution that we know works somewhat well. Define the respective base bid in the followin code. Remeber we are defining bids for a conventional power plant bidding in an Energy-Only-Market with a uniform pricing auction. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8ehlm5Z9CbRw" + }, + "outputs": [], + "source": [ + "#magic to enable class definitions across colab cells\n", + "%%add_to RLStrategy\n", + "def get_actions(self, next_observation):\n", + " \"\"\"\n", + " Get actions\n", + "\n", + " :param next_observation: Next observation\n", + " :type next_observation: torch.Tensor\n", + " :return: Actions\n", + " :rtype: torch.Tensor\n", + " \"\"\"\n", + "\n", + " # distinction whetere we are in learning mode or not to handle exploration realised with noise\n", + " if self.learning_mode:\n", + " # if we are in learning mode the first x episodes we want to explore the entire action space\n", + " # to get a good initial experience, in the area around the costs of the agent\n", + " if self.collect_initial_experience_mode:\n", + " # define current action as soley noise\n", + " noise = (\n", + " th.normal(\n", + " mean=0.0, std=0.2, size=(1, self.act_dim), dtype=self.float_type\n", + " )\n", + " .to(self.device)\n", + " .squeeze()\n", + " )\n", + "\n", + " # =============================================================================\n", + " # 2.1 Get Actions and handle exploration\n", + " # =============================================================================\n", + " #==> YOUR CODE HERE\n", + " base_bid = #TODO\n", + "\n", + " # add niose to the last dimension of the observation\n", + " # needs to be adjusted if observation space is changed, because only makes sense\n", + " # if the last dimension of the observation space are the marginal cost\n", + " curr_action = noise + base_bid.clone().detach()\n", + "\n", + " else:\n", + " # if we are not in the initial exploration phase we chose the action with the actor neuronal net\n", + " # and add noise to the action\n", + " curr_action = self.actor(next_observation).detach()\n", + " noise = th.tensor(\n", + " self.action_noise.noise(), device=self.device, dtype=self.float_type\n", + " )\n", + " curr_action += noise\n", + " else:\n", + " # if we are not in learning mode we just use the actor neuronal net to get the action without adding noise\n", + "\n", + " curr_action = self.actor(next_observation).detach()\n", + " noise = tuple(0 for _ in range(self.act_dim))\n", + "\n", + " curr_action = curr_action.clamp(-1, 1)\n", + "\n", + " return curr_action, noise\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OTaqkwV3xcf6" + }, + "source": [ + "### **Solution 2.1**\n", + "\n", + "So how do we define the base bid?\n", + "\n", + "Assuming the described auction is a efficient market with full information and competition, we know that bidding the marginal costs of the power plant is the economically best bid. With the RL strategy we can recreate the abuse of market power and incomplete information, which enables us to model different market settings. Yet, starting of with the theoretically styleized optimal solution guides our RL agents porperly. As the marginal costs of the power plant are part of the oberservations we can define the base bid in the following way. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "id": "rfXJBGOKxbk7", + "outputId": "06f76c52-e215-4998-8f61-f7492b880e4d" + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "#base_bid = marginal costs\n", + "base_bid = next_observation[-1] # = marginal_costs\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "B5Hgh88Vz0wD" + }, + "source": [ + "### **Task 2.2**\n", + "**Goal: Define the actual bids with the outputs of the actors\n", + "\n", + "Similarly to every other output of a neuronal network, the actions are in the range of 0-1. These values need to be translated into the actual bids $a_{i,t} = [ep^\\mathrm{inflex}_{i,t}, ep^\\mathrm{flex}_{i,t}] \\in [ep^{min},ep^{max}]$. This can be done in a way that further helps the RL agent to learn, if we put some thought into.\n", + "\n", + "For this we go back into the calculate_bids() function and instead of just defining bids=actions, which was just a place holder, we actually make them into bids. Think about a smart way to transform them and fill the gaps in the following code. Remember:\n", + "\n", + " - *bid_quantity_inflex* represent the inflexible part of the bid. This represents the minimum run capacity of the unit.\n", + " - *bid_quantity_flex* represent the flexible part of the bid. This represents the flexible capacity of the unit." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Y81HzlkjNHJ0" + }, + "outputs": [], + "source": [ + "#magic to enable class definitions across colab cells\n", + "%%add_to RLStrategy\n", + "def calculate_bids(\n", + " self,\n", + " unit: SupportsMinMax,\n", + " market_config: MarketConfig,\n", + " product_tuples: list[Product],\n", + " **kwargs,\n", + ") -> Orderbook:\n", + " \"\"\"\n", + " Calculate bids for a unit\n", + "\n", + " :param unit: Unit to calculate bids for\n", + " :type unit: SupportsMinMax\n", + " :param market_config: Market configuration\n", + " :type market_config: MarketConfig\n", + " :param product_tuples: Product tuples\n", + " :type product_tuples: list[Product]\n", + " :return: Bids containing start time, end time, price and volume\n", + " :rtype: Orderbook\n", + " \"\"\"\n", + "\n", + " bid_quantity_inflex, bid_price_inflex = 0, 0\n", + " bid_quantity_flex, bid_price_flex = 0, 0\n", + "\n", + " start = product_tuples[0][0]\n", + " end = product_tuples[0][1]\n", + " # get technical bounds for the unit output from the unit\n", + " min_power, max_power = unit.calculate_min_max_power(start, end)\n", + " min_power = min_power[start]\n", + " max_power = max_power[start]\n", + "\n", + " # =============================================================================\n", + " # 1. Get the Observations, which are the basis of the action decision\n", + " # =============================================================================\n", + " next_observation = self.create_observation(\n", + " unit=unit,\n", + " start=start,\n", + " end=end,\n", + " )\n", + "\n", + " # =============================================================================\n", + " # 2. Get the Actions, based on the observations\n", + " # =============================================================================\n", + " actions, noise = self.get_actions(next_observation)\n", + "\n", + " bids = actions\n", + "\n", + " # =============================================================================\n", + " # 3.2 Transform Actions into bids\n", + " # =============================================================================\n", + " #==> YOUR CODE HERE\n", + " # actions are in the range [0,1], we need to transform them into actual bids\n", + " # we can use our domain knowledge to guide the bid formulation\n", + " bid_prices = actions * self.max_bid_price\n", + "\n", + " # 3.1 formulate the bids for Pmin\n", + " # Pmin, the minium run capacity is the inflexible part of the bid, which should always be accepted\n", + " bid_quantity_inflex = min_power\n", + " bid_price_inflex = #TODO\n", + "\n", + " # 3.1 formulate the bids for Pmax - Pmin\n", + " # Pmin, the minium run capacity is the inflexible part of the bid, which should always be accepted\n", + " bid_quantity_flex = max_power - bid_quantity_inflex\n", + " bid_price_flex = #TODO\n", + "\n", + " # actually formulate bids in orderbook format\n", + " bids = [\n", + " {\n", + " \"start_time\": start,\n", + " \"end_time\": end,\n", + " \"only_hours\": None,\n", + " \"price\": bid_price_inflex,\n", + " \"volume\": bid_quantity_inflex,\n", + " },\n", + " {\n", + " \"start_time\": start,\n", + " \"end_time\": end,\n", + " \"only_hours\": None,\n", + " \"price\": bid_price_flex,\n", + " \"volume\": bid_quantity_flex,\n", + " },\n", + " ]\n", + "\n", + " # store results in unit outputs which are written to database by unit operator\n", + " unit.outputs[\"rl_observations\"][start] = next_observation\n", + " unit.outputs[\"rl_actions\"][start] = actions\n", + " unit.outputs[\"rl_exploration_noise\"][start] = noise\n", + "\n", + " return bids" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3n-kJeOFCfRB" + }, + "source": [ + "### **Solution 2.2**\n", + "\n", + "So how do we define the actual bid from the action?\n", + "\n", + "We have the bid price for the minimum power (inflex) and the rest of the power. As the power plant needs to run at minimal the minum power in order to offer generation in general, it makes sense to offer this generation at a lower price than the rest of the power. Hence, we can alocate the actions to the bid prices in the following way. In addition, the actions need to be rescaled of course.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 70 + }, + "id": "wB7X-pFkCje3", + "outputId": "ff905a9d-e3f2-4487-9e8a-9dbf4e855ab7" + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "#calculate actual bids\n", + "#rescale actions to actual prices\n", + "bid_prices = actions * self.max_bid_price\n", + "\n", + "#calculate inflexible part of the bid\n", + "bid_quantity_inflex = min_power\n", + "bid_price_inflex = min(bid_prices)\n", + "\n", + "#calculate flexible part of the bid\n", + "bid_quantity_flex = max_power - bid_quantity_inflex\n", + "bid_price_flex = max(bid_prices)\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hr15xKuGCkbn" + }, + "source": [ + "## 3.4 Get a reward\n", + "This step is done in the *calculate_reward*()-function, which is called after the market is cleared and we get the market feedback, so we can calculate the profit. In RL, the design of a reward function is as important as the choice of the correct algorithm. During the initial phase of the work, pure economic reward in the form of the agent's profit was used. Typically, electricity market models consider only a single restart cost. Still, in the case of using RL, the split into shut-down and start-up costs allow the agents to better differentiate between these two events and learn a better policy.\n", + "\n", + "\n", + "\\begin{equation}\n", + "\\pi_{i,t} =\n", + "\\begin{cases}\n", + "P^\\text{conf}_{i,t} (M_t - mc_{i,t}) dt - c^{su}_i & \\text{if $P^\\text{conf}_{i,t}$ $\\geq P^{min}_i$} \\\\\n", + "& \\text{and $P_{i,t-1}$ $= 0$} \\\\\n", + "P^\\text{conf}_{i,t} (M_t - mc_{i,t}) dt & \\text{if $P^\\text{conf}_{i,t}$ $\\geq P^{min}_i$} \\\\\n", + "& \\text{and $P_{i,t-1}$ $\\neq 0$} \\\\\n", + "- c^{sd}_i & \\text{if $P^\\text{conf}_{i,t}$ $\\leq P^{min}_i$} \\\\\n", + "& \\text{and $P_{i,t-1}$ $\\neq 0$} \\\\\n", + "0 & \\text{otherwise} \\\\\n", + "\\end{cases}\n", + "\\end{equation}\n", + "\n", + "\n", + "In this equation, $P^\\text{conf}$ is the confirmed capacity on the market, $P^{min}$ --- minimal stable capacity, $M$ --- market clearing price, $mc$ --- marginal generation cost, $dt$ --- market time resolution, $c^{su}, c^{sd}$ --- start-up and shut-down costs, respectively.\n", + "\n", + "The profit-driven reward function was sufficient for a few agents, but the learning performance decreased significantly with more agents. Therefore, we add an additional regret term $cm$." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aGyaOUgo3Y8Q" + }, + "source": [ + "### **Task 3**\n", + "**Goal**: Define the reward guiding the learning process of the agent.\n", + "\n", + "As the reward plays such a crucial role in the learning think of ways how to integrate further signals exceeding the monetary profit. One example could be integrating a regret term, namely the opportunity costs. Your task is to define the rewrad using the opportunity costs and to scale it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "U9HX41mODuBU" + }, + "outputs": [], + "source": [ + "#magic to enable class definitions across colab cells\n", + "%%add_to RLStrategy\n", + "def calculate_reward(\n", + " self,\n", + " unit,\n", + " marketconfig: MarketConfig,\n", + " orderbook: Orderbook,\n", + " ):\n", + " \"\"\"\n", + " Calculate reward\n", + "\n", + " :param unit: Unit to calculate reward for\n", + " :type unit: SupportsMinMax\n", + " :param marketconfig: Market configuration\n", + " :type marketconfig: MarketConfig\n", + " :param orderbook: Orderbook\n", + " :type orderbook: Orderbook\n", + " \"\"\"\n", + "\n", + " # =============================================================================\n", + " # 3. Calculate Reward\n", + " # =============================================================================\n", + " # function is called after the market is cleared and we get the market feedback,\n", + " # so we can calculate the profit\n", + "\n", + " product_type = marketconfig.product_type\n", + "\n", + " profit = 0\n", + " reward = 0\n", + " opportunity_cost = 0\n", + "\n", + " # iterate over all orders in the orderbook, to calculate order specific profit\n", + " for order in orderbook:\n", + " start = order[\"start_time\"]\n", + " end = order[\"end_time\"]\n", + " end_excl = end - unit.index.freq\n", + "\n", + " # depending on way the unit calaculates marginal costs we take costs\n", + " if unit.marginal_cost is not None:\n", + " marginal_cost = (\n", + " unit.marginal_cost[start]\n", + " if len(unit.marginal_cost) > 1\n", + " else unit.marginal_cost\n", + " )\n", + " else:\n", + " marginal_cost = unit.calc_marginal_cost_with_partial_eff(\n", + " power_output=unit.outputs[product_type].loc[start:end_excl],\n", + " timestep=start,\n", + " )\n", + "\n", + " duration = (end - start) / timedelta(hours=1)\n", + "\n", + " # calculate profit as income - running_cost from this event\n", + " price_difference = order[\"accepted_price\"] - marginal_cost\n", + " order_profit = price_difference * order[\"accepted_volume\"] * duration\n", + "\n", + " # calculate opportunity cost\n", + " # as the loss of income we have because we are not running at full power\n", + " order_opportunity_cost = (\n", + " price_difference\n", + " * (\n", + " unit.max_power - unit.outputs[product_type].loc[start:end_excl]\n", + " ).sum()\n", + " * duration\n", + " )\n", + "\n", + " # if our opportunity costs are negative, we did not miss an opportunity to earn money and we set them to 0\n", + " order_opportunity_cost = max(order_opportunity_cost, 0)\n", + "\n", + " # collect profit and opportunity cost for all orders\n", + " opportunity_cost += order_opportunity_cost\n", + " profit += order_profit\n", + "\n", + " # consideration of start-up costs, which are evenly divided between the\n", + " # upward and downward regulation events\n", + " if (\n", + " unit.outputs[product_type].loc[start] != 0\n", + " and unit.outputs[product_type].loc[start - unit.index.freq] == 0\n", + " ):\n", + " profit = profit - unit.hot_start_cost / 2\n", + " elif (\n", + " unit.outputs[product_type].loc[start] == 0\n", + " and unit.outputs[product_type].loc[start - unit.index.freq] != 0\n", + " ):\n", + " profit = profit - unit.hot_start_cost / 2\n", + "\n", + " # =============================================================================\n", + " # =============================================================================\n", + " # ==> YOUR CODE HERE\n", + " # The straight forward implemntation would be reward = profit, yet we would like to give the agent more guidance\n", + " # in the learning process, so we add a regret term to the reward, which is the opportunity cost\n", + " # define the reward and scale it\n", + "\n", + " scaling = #TODO\n", + " regret_scale = #TODO\n", + " reward = #TODO\n", + "\n", + " # store results in unit outputs which are written to database by unit operator\n", + " unit.outputs[\"profit\"].loc[start:end_excl] += profit\n", + " unit.outputs[\"reward\"].loc[start:end_excl] = reward\n", + " unit.outputs[\"regret\"].loc[start:end_excl] = opportunity_cost" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gWF7D4QA2-kz" + }, + "source": [ + "### **Solution 3**\n", + "\n", + "So how do we define the actual reward?\n", + "\n", + "We use the opportunity costs for further guidance, which quantify the expected contribution margin, as defined by the following equation, with $P^{max}$ as the maximal available capacity.\n", + "\n", + "\\begin{equation}\n", + "cm_{i,t} = \\max[(P^{max}_i - P^\\text{conf}_{i,t}) (M_t - mc_{i,t}) dt, 0]\n", + "\\end{equation}\n", + "\n", + "The regret term gives a negative signal to the agent when there is opportunity cost due to the unsold capacity, thus correcting the agent's actions. This term also introduces an increased influence of the competition between agents in learning. By minimizing the regret, the agents drive the bid prices closer to the marginal generation cost, which drives the market price down.\n", + "\n", + "The reward of agent $i$ at time-step $t$ is defined by the equation below.\n", + "\n", + "\\begin{equation}\n", + "R_{i,t} = \\pi_{i,t} + \\beta cm_{i,t}\n", + "\\end{equation}\n", + "\n", + "Here, $\\beta$ is the regret scaling factor to adjust the ratio between profit-maximizing and regret-minimizing learning.\n", + "\n", + "The described reward function has proven to perform well even with many agents and to accelerate learning convergence. This is because minimizing the regret term drives the overall system to equilibrium. At a point close to the equilibrium point, the average reward of all agents would converge to a constant value since further policy changes would not lead to an additional reduction in regrets or an increase in profits. Therefore, the average reward value can also be a good indicator of learning performance and convergence." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 52 + }, + "id": "e1XdVXPSCo_k", + "outputId": "585d94a5-7475-4e96-d0a1-5e82b711c6a5" + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "scaling = 0.1 / unit.max_power\n", + "regret_scale = 0.2\n", + "reward = float(profit - regret_scale * opportunity_cost) * scaling\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "L3flH5iY4x7Z" + }, + "source": [ + "## 3.5 Start the simulation\n", + "\n", + "We are almost done with all the changes to actually be able to make ASSUME learn here in google colab. If you would rather like to load our pretrained strategies, we need a function for loading parameters, which can be found below. \n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZwVtpK3B5gR6" + }, + "outputs": [], + "source": [ + "# magic to enable class definitions across colab cells\n", + "%%add_to RLStrategy\n", + "\n", + "def load_actor_params(self, load_path):\n", + " \"\"\"\n", + " Load actor parameters\n", + "\n", + " :param simulation_id: Simulation ID\n", + " :type simulation_id: str\n", + " \"\"\"\n", + " directory = f\"{load_path}/actors/actor_{self.unit_id}.pt\"\n", + "\n", + " params = th.load(directory, map_location=self.device)\n", + "\n", + " self.actor = Actor(self.obs_dim, self.act_dim, self.float_type)\n", + " self.actor.load_state_dict(params[\"actor\"])\n", + "\n", + " if self.learning_mode:\n", + " self.actor_target = Actor(self.obs_dim, self.act_dim, self.float_type)\n", + " self.actor_target.load_state_dict(params[\"actor_target\"])\n", + " self.actor_target.eval()\n", + " self.actor.optimizer.load_state_dict(params[\"actor_optimizer\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cTlqMouufKyo" + }, + "source": [ + "To control the learning process, the config file determines the parameters of the learning algorithm. As we want to temper with these values in the notebook we will overwrite the learning config in the next cell and then load it into our world. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "moZ_UD7FfkOh" + }, + "outputs": [], + "source": [ + "learning_config = {\n", + " \"observation_dimension\": 50,\n", + " \"action_dimension\": 2,\n", + " \"continue_learning\": False,\n", + " \"load_model_path\": \"None\",\n", + " \"max_bid_price\": 100,\n", + " \"algorithm\": \"matd3\",\n", + " \"learning_rate\": 0.001,\n", + " \"training_episodes\": 100,\n", + " \"episodes_collecting_initial_experience\": 5,\n", + " \"train_freq\": 24,\n", + " \"gradient_steps\": -1,\n", + " \"batch_size\": 256,\n", + " \"gamma\": 0.99,\n", + " \"device\": \"cpu\",\n", + " \"noise_sigma\": 0.1,\n", + " \"noise_scale\": 1,\n", + " \"noise_dt\": 1,\n", + " \"validation_episodes_interval\": 5,\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "iPz8v4N5hpfr" + }, + "outputs": [], + "source": [ + "# Read the YAML file\n", + "with open(\"assume/examples/inputs/example_02a/config.yaml\", \"r\") as file:\n", + " data = yaml.safe_load(file)\n", + "\n", + "# store our modifications to the config file\n", + "data[\"base\"][\"learning_mode\"] = True\n", + "data[\"base\"][\"learning_config\"] = learning_config\n", + "\n", + "# Write the modified data back to the file\n", + "with open(\"assume/examples/inputs/example_02a/config.yaml\", \"w\") as file:\n", + " yaml.safe_dump(data, file)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZlRnTgCy5d9W" + }, + "source": [ + "In order to let the simulation run with the integrated learning we need to touch up the main file that runs it in the following way." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "ZlWxXxZr54WV", + "outputId": "e30f4279-7a4e-4efc-9cfb-61416e4fe2f1" + }, + "outputs": [], + "source": [ + "log = logging.getLogger(__name__)\n", + "\n", + "csv_path = \"./outputs\"\n", + "os.makedirs(\"./local_db\", exist_ok=True)\n", + "\n", + "if __name__ == \"__main__\":\n", + " \"\"\"\n", + " Available examples:\n", + " - local_db: without database and grafana\n", + " - timescale: with database and grafana (note: you need docker installed)\n", + " \"\"\"\n", + " data_format = \"local_db\" # \"local_db\" or \"timescale\"\n", + "\n", + " if data_format == \"local_db\":\n", + " db_uri = \"sqlite:///./local_db/assume_db.db\"\n", + " elif data_format == \"timescale\":\n", + " db_uri = \"postgresql://assume:assume@localhost:5432/assume\"\n", + "\n", + " input_path = \"assume/examples/inputs\"\n", + " scenario = \"example_02a\"\n", + " study_case = \"base\"\n", + "\n", + " # create world\n", + " world = World(database_uri=db_uri, export_csv_path=csv_path)\n", + "\n", + " # we import our defined bidding strategey class including the learning into the world bidding strategies\n", + " # in the example files we provided the name of the learning bidding strategeis in the input csv is \"pp_learning\"\n", + " # hence we define this strategey to be one of the learning class\n", + " world.bidding_strategies[\"pp_learning\"] = RLStrategy\n", + "\n", + " # then we load the scenario specified above from the respective input files\n", + " load_scenario_folder(\n", + " world,\n", + " inputs_path=input_path,\n", + " scenario=scenario,\n", + " study_case=study_case,\n", + " )\n", + "\n", + " # run learning if learning mode is enabled\n", + " # needed as we simulate the modelling horizon multiple times to train reinforcement learning run_learning( world, inputs_path=input_path, scenario=scenario, study_case=study_case, )\n", + "\n", + " if world.learning_config.get(\"learning_mode\", False):\n", + " run_learning(\n", + " world,\n", + " inputs_path=input_path,\n", + " scenario=scenario,\n", + " study_case=study_case,\n", + " )\n", + "\n", + " # after the learning is done we make a normal run of the simulation, which equasl a test run\n", + " world.run()" + ] + } + ], + "metadata": { + "colab": { + "include_colab_link": true, + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "assume-framework", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.3" + }, + "nbsphinx": { + "execute": "never" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/examples/notebooks/04_Reinforcement_learning_example.ipynb.license b/examples/notebooks/04_Reinforcement_learning_example.ipynb.license new file mode 100644 index 00000000..a6ae0636 --- /dev/null +++ b/examples/notebooks/04_Reinforcement_learning_example.ipynb.license @@ -0,0 +1,3 @@ +SPDX-FileCopyrightText: ASSUME Developers + +SPDX-License-Identifier: AGPL-3.0-or-later From ee5e7b9cc7cfe10f714a86f8df7cfcdfa7be9dc0 Mon Sep 17 00:00:00 2001 From: Nick Harder Date: Mon, 4 Dec 2023 10:57:39 +0100 Subject: [PATCH 3/4] -add license --- examples/notebooks/01_minimal_manual_example.ipynb.license | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 examples/notebooks/01_minimal_manual_example.ipynb.license diff --git a/examples/notebooks/01_minimal_manual_example.ipynb.license b/examples/notebooks/01_minimal_manual_example.ipynb.license new file mode 100644 index 00000000..a6ae0636 --- /dev/null +++ b/examples/notebooks/01_minimal_manual_example.ipynb.license @@ -0,0 +1,3 @@ +SPDX-FileCopyrightText: ASSUME Developers + +SPDX-License-Identifier: AGPL-3.0-or-later From b717ee638d005c5a5e400d6f78764fd9f7fc1892 Mon Sep 17 00:00:00 2001 From: Nick Harder Date: Mon, 4 Dec 2023 11:03:00 +0100 Subject: [PATCH 4/4] -adjust docs config --- docs/source/conf.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/conf.py b/docs/source/conf.py index 7a68c651..45fdfe7a 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -75,7 +75,7 @@ .. note:: You can `download `_ this example as a Jupyter notebook - or open ir directly in `Google Colab `_. + or try it out directly in `Google Colab `_. """ nbsphinx_allow_errors = True