adjust query of reward during training #256

nick-harder · 2023-11-30T10:22:28Z

-before it got mean of all rewards
-now it is per unit which is better

-before it got mean of all rewards -now it is per unit which is better

codecov · 2023-11-30T10:25:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (357f0ba) 78.44% compared to head (a1314f7) 78.45%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #256   +/-   ##
=======================================
  Coverage   78.44%   78.45%           
=======================================
  Files          39       39           
  Lines        4259     4260    +1     
=======================================
+ Hits         3341     3342    +1     
  Misses        918      918

Flag	Coverage Δ
pytest	`78.45% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kim-mskw · 2023-11-30T12:15:08Z

I am not quite sure if that is gae theoretic the smartest. We aim to have the highest sum of all rewards not the highest average per unit.

nick-harder · 2023-12-01T08:27:22Z

Why highest sum of all rewards? We want to perfrom each unit as good as possible. In current approach one unit performing really well outshines all other units which haven't learned anything. For example nuclear can earn a lot without much effort and has a huge reward, while others didn't learn much, and their reward is lost.

nick-harder · 2023-12-01T08:28:17Z

also, I have learned, that taking the max reward is not anywhere close to the equilibrium point. We should introduce some mechanism in the future which checks chnages in rewards per unit and exits if no changes in behavior were observed for some period of time.

kim-mskw · 2023-12-05T08:29:54Z

Mhhh, I do not see that according to the game theory the Nash Equilibrium is when the overall welfare is the highest. Since we have a fixed demand that equals the production rent. If the overall welfare (so the absolute sum) is higher when the nuclear plant earns a shit ton of money and the rest do not earn anything than this is the Nash Equilibrium disregarding of the fairness of the result.

nick-harder · 2023-12-05T13:33:56Z

@kim-mskw I don't agree with this definition. Maybe it is the case for some particular designs, but not for a general market setup. NE is when noone deviates from their policy. So ultimately we should have such condition for MADRL setups. But for now I believe the average reward of agents is a better representation compared to sum of all rewards

kim-mskw · 2023-12-07T14:24:51Z

@nick-harder after our bilateral talk I thought about that a lot. You are right the Nash Equilibrium (or one of the multiple) is not the state where the sum of all profits/rewards is maximal, but neither is it when the average profits/rewards of all units are the highest. I mean both are approximations. I could not find evidence in the literature which hints in multi-agent reinforcement learning which metric to rather use, frankly.

I mean with the mean we just divide the sum by the quantity of agents right now. So I came to the conclusion, it should not make any difference anyhow. Hence, my initial thought of it needing to be sum was wrong.

-adjust query of reward during training

3983d34

-before it got mean of all rewards -now it is per unit which is better

nick-harder requested a review from kim-mskw November 30, 2023 10:22

Merge branch 'main' into fix_avg_reward_query

63c355e

Merge branch 'main' into fix_avg_reward_query

a1314f7

kim-mskw approved these changes Dec 7, 2023

View reviewed changes

kim-mskw merged commit dd9a744 into main Dec 7, 2023
4 checks passed

kim-mskw deleted the fix_avg_reward_query branch December 7, 2023 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adjust query of reward during training #256

adjust query of reward during training #256

nick-harder commented Nov 30, 2023

codecov bot commented Nov 30, 2023 •

edited

Loading

kim-mskw commented Nov 30, 2023

nick-harder commented Dec 1, 2023

nick-harder commented Dec 1, 2023

kim-mskw commented Dec 5, 2023

nick-harder commented Dec 5, 2023

kim-mskw commented Dec 7, 2023

adjust query of reward during training #256

adjust query of reward during training #256

Conversation

nick-harder commented Nov 30, 2023

codecov bot commented Nov 30, 2023 • edited Loading

Codecov Report

kim-mskw commented Nov 30, 2023

nick-harder commented Dec 1, 2023

nick-harder commented Dec 1, 2023

kim-mskw commented Dec 5, 2023

nick-harder commented Dec 5, 2023

kim-mskw commented Dec 7, 2023

codecov bot commented Nov 30, 2023 •

edited

Loading