Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjust query of reward during training #256

Merged
merged 3 commits into from
Dec 7, 2023
Merged

Conversation

nick-harder
Copy link
Member

-before it got mean of all rewards
-now it is per unit which is better

-before it got mean of all rewards
-now it is per unit which is better
@nick-harder nick-harder requested a review from kim-mskw November 30, 2023 10:22
Copy link

codecov bot commented Nov 30, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (357f0ba) 78.44% compared to head (a1314f7) 78.45%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #256   +/-   ##
=======================================
  Coverage   78.44%   78.45%           
=======================================
  Files          39       39           
  Lines        4259     4260    +1     
=======================================
+ Hits         3341     3342    +1     
  Misses        918      918           
Flag Coverage Δ
pytest 78.45% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kim-mskw
Copy link
Contributor

I am not quite sure if that is gae theoretic the smartest. We aim to have the highest sum of all rewards not the highest average per unit.

@nick-harder
Copy link
Member Author

Why highest sum of all rewards? We want to perfrom each unit as good as possible. In current approach one unit performing really well outshines all other units which haven't learned anything. For example nuclear can earn a lot without much effort and has a huge reward, while others didn't learn much, and their reward is lost.

@nick-harder
Copy link
Member Author

also, I have learned, that taking the max reward is not anywhere close to the equilibrium point. We should introduce some mechanism in the future which checks chnages in rewards per unit and exits if no changes in behavior were observed for some period of time.

@kim-mskw
Copy link
Contributor

kim-mskw commented Dec 5, 2023

Mhhh, I do not see that according to the game theory the Nash Equilibrium is when the overall welfare is the highest. Since we have a fixed demand that equals the production rent. If the overall welfare (so the absolute sum) is higher when the nuclear plant earns a shit ton of money and the rest do not earn anything than this is the Nash Equilibrium disregarding of the fairness of the result.

@nick-harder
Copy link
Member Author

@kim-mskw I don't agree with this definition. Maybe it is the case for some particular designs, but not for a general market setup. NE is when noone deviates from their policy. So ultimately we should have such condition for MADRL setups. But for now I believe the average reward of agents is a better representation compared to sum of all rewards

@kim-mskw
Copy link
Contributor

kim-mskw commented Dec 7, 2023

@nick-harder after our bilateral talk I thought about that a lot. You are right the Nash Equilibrium (or one of the multiple) is not the state where the sum of all profits/rewards is maximal, but neither is it when the average profits/rewards of all units are the highest. I mean both are approximations. I could not find evidence in the literature which hints in multi-agent reinforcement learning which metric to rather use, frankly.

I mean with the mean we just divide the sum by the quantity of agents right now. So I came to the conclusion, it should not make any difference anyhow. Hence, my initial thought of it needing to be sum was wrong.

@kim-mskw kim-mskw merged commit dd9a744 into main Dec 7, 2023
4 checks passed
@kim-mskw kim-mskw deleted the fix_avg_reward_query branch December 7, 2023 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants