Skip to content

Commit

Permalink
update FAQ
Browse files Browse the repository at this point in the history
  • Loading branch information
Equim-chan committed Sep 14, 2023
1 parent 40cbe3f commit dced063
Showing 1 changed file with 10 additions and 4 deletions.
14 changes: 10 additions & 4 deletions faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ For instance, if the game has pt setting $w$ and the players' scores are $[29000
| West | 27200 | 24.857 | 29.048 | 31.777 | 14.317 |
| North | 29600 | 37.990 | 28.533 | 23.800 | 9.677 |

Note that these probabilities are esitmates of the **final** rankings at the end of the whole *game*, not after the current *kyoku*.
Note that these probabilities are estimates of the **final** rankings at the end of the whole *game*, not after the current *kyoku*.

To get the $\Phi_k$​ value for the player sitting in the East seat at South 1, we multiply the probabilities with the pt setting, specifically $[0.29532, 0.32512, 0.27416, 0.10539] \cdot w$.
It's important to note that Mortal models do not guarantee to use a fixed pt setting throughout its training.
Expand All @@ -52,7 +52,13 @@ $\pi_\tau(a|s)$, in simple terms, can be thought of something similar to the hei
$$\pi_\tau(a|s) = \frac{\exp(\hat Q^\pi(s, a) / \tau)}{\sum_i \exp(\hat Q^\pi(s, a_i) / \tau)}$$
where $\tau$ is temperature.

Wrapping up, $\hat Q^\pi(s, a)$ is only for advanced users, because it can be very misleading if the user does not understand the subtle details of how Mortal works under the hood. I have been considering whether to just remove the column or not, but in the end I decided to keep it as is. <ins>Just look up $\pi_\tau(a|s)$ as it is easier to understand.</ins>
Wrapping up, $\hat Q^\pi(s, a)$ is only for advanced users, because it can be very misleading if the user does not understand the subtle details of how Mortal works under the hood. To make it clear:

- $\hat Q^\pi(s, a)$ is not 局収支 (round EV).
- $\hat Q^\pi(s, a)$ is not pt.
- $\hat Q^\pi(s, a)$ is not 清算ポイント (end game score).

I have been considering whether to just remove the column or not, but in the end I decided to keep it as is. <ins>Just look up $\pi_\tau(a|s)$ as it is easier to understand.</ins>

## (Mortal) Why do all actions except the best sometimes have significantly lower Q values than that of the best?
As mentioned above, $\hat Q^\pi(s, a) + \Phi_k$ is an estimation to the pt EV. However, the evaluation for this value is <ins>the means but not the objective</ins>. To be clear, the real fundamental objective for Mortal as a mahjong AI is to achieve the best performance in a mahjong game, but not to calculate accurate scores for all actions. As a result, the evaluated values of all actions but the best may be inaccurate; they only serve as a means to determine its preference for exploration in training.
Expand All @@ -66,9 +72,9 @@ ELI5: <ins>Mortal is optimized for playing, not reviewing or attribution.</ins>
Mortal is an end-to-end deep learning model that deploys model-free reinforcement learning, therefore we are unlikely to be able to do any significant attribution work on it. If you insist on wanting a reason for a decision made by Mortal, I would say that in contrast to how humans play, Mortal is not based on so-called "precise calculations", but rather just "intuition".

## (Mortal) The single-line output and the table are in conflict, is it a bug?
![figure](res/agarasu.webp)
![agarasu](res/agarasu.webp)

This is an intentional feature, and in the case shown in the figure, it is a rule-based fail-safe strategy against アガラス (win-to-last-place) in the all-last round.
Not really. This is an intentional feature, and in the case shown in the figure, it is a rule-based fail-safe strategy against アガラス (win-to-be-last-place) in the all-last round.

The single-line output (starting with `Mortal:`) is the actual final decision made by the AI, while the expanded table provides additional, intermediate information that is totally optional and may be altered or even removed in a future version. When they are in conflict, <ins>the single-line output should take precedence.</ins> Furthermore, the table is just a by-product of the AI, and focusing too much on building it may hinder finding better ways to achieve its goal.

Expand Down

0 comments on commit dced063

Please sign in to comment.