Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

general hat-based strategy discussion #6

Open
WuTheFWasThat opened this issue Feb 27, 2019 · 11 comments
Open

general hat-based strategy discussion #6

WuTheFWasThat opened this issue Feb 27, 2019 · 11 comments

Comments

@WuTheFWasThat
Copy link
Owner

WuTheFWasThat commented Feb 27, 2019

@felixbauckholt

Moving discussion from discord to here, as it's a bit easier to follow this way

Your original comment (modifying format a bit):

I spent a bunch of time getting increasingly confused about my "general hat-based strategy" idea, and now I think it might not be very useful: All the interesting nontrivial tweaks to a hat-based strategy I could think of involve conveying information that can't immediately be made common knowledge, and thus don't fit into a "standard hat-based template" well. The tweaks I was mostly thinking of were

  1. @florrat2 's approach (where a player will only be able to decode a hint once all players between the hinter and them have played)
  2. Whenever there's a choice between different hints that have the same "hat-value" (say two color hints that don't involve the "hint-index" card), using that choice in a "hat-based" way to establish common knowledge of something between all players but the hinted
  3. If the amount of information in a hint is a composite number, "splitting up" the hint into a part that is made common knowledge now, and a part that will be made common knowledge later

If you have other ideas that fit the "general hat-based strategy" template better, let me know!)

I haven't thought much along these lines, but initial thoughts:

  1. Could you explain florrat2's approach? It sounds very tricky, and I don't think I understand how it would work

  2. Very cool idea! Could be a big boost, esp in 4 and 5 player

  3. This sounds similar to 1, also not sure how it would work. What were you thinking exactly?

BTW, I think I imagine not so much "general hat-based strategy" as "general hat-based information strategy"! which the code might already be mostly set up for.

@fpvandoorn
Copy link

Funnily, I was just browsing this repository and saw this issue.

I have a description of my strategy here:
https://github.com/fpvandoorn/hanabi/blob/master/doc_hat_player.md
If you know the hat strategy from the original paper, then I have a short summary of differences at the top of that file.

I changed the strategy quite a bit over the last few days on a separate branch:
https://github.com/fpvandoorn/hanabi/blob/override_clue/players/hat_player.py
The scores I now achieve are closer to the scores here, but I see that Felix has also been improving the bot here.

@WuTheFWasThat
Copy link
Owner Author

WuTheFWasThat commented Feb 28, 2019

RE 1: the section explaining it is "Modified action", right? If I understand correctly, this works only if later players can infer the hint given via earlier player's actions. This seems like a nice trick, although it seems complicated once the "modified actions" start overlapping and stuff, and also with information-style hints, rather than action-style hints. This makes me more excited about 2, personally, as it seems like a clear win!

@felixbauckholt
Copy link
Collaborator

felixbauckholt commented Feb 28, 2019

Oh no, now all my confused claims and empty promises will be out in the open forever! 😱

@fpvandoorn those improvements look super impressive, congrats!

Regarding idea 3: For a stylized example, suppose A is giving a hint to B and C, B wants to know if card X is playable, and C wants to know if card Y is playable and/or dead after B plays/doesn't play X. Also suppose A has 6 possible hints. Then A could transmit information in two stages: First, A encodes whether B can play X and some unrelated ("binary") information for C, in a "hat" way, by choosing between "even" hints and "odd" hints. Then A encodes to C whether Y is playable, dead, or neither, by choosing between the three "even" hints (or the three "odd" hints, respectively).

I don't think this idea will be a "low hanging fruit" very soon unless we stumble upon some way of expressing such a strategy very naturally (and even then, situations like this might turn out not to matter).

Regarding idea 2: With 5 players, we'd need to keep track 6 "common knowledge groups" (one that includes everybody and five that are everybody except one player). Once we refactor our way towards a state where this is doable without too much bloat, we should totally try it! (OTOH, I can totally imagine a world in which it doesn't help too much because once all players except one establish some knowledge, the player left out will spend hints to redundantly establish the same knowledge!)

Other thinks I'm also thinking of:

  1. It's plausible that some of the comparisons with 1.0 are introducing errors somewhere. I'm considering sidestepping all this by replacing the floats with rational numbers.
  2. If we have, say, 10 information and a player wants to ask a question with 7 answers, we could make it so that if the answer is 0, 1, or 2, that player gets to ask another binary question. Similarly, if we have 10 information and a player asks a question with 3 answers, we could make it so that "info_remaining" is 4 if the answer was 0 and 3 otherwise. I previously thought this would be premature optimization, but now I think this could be very powerful because it means you can ask a complicated question that tells you a playable card, but still have a chance to learn one bit about your not-immediately-playable cards.
  3. We could also have channels of transmitting information that only sometimes work: Say A wants to discard, and has one card that's publicly known to be dead, and one card that isn't. A lets every other player ask a binary question, and "hat-sums" the answers. If the sum is 1, A discards the "unknown" card, and every other player learns one bit about their hand. If the sum is 0, A discards the known card, and nobody learns anything.
  4. Our card "probability distributions" currently use the fact that each slot has exactly one card, but not the fact that each card is in exactly one place. We could consider the latter either in an ad-hoc way (say by marking down a card in other slots once it becomes "determined" in one slot, as @WuTheFWasThat's TODOs suggest), but maybe we can use some math to build ourselves a more sophisticated model that's still easy to compute.

I know my language about "asking" questions is probably unclear, but I need sleep. I can clarify anything tomorrow.

@WuTheFWasThat
Copy link
Owner Author

WuTheFWasThat commented Feb 28, 2019

RE confusion: no worries, I always get confused thinking about this stuff!

RE 3: one difficulty is that B has to know that C's question is split, and so the decision to split C's questions cannot be based on non-public knowledge of B's hand. I think that intuitively makes it much less valuable, like I wouldn't want to pay the cost of a bit of information to learn "what should I do if B plays something", given that B probably has no idea what they're playing.

I think a much more obvious low hanging fruit is for the questions to take into account stuff that's public knowledge! like if B already knows he has a playable red 3, all players should understand that A's hint stops considering red 3 playable to players before B (aka nobody)

RE 2: are you sure? it seems useful even with just public info. with 5 players, if 1 gives a hint to another with 1 bit of ambiguity, then 3 players all learn something about their own hand which becomes public info.

RE other things (renumbered with +3):
4. sounds good!
5. i don't see how that works. how would you distinguish between "answer is 0 + 2nd question" and "answer is 3/4"?
6. is the second card privately known to be dead? if not, seems kinda dangerous. if so, leads to more general idea - when discarding, if you can, always discard cards that are only privately known to be dead (via process of elimination), since this gives other players info of the form "i have this card in my hand", which is a new type of info to track!
7. yep, also sounds useful!

Another low hanging fruit... endgame optimization! But I don't wanna think about that :)

@fpvandoorn
Copy link

If I understand correctly, Felix wants to incorporate a hat-based strategy in your framework? I would love to learn more about your framework, and how it exactly works with asking and answering questions between the players. It seems that your strategy nicely uses common knowledge to communicate a lot of information.

RE 1: There are multiple reasons why players cannot fully decode a hint until their turn. One of them is the "modified actions", where the cluer can use all information available to him to decide what the first player reacting to his clue should do. There is another important reason why players cannot immediately decode hints, and that is to avoid double plays.
Suppose Alice clues, and Cathy and Donald both have r1 as their (only) playable card. If Alice tells both of them to play r1, the team will get a strike. We cannot avoid it by only telling Cathy to play red 1, since Cathy will assume that Donald is going to play r1, and interpret the clue accordingly. However, we can only tell Donald to play the r1, and tell Cathy to do something differently. Bob (between Alice and Cathy) can also see all of this, and can therefore correctly interpret his action. However, Donald cannot interpret the clue until Cathy has demonstrated that she is not playing the r1.
The recent additions add a third reason that players cannot fully decode a hint when it is given: if a player tells someone to play r2 because you are going to play r1, there is no way for you to know that until after your turn.

@felixbauckholt
Copy link
Collaborator

I just proposed some refactoring that might hopefully get us closer to a "general hat-based information strategy"!

Regarding idea 5, I'm sorry I didn't elaborate earlier, but this seems like it was much easier to code up than to explain. The idea assumed that the answer to the first question (if it has n answers) is encoded as the hat-value modulo n; when I wrote about it earlier I didn't realize we weren't actually doing that! So I changed this in commit 0eb1522, and implemented the idea in commit 7eae894.

Regarding idea 6, yeah I meant cards that are privately known to be dead. I implemented that in commit 774a8dd. My intuition was that transferring 0.5 bits of information should be very useful since players can decide what they want to ask questions about (and also, the "new type of info to track" might be hard to track), but unfortunately win-rates have barely changed.

@WuTheFWasThat
Copy link
Owner Author

RE 5: ah! I was just being silly. in general, having question dependence seems good, and somehow I didn't notice it was possible before :)

Some other ideas for improving play, while I'm here:

  • re-define "playable" to be (recursively) "playable assuming all publicly playable cards are played"
  • endgame YOLO playing?

@fpvandoorn
Copy link

endgame YOLO playing?

In the last round of the game, you can deduce from what you see the identity of your own cards, but not necessarily the order. If you know that you have a playable card in your hand, but you don't know where, you should blind-play a card in your hand, to see if you get lucky.

For my strategy this was quite a big improvement of about 0.8pp:
fpvandoorn/hanabi@8e117d0
For you this will probably be less, since you get much more information on unplayable cards, but there will still be situations where this is useful.

@WuTheFWasThat
Copy link
Owner Author

WuTheFWasThat commented Mar 6, 2019

yep, that's exactly what i was imagining! good to know that it's a big improvement for you. I think I would a priori expect a similar improvement, as measured by "percentage progress towards perfect score"

edit: looks like it wasn't too big an improvement by that measure, for you. I think I expect a larger improvement, in that case :)

@fpvandoorn
Copy link

It was a 10+% decrease in the amount non-perfect games, which was higher than I expected (it was
a much higher difference than other end-game strategies I implemented).

@felixbauckholt
Copy link
Collaborator

Regarding endgame YOLOing, I tried it in commit a541481. (By the way, is there a good place to put this commit given that it depends on the open PR #7?)
It adds about 0.45 percentage points for three players, and much less for more players.

I'm guessing the next thing to do regarding endgame is to improve the way we calculate probabilities of cards being a specific value: Our CardPossibilityTable based prior is very reasonable at the beginning of the game, but at the end of the game, there are a bunch of things we should be able to deduce that should shift our probabilities by a bunch. This might be a bunch of work to implement, though.

Regarding the "re-define 'playable' to be (recursively) 'playable assuming all publicly playable cards are played'", I think this is also nontrivial (or maybe I'm just getting nerd-sniped), but once we did all this work to keep track of our deductions about cards, this will hopefully be a bit easier to implement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants