Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amending papers after attack #26

Open
ftramer opened this issue Feb 23, 2021 · 7 comments
Open

Amending papers after attack #26

ftramer opened this issue Feb 23, 2021 · 7 comments

Comments

@ftramer
Copy link
Contributor

ftramer commented Feb 23, 2021

Aleksander, Wieland, Nicholas and I have had some discussions about the lack of "self-correction" among (broken) defense papers, and how this can make it hard for newcomers to navigate the field (i.e., after reading about a defense, you have to sift through the literature to find out whether an attack on it exists or not).
I think that a discussion of the "aftermath" of a defense evaluation would nicely fit in this report.

If this would make sense, some points worth discussing include:

  • What constitutes a "break" that is worth amending a paper over?

Essentially every robustness claim in the literature is "false", as you can always reduce accuracy by 0.1%-1% by fiddling with hyper-parameters. If a paper claims 60% robust accuracy, and a later attack reduces this to 59%, I doubt it's worth amending the paper. But what about 55%, 50%, or 10%?
Maybe the only good solution here is to set up a public leaderboard, but I doubt that many authors would maintain one.

  • Would it make sense to give examples of papers that were amended?

I know of very few examples (each of which involves one or more authors of this report):

@max-andr
Copy link

Since you mentioned public leaderboards, I feel like a pointer to our project can be relevant to this discussion:
https://robustbench.github.io/
which is a standardized leaderboard that uses AutoAttack and only accepts models that satisfy some restrictions (no randomness, non-differentiability, and optimization at inference time) that mostly tend to make gradient-based attacks ineffective without substantially improving robustness.

One of the main ideas behind our project is that we do need to be able to systematically distinguish fine-grained differences between different models. This seems to me very related to what @ftramer mentions: perhaps 1% reduction in adversarial accuracy by using a different attack is not very interesting but 5% may be quite important as it's roughly the improvement one gets when using extra unlabeled data (e.g., as in Carmon et al., 2019). And when stacking such small 5%-improvements together, one can get overall quite substantial improvements. E.g., the top entry from DeepMind of the Linf CIFAR-10 leaderboard has 65.87% adversarial accuracy (evaluated with AutoAttack) compared to 44.04% of standard adversarial training from Madry et al., 2018.

If you find it useful, we would be happy to include adaptive evaluations (we also mention this point in our whitepaper) in our leaderboards for models that satisfy the restrictions mentioned above. So far I'm aware of only one case where adaptive attacks can noticeably reduce adversarial accuracy evaluated with AutoAttack (among those datasets / threat models that we have in our leaderboards): from 18.50% to 0.16% for the model from Enhancing Adversarial Defense by k-Winners-Take-All which you report in On Adaptive Attacks to Adversarial Example Defenses. But if there are more cases like this, it would be great to know so that we can provide a more complete picture with our leaderboards.

@ftramer
Copy link
Contributor Author

ftramer commented Feb 24, 2021

Yes, a discussion of common attack benchmarks is definitely warranted here.
The issue remains though that if the evaluation is performed by a third party, the original defense paper rarely (if ever) acknowledges this re-evaluation.

@carlini
Copy link
Member

carlini commented Feb 25, 2021

For what it's worth, "Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness" is also fully broken by our adaptive attacks paper and is not robust, but AutoAttack fails completely here to find any good attack.

@ftramer
Copy link
Contributor Author

ftramer commented Feb 25, 2021

You might be thinking of a different paper? From what I see in the AutoAttack paper, it goes down to 0% accuracy (row 25 in Table 2, Pang et al. 2020).

@carlini
Copy link
Member

carlini commented Feb 25, 2021

Hm, you're right. I was looking at the leaderboard page:

"30 | Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness | 80.89% | 43.48% | × | ResNet-32 | ICLR 2020"

@davidwagner
Copy link

One possible point of comparison might be to the cryptography literature, where similar challenges arise. My sense is that the situation in crypto is similar: if a scheme is broken, often that shows up publicly by publishing a follow-up paper (but not in any other way), so to tell whether a scheme has resisted scrutiny, one has to check for more recent papers that have cited it to see if any reports a stronger attack against it. I wonder if it's more challenging in adversarial ML because ML has more papers being published that propose defenses than crypto does?

@ftramer
Copy link
Contributor Author

ftramer commented Feb 26, 2021

My understanding was that in crypto it is considered good practice to amend papers (on eprint, not in proceedings) after a break. I've definitely seen this a few times in the past but I don't know if the process is widespread.

I've also heard many stories in TCS or crypto where there is "folklore" knowledge that some scheme is broken or some Lemma/Theorem is incorrect, without this even being written down anywhere. So maybe a smaller community with fewer papers does help in propagating this knowledge.

This is somewhat problematic in a much bigger field such as ML, but maybe it is hard to set incentives differently to avoid this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants