Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't recommend transfer attacks (controversial) #27

Open
carlini opened this issue Feb 25, 2021 · 1 comment
Open

Don't recommend transfer attacks (controversial) #27

carlini opened this issue Feb 25, 2021 · 1 comment

Comments

@carlini
Copy link
Member

carlini commented Feb 25, 2021

I can't think of any evaluation (that wasn't obviously wrong ^1) in the last two years where transfer attacks helped invalidate the robustness claims. At best, transfer attacks can reduce clean accuracy to ~50% or so, and most papers claim less than this robustness in the first place. So it's not surprising that transfer attacks don't do better than any halfway reasonable attack.

However, there are lots of papers that include transfer attacks, cite (Athalye et al. 2018, Carlini et al. 2019) and then say "therefore our evaluation is probably correct". If we have any other idea that would be better (maybe exclusively ask for gradient-free attacks now? they're a lot better than they were in 2019.) then it might be worth thinking about including these.

This is obviously controversial. There can exist defenses where running transfer attacks would help diagnose problems. I just think it's sufficiently rare we should either remove it or downgrade it to something that should only really be done after everything else has been tried.

^1 There are some obviously-wrong evaluations where transfer attacks would have also shown they were wrong. But these are almost always papers that claim >>0% accuracy at linf eps=0.5 or something else absurd. So as long as there's another way to disprove it, arguably the transfer attack hasn't added much new value.

@ftramer
Copy link
Contributor

ftramer commented Feb 25, 2021

The only one I can remember is Thermometer Encoding (which I wouldn't characterize as obviously wrong), where the original paper had results showing that transfer attacks worked better than white-box attacks, and this was viewed as a red flag.

But toning down transfer attacks seems in line with putting more focus on stronger black-box attacks that have emerged over the past years.
At the time where this report was first written, I would have been a bit wary of black-box evaluations because the existing attacks (e.g., SPSA or Boundary) were quite brittle. Today, we have much stronger candidates so transfer attacks are indeed somewhat obsolete in a white-box threat model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants