Skip to content

TEAM 1 WORK IN PROGRESS

Jakob Heinrich edited this page May 27, 2021 · 53 revisions

Work documentation of Team 1 of the privML Software Project

Dates

  • "Daily" Scrum Meetings every Tuesday at 9:00 AM here and every Thursday at 12:00 PM here
  • Sprint Review & Planning every even calendar week on Wednesday at 12:00 PM here
  • adhoc meetings are often planned via our gitter channel

Literature

Collection of all relevant Links


Protocols

Protocol 27.05.21

Protocoll 25.05.21

  • Marisa and Milos fixed their stuff in the PR
  • setup.py import error
    • still worked for marisa
    • try with venv
      • marisa will link some docs
  • is there a priv risk score in art?
  • privacy risk score
    • only for blackbox attack?
    • quantifies how good an attack model is?
    • we have to read the paper again
  • k datapoints
    • visualization of the priv risk score
    • how good is our attack model?
    • claas will work on more ideas for visualisation
    • infer returns only 1 or 0 for every datapoint
  • readme abstracts
    • is a lot of work
      • since all of the attacks have to be fully understood
    • we should start working on it now and then improve iteratively

TODOS:

  • everybody sets up venv
  • read privacy risk score
  • link docs for venv in wiki -> @marisanest, see Collection of all relevant Links
  • more ideas for visualization -> @erdnaf

Daily Scrum 20.05.2020

  • PR of the last sprint

    • Should we merge directly to the pr branch or make a PR?
    • Commit directly to the branch
    • Marisa already started fixing Franziska's requested changes
    • Little changes requested from jury and Milos
    • Everybody fixes their own work? YES
    • Marisa and Claas will discuss the specific changes of Marisa's work
  • Installing via setup.py did that work for anyone? (import error)

    • marisa did the imports as in ART
    • importing modules works normally like that
    • create an issue
  • Actions And branches?

    • call the branch only team1?
      • nobody can push on overlapping branch names
      • so no.
    • create different branches for sprints
    • running tests for every branch?
      • might be annoying for team2
      • can't they then just ignore the tests?
      • tests take a really long time (since attack models are trained)
      • Milos will look into that
  • Assigning Issues/Work

    • Little Tasks:
      • Create team1sprint3 branch -> @marisanest
      • Create Wiki Page for work documentation and literature
      • fixing setup.py -> issue
      • create separate Webex room -> @blauertee
      • GitHub actions and branches -> @budmil
    • Issues:
      • See assignments in the issues
      • privacy score calculation should happen in the metrics class
        • visualisation as in here should be possible
      • Two many issues about MIA on Single data points?
        • Nah ... just close all of them with one PR
      • writing all the texts for the README
      • renaming model_card_info
        • not that important
        • will also require adjustments in the notebooks
      • metrics output as JSON
        • goes to the metrics class
        • only calculate that once, because these values do not change

Open Questions

Privacy Risk Score

Trying to write up and easy to understand definition which is w&p as u can c :)

Definition:

MIAs: Target Classifier and Shadow Models (@blauertee is confused)

In our MI black box attack, we supply train and test data to train shadow models. Which are not the same as the target classifiers train and test data to simulate an actual attackers behaviour. For the purpose of measuring how successful the attack was (or how vulnerable the target classifier is) these results should be compared at some point to the actual train and test data of the target classifier. But for our MIAs only one set of train and test data is ever supplied. This feels contradictory to me. What am I missing?

Notes with respect to this question (by @marisanest):

  • The ART attack MembershipInferenceBlackBox does not use shadow models. It's a different approach, which is sadly not enough explained in the code and I could not find a referenced paper either. So no idea where this implementation is coming from, but to use this attack you need to know which data was used for the target model training and which was not. So you train the attack model with data that is split into training data and test data/non-training data of the target model. You use a part of this data for training and the other part to later test the performance of the attack model.
    • @blauertee: I just asked them in an issue
  • The other two ART attacks also follow a different approach. For both a paper is referenced (see Collection of all relevant Links).
Clone this wiki locally