Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate direct logit attribution process #172

Open
valedan opened this issue Apr 12, 2023 · 2 comments
Open

Automate direct logit attribution process #172

valedan opened this issue Apr 12, 2023 · 2 comments
Labels
research Research and Experimentation

Comments

@valedan
Copy link
Contributor

valedan commented Apr 12, 2023

@mivanit @cmathw We discussed this in the research meeting yesterday and I think one of you wrote down some details - can you fill in here?

@mivanit
Copy link
Member

mivanit commented Apr 13, 2023

short version:

@cmathw has put together a notebook in the experiments repo which performs direct logit attribution on the "produce the first path token" task. We'd like to extend this to more tasks, and the best interface would probably be along the lines of

def direct_logit_attribution(
    model,
    task_data: list[tuple[prompt, response_token]],
) -> LogitAttributionResponse:
    ...

extending the SolvedMaze class might also be an option, but it seems probably easiest to literally just pass lists of tuples of whatever tokens the task consists of.

  • currently, logit attribution in the notebook measures the importance of various blocks on the task of correctly predicting the first token after the path_start token, which should just be copying from the specification of the path start
  • another option is to try the same on the task of producing the path_end token, if the current token matches the target node
  • unclear what other sorts of tasks make sense -- hallway following? picking correct fork?
  • probably makes sense to make evals which pair with any task we want to do logit evals. is it reasonable to reuse code between these two areas?

@rusheb
Copy link
Collaborator

rusheb commented Apr 13, 2023

Is this high priority? If so, I would be interested in working on it!

@mivanit mivanit added the research Research and Experimentation label Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
research Research and Experimentation
Projects
None yet
Development

No branches or pull requests

3 participants