Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Add support for TracrBench #704

Open
HannesThurnherr opened this issue Aug 14, 2024 · 3 comments
Open

[Proposal] Add support for TracrBench #704

HannesThurnherr opened this issue Aug 14, 2024 · 3 comments
Labels
complexity-high Very complicated changes for people to address who are quite familiar with the code new-architecture This card involves adding a new architecture .

Comments

@HannesThurnherr
Copy link

HannesThurnherr commented Aug 14, 2024

Proposal

Add support for TracrBench transformers

Motivation

I and @JeremyAlain recently wrote a paper in which we introduced a dataset of 121 tracr-transformers. Tracr transformers are meant to be used as test beds or "sanity-checks" in the development of novel interpretability methods. To make them as accessible as possible we convert from the deepmind-internal "haiku" framework to Hooked Transformers (following this template made by Neel). We would like and have been asked by multiple people to make these toy models available from within transformerlens.

Pitch

We have all the models uploaded to huggingface and i have code to load the models. It's a little different from the code used to load the typical LLMs. Since the model requires input and output encoders, we wrap the hooked transformer class in another simple class called "TracrModel".

My question is, whether this is possible and if so, where to put this code/the tracr_models.py file.

Alternatives

An alternative would be to integrate the code to download the tracr models for use within transformerlens in another repo.

@bryce13950
Copy link
Collaborator

Can you add a link to the models on HuggingFace, and a link to the source code? Most likely, you will be able to utilize the majority of existing component to add this, but there are going to need to be some new components created.

@bryce13950 bryce13950 added complexity-high Very complicated changes for people to address who are quite familiar with the code new-architecture This card involves adding a new architecture . labels Aug 16, 2024
@neelnanda-io
Copy link
Collaborator

neelnanda-io commented Aug 16, 2024 via email

@HannesThurnherr
Copy link
Author

Can you add a link to the models on HuggingFace, and a link to the source code? Most likely, you will be able to utilize the majority of existing component to add this, but there are going to need to be some new components created.

I've added the links to the issue. The code is in the "another repo" link, linking to our TracrBench repo.

My personal inclination would be to just make this into another repo that builds on TransformerLens. What's the case for making this part of the core repo?

We are happy to make it into its own repo.
The case for making it part of TransformerLens is that the point of the dataset and the paper was to make the use of tracr for evaluating interp methods as easy as possible. Integrating this directly into TransformerLens would really help with that.
If we make it into its own repo, maybe the project could be mentioned in the docs somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity-high Very complicated changes for people to address who are quite familiar with the code new-architecture This card involves adding a new architecture .
Projects
None yet
Development

No branches or pull requests

3 participants