[Proposal] Add support for TracrBench #704

HannesThurnherr · 2024-08-14T12:39:53Z

Proposal

Add support for TracrBench transformers

Motivation

I and @JeremyAlain recently wrote a paper in which we introduced a dataset of 121 tracr-transformers. Tracr transformers are meant to be used as test beds or "sanity-checks" in the development of novel interpretability methods. To make them as accessible as possible we convert from the deepmind-internal "haiku" framework to Hooked Transformers (following this template made by Neel). We would like and have been asked by multiple people to make these toy models available from within transformerlens.

Pitch

We have all the models uploaded to huggingface and i have code to load the models. It's a little different from the code used to load the typical LLMs. Since the model requires input and output encoders, we wrap the hooked transformer class in another simple class called "TracrModel".

My question is, whether this is possible and if so, where to put this code/the tracr_models.py file.

Alternatives

An alternative would be to integrate the code to download the tracr models for use within transformerlens in another repo.

bryce13950 · 2024-08-16T03:46:05Z

Can you add a link to the models on HuggingFace, and a link to the source code? Most likely, you will be able to utilize the majority of existing component to add this, but there are going to need to be some new components created.

neelnanda-io · 2024-08-16T04:01:21Z

My personal inclination would be to just make this into another repo that builds on TransformerLens. What's the case for making this part of the core repo?

…

On Wed, 14 Aug 2024, 05:40 Hannes Thurnherr, ***@***.***> wrote: Proposal Add support for TracrBench transformers Motivation I and @JeremyAlain <https://github.com/JeremyAlain> recently wrote a paper in which we introduced a dataset of 121 tracr-transformers. Tracr transformers are meant to be used as test beds or "sanity-checks" in the development of novel interpretability methods. To make them as accessible as possible we convert from the deepmind-internal "haiku" framework to Hooked Transformers (following this <https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Tracr_to_Transformer_Lens_Demo.ipynb>template made by Neel). We would like and have been asked by multiple people to make these toy models available from within transformerlens. Pitch We have all the models uploaded to huggingface and i have code to load the models. It's a little different from the code used to load the typical LLMs. Since the model requires input and output encoders, we wrap the hooked transformer class in another simple class called "TracrModel". My question is, whether this is possible and if so, where to put this code/the tracr_models.py file. Alternatives An alternative would be to integrate the code to download the tracr models for use withing transformerlens in another repo. — Reply to this email directly, view it on GitHub <#704>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASRPNKPH5ACQJMAU2452TTLZRNF3BAVCNFSM6AAAAABMQILXB2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ3DKNZSGY2DKMY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

HannesThurnherr · 2024-08-16T14:14:38Z

Can you add a link to the models on HuggingFace, and a link to the source code? Most likely, you will be able to utilize the majority of existing component to add this, but there are going to need to be some new components created.

I've added the links to the issue. The code is in the "another repo" link, linking to our TracrBench repo.

My personal inclination would be to just make this into another repo that builds on TransformerLens. What's the case for making this part of the core repo?

We are happy to make it into its own repo.
The case for making it part of TransformerLens is that the point of the dataset and the paper was to make the use of tracr for evaluating interp methods as easy as possible. Integrating this directly into TransformerLens would really help with that.
If we make it into its own repo, maybe the project could be mentioned in the docs somewhere?

bryce13950 added complexity-high Very complicated changes for people to address who are quite familiar with the code new-architecture This card involves adding a new architecture . labels Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Add support for TracrBench #704

[Proposal] Add support for TracrBench #704

HannesThurnherr commented Aug 14, 2024 •

edited

Loading

bryce13950 commented Aug 16, 2024

neelnanda-io commented Aug 16, 2024 via email

HannesThurnherr commented Aug 16, 2024

[Proposal] Add support for TracrBench #704

[Proposal] Add support for TracrBench #704

Comments

HannesThurnherr commented Aug 14, 2024 • edited Loading

Proposal

Motivation

Pitch

Alternatives

bryce13950 commented Aug 16, 2024

neelnanda-io commented Aug 16, 2024 via email

HannesThurnherr commented Aug 16, 2024

HannesThurnherr commented Aug 14, 2024 •

edited

Loading