Check if all neurons lead to ROME #2

lauritowal · 2023-04-11T20:37:58Z

Using interpretability tools (e.g the causal tracing method from the ROME publication), we could check if we can figure out how truth is represented and in which neurons. We could even combine this approach with the previous idea and see if they produce the same results.

Link

KayKozaronek · 2023-04-14T10:35:25Z

I am interested in this, but I think that it would take some time to understand the ROME paper and the causal tracing method.

I will look into the paper on the weekend and decide whether I deem it plausible that I'd be able to implement it before the paper deadline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check if all neurons lead to ROME #2

Check if all neurons lead to ROME #2

lauritowal commented Apr 11, 2023

KayKozaronek commented Apr 14, 2023

Check if all neurons lead to ROME #2

Check if all neurons lead to ROME #2

Comments

lauritowal commented Apr 11, 2023

KayKozaronek commented Apr 14, 2023