Skip to content

curt-tigges/eliciting-latent-sentiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eliciting-latent-sentiment

Disclaimers

This is research code and we sincerely apologise for not having time to neatly package it up. The most friendly and reusable code can be found in the utils directory. We will try to draw attention to other key scripts here which might be of use to others. The Dockerfile should be sufficient to run most of this code. We forked CircuitsVis as a sub-module in order to make a couple of small changes used to generate cleaner figures for this paper.

Acknowledgements

Stanford Sentiment Treebank was downloaded from https://nlp.stanford.edu/sentiment/index.html

We are very grateful to Neel Nanda for his mentorship and the transformer-lens library.

SERI MATS provided funding for this research.

Cached sentiment direction

In the data/gpt2-small directory, there is a residual stream direction already computed using DAS and the "simple_train" dataset for each layer, stored as a numpy file.

Training the sentiment direction

In fit_directions.py, you can specify

  • a list of models (e.g. gpt2-small)
  • a list of methods (e.g. das, kmeans, logistic_regression)
  • a list of training datasets (we generally use simple_train)
  • a list of test datasets to use for evaluation during training (can just use none to save code time)
  • a scaffold (only necessary if using Stanford Sentiment Treebank): either continuation or classification.

Then the for-loop between # Training loop and # # END OF ACTUAL DIRECTION FITTING is the critical section.

This writes the directions to numpy files like data/gpt2-small/kmeans_simple_train_ADJ_layer1.npy. The file names are fairly self-explanatory but for completeness, there is a directory per model then the file name states the method, training data, token position and residual stream layer.

The only exception are the random directions which are generated by random_directions.py.

Patching the sentiment direction

In direction_patching_suite.py, you can select

  • a list of models
  • a list of filename patterns to load as directions
  • a list of evaluation datasets
  • a scaffold (if using Treebank)
  • a list of patching metrics (see utils/circuit_analysis.py::PatchingMetric). runs The for-loop at the bottom of the file performs the directional activation patching experiments and writes the results to CSVs with names that begin direction_patching_.

Then direction_patching_results.py is a very quick and basic script to generate the plots shown in the paper using the cached CSV files from the first step.

Circuit Analyses

The code used to analyze circuits performing various functions can be found in the notebook files prepended with circuit. mood_inference refers to circuits for the ToyMoodStories dataset or variants thereof. simple_sentiment refers to the ToyMovieReview dataset. In addition, we performed a number of analyses that we did not cover in the paper--e.g., sentiment continuation and classification in Pythia 1.4b.

Circuit analysis notebooks include a range of experiments that look at attention patterns and model components using patching experiments. Depending on the task, additional experiments may be included. Each notebook is specific to a dataset and model, and can be run top to bottom.

Note: Some of the dataset generation code may be outdated in a few of these notebooks. We have updated the most important of these, but if you find this is the case for a notebook you use, you can replace the dataset generation code to use the latest get_dataset function in utils.py.

Summarization Experiments

Treebank data

Before the Treebank dataset can be used, it is necessary to first run treebank_data_gen.py to write pickle files locally.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages