A collection of papers and codebases about influence, causality, and language.
Pull requests welcome!
Type | Description | Code |
---|---|---|
Semi-simulated | Given text (amazon reviews), extracts treatments (0 or 5 stars) and confounds (product type), then samples outcomes (sales) conditioned on the extracted treatments and confounds. | git |
Fully synthetic | Samples outcomes, treatments, and confounds from binomial distributions, then words from a uniform distribution conditioned on those sampled variables. | git |
Title | Description | Code |
---|---|---|
Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates Katherine A. Keith, David Jensen, and Brendan O’Connor |
Survey of studies that use text to remove confouding. Also highlights numerous open problems in the space of text and causal inference. | |
Text Feature Selection for Causal Inference Reid Pryzant and Dan Jurafsky |
Blog post about text as treatment (operationalized through lexicons) | git |
Econometrics Meets Sentiment: An Overview of Methodology and Applications Andres Algaba, David Ardia, Keven Bluteau, Samuel Borms, and Kris Boudt |
Survey summarizing various methods to transform alternative data (with a focus on text) into a variable, and use it in econometric models. Includes applications throughout. | git |
Title | Description | Code |
---|---|---|
Causal Effects of Linguistic Properties Reid Pryzant, Dallas Card, Dan Jurafsky, Victor Veitch, Dhanya Sridhar |
Develops an adjustment procedure for text-based causal inference with classifier-based treatments. Proves bounds on the bias | git |
Challenges of Using Text Classifiers for Causal Inference Zach Wood-Doughty, Ilya Shpitser, Mark Dredze |
Looks at different errors that can stem from estimating treatment labels with classifiers, proposes adjustments to account for said errors | git |
Deconfounded Lexicon Induction for Interpretable Social Science Reid Pryzant, Kelly Shen, Dan Jurafsky, Stefan Wager |
Looks at effect of text as manifested in lexicons or individual words, proposes algorithms for estimating effects and evaluating lexicons | git |
How to Make Causal Inferences Using Texts Naoki Egami, Christian J. Fong, Justin Grimmer, Margaret E. Roberts, and Brandon M. Stewart |
(Also text as outcome). Covers assumptions needed for text as treatment and concludes that you should use a train/test set. | |
Discovery of treatments from text corpora Christian Fong, Justin Grimmer |
Propose a new experimental design and statistical model to simultaneously discover treatments in a corpora and estimate causal effects for these discovered treatments. | |
The effect of wording on message propagation: Topic and author-controlled natural experiments on twitter Chenhao Tan, Lillian Lee, and Bo Pang |
Controls for confouding by looking at Tweets containing the same url and written by the same user but employing different wording. | |
When do Words Matter? Understanding the Impact of Lexical Choice on Audience Perception using Individual Treatment Effect Estimation Zhao Wang and Aron Culotta |
Measure effect of words on reader's perception. Multiple quasi-experimental methods compared. | git |
Title | Description | Code |
---|---|---|
Adapting Text Embeddings for Causal Inference Victor Veitch, Dhanya Sridhar, and David Blei |
(also text as confounder) Adapts BERT embeddings for causal inference by predicting propensity scores and potential outcomes alongside masked language modeling objective. | tensorflow pytorch |
Title | Description | Code |
---|---|---|
Estimating Causal Effects of Tone in Online Debates Dhanya Sridhar and Lise Getoor |
(Also text as confounder). Looks at effect of reply tone on the sentiment of subsiquent responses in online debates. | git |
How Judicial Identity Changes the Text of Legal Rulings Michael Gill and Andrew Hall |
Looks at how the random assignment of a female judge or a non-white judge affects the language of legal rulings. | |
Measuring semantic similarity of clinical trial outcomes using deep pre-trained language representations Anna Koroleva, Sanjay Kamath, Patrick Paroubek |
Title | Description | Code |
---|---|---|
Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates Katherine A. Keith, David Jensen, and Brendan O’Connor |
Survey of studies that use text to remove confouding. Also highlights numerous open problems in the space of text and causal inference. | |
Adjusting for confounding with text matching Margaret E Roberts, Brandon M Stewart, and Richard A Nielsen |
Estimate a low-dimensional summary of the text and condition on this summary via matching to remove confouding. Proposes a method of text matching, topical inverse regression matching, that matches on both on the topical content and propensity score. | |
Matching with text data: An experimental evaluation of methods for matching documents and of measuring match quality Reagan Mozer, Luke Miratrix, Aaron Russell Kaufman, L Jason Anastasopoulos |
Characterizes and empirically evaluates a framework for matching text documents that decomposes existing methods into: the choice of text representation, and the choice of distance metric. | |
Learning representations for counterfactual inference Fredrik Johansson, Uri Shalit, David Sontag |
One of their semi-synthetic experiments has news content as a confounder. |
Title | Description | Code |
---|---|---|
CausaLM: Causal Model Explanation Through Counterfactual Language Models Amir Feder, Nadav Oved, Uri Shalit and Roi Reichart |
Suggested a method for generating causal explanations through counterfactual language representations. | git |
Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer and Stuart Shieber |
Uses causal mediation analysis to interpret NLP models. | git |
Title | Description | Code |
---|---|---|
Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals Zhao Wang and Aron Culotta |
Matching to identify causal terms, then generate counterfactuals for training. | git |
Identifying Spurious Correlations for Robust Text Classification Zhao Wang and Aron Culotta |
Matching to identify spurious word features | git |
Discovering and Controlling for Latent Confounds in Text Classification Using Adversarial Domain Adaptation Virgile Landeiro, Tuan Tran, and Aron Culotta |
Control for unobserved confounders in text classification | |
Robust Text Classification under Confounding Shift Virgile Landeiro and Aron Culotta |
Control for changing confounders in text classification | git |
Title | Description | Code |
---|---|---|
Decoupling entrainment from consistency using deep neural networks Andreas Weise, Rivka Levitan |
Isolated the individual style of a speaker when modeling entrainment in speech. | |
Estimating causal effects of exercise from mood logging data Dhanya Sridhar, Aaron Springer, Victoria Hollis, Steve Whittaker, Lise Getoor |
Confouder: Text of mood triggers. Confounding adjustment method: Propensity score matching |
Title | Description | Code |
---|---|---|
Predicting Sales from the Language of Product Descriptions Reid Pryzant, Young-Joo Chung, and Dan Jurafsky |
Found features of product descriptions most predictive of sales while controlling for brand & price. | git |
Interpretable Neural Architectures for Attributing an Ad’s Performance to its Writing Style Reid Pryzant, Kazoo Sone, and Sugato Basu |
Found features of ad copy most predictive of high CTR while controlling for advertiser and targeting. | git |
Title | Description | Code |
---|---|---|
Influence via Ethos: On the Persuasive Power of Reputation in Deliberation Online Emaad Manzoor, George H. Chen, Dokyun Lee, Michael D. Smith |
Controls for unstructured argument text using neural models of language in the double machine-learning framework. | |
Title | Description | Code |
---|---|---|
The language of social support in social media and its effect on suicidal ideation risk Munmun De Choudhury and Emre Kiciman |
Confouder: previous text written in a Reddit forum. Confounding adjustment method: stratified propensity scores matching. | |
Discovering shifts to suicidal ideation from mental health content in social media Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen Coppersmith, Mrinal Kumar |
Confouder: User’s previous posts and comments received. Confounding adjustment method: stratified propensity scores matching |
Title | Description | Code |
---|---|---|
Increasing vegetable intake by emphasizing tasty and enjoyable attributes: A randomized controlled multisite intervention for taste-focused labeling Bradley Turnwald, Jaclyn Bertoldo, Margaret Perry, Peggy Policastro, Maureen Timmons, Christopher Bosso, Priscilla Connors, Robert Valgenti, Lindsey Pine, Ghislaine Challamel, Christopher Gardner, Alia Crum |
Did RCT on cafeteria food labels, observing effect on how much of those foods students took. | |
A social media study on the effects of psychiatric medication use Koustuv Saha, Benjamin Sugar, John Torous, Bruno Abrahao, Emre Kıcıman, Munmun De Choudhury |
Confounder: users' previous posts on Twitter. Confounding adjustment method: Stratified propensity score matching. |
Title | Description | Code |
---|---|---|
A deep causal inference approach to measuring the effects of forming group loans in online non-profit microfinance platform Thai T Pham and Yuanyuan Shen |
Confounder: Microloan descriptions on Kiva. Confounding adjustment method: A-IPTW, TMLE on embeddings. |
Title | Description | Code |
---|---|---|
Unsupervised Discovery of Implicit Gender Bias | Propensity score matching and adversarial learning to get a model to focus on bias instead of other artifacts. | |
Tweetment Effects on the Tweeted: Experimentally Reducing Racist Harassment Kevin Munger |
Did RCT sending de-escalation messages to racist twitter users, changing the "from" user and observing effects on downstream behavior. |
Title | Description | Code |
---|---|---|
Estimating the effect of exercising on users online behavior Seyed Amin Mirlohi Falavarjani, Hawre Hosseini, Zeinab Noorian, Ebrahim Bagheri |
Confounder: Pre-treatment topical interest shift. Confounding adjustment method: Matching on topic models. | |
Distilling the outcomes of personal experiences: A propensity-scored analysis of social media Alexandra Olteanu, Onur Varol, Emre Kiciman |
Confounder: Past word use on Twitter. Confoundig adjustment method: Stratified propensity score matching. | |
Using longitudinal social media analysis to understand the effects of early college alcohol use Emre Kiciman, Scott Counts, Melissa Gasser |
Confounder: Previous posts on Twitter. Confounding adjustment method: Stratified propensity score matching. | |
Using Matched Samples to Estimate the Effects of Exercise on Mental Health from Twitter Virgile Landeiro and Aron Culotta |
Confounder: Gender, location, profile. Confounding adjustment method: Matching. | git |