Replication Materials

This repository contains code for replicating the main findings in "Users
choose to engage with more partisan news than they are exposed to on Google
Search," a forthcoming paper on exposure and engagement with partisan and
unreliable news on Google Search. The collected data types and our metrics
for exposure, engagement, and follows are described in detail in the paper.

full paper: https://doi.org/10.1038/s41586-023-06078-5
preprint: https://arxiv.org/abs/2201.00074

Getting Started

The datasets needed to run this code are included in this repository and are
documented in the Datasets section below. These datasets are also
available on Dataverse (https://doi.org/10.7910/DVN/WANAX3).

To run the code in this repository, you need to:

Clone this repository
git clone https://github.com/gitronald/google-exposure-engagement.git
Follow the instructions for running each replication resource in the sections below:
Descriptive Analysis : Main descriptive analysis in jupyter notebooks
Regression Analysis : Regression analysis and plotting in R and jupyter notebooks
Search Queries : Pivoted text scaling in R

Datasets

User-level aggregated data for replicating our main findings must be downloaded
from https://doi.org/10.7910/DVN/WANAX3 and placed in a data directory within
the cloned repository. Some columns have been removed to protect participant
privacy. These datasets contain merged columns from all datatypes -- exposure,
follows, and overall engagement -- and have prefixes to delineate among them,
which we provide and explain for each dataset below. Please see the Methods
section of the paper for additional details and context on each measure.

data/users2018.csv

Provides user-level aggregated data for participants from our 2018 study wave.
Each row represents a participant and has a unique identifier in the caseid
column that we've replaced with an autoincrement integer value in this dataset.
The column prefixes for distinguishing datatypes are search_ for exposure,
follow_ for follows, and browse_ for overall engagement. A secondary measure
of overall engagement is provided in columns prefixed with history_, representing
participants' complete Google History.

data/users2020.csv

Provides user-level aggregated data for participants from our 2020 study wave.
Each row represents a participant and has a unique identifier in the user_id
column that we've replaced with an autoincrement integer value in this dataset.
The column prefixes for distinguishing datatypes are activity_gs_search_ for
exposure, activity_gs_follow_ for follows, and browser_history_ for overall
engagement. A secondary measure of overall engagement is provided in columns
prefixed with activity_, representing participants' Tab Activity.

data/coefficients.csv

Provides regression coefficients, 95% CIs, t values, and P-values for
the main regression analysis. Produced in regressions/run_analysis.R and
used in figure_coefficients.ipynb and table_coefficients.ipynb.

Descriptive Analysis

The descriptive analysis was done primarly in jupyter notebooks, which we list
below along with brief descriptions. These notebooks import shared utility
functions from functions.py for reformatting data, adjusting plots, and
calculating statistics.

main_results.ipynb:

This notebook contains descriptive analyses for 2018 and 2020 data. It creates
the figures that appear in the main manuscript (excluding the diagram in Figure 1),
the figures and tables that appear in Extended Data, and the tables that appear in
Supplementary Information.

figure_individual_level.ipynb:

This notebook loads, reshapes, and plots participant-level distributions of
partisan news exposure, follows, and engagement. The data needed to run this
file are not publicly available because only aggregated data may be released.

Regression Analysis

The regression analysis was done using the R scripts in regressions/, and the
plots were made using jupyter notebooks. Below we list each script and notebook
with a brief description.

regressions/run_analysis.R

Run regression models and produce associated output.

regressions/helper_functions.R

Helper functions for regression modeling and organizing output.

figure_coefficients.ipynb:

This notebook loads, reshapes, and plots regression coefficients and CIs produced
in run_analysis.R.

table_coefficients.ipynb:

This notebook loads and reshapes the regression coefficients, CIs, t-values,
and P-values produced in run_analysis.R. It outputs a latex tables of
formatted regression results that we further edited by hand to produce
Extended Data Tables 4-7.

Search Queries - Pivoted Text Scaling

We used pivoted text scaling to identify features in our participants' search
queries using the R scripts in pivot_scores/. We do not provide the text
data needed to regenerate these scores. Additional details on pivoted text
scaling and how we applied it to search queries are available in the paper.
Below we list each R script with a brief description.

pivot_scores/make_parrot_scores.R

Creates pivoted text scaling scores from participants' search queries.

pivot_scores/parrot_functions.R

Pivoted text scaling helper functions and pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Replication Materials

Getting Started

Datasets

Descriptive Analysis

Regression Analysis

Search Queries - Pivoted Text Scaling

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
outputs		outputs
pivot_scores		pivot_scores
regressions		regressions
.gitignore		.gitignore
README.md		README.md
figure_coefficients.ipynb		figure_coefficients.ipynb
figure_individual_level.ipynb		figure_individual_level.ipynb
functions.py		functions.py
main_results.ipynb		main_results.ipynb
plot_colors.json		plot_colors.json
plot_labels.json		plot_labels.json
requirements.txt		requirements.txt
table_coefficients.ipynb		table_coefficients.ipynb

gitronald/google-exposure-engagement

Folders and files

Latest commit

History

Repository files navigation

Replication Materials

Getting Started

Datasets

Descriptive Analysis

Regression Analysis

Search Queries - Pivoted Text Scaling

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages