A data-driven framework for mapping domains of human neurobiology

Code repository for the article in Nature Neuroscience by Elizabeth Beam, Christopher Potts, Russell Poldrack, & Amit Etkin

Abstract

Functional neuroimaging has been a mainstay of human neuroscience for the past 25 years. Interpretation of fMRI data has often occurred within knowledge frameworks crafted by experts, which have the potential to amplify biases that limit the replicability of findings. Here, we employ a computational approach to derive a data-driven framework for neurobiological domains that synthesizes the texts and data of nearly 20,000 human neuroimaging articles. Across multiple levels of domain specificity, the structure-function links within domains better replicate in held- out articles than those mapped from dominant frameworks in neuroscience and psychiatry. We further show that the data-driven framework partitions the literature into modular subfields, for which domains serve as generalizable prototypes of structure-function patterns in single articles. The approach to computational ontology we present here is the most comprehensive characterization of human brain circuits quantifiable with fMRI and may be extended to synthesize other scientific literatures.

Pipelines

Data-driven framework

Approach to computational ontology. A data-driven framework was generated in an integrative manner in a training set of 12,708 human neuroimaging articles with brain coordinate data. First, 118 brain structures were clustered by k-means according to their co-occurrences with 1,683 terms for mental functions. The co-occurrence matrix was weighted by pointwise mutual information (PMI). Second, the top 25 terms for mental functions were assigned to each circuit based on the point-biserial correlation (r_pb) of their binarized occurrences with the centroid of occurrences across structures. Third, the number of terms was selected to maximize average ROC-AUC of logistic regression classifiers predicting structure occurrences from term occurrences (forward inference) and term occurrences from structure occurrences (reverse inference) over a range of term list lengths from 5 to 25. Fourth, the number of domains was selected based on the average ROC-AUC of forward and reverse inference classifiers. Occurrences were summed across terms in each list and structures in each circuit, then thresholded by their mean across articles. In the fifth and final step, each domain was named by the mental function term with highest degree centrality of co-occurrences with other terms in the domain.

Expert-determined frameworks

Approach to mapping expert-determined frameworks for brain function (RDoC) and mental illness (DSM). Seed terms from the RDoC and DSM frameworks were translated into the language of the human neuroimaging literature through a computational linguistics approach. Term embeddings of length 100 were trained using GloVe. For RDoC, embeddings were trained on a general human neuroimaging corpus of 29,828 articles (Supplementary Fig. 1b). For the DSM, embeddings were trained on a psychiatric human neuroimaging corpus of 26,070 articles (Supplementary Fig. 1c). Candidate synonyms included terms for mental functions in the case of RDoC and for both mental functions and psychopathology in the case of the DSM, as detailed in Supplementary Table 2. In the first step, the closest synonyms of seed terms were identified based on the cosine similarity of synonym term embeddings with the centroid of embeddings across seed terms in each domain. Second, the number of terms for each domain was selected to maximize cosine similarity with the centroid of seed terms. Third, the mental function term lists for each domain were mapped onto brain circuits based on positive pointwise mutual information (PPMI) of term and structure co-occurrences across the corpus of 18,155 articles with activation coordinate data (Supplementary Fig. 1a). Structures were included in the circuit if the FDR of the observed PPMI was less than 0.01, determined by comparison to a null distribution generated by shuffling term list features over 10,000 iterations.

Index of Figures

Main Text

Figure	Files
1b	ontology/ontol_data-driven_lr.ipynb, ontology/ontology.py
1c	partition/part_splits.ipynb, partition/partition.py
1d	modularity/mod_kvals_lr.ipynb
1e	prototype/proto_kvals_lr.ipynb
2a	ontology/ontol_data-driven_lr.ipynb
2b	prediction/comp_frameworks_lr_k.ipynb, modularity/comp_frameworks_lr_k.ipynb, prototype/comp_frameworks_lr_k*.ipynb
2c	hierarchy/hier_data-driven_lr_k6-8-22.ipynb
3b	ontology/ontol_rdoc.ipynb, ontology/ontology.py
4a	ontology/ontol_rdoc.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py
4b	ontology/ontol_data-driven_lr.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py
4c	ontology/ontol_ontol_dsm.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py
5b, e	prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
5c, f	prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
5d, g	prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
5h	prediction/comp_frameworks_lr.ipynb
6a-f	mds/mds.ipynb, mds/mds.py
6g	modularity/mod_data-driven_lr.ipynb, modularity/modularity.py
6h	modularity/mod_rdoc.ipynb, modularity/modularity.py
6i	modularity/mod_dsm.ipynb, modularity/modularity.py
6j	modularity/comp_frameworks_lr.ipynb, modularity/modularity.py
6k	prototype/proto_data-driven_lr.ipynb, prototype/prototype.py
6l	prototype/proto_rdoc.ipynb, prototype/prototype.py
6m	prototype/proto_dsm.ipynb, prototype/prototype.py
6n	prototype/comp_frameworks_lr.ipynb, prototype/prototype.py

Extended Data

Figure	Files
1	corpus/cohorts.ipynb
2-3	ontology/ontol_kvals_lr.ipynb, ontology/ontology.py
4a-b	ontology/ontol_data-driven_nn.ipynb, ontology/ontology.py
4c	mds/mds.ipynb, mds/mds.py
4d	modularity/mod_data-driven_nn.ipynb, modularity/modularity.py
4e	prototype/proto_data-driven_nn.ipynb, prototype/prototype.py
5a	ontology/ontol_data-driven_terms.ipynb, ontology/ontol_sim_terms.ipynb, ontology/ontology.py
5b-e	ontology/ontol_sim_terms.ipynb
6a, d	prediction/comp_frameworks_lr_k09.ipynb
6b-c, e-f	prediction/pred_data-driven_lr_k09.ipynb
6g-h	partition/part_data-driven_lr_k09.ipynb, mds/mds.ipynb
6i Left	modularity/comp_frameworks_lr_k09.ipynb
6i Right	modularity/mod_data-driven_lr_k09.ipynb
6j Left	prototype/comp_frameworks_lr_k09.ipynb
6j Right	prototype/proto_data-driven_lr_k09.ipynb
7b, e	prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
7c, f	prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
7d, g	prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
7h-j	prediction/comp_frameworks_lr.ipynb
8b, e; 9b, e	prediction/pred_data-driven_nn.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8c, f; 9c, f	prediction/pred_rdoc.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8d, g; 9d, g	prediction/pred_dsm.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8h; 9h-j	prediction/comp_frameworks_nn.ipynb U
10a	partition/part_data-driven_lr.ipynb, partition/partition.py
10b	partition/part_rdoc.ipynb, partition/partition.py
10c	partition/part_dsm.ipynb, partition/partition.py
10d-f	tsne/tsne.ipynb

Supplementary Material

Figure	Files
1	validation/val_brainmap_top.ipynb
2	validation/val_brainmap_sims.ipynb
3-4	ontology/ontol_kvals_nn.ipynb, ontology/ontology.py
5	stability/stab_data-driven_lr_top.ipynb
6a, d; 7a, d	prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
6b, e; 7b, e	prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
6c, f; 7c, f	prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
6g; 7g-i	prediction/comp_frameworks_lr.ipynb
8a, d; 9a, d	prediction/pred_data-driven_nn.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8b, e; 9b, e	prediction/pred_rdoc.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8c, f; 9c, f	prediction/pred_dsm.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8g; 9g-i	prediction/comp_frameworks_nn.ipynb

Table	Files
1	data/data_table_coord.ipynb
2	lexicon/preproc_cogneuro.py, lexicon/preproc_psychiatry.py, lexicon/preproc_rdoc.py, lexicon/preproc_dsm.py
3	data/text/pubmed/gen_190428/query.txt, data/text/pubmed/psy_190428/query.txt
4-5	prediction/table_lr-nn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

A data-driven framework for mapping domains of human neurobiology

Abstract

Pipelines

Data-driven framework

Expert-determined frameworks

Index of Figures

Main Text

Extended Data

Supplementary Material

Files

README.md

Latest commit

History

README.md

File metadata and controls

A data-driven framework for mapping domains of human neurobiology

Abstract

Pipelines

Data-driven framework

Expert-determined frameworks

Index of Figures

Main Text

Extended Data

Supplementary Material