This code repository contains an implementation of Graph Agreement Models [1].
Neural structured learning methods such as Neural Graph Machines [2], Graph Convolutional Networks [3] and their variants have successfully combined the expressiveness of neural networks with graph structures to improve on learning tasks. Graph Agreement Models (GAM) is a technique that can be applied to these methods to handle the noisy nature of real-world graphs. Traditional graph-based algorithms, such as label propagation, were designed with the underlying assumption that the label of a node can be imputed from that of the neighboring nodes and edge weights. However, most real-world graphs are either noisy or have edges that do not correspond to label agreement uniformly across the graph. Graph Agreement Models introduce an auxiliary model that predicts the probability of two nodes sharing the same label as a learned function of their features. This agreement model is then used when training a node classification model by encouraging agreement only for those pairs of nodes that it deems likely to have the same label, thus guiding its parameters to a better local optima. The classification and agreement models are trained jointly in a co-training fashion.
The code is organized into the following folders:
- data: Classes and methods for accessing semi-supervised learning datasets.
- models: Classes and methods for classification models and graph agreement models.
- trainer: Classes and methods for training the classification models, and agreement models individually as well as in a co-training fashion.
- experiments: Python run script for training Graph Agreement Models on CIFAR10 and other datasets.
The implementations of Graph Agreement Models (GAMs) are provided in the gam
folder on a strict "as is" basis, without warranties or conditions of any kind.
Also, these implementations may not be compatible with certain TensorFlow
versions (such as 2.0 or above) or Python versions.
More details can be found in our paper, supplementary material, slides or poster.
A bug was discovered in the implementation of the GAM agreement regularization term after publication. We have fixed the bug (PR #82) and have rerun the affected experiments. Below are the updated results (note that the GAM* results are not affected).
Although some of these numbers are lower than what was originally reported, the takeaways presented in our paper still hold: GAM adds a significant boost to the original base models, and also performs better than other forms of regularization reported in our paper. Nevertheless, we apologize for any inconvenience caused by this bug!
To run GAM on a graph-based dataset (e.g., Cora, Citeseer, Pubmed), from this
folder run: $ python3.7 -m gam.experiments.run_train_gam_graph --data_path=<path_to_data>
To run GAM on datasets without a graph (e.g., CIFAR10), from this folder run: $ python3.7 -m gam.experiments.run_train_gam
We recommend running on a GPU. With CUDA, this can be done by prepending
CUDA_VISIBLE_DEVICES=<your-gpu-number>
in front of the run command.
For running on different datasets and configuration, please check the command
line flags in each of the run scripts. The configurations used in our paper can
be found in the file run_configs.txt
.
To visualize the results in Tensorboard, use the following command, adjusting
the dataset name accordingly: $ tensorboard --logdir=outputs/summaries/cora
An example of such visualization for Cora with GCN + GAM model on the Pubmed dataset is the following:
Similarly, we can run with multiple different parameter configurations and plot the results together for comparison. An example showing the accuracy per co-train iteration of a GCN + GAM model on the Cora dataset for 3 runs with 3 different random seeds is the following: