Skip to content
pbloem edited this page Oct 26, 2015 · 5 revisions

This page accompanies the paper Compression as a fast measure of motif relevance. It describes which parts of the paper correspond to which classes, how to run the experiment, and where to get the data.

Code

Most of the code used in the paper resides in the nodes library. If you want to reproduce the experiments precisly, you should check out revision 0aa2f354. Other, I recommend the latest version.

  • The code for the motif model, and the three null models can be found in the class org.nodes.models.MotifModel. The function sizeBeta(...) corresponds to the degree-sequence model. The other functions use the names from the paper.
  • org.nodes.models.MotifSearchModel implements the Fibonacci search
  • DPlainMotifExtractor and UPLainMotifExtractor implement the motif sampling.
  • DSequenceEstimator and USequenceEstimator implement the importance sampling algorithm for the degree-sequence model.

The experiments can be found in the repository Lilian-experimental, in the package org.lilian.motifs:

  • The main experiments are contained in Sythetic.java, Compare.java and CompareLarge, respectively.
  • The coverage experiment from the supplementary information is contained in Coverage.java.

These experiments were run using an experimental workflow system called ducktape, which is no longer supported. I would not recommend using it for this. Instead, it's probably easiest to copy/paste the code and tweak it as needed.

Python script to produce the plots shown in the paper can he found here. These plots use the csv and edgelist files produced by the java experiments.

I hope to make this code more user friendly in the future, but these instructions should be enough to get you started if you're interested. If you have trouble with anything, please don't hesitate to ask me any questions on p "at" peterbloem "dot" nl, on @pbloemesquire on twitter, or by opening a ticket here on github.

Data

Most data comes from the Koblenz network collection. By the name used in the paper and the network connections, you should be able to find the correct files. They can be read with the methods in org.nodes.data.Data

A few of the datasets from the paper can also be loaded directly using the class org.nodes.data.Examples.

Clone this wiki locally