xstats.MINE
is a Python library wrapping the Maximal Information-based Nonparametric Exploration (MINE) statistical library which is, for now, only available as a Java implementation.
xstats.MINE
can be used both with the Jython interpreter or with Python using JPype.
MINE is a set of statistics that can be used to identify important relationships in datasets and characterize these relationships. Given a relationship between two vectors of scalars, MINE produces the following scores:
- MIC (maximum information coefficient), which captures relationship strength
- MAS (maximum asymmetry score), which captures departure from monotonicity
- MEV (maximum edge value), which captures closeness to being a function
- MCN (minimum cell number), which captures complexity
A complete description of MINE and examples of use can be found in the following article: D. Reshef, Y. Reshef, H. Finucane, S. Grossman, G. McVean, P. Turnbaugh, E. Lander, M. Mitzenmacher, P. Sabeti. Detecting novel associations in large datasets. Science 334, 6062 (2011).
Contact Aurelien Mazurie <[email protected]>
Keywords MINE, Statistics, Python, Jython, JPype
Please follow those steps to ensure a proper installation of xstats.MINE
; note that step 3 can be skipped if you only intent to use xstats.MINE
with Jython.
The file MINE.jar, which you can retrieve at http://www.exploredata.net/Downloads/MINE-Application must be downloaded in your computer. It is advised to place this file in a stable location; e.g., a directory on your computer dedicated to Java .jar files.
Once downloaded, MINE.jar must be made visible from the Java interpreter that lies behind Jython and JPype. It typically means adding the path to this file (wherever you placed it) to the CLASSPATH
environment variable. If you are not familiar with the concept of environment variable, a quick introduction is available here.
Depending of if you are under Windows or a flavor of Unix the technique to modify the CLASSPATH
slightly differs. A good tutorial is available here; simply replaces references to PATH
by references to CLASSPATH
.
Please note that this version of xstats.MINE
is compatible with MINE.jar
version 1.0.1b through 1.0.1d.
If you plan to use xstats.MINE
with Python you need to have JPype installed first. An easy way to do so, if you have setuptools installed, is to type
easy_install JPype
(see the relevant documentation)
You will also need to download the commons-io-X.X.jar file from http://commons.apache.org/io/; X.X is the version of the Commons IO library (2.1 at the time of writing). This file must be declared in your CLASSPATH
the same way you did for MINE.jar; see instructions in Step 1.
Finally, to install xstats.MINE
for both Python and Jython please follow those steps:
- Download the latest version of the library from http://github/ajmazurie/xstats.MINE/downloads
- Unzip the downloaded file, and
cd
in the resulting directory - Run
python setup.py install
To update xstats.MINE
with newer versions just repeat Step 3.
The method analyze_pair() can be used to calculate the various MINE scores on a pair of scalar vectors. For example,
import xstats.MINE x = [40,50,None,70,80,90,100,110,120,130,140,150, 160,170,180,190,200,210,220,230,240,250,260] y = [-0.07,-0.23,-0.1,0.03,-0.04,None,-0.28,-0.44,-0.09,0.12,0.06, -0.04,0.31,0.59,0.34,-0.28,-0.09,-0.44,0.31,0.03,0.57,0,0.01] print "x y", xstats.MINE.analyze_pair(x, y)
will return the following scores:
{'MCN': 2.5849625999999999, 'MAS': 0.040419996, 'pearson': 0.31553724, 'MIC': 0.38196000000000002, 'MEV': 0.27117000000000002, 'non_linearity': 0.28239626000000001}
The method analyze_file() can be used to calculate the various MINE scores on values read from a comma- or tab-delimited file. The function can consider all pairs of variables in the file, only adjacent variables, or compare all variables in turn against a master variable.
If the input file has a .csv extension the function will assume it is a comma-delimited file; if not it assumes it is a tab-delimited file.
For example, analyzing the Spellman.csv file which can be found at http://www.exploredata.net/Downloads/Gene-Expression-Data-Set
import xstats.MINE for a, b, scores in xstats.MINE.analyze_file("Spellman.csv", xstats.MINE.MASTER_VARIABLE, 0, cv = 0.7): print a, b, scores
will display the following (only the first lines are shown; lines are truncated):
time YER044C {'MCN': 2.5849625999999999, 'MAS': 0.16225999999999999, ...} time YNL178W {'MCN': 2.5849625999999999, 'MAS': 0.46802998000000001, ...} time YCR098C {'MCN': 2.0, 'MAS': 0.0, ...} time YEL050C {'MCN': 2.0, 'MAS': 0.0, ...}
Note that this example replicates the one shown in the MINE documentation (see http://www.exploredata.net/Usage-instructions/Parameters):
java -jar MINE.jar Spellman.csv 0 cv=0.7
xstats.MINE
is released under a MIT/X11 license.
MINE.jar
is released under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported license by its authors.