-
Notifications
You must be signed in to change notification settings - Fork 5
/
INSTALL.txt
253 lines (179 loc) · 8.94 KB
/
INSTALL.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
.. -*- rest -*-
.. vim:syntax=rest
.. NB! Keep this document a valid restructured document.
Building and installing the Approximate Bayesian Inference Toolkit (ApBsInT)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
:Authors: Matthias Seeger <[email protected]>
.. Contents::
Introduction
============
The Approximate Bayesian Inference Toolkit (ApBsInT) is a lightweight
implementation of a number of algorithms for approximate Bayesian inference,
applicable to a range of probabilistic models. The range of approximations,
algorithms and supported models is set to grow. At present, the main focus
is as follows.
1) *Probabilistic models*: Factor graphs of continuous variables. Each factor
is a univariate potential of a linear combination of the variables.
* Generalized linear models
- Linear regression: Gaussian noise, double exponential noise
- Binary classification: probit and logistic regression
- Count data: Poisson and negative binomial regression
- Quantile regression
* Sparse (generalized) linear models
- Laplace prior
- Mixture of Gaussian prior
- Spike and slab prior
2) *Bayesian inference approximations*: At present, the focus is on variational
(deterministic) approximations. The posterior distribution over latent
variables is approximated by a Gaussian distribution (called the
*backbone*), which can either be unconstrained or fully factorized (all
variables are pairwise independent).
* Expectation propagation
3) *Algorithms*
* General (fully coupled) Gaussian backbone. Parallel (synchronous) updates
on all potentials
* General (fully coupled) Gaussian backbone. Sequential updates on
potentials, random or actively chosen schedule. The backbone is updated
by way of rank one updates of a Cholesky decomposition
* Fully factorized Gaussian backbone. Sequential updates on potentials,
random or user-defined schedule. This is a generalization of loopy
belief propagation to non-Gaussian models.
ApBsInT is written in Python. A number of core routines are implemented in
C++, both for efficiency and to interface with numerical libraries. A part
of the functionality is duplicated in Matlab, mainly for the purpose of
testing.
This toolbox is in an early alpha stage. At present, it appeals to users with
a certain degree of knowledge about probabilistic modelling and the abilities
to get their hands dirty. The build system is only partly automated, and apart
from test examples, there are no general-purpose applications implemented at
present.
On the other hand, the implemented techniques are well documented, and great
care is taken to ensure numerically robust behaviour. Unit test code is
supplied on a case-by-case basis. Contributions on all levels are highly
encouraged.
Prerequisites
=============
ApBsInT consists of Python code, Matlab code, and C++ code. The latter exports
APIs both towards Python and Matlab. These APIs can be built seperately. The
Python (Matlab) code works only together with its successfully built C++ API
(respectively).
1) Prerequisites for C++ code (apart from APIs)
The C++ code is written under Linux, it compiles with GNU gcc/g++ 4.7.2.
It may compile with other gcc versions as well. At present, the code does
not compile under Windows or Mac. Having said that, the code is basic and
has essentially no dependencies, porting it should not be difficult
(please contribute a port if you did one, even better if you could pledge
to maintain it).
2) Prerequisites for Python code and Python API to C++ code
The Python code is written in Python 2.7, it may work for earlier Python
2.x versions as well. It does not run in Python 3.x.
You also need recent versions of NumPy (code written with version 1.7.1)
and SciPy (code written with version 0.12.0).
The API requires Cython (code written with version 0.19) and the distutils
module.
3) Prerequisites for Matlab code and Matlab API to C++ code
The Matlab code is written in Matlab 8, it may work for earlier versions
of Matlab as well. It does not run in Octave.
The Matlab API is a set of MEX functions. The MEX interface is a part of
any Matlab installation. Unfortunately, it is not a portable API. The
present code is tested under Linux. It does not compile under Windows or
Mac. Again, a port should not be difficult to do (please contribute a port
if you did one).
Building the Python API
=======================
The Python API interfaces C++ code by way of Cython.
Denote the repository root by [ROOT].
* Configure setup.py
Make sure that both Cython and distutils are present.
$ cd [ROOT]/python/cython
$ cp apbsint_profile.py.template apbsint_profile.py
Edit apbsint_profile.py. 'get_include_dirs' must return include directories,
'get_library_dirs' library directories for the C++ compiler and linker.
Make sure to use absolute pathnames.
In particular, the list returned by 'get_include_dirs' must include [ROOT]
(absolute pathname).
* Build
$ cd [ROOT]/python/cython
$ make
This should produce a number of shared libraries, most important
[ROOT]/python/apbsint/eptools_ext.so (all shared libraries are moved
to [ROOT]/python/apbsint). The suffix may be different from '.so' for
your system.
* Extend Python sys.path
Insert [ROOT]/python into the module search path sys.path. If
you use IPython, you can do the following. Locate your profile:
$ ipython locate profile
Go there and edit the file ipython_config.py. Look for sys.path, or just
append the following to the end::
import sys
sys.path.append('[ROOT]/python')
* Test
There is some test code in [ROOT]/python/test/. For example:
$ cd [ROOT]/python/test/binclass
$ ipython eptest_binclass.py
This should run some simple binary classification example. You can edit
eptest_binclass.py and thereby test different aspects.
Building the Matlab API
=======================
Note: The main focus of ApBsInT is on Python development. The Matlab API should
be regarded as add-on, which we mainly use to debug and test the Python code.
Also, while Cython and distutils are portable across different platforms, this
is not the case for the Matlab MEX API. If you are working on Windows or Mac,
you may have to do some serious porting (please contribute a port if you did
one).
Denote the repository root by [ROOT].
* Prerequisites
Make sure you have a fairly recent version of Matlab, and that you can
compile MEX files (Matlab mex command).
* Configure make.inc.def
$ cd [ROOT]/matlab
$ cp make.inc.example make.inc.def
Edit make.inc.def. You have to set::
ROOTDIR=[ROOT]
Also, check the other variables and make sure this makes sense for your
system. In particular, set EXINCS_MATLAB to the XXX/extern/include
directory for your Matlab installation (it has to contain mex.h).
Matlab MEX creates shared libraries, whose filenames have a specific
suffix (.mexglx or .mexa64 in Linux). Find out what this is for you, and
set MEXSUFFIX accordingly.
* Build
$ cd [ROOT]/matlab
$ make
This should produce MEX shared libraries in [ROOT]/matlab/bin.
* Modify Matlab path
The directory [ROOT]/matlab and its subdirectories must be in the
Matlab path. The Matlab script [ROOT]/matlab/startup.m adds them.
In order to run this script automatically when starting Matlab, add the
following to your ~/matlab/startup.m file::
s=pwd;
cd(strcat(getenv('HOME'),'[ROOT]/matlab'));
startup;
cd(s); clear('s');
* Test
There is some test code in [ROOT]/matlab/test/. For example, try to run
[ROOT]/matlab/test/binclass/eptest_binclass.m. This should run some
simple binary classification example. You can edit the code and thereby
test different aspects.
Documentation
=============
A bit patchy for now. In particular, there is not much user documentation
right now. Look into [ROOT]/doc.
* [ROOT]/doc/techreps/ep_toolbox.pdf: Technical report with details about
the Expectation Propagation (EP) part of ApBsInT
* [ROOT]/doc/design: Some files describing design aspects
Apart from that, example and test code for the Matlab API are in
[ROOT]/matlab/test/, for the Python API in [ROOT]/python/test/. Also,
the code is well documented.
What is the "Workaround"?
=========================
You may note a certain incompleteness of the system (for example, some
methods in src/src/eptools/potentials/SpecfunServices throw a
NotImplemException) and come across the term "workaround" (for example, in
python/cython/setup.py). Trying to build the system in "workaround mode" does
not work.
These problems are due to licensing issues. While we will always ensure that
a core part of the system builds properly, we have to exclude some more recent
extensions, as they employ code which is copy-left licensed.
Our working solution is to explore new directions based on software under any
license. Once an extension proves worthwile, we may spend the effort to purge
the viral links. This process could be sped up by more contributors.