forked from cooperative-computing-lab/awe
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
272 lines (194 loc) · 8.67 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
# OVERVIEW
Accelerated Weighted Ensemble or AWE package provides a Python library for
adaptive sampling of molecular dynamics. The framework decomposes the
resampling computations and the molecular dynamics simulations into tasks that
are dispatched for execution on resources allocated from
clusters, clouds, grids, or any idle machines.
AWE uses Work Queue, which is part of the Cooperating Computing Tools (CCTools)
package, for dispatching jobs for execution on allocated resources.
Documentation on downloading, installing, and using Work Queue can be found
<a href="http://www.nd.edu/~ccl/software/manuals/workqueue.html">here</a>.
# REQUIREMENTS
- [CCTools][cctools] 3.7 or higher
- [Python][python] 2.6 or 2.7
- [Gromacs][gmx] 4.5 or higher
- [Gromacs XTC library][xdrfile] 1.1 or higher
- [Numpy][numpy] 1.5 or higher
- [Prody][prody] 1.0 or higher
- [GNU Scientific Library][gsl]
- [Matplotlib][matplotlib]
[cctools]: http://www.cse.nd.edu/~ccl/software/download.shtml
[python]: http://python.org/
[gmx]: http://www.gromacs.org/
[xdrfile]: http://www.gromacs.org/Developer_Zone/Programming_Guide/XTC_Library
[numpy]: http://www.numpy.org/
[prody]: http://www.csb.pitt.edu/prody
[gsl]: http://www.gnu.org/software/gsl/
[matplotlib]: http://matplotlib.org/
# BUILDING AND INSTALLING
First, determine the location where AWE is to be installed. For example:
```
$ export AWE_INSTALL_PATH=$HOME/awe
```
Compile and install AWE in the location pointed by $AWE_INSTALL_PATH using:
```
$ tar xf awe-src.tar.gz
$ cd awe
$ ./configure --prefix $AWE_INSTALL_PATH
$ make install
```
Next, set PATH to include the installed AWE binaries:
```
$ export PATH=${AWE_INSTALL_PATH}/bin:${PATH}
```
Finally, set PYTHONPATH to include the installed AWE Python modules:
```
$ export PYTHONPATH=${AWE_INSTALL_PATH}/lib/python2.6/site-packages:${PYTHONPATH}
```
Note that the AWE Python modules will be built for the version of Python
accessible in your installed environment. The installation script creates a
directory (under $AWE_INSTALL_PATH/lib) named with the version of Python for
which the modules are built and copies the modules to this directory. So if your
environment has a Python version different from 2.6, replace the version string
accordingly when setting PYTHONPATH.
You can check if AWE was correctly installed by running:
```
$ awe-verify
```
# RUNNING
First, create a directory from which you want to instantiate and run the AWE
program. Here, we are going to run AWE to sample the state transitions of the
Alanine Dipeptide protein using the code in awe-ala.py.
```
$ cd $HOME
$ mkdir awe-alanine
$ cd awe-alanine
```
To run AWE to sample a protein molecule, you will need to have the files
describing the topology of atoms in that molecule, the coordinates of the
walkers, and the coordinates of the cells. In addition, AWE transfers the
executables from the GROMACS package required for running the simulations of
each walker.
These files and executables can be fetched to the current working directory by
running:
```
$ awe-prepare
```
This will create two directories named awe-generic-data and awe-instance-data.
awe-generic-data will contain files that all AWE runs will require, such as the
task executables and Gromacs forcefield files. awe-instance-data will contain
files that are particular to a protein system such as the state definitions,
initial protein coordinates, etc. Note that awe-prepare, by default, will
transfer the files for the Alanine Dipeptide protein molecule. Further,
awe-prepare will also copy the example program awe-ala.py provided in the AWE
source that samples the state transitions for the Alanine Dipeptide protein.
To run the example in awe-ala.py, do
```
$ python awe-ala.py
```
You will see this output right away:
```
$ python awe-ala.py
Running on port 9123...
Loading cells and walkers
```
The AWE master program successfully started and began loading the cells and
walkers for running the simulations. After that, the master waits for workers
to connect so it can dispatch the simulation tasks for execution by the
connected workers.
Now, start a worker for this master on the same machine:
```
$ work_queue_worker localhost 9123
```
However, to run a really large sampling, you will need to run as many workers
as possible. A simple (but tiresome) way of doing so is to log into several
machines and manually run work_queue_worker as above. But, if you have access
to a batch system like Condor or SGE, you can use them to start many workers
with a single submit command.
We have provided some scripts to make this easy. For example, to submit 10
workers to your local Condor pool:
```
$ condor_submit_workers master.somewhere.edu 9123 10
Submitting job(s)..........
Logging submit event(s)..........
10 job(s) submitted to cluster 298.
```
Or, to submit 10 worker processes to your SGE cluster:
```
$ sge_submit_workers master.somewhere.edu 9123 10
Your job 1054781 ("worker.sh") has been submitted
Your job 1054782 ("worker.sh") has been submitted
Your job 1054783 ("worker.sh") has been submitted
...
```
Once the workers begin running, the AWE master can dispatch tasks to each one
very quickly. It's ok if a machine running a worker crashes or is turned off;
the work will be silently sent elsewhere to run.
When the AWE master process completes, your workers will still be available, so
you can either run another master with the same workers, remove them from the
batch system, or wait for them to expire. If you do nothing for 15 minutes,
they will automatically exit.
# OUTPUT
The AWE master creates the following output files on completion:
- cell-weights.csv describing the weight of the cells
- walker-history.csv describing the cells that were visited by the walkers
- color-transition-matrix.csv that describes the number of transitions between
the defined cell groups
- walker-weights.csv that describes the weights of the walkers
You can use these output files to generate different plots that visualize the
output of the AWE run. To generate a Ramachandran plot (free energy landscape
on phi-psi space) using cell-weights.csv, run the following command.
```
$ python awe-rama-ala.py -w cell-weights.csv -p awe-instance-data/topol.pdb -c awe-instance-data/cells.dat -n 100
```
where
* -w specifies the csv file recording cell weights
* -p the structure file of simulation target
* -c the data file recording coordinates of cell centers
* -n the number of cells
This produces an output file named awe-rama-ala.png that contains the
Ramachandran plot.
To generate and visualize forward and backward fluxes from output
color-transition-matrix.csv, use the script awe-flux.py.
```
python awe-flux.py -i color-transition-matrix.csv -l 0.01
```
where
* -i specifies the data file recording transitions at every iteration
* -l specifies the scale iteration length to actual unit (nano/pico/femto-seconds)
* -o specifies the output directory
This will produce the following outputs:
* instan-forward-flux.dat: instantaneous forward flux
* instan-backward-flux.dat: instantaneous backward flux
* forward-flux.png: plot of forward flux
* backward-flux.png: plot of backward flux
Finally, to generate transition probability matrix from AWE output, run script
awe-transMatrix.py
```
$ python awe-transMatrix.py -p walker-history.dat -w walker-weights.dat -t 1 -n 100
```
where the options
* -p specifies the data file recording dependencies of walkers
* -w specifies the data file recording weights and cell ID of walkers
* -t specifies the time lag (number of iterations) for calculating transition matrix
* -n specifies the number of cells
This prints the matrix to a file called trans-probability-matrix.csv.
# RUNNING AWE ON DIFFERENT PROTEIN SYSTEMS
You can run AWE to sample a different protein system by following the steps
below:
1. Sample the conformations space using ensemble simulations, replica
exchange, etc.
2. Cluster the data to obtain the cell definitions.
3. Extract the conformations from each cluster as individual walkers.
Specifically, these steps translate to the following:
1. Describe the topology of the system in topol.pdb.
2. Prepare the state definitions and list them in cells.dat.
3. Select the subset of atoms from the cell definitions cells.dat and list
them in CellIndices.dat.
4. Select the subset of atoms from the walker topology file topol.pdb and
list them in StructureIndices.dat.
5. Define the initial coordinates for the walkers in State$i-$j.pdb where i
is the index of the cell and j is the index of the walker.
6. Specify the parameters for the walker simulation by GROMACS in sim.mdp.
# CONTRIBUTERS
Please see the AUTHORS file.