This github is inherited from spacy-ray (dev branch).
It trains model using the new collective calls available in current ray-1.2-dev, replacing the original get(), set() usage in spacy-ray.
The runtime comparison for 1000 update using spacy pipeline = ["tok2vec", "ner"] is showed below:
| Comparison | spacy-ray | spacy-ray-nccl | ratio |
| ------------- | ------------- | -------------- | ------- |
| 1 worker | 137.5 ± 2.1 | 116.7 ± 2.51 | 1.18x |
| 2 workers | 354.1 ± 16.8 | 171.1 ± 1.11 | 2.07x |
| 4 workers | 523.9 ± 10.4 | 179.6 ± 2.91 | 2.92x |
| 8 workers | 710.1 ± 3.0 | 205.8 ± 1.20 | 3.45x |
| 16 workers | 1296.1 ± 42.1 | 248.3 ± 3.63 | 5.22x |
Mean and standard deviation are obtained by three trials (unit: second). Each worker is on a different node.
The ideal plot should be a horizontal line.
The trivial speedup is a horizontal line y = 1.
The ideal scalability is the line y = x.
conda create -n spacy-ray python=3.7.3
conda activate spacy-ray
pip install spacy-nightly[cuda]
- This will take some time, if observe a build error in cupy, try:pip install cupy-cuda[version]
e.g. for cudatoolkit 11.0:pip install cupy-cuda110
pip install spacy-ray
- runpython -m spacy ray --help
to check whether install correctly- The collective calls are only available in current ray github. Instead we use the latest ray-1.1 in pip to test runtime.
- get collective code:git clone
- access the installed code of ray 1.1:cd [path-to-packages]/ray
If using conda, typically the path would be[path-to-conda]/anaconda3/envs/spacy-ray/lib/python3.7/site-packages/
- copy the code over:cp -r [path-to-github-ray]ray/python/ray/util/collective ./ray/util
- add to "init" file:vim ./ray/util/
->from ray.util import ray
, append "collective" to the "all" dict. - The last step is to replace the installed spacy-ray using this github.
-git clone
-mv [path-to-github-spacy-ray-nccl] [path-to-packages]
- make a copy of the original spacy_ray in case you would like to recover the comparison: <br \ >  :mv [path-to-packages]/spacy_ray [path-to-packages]/spacy_ray_original
-mv [path-to-packages]/spacy_ray_nccl [path-to-packages]/spacy_ray
git clone
cd spacy_ray_example/tmp/experiments/en-ent-wiki
- Setup the ray cluster in different machine. The code will detect the available ray cluster and attach.
- Modify the config (for training hyperparameter) and project.yml (for number of workers)
- Download and process necessary files: (reference:
-spacy project assets
-spacy project run corpus
spacy project run ray-train
The github turns off the score evaluation for comparison. This is because evaluation takes a long time, and we only want to measure the speedup during training.
To turn on: comment the hard-coded socres in train() function at, change the if condition from if self.rank ==0 to if True, and uncomment socres = self.evaluate()
The original spacy-ray uses sharded parameter server and update the model parameter in each worker asynchronizely. The current implementation uses all_reduce strategy, which performs similarly to a sharded parameter server. It has a whole copy of the model parameter in each worker. For update, it uses collective.allreduce() to synchronize gradients.
A template for setting up a 16 machine ray cluster:
1 #!/bin/bash
3 MY_IPADDR=$(hostname -i)
4 echo $MY_IPADDR
6 ray stop --force
7 sleep 3
8 ray start --head --port=6380 --object-manager-port=8076 --object-store-memory=32359738368
9 sleep 2
11 for i in {1..15}
12 do
13 echo "=> node $i"
14 ssh -o StrictHostKeyChecking=no h$i.ray-dev-16.BigLearning "cd spacy_ray_example/tmp/experiments/en-ent-wiki; source ~/anaconda3/bin/activate; conda activate spacy-ray; ray stop --force; ray start --address='$MY_IPADDR:6380' --object-manager-port=8076 --object-store-memory=32359738368";
15 done
16 wait