-
Tunchanok Ngamsaowaros [email protected]
-
Pooja Vijayakumar [email protected]
-
Juran Guo [email protected]
In this work, works of Pruning Neural Networks at Initialization: Why are We Missing the Mark? are reproduced. Three pruning methods on VGG16 on CIFAR10 (GraSP: /trials/grasp
, SNIP: /trials/SNIP
, Magnitude: /trials/magnitude
, SynFlow: trials/synflow
) are implemented. In each trial, the pruned model is trained 80 epochs to observe performance.
The results are the same as the original paper except the results of the inversion method. When applying inversion on SynFlow and GraSP, the trends of testing accuracy are not in the same as the guess, which should be decreasing sharply. The trends still increase. This needs further researches.
Training and validation accuracies for different number of iterations
The original paper typically used 160 training iterations and depending on the sparsity level of the network, this would take 2 to 3 hours to train. In terms of results obtained, as the figures showed above, there appears to be no significant changes to accuracies when the model was trained up to 80 epochs compared to 160 epochs. For this reason, in this paper, the number of training iterations was limited to 80 epochs to reduce the training time.
PyTorch >= 1.4.0
Torchvision >= 0.5.0
Torchbearer
GPU (if available, suggest) (in this work, GPU P100 from Kaggle free resources is used)
Simply access to /SNIP
or /grasp
or /magnitude
or /synflow
in trials/
, then follow the instructions. All codes are inside.
Test Accuracy of Ablations on Pruning Methods
This reimplementation produces outcomes that align with those reported in the original study for all pruning strategies except inverted SynFlow.
[1] Frankle, Jonathan, et al. "Pruning neural networks at initialization: Why are we missing the mark?." arXiv preprint arXiv:2009.08576 (2020). https://doi.org/10.48550/arXiv.2009.08576
[2] Tanaka, Hidenori, et al. "Pruning neural networks without any data by iteratively conserving synaptic flow." Advances in neural information processing systems 33 (2020): 6377-6389. https://doi.org/10.48550/arXiv.2006.05467
[3] Wang, Chaoqi, Guodong Zhang, and Roger Grosse. "Picking winning tickets before training by preserving gradient flow." arXiv preprint arXiv:2002.07376 (2020). https://doi.org/10.48550/arXiv.2002.07376
[1] alecwangcq/GraSP: https://github.com/alecwangcq/GraSP
[2] ganguli-lab/Synaptic-Flow: https://github.com/ganguli-lab/Synaptic-Flow