Running full Yolo V2 on iPad at 1280x720. A rough proof of concept.
- Pre-trained model weights from Coursera Deep Learning Specialization's Course 4 on Convolutional Network.
- Custom layer for tf.space_to_depth and final activations written with Apple Accelerate (this is probably suboptimal, see below).
Since iOS 12, Apple has new framework support such that you don't have to implement your own filtering and non-max suppression post processing. For now, I think this is true for model created with "Create ML". Since the pre-trained model is from Keras, I have yet to figure out this out. The custom layers are written using Accelerate Framework. Presumably, writing them in lower level code or Metal Performance Shader will speed things up. Currently, it takes roughly 250-300ms to complete an inference, which is too slow for the purpose of live stream and tracking moving objects.
- Git clone this project.
- Go to Release and download YoloV2DetectionModel.mlmodel
- Open the project in xcode and drop this .mlmodel into the mlModels folder and compile/run. The project is configured to run only in landscape on iPad but you can modify this as pleased.
References and Inspirations: