With advances in deep neural networks (DNNs), there has been a proliferation of revolutionizing products. Yet, the popularization of such DNN-powered applications is hindered because their success is built on intensive computations and powerful GPUs. This project aims at accelerating deep neural networks on edge devices using Intel Neural Compute Stick 2 (NCS). We show that NCS is capable of speeding up the inference time of complicated neural networks, which can be efficient enough to run locally on edge devices. Such acceleration paves the way to develop ensemble learning on the edge for performance improvement. As a motivating example, we exploit object detection to be the application and utilize the well-known algorithm YOLOv3 [Link] as the detector.
We develop a web-based visualization platform to support (ensemble) object detection. Photos and videos from local files or webcam are supported. Frame per second (FPS) is displayed to indicate the speed-up made by one or more NCS devices running in parallel. The architecture of the system is as below:
We provide a video below to demonstrate this project. You may run the source code to replicate the demo using NCS devices. Make sure you have turned on the "Subtitles" on YouTube.
This project consists of two components: (1) client-side for detection visualization and (2) server-side hosting the Intel Compute Sticks. You should follow the instruction carefully to start each component. The following instruction assumes that you have already installed OpenVINO according to the guideline [Link] provided by Intel to run NCS on your machine.
The client is a React app. The UI allows three types of inputs - image, video and webcam stream. If the input is an image, it is converted to a base64 encoded string and sent to server. The server decodes the image and runs the detection algorithm on it. Server then sends back the prediction results - class, score and bounding box locations - which are then displayed on the UI. For video or video stream captured through UI, the workflow is same, except that in this case, each frame of the video is encoded as base64 string and sent to server.
To run the client, follow the below steps
cd app/
npm install
npm start
The client runs on port 3000.
The server is a Flask server:
- Install Flask with
pip install Flask
- Setup OpenVINO env as instructed by Intel
- Run
python3 server_parallel.py
- Plugin the device one by one according to the instruction
The server supports two APIs
-
POST /detect_objects
This API is used to send the image data to the server.
URL:
http://{device ip}:5000/detect_objects
request body:
{ "image": "Base 64 encoded image, as a string", "mode": "“parallel” or “ensemble”", "models": "one or more model names, as an array" }
The server accepts the request data and immediately responds with 200 response code. The data is processed asynchronously and the prediction results are stored in the server until requested by the client.
-
GET /detect_objects_response
This API is used to retrieve the prediction results from the server. The client repeatedly polls the server with this API until a response is given.
URL:
http://{device ip}:5000/detect_objects_response?models=<model names>
Response:
with 'mode' == 'parallel'
{ "model1": [ { "bbox": [ 1, 0, 200, 200 ], "class": "person", "score": 0.838 } ], "model2": [ { "bbox": [ 1, 0, 200, 200 ], "class": "person", "score": 0.838 } ], "model3": [ { "bbox": [ 1, 0, 200, 200 ], "class": "person", "score": 0.838 } ] }
with 'mode' == 'ensemble'
{ "all": [ { "bbox": [ 1, 0, 200, 200 ], "class": "person", "score": 0.838 } ] }
If you encounter the following message multiple times during the model loading, unplug NCS devices, wait for a few seconds, and reinstall them to the host.
Failed, trying again in 2 second(s)
Apart from the installation of OpenVINO for NCS devices, the dependencies of this project have been listed in requirements.txt
. The current version of this project is tested on the following environment:
- Operating System: Ubuntu 18.04.3 LTS
- Python Version: Python 3.6.8
- Web Browser: Firefox
The object detection models in this repository are ten tiny YOLOv3 trained with the PASCAL VOC dataset having different input resolution.
This project is developed based on two repositories:
- PINTO0309/OpenVINO-YoloV3: https://github.com/PINTO0309/OpenVINO-YoloV3
- opencv/open_model_zoo: https://github.com/opencv/open_model_zoo
This project is managed and maintained by Ka-Ho Chow with the following contributors from Georgia Tech (ordered according to the last name):
- Ka-Ho Chow
- Yu-Lin Chung
- Quang Huynh
- Hung-Yi Li
- Sonia Mathew
- Víctor Castro Serrano
Contributions are welcomed! Please contact Ka-Ho Chow ([email protected]) if you have any problem.