Quick Start for PP-Human

Environment Preparation

Environment requirements： PaddleDetection>= release/2.4 or develop version

Installation of PaddlePaddle and PaddleDetection

# PaddlePaddle CUDA10.1
python -m pip install paddlepaddle-gpu==2.2.2.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

# PaddlePaddle CPU
python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple

#Clone PaddleDetection repositories
cd <path/to/clone/PaddleDetection>
git clone https://github.com/PaddlePaddle/PaddleDetection.git

# Install dependencies
cd PaddleDetection
pip install -r requirements.txt

For installation details, please refer to Installation Tutorials
If you need TensorRT inference acceleration (speed measurement), please install PaddlePaddle with TensorRT version. You can download and install it from the PaddlePaddle Installation Package or follow the [Instructions](https://www. paddlepaddle.org.cn/inference/master/optimize/paddle_trt.html) or use docker, or self-compiling to prepare the environment.

Model Download

PP-Human provides object detection, attribute recognition, behaviour recognition and ReID pre-trained models for different applications. Developers can download them directly.

Task	End-to（ms）	Model Solution	Model Size
Pedestrian Detection (high precision)	25.1ms	Multi-Object Tracking	182M
Pedestrian Detection (Lightweight)	16.2ms	Multi-Object Tracking	27M
Pedestrian Tracking (high precision)	31.8ms	Multi-Object Tracking	182M
Pedestrian Tracking (Lightweight)	21.0ms	Multi-Object Tracking	27M
Attribute Recognition (high precision)	Single Person 8.5ms	Object Detection Attribute Recognition	Object Detection：182M Attribute Recogniton：86M
Attribute Recognition (Lightweight)	Single Person 7.1ms	Object Detection Attribute Recogniton	Object Detection：182M Attribute Recogniton：86M
Falling Detection	Single Person 10ms	Multi-Object Tracking Keypoint Detection Skeleton Based Action Recognition	Multi-Object Tracking：182M Keypoint Detection：101M Skeleton Based Action Recognition：21.8M
Breaking-In Detection	31.8ms	Multi-Object Tracking	Multi-Object Tracking：182M
Fighting Detection	19.7ms	Video Classification	90M
Smoking Detection	Single Person 15.1ms	Object Detection Object Detection Based On Body ID	Object Detection：182M Object Detection Based On Body ID：27M
Phone-calling Detection	Single Person 6.0ms	Object Detection Image Classification Based On Body ID	Object Detection：182M Image Classification Based On Body ID：45M

Download the model and unzip it into the . /output_inference folder.

In the configuration file, the model path defaults to the download path of the model. If the user does not change it, the corresponding model will be downloaded automatically upon inference.

Note:

Model accuracy is tested on fused datasets, which contain both open source and enterprise datasets.
ReID model accuracy is tested on the Market1501 dataset
Prediction speed is obtained at T4 with TensorRT FP16 enabled, which includes data pre-processing, model inference and post-processing.

Configuration

The PP-Human-related configuration is located in deploy/pipeline/config/infer_cfg_pphuman.yml, and this configuration file contains all the features currently supported by PP-Human. If you want to see the configuration for a specific feature, please refer to the relevant configuration in deploy/pipeline/config/examples/. In addition, the contents of the configuration file can be modified with the -ocommand line parameter. E.g. to modify the model directory of an attribute, developers can run ```-o ATTR.model_dir="DIR_PATH"``.

The features and corresponding task types are as follows.

Input	Feature	Task	Config
Image	Attribute Recognition	Object Detection Attribute Recognition	DET ATTR
Single-camera video	Attribute Recognition	Multi-Object Tracking Attribute Recognition	MOT ATTR
Single-camera video	Behaviour Recognition	Multi-Object Tracking Keypoint Detection Falling detection	MOT KPT SKELETON_ACTION

Take attribute recognition based on video input as an example: Its task type includes multi-object tracking and attributes recognition. The specific configuration is as follows.

crop_thresh: 0.5
attr_thresh: 0.5
visual: True

MOT:
  model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip
  tracker_config: deploy/pipeline/config/tracker_config.yml
  batch_size: 1
  enable: True

ATTR:
  model_dir:  https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip
  batch_size: 8
  enable: True

Note:

If developer needs to carry out different tasks, set the corresponding enables option to be True in the configuration file.

Inference Deployment

Use the default configuration directly or the configuration file in examples, or modify the configuration in infer_cfg_pphuman.yml

# Example: In pedestrian detection model, specify configuration file path and test image, and image input opens detection model by default
python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --image_file=test_image.jpg --device=gpu
# Example: In pedestrian attribute recognition, directly configure the examples
python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml --video_file=test_video.mp4 --device=gpu

Use the command line to enable functions or change the model path.

# Example: Pedestrian tracking, specify config file path, model path and test video. The specified model path on the command line has a higher priority than the config file.
python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml -o MOT.enable=True MOT.model_dir=ppyoloe_infer/ --video_file=test_video.mp4 --device=gpu

# Example: In behaviour recognition, with fall recognition as an example, enable the SKELETON_ACTION model on the command line
python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml -o SKELETON_ACTION.enbale=True --video_file=test_video.mp4 --device=gpu

For rtsp stream, use --rtsp RTSP [RTSP ...] parameter to specify one or more rtsp streams. Separate the multiple addresses with a space, or replace the video address directly after the video_file with the rtsp stream address), examples as follows

# Example: Single video stream for pedestrian attribute recognition
python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE]  --device=gpu
# Example: Multiple-video stream for pedestrian attribute recognition
python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE1] rtsp://[YOUR_RTSP_SITE2] --device=gpu                                                                      |

Jetson Deployment

Due to the large gap in computing power of the Jetson platform compared to the server, we suggest:

choose a lightweight model, especially for tracking model, ppyoloe_s: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip is recommended
For frame skipping of tracking; we recommend 2 or 3: skip_frame_num: 3

With this recommended configuration, it is possible to achieve higher speeds on the TX2 platform. It has been tested with attribute case, with speeds up to 20fps. The configuration file can be modified directly (recommended) or from the command line (not recommended due to its long fields).

Parameters

Parameters	Necessity	Implications
--config	Yes	Path to configuration file
-o	Option	Overwrite the corresponding configuration in the configuration file
--image_file	Option	Images to be predicted
--image_dir	Option	Path to the images folder to be predicted
--video_file	Option	Video to be predicted, or rtsp stream address (rtsp parameter recommended)
--rtsp	Option	rtsp video stream address, supports one or more simultaneous streams input
--camera_id	Option	The camera ID for prediction, default is -1 ( for no camera prediction, can be set to 0 - (number of cameras - 1) ), press `q` in the visualization interface during the prediction process to output the prediction result to: output/output.mp4
--device	Option	Running device, options include `CPU/GPU/XPU`, and the default is `CPU`.
--output_dir	Option	The root directory for the visualization results, and the default is output/
--run_mode	Option	For GPU, the default is paddle, with (paddle/trt_fp32/trt_fp16/trt_int8) as optional
--enable_mkldnn	Option	Whether to enable MKLDNN acceleration in CPU prediction, the default is False
--cpu_threads	Option	Set the number of cpu threads, and the default is 1
--trt_calib_mode	Option	Whether TensorRT uses the calibration function, and the default is False; set to True when using TensorRT's int8 function and False when using the PaddleSlim quantized model
--do_entrance_counting	Option	Whether to count entrance/exit traffic flows, the default is False
--draw_center_traj	Option	Whether to map the trace, the default is False
--region_type	Option	'horizontal' (default), 'vertical': traffic count direction; 'custom': set break-in area
--region_polygon	Option	Set the coordinates of the polygon multipoint in the break-in area. No default.
--do_break_in_counting	Option	Area break-in checks

Solutions

The overall solution for PP-Human v2 is shown in the graph below:

Pedestrian detection

Take PP-YOLOE L as the object detection model
For detailed documentation, please refer to [PP-YOLOE](... /... /... /... /configs/ppyoloe/) and Multiple-Object-Tracking

Pedestrian tracking

Vehicle tracking by SDE solution
Adopt PP-YOLOE L (high precision) and S (lightweight) for detection models
Adopt the OC-SORT solution for racking module
Refer to [OC-SORT](... /... /... /... /configs/mot/ocsort) and Multi-Object Tracking for details

Multi-camera & multi-pedestrain tracking

Use PP-YOLOE & OC-SORT to acquire single-camera multi-object tracking trajectory
Extract features for each frame using ReID (StrongBaseline network).
Match multi-camera trajectory features to obtain multi-camera tracking results.
Refer to Multi-camera & multi-pedestrain tracking for details.

Attribute Recognition

Use PP-YOLOE + OC-SORT to track the human body.
Use PP-HGNet, PP-LCNet (multi-classification model) to complete the attribute recognition. Main attributes include age, gender, hat, eyes, top and bottom dressing style, backpack.
Refer to attribute recognition for details.

Behaviour Recognition:

Four behaviour recognition solutions are provided:
1. Behaviour recognition based on skeletal points, e.g. falling recognition
1. Behaviour recognition based on image classification, e.g. phone call recognition
1. Behaviour recognition based on detection, e.g. smoking recognition
1. Behaviour recognition based on Video classification, e.g. fighting recognition
For details, please refer to Behaviour Recognition

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPHuman_QUICK_STARTED_en.md

PPHuman_QUICK_STARTED_en.md

Quick Start for PP-Human

Contents

Environment Preparation

Model Download

Configuration

Inference Deployment

Jetson Deployment

Parameters

Solutions

Pedestrian detection

Pedestrian tracking

Multi-camera & multi-pedestrain tracking

Attribute Recognition

Behaviour Recognition:

Files

PPHuman_QUICK_STARTED_en.md

Latest commit

History

PPHuman_QUICK_STARTED_en.md

File metadata and controls

Quick Start for PP-Human

Contents

Environment Preparation

Model Download

Configuration

Inference Deployment

Jetson Deployment

Parameters

Solutions

Pedestrian detection

Pedestrian tracking

Multi-camera & multi-pedestrain tracking

Attribute Recognition

Behaviour Recognition: