The project takes a video of the user as input and attempts at creating a realistic cloth try on video as the output by the methods of pose estimation, mesh movements in unity, bone mapping and superimposition to process the final output.
- About the Project
- Process Flow
- Applications
- File Structure
- Installations and Execution
- Tech Stack
- Future Prospects
- Contributors
- Acknowledgements and Resources
Our approach consists of first performing Pose estimation of the given video. The pose received from this step is sent to Unity engine. The keypoints received are mapped on the mesh present in Unity.
We create pose simulation of the cloth using the keypoints received from Mediapipe. This simulation runs on a rigged mesh i.e a mesh skinned to the skeleton and each joint having its weight painted.
After the simulation is created, this video is then sent to a python script which uses OpenCV functions for superimposition. The script considers the keypoints of the human body while superimposition. The shoulder and hips coordinates are considered.
- Input: The input will be a video of the user. The video file should be in .mp4 file format.
- Pose estimation: For the given video, pose estimation will be applied. As we are using Mediapipe for pose estimation we get the coordinates of 33 different keypoints for a single frame. These coordinates are then sent to Unity for creating the cloth simulation video.
- Replicating human pose in Unity: Using the keypoints received from Mediapipe, each keypoint is mapped to its corresponding bone/joint in the mesh imported in Unity. We also make sure the mesh is rigged to avoid any unexpected movements. Also directions are mapped to the bones to avoid deformations of the mesh while output is being generated.
- Generating simulation video: Using the movements of cloth mesh the simulation video is generated. We make sure that the simulation and input video have same length to avoid lagging of cloth mesh in the final output.
- Superimposition of videos: We use the functions offered by the OpenCV in order to superimpose the input video and simulation video.
-
Online Shopping: Allows customers to virtually try on clothing before purchasing. Helps customers make more informed decisions, potentially reducing return rates.
-
Lighting: Visualize different lighting options and their effects.
-
Virtual Photo Booths: Create fun, shareable images with virtual costumes and props.
📦Virtual-Try-On-Products
┣ 📂assets # Contains gifs, objs and images of the results
┣ 📂2D # Approaches and scripts for 2D TryOn
┃ ┣ ACGPN # Used to create and save masks of objects from input image
┃ ┣ End-to-End # Run this notebook to get results
┣ 📂3D
┃ ┣ 3D Pose Estimation # Used to estimate pose and keypoints
┃ ┣ Unity # Scripts for integrating mediapipe to Unity and pose update scripts
┃ ┣ DigiHuman # Scripts for rigging and pose update in Unity.
┃ ┣ Superimposition # Blend simulated mesh with reference video.
┣ 📜README.md
┣ 📜demo_video.gif # Demo Video
┣ 📜project_report.docx # Project Report
┗ 📜requirements.txt # Requirements
Cloning DigiHuman
git clone https://github.com/Danial-Kord/DigiHuman.git
Add your input video in DigiHuman
Save the simulation video generated
Cloning into device
git clone https://github.com/Mr-MVP/Virtual-Try-On-Products-.git
cd Virtual-Try-On-Products-
Create a virtual env for the project
pip install requirements.txt
Create a new folder for saving your input/output videos
cd 3D
Add your videos as input in superimpose.py
Run the script
python superimpose.py
The output will be saved as a .mp4 file
-
Unity Engine
-
OpenCV
-
Mediapipe
-
Python, C#
-
We are able to create an output video. In future we aim at making the project realtime.
-
The mesh used is not properly scaled with the human body during superimposition. We aim at improving the quality of mesh either by modifying the mesh we currently use or by making one from scratch.
-
Currently, we do not consider the lightning in the video's environment. Therefore we plan to take into consideration the lightning conditions which will improve the output quality in different surroundings of the user.