This project addresses the problem of generating detailed test cases for digital product features based on screenshots. The tool uses a combination of advanced models and a user-friendly front end to achieve this goal.
DEMO - https://www.loom.com/share/ab548ceebb32480e98b2e8899e89eb95?sid=368eccde-75a4-4d73-b432-b937358b7d0c
The front end is built using Gradio, providing a simple and intuitive interface for users. Key features include:
- Text Box: For optional context.
- Multi-Image Uploader: For uploading screenshots.
- Buttons: To trigger image processing and test case generation.
The backend integrates two primary models:
- Purpose: Detects and annotates UI elements in screenshots.
- Usage: Processes images to identify and highlight various UI components.
- Purpose: Generates detailed test cases based on the processed images and optional text context.
- Usage: Receives annotated images and a mixed-approach prompt to produce comprehensive test case descriptions.
The tool utilizes a mixed approach to prompt the model effectively, combining the visual information from the UI-Detector with contextual text to generate accurate and useful test cases.
- Image Upload: Users upload screenshots through the Gradio interface.
- UI Detection: The UI-Detector model processes the images to detect and highlight UI elements.
- Image Preprocessing: Screenshots are preprocessed into a format suitable for the MLLM.
- Test Case Generation: The OpenGVLab/InternVL2-2B model, guided by a mixed-approach prompt, generates detailed test cases based on the processed images and context.
To install the required dependencies, use:
pip install -r requirements.txt