Multimodal LLM (MLLM) UI Test Case Generator

Overview

This project addresses the problem of generating detailed test cases for digital product features based on screenshots. The tool uses a combination of advanced models and a user-friendly front end to achieve this goal.

DEMO - https://www.loom.com/share/ab548ceebb32480e98b2e8899e89eb95?sid=368eccde-75a4-4d73-b432-b937358b7d0c

Flow

Frontend

The front end is built using Gradio, providing a simple and intuitive interface for users. Key features include:

Text Box: For optional context.
Multi-Image Uploader: For uploading screenshots.
Buttons: To trigger image processing and test case generation.

Backend

The backend integrates two primary models:

1. UI-Detector Computer Vision Project from Roboflow

Purpose: Detects and annotates UI elements in screenshots.
Usage: Processes images to identify and highlight various UI components.

2. OpenGVLab/InternVL2-2B

Purpose: Generates detailed test cases based on the processed images and optional text context.
Usage: Receives annotated images and a mixed-approach prompt to produce comprehensive test case descriptions.

Mixed Approach Prompt

The tool utilizes a mixed approach to prompt the model effectively, combining the visual information from the UI-Detector with contextual text to generate accurate and useful test cases.

How It Works

Image Upload: Users upload screenshots through the Gradio interface.
UI Detection: The UI-Detector model processes the images to detect and highlight UI elements.
Image Preprocessing: Screenshots are preprocessed into a format suitable for the MLLM.
Test Case Generation: The OpenGVLab/InternVL2-2B model, guided by a mixed-approach prompt, generates detailed test cases based on the processed images and context.

Installation

To install the required dependencies, use:

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
MLLM_testcase.ipynb		MLLM_testcase.ipynb
README.md		README.md
flwchrtt.png		flwchrtt.png
llama3_2_11b.py		llama3_2_11b.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal LLM (MLLM) UI Test Case Generator

Overview

DEMO - https://www.loom.com/share/ab548ceebb32480e98b2e8899e89eb95?sid=368eccde-75a4-4d73-b432-b937358b7d0c

Flow

Frontend

Backend

1. UI-Detector Computer Vision Project from Roboflow

2. OpenGVLab/InternVL2-2B

Mixed Approach Prompt

How It Works

Installation

About

Releases

Packages

Languages

mavihsrr/MLLM-UI-Test-Case

Folders and files

Latest commit

History

Repository files navigation

Multimodal LLM (MLLM) UI Test Case Generator

Overview

DEMO - https://www.loom.com/share/ab548ceebb32480e98b2e8899e89eb95?sid=368eccde-75a4-4d73-b432-b937358b7d0c

Flow

Frontend

Backend

1. UI-Detector Computer Vision Project from Roboflow

2. OpenGVLab/InternVL2-2B

Mixed Approach Prompt

How It Works

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages