GiantRepair

Artifact for TOSEM Submission: GiantRepair

I. Introduction
II. Project Structure
III. Environment
IV. How to Run
V. Ablation Results
VI. Discussion Results

I. Introduction

Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers. Recently, LLM-based APR methods have shown promise in repairing real-world bugs. However, existing APR methods often utilize patches generated by LLMs without further optimization, resulting in reduced effectiveness due to the lack of program-specific knowledge. Furthermore, the evaluations of these APR methods have typically been conducted under the assumption of perfect fault localization, which may not accurately reflect their real-world effectiveness. To address these limitations, this paper introduces an innovative APR approach called GIANTREPAIR. Our approach leverages the insight that LLM-generated patches, although not necessarily correct, offer valuable guidance for the patch generation process. Based on this insight, GIANTREPAIR first constructs patch skeletons from LLM-generated patches to confine the patch space, and then generates high-quality patches tailored to specific programs through context-aware patch generation by instantiating the skeletons. To evaluate the performance of our approach, we conduct two large-scale experiments. The results demonstrate that GIANTREPAIR not only effectively repairs more bugs (an average of 27.78% on Defects4J v1.2 and 23.40% on Defects4J v2.0) than using LLM-generated patches directly, but also outperforms state-of-the-art APR methods by repairing at least 42 and 7 more bugs under perfect and automated fault localization scenarios, respectively.

$The workflow of this GaintRepair.\label{workflow}$

II. Project Structure

├── GiantRepair: GiantRepair's Java implementation
├── LLM_Inference: Code to apply LLMs to APR task
│   ├── Models
│   ├── run_apr.py
│   ├── script_runapr.sh
│   ├── test_llm.py
│   └── utils
├── README.md
├── doc
├── results: Specific Results used in paper.
└── d4j-info: Analysisi results of Defects4J and GrowingBugs Dataset
    ├── filelist.json
    ├── growing_bugs_filelist.json
    ├── growing_bugs_single_function.json
    ├── growing_bugs_single_function_expand.json
    ├── linelist.json
    └── single_function_repair.json

III. Environment

GiantRepair

OS: Linux (Tested on Ubuntu 20.04.6 LTS)
OpenJDK 1.8.0_382 and OpenJDK 11.0.20.1
Download and configure Defects4J and ExpressAPR.
More runtime configurations can be found in the config-file.

LLM

Python==3.9
transformers==4.33.3

IV. How to Run

Prepare

Defects4J Setting

defects4j checkout -p Chart -v 1b -w ${buggy_program_path}/chart/chart_1_buggy

ExpressAPR Setting, shown in Link
Modify GiantRepair's setting in configfile

then

java -jar GiantRepair repair -d4j {bugid} -d4jhome {buggy_program_path} -modelname {modelName}

bugid should be like proj_idnum all in lowercase.

V. Ablation Results

In oreder to study the contribution of various components in GIANTREPAIR to the overall performance, we have set up the following three variants:

GiantRepair_selection will randomly select code elements from the project to fill the code skeletons, rather than being constrained by syntatic rules.
GiantRepair_context will test the generated patches in the order of generation, rather than rank by the similarities.
GiantRepair_adaptive will randomly select modifications from LLM patches, rather than apply coarse-grained modifications.

We conduct the experiment on Defects4J v1.2 single-function bugs, and the results shows in following table:

variant	#Plausible Fixes	#Correct Fixes	%Precision
GiantRepair_selection	123	46	37.40%
GiantRepair_context	129	51	39.53%
GiantRepair_adaptive	125	49	39.20%
GiantRepair_ori	135	55	40.74%

Thie table shows the numebr of plausible fixes, correct fixes and precision value for each of the three variants. We first observe that just randomly filling code skeletons, we achieve the lowest number of plausible fixes and precision value. And by disable the Context similarity and Adaptive application, these variant also have drop on the number of plausible and correct fixes. As a result, all the components contribute to the overall effectiveness of GiantRepair. GiantRepair can effectively produce more plausible/correct fixes by utilizing LLM-generated patches.

VI. Discussion Results

Experiment with the GPT-4-1106-preview

To investigate whether or not GiantRepair is still effective for repairing unique bugs when comparing to the most advanced LLMs, we conducted another experiment with GPT-4.Specifically, we randomly selected ten bugs that were correctly repaired by GIANTREPAIR but cannot by the studied LLMs, and then invoked GPT-4 via API requests to generate 20 patches for each bug. Here is the outcome table:

Bug ids	Closure-19	Closure-36	Closure-113	Lang-57	Math-27	Math-85	Cli-32	Codec-4	Compress-1	Jsoup-33
GPT-4-1106-preview	$\checkmark$	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$
GiantRepair	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$

Data leakage

In Discussion's Data leakage part, we not only showcase GiantRepair's effectiveness in addressing data leakage concerns by examining the StarCoder training dataset, but we also seek to further substantiate this conclusion. To achieve this, we employed the GrowingBugs dataset for additional experimentation. Remarkably, GiantRepair managed to successfully rectify 10 out of the 51 bugs identified. The detailed data are presented in the tables below:

Project	Bugs#SF	GiantRepair
Canvas_api	2	1
Dosgi_common	1	1
Hono_client	2	0
Tika_app	1	1
HttpClient5	2	0
JacksonDatatypeJsr310	1	0
JacksonModuleAfterburner	1	1
Switchyard_admin	1	1
Qpidjms_client	1	0
Tiles_api	1	0
Tiles_core	2	0
Wicket_request	5	0
Wicket_util	4	1
Wicket_spring	1	0
Struts1_core	2	0
Wicket_core	10	2
Cargo_container	3	0
Jcodemodel	1	1
Vectorz	2	0
Restfixture	2	0
Xades4j	1	0
Render_app	1	0
Leshan_core	4	1
Total	51	10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GiantRepair

I. Introduction

II. Project Structure

III. Environment

GiantRepair

LLM

IV. How to Run

Prepare

V. Ablation Results

VI. Discussion Results

Experiment with the GPT-4-1106-preview

Data leakage

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
GiantRepair		GiantRepair
LLM_Inference		LLM_Inference
d4j-info		d4j-info
data		data
doc/figure		doc/figure
tmp/defects4j_buggy/closure		tmp/defects4j_buggy/closure
.gitignore		.gitignore
README.md		README.md
tmp.py		tmp.py

Feng-Jay/GiantRepair

Folders and files

Latest commit

History

Repository files navigation

GiantRepair

I. Introduction

II. Project Structure

III. Environment

GiantRepair

LLM

IV. How to Run

Prepare

V. Ablation Results

VI. Discussion Results

Experiment with the GPT-4-1106-preview

Data leakage

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages