Represent Code as Action Sequence for Predicting Next Method Call

This repository implements the approach described in the paper Represent Code as Action Sequence for Predicting Next Method Call. The core concept is to model method calls in Python code as "actions" similar to human actions, leveraging both the coding context and method invocation sequences to predict future method calls. This approach enhances the accuracy and context-awareness of code completion tools.

Overview

Code completion is an essential feature in modern IDEs, directly boosting developer productivity. This project focuses on utilizing large-scale Python code repositories to predict the next method call a developer might write. Inspired by natural language processing (NLP) models, we treat code as sequences of actions, combining coding context (project, file, function) with method invocations and their parameters.

The method proposed and implemented in this repository shows a 32.36% improvement over baseline models such as GPT-2(powerful when writing the paper) , particularly in next-method-call token prediction.

Key Features

Contextual Action Modeling:
- Code is modeled as a sequence of actions, including the context (project, file, and function structure), allowing the model to understand the developer's intent better.
Transformer-Based Predictions:
- A GPT-like architecture is used to predict the next method call based on the past sequence of actions.
Action Extraction Tools:
- Provides utilities for parsing Python code into Abstract Syntax Trees (ASTs) and extracting method calls, assignments, and other relevant actions.
Data Processing and Analysis:
- Tools to collect, preprocess, and analyze large Python codebases, making this approach scalable for large-scale experiments.

Repository Structure

├── README.md                              # Project description and usage instructions
├── Represent_Code_as_Action_Sequence_for_Predicting_Next_Method_Call_draft.pdf  # Paper draft
├── slides_internetware_2022_Represent Code as Action Sequence for Predicting Next Method Call.pdf # Conference slides
├── sample/py_demo.py                      # Example for quick demonstration of the idea
└── src/analysis/                          # Core analysis and model scripts

The repository also includes a paper draft and the slides used in the Internetware 2022 conference for a deeper understanding of the work.

How It Works

Action Extraction:
- The call_extractor.py script parses Python code into ASTs. It then extracts method calls and related actions, such as assignments, and models them as action sequences. Each action contains attributes like context, actor, and parameters.
Modeling:
- The extracted action sequences are flattened and processed into a format suitable for transformer-based models. We use a fine-tuned GPT-2 to predict the next method call based on past actions.

Training the Model

To train (or fine-tune) the model for method call prediction:

First, collect and prepare Python project files.
Then, run the scripts under the src/analysis/ folder to extract the action sequences and prepare the training data.
Finally, feed the processed data into the transformer-based model (such as GPT) for training.

For more details, refer to the original paper or the provided slides.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Represent Code as Action Sequence for Predicting Next Method Call

Overview

Key Features

Repository Structure

How It Works

Training the Model

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
sample		sample
src/analysis		src/analysis
.gitignore		.gitignore
README.md		README.md
Represent_Code_as_Action_Sequence_for_Predicting_Next_Method_Call_draft.pdf		Represent_Code_as_Action_Sequence_for_Predicting_Next_Method_Call_draft.pdf
slides_internetware_2022_Represent Code as Action Sequence for Predicting Next Method Call.pdf		slides_internetware_2022_Represent Code as Action Sequence for Predicting Next Method Call.pdf

JY0284/code_completion_as_human_action_prediction

Folders and files

Latest commit

History

Repository files navigation

Represent Code as Action Sequence for Predicting Next Method Call

Overview

Key Features

Repository Structure

How It Works

Training the Model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages