AI4Code

A Coding Challenge to match Markdown comments to code in Jupyter notebook

"The goal of this competition is to understand the relationship between code and comments in Python notebooks. You are challenged to reconstruct the order of markdown cells in a given notebook based on the order of the code cells, demonstrating comprehension of which natural language references which code."

Notebook 1: A code snippet to reconstruct Jupyter notebooks from the JSON file input
Notebook 2: A test of Assymetric Syntax Tree (AST) that allows us to parse the python code and extract crucial features
Notebook 3: A test of NLP cleaning methods
Notebook 4: Preprocessing step. This Notebook does most of the heavy lifting and provides most of the annotations. It defines a python decorator method of implement a pipeline. The main functions are:
- skims and divides the data
- processes text with traditional NLP methods
- apply AST and find out about potential inheritance information
- pair cells to make final matrix ready for classifiers
- creates custom validation set, as the validation and training samples must not share entries from the same jupyter notebook
Notebook 5: Machine Learning Classifiers and plot and metric of result
Notebook X: A demonstration of the metric used in the competition and make sense of it

Data is not included in this repo, as it is 2GB large. Instructions to download can be found here: https://www.kaggle.com/c/AI4Code

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
visualize		visualize
N1_reconstruct_nb.ipynb		N1_reconstruct_nb.ipynb
N2_test_ast.ipynb		N2_test_ast.ipynb
N3_nlpclean.ipynb		N3_nlpclean.ipynb
N4_preprocess_pipeline.ipynb		N4_preprocess_pipeline.ipynb
N5_Training.ipynb		N5_Training.ipynb
NX_kendall_tau.ipynb		NX_kendall_tau.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI4Code

A Coding Challenge to match Markdown comments to code in Jupyter notebook

Table of Contents

About

Releases

Packages

Languages

sciencecw/AI4Code

Folders and files

Latest commit

History

Repository files navigation

AI4Code

A Coding Challenge to match Markdown comments to code in Jupyter notebook

Table of Contents

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages