layout	title
default	A Survey of Machine Learning for Big Code and Naturalness

Machine Learning on Source Code

The billions of lines of source code that have been written contain implicit knowledge about how to write good code, code that is easy to read and to debug. A recent line of research aims to find statistical patterns in large corpora of code to drive new software development tools and program analyses.

This website and the accompanying article surveys the work in this emerging area.

Like writing and speaking, software development is an act of human communication. At its core, the naturalness of software employs statistical modeling over big code to reason about rich variety of programs developers write. This new line of research is inherently interdisciplinary, uniting the machine learning and natural language processing communities with software engineering and programming language communities.

🏷 Browse Papers by Tag

{% assign rawtags = Array.new %} {% for publication in site.publications %} {% assign ttags = publication.tags %}
{% assign rawtags = rawtags | concat: ttags %}
{% endfor %} {% assign rawtags = rawtags | uniq | sort_natural %} {% for tag in rawtags %}{{ tag }} {% endfor %}

About This Site

This site is an experiment: a living literature review that allows you explore, [search and navigate]({% link papers.html %}) the literature in this area. The full survey is available as a research paper. Please cite as

@article{allamanis2018survey,
  title={A survey of machine learning for big code and naturalness},
  author={Allamanis, Miltiadis and Barr, Earl T and Devanbu, Premkumar and Sutton, Charles},
  journal={ACM Computing Surveys (CSUR)},
  volume={51},
  number={4},
  pages={81},
  year={2018},
  publisher={ACM}
}

Contributing

This research area is evolving so fast that a static review cannot keep up. But a website can! We hope to make this site a living document. Anyone can add a paper to this web site, essentially by creating one Markdown file. To contribute, open a pull request in GitHub, by following these instructions for contributing.

Contributors

The core survey and the original taxonomy was created by

Miltos Allamanis Microsoft Research, Cambridge, UK
Earl T. Barr University College London, London, UK
Prem Devanbu University of California, Davis, USA
Charles Sutton University of Edinburgh and The Alan Turing Institute, UK

Contributors to the website

This website accepts external contributions. Please, feel free to add your name below, once you contribute to this website. A comprehensive list can be found here.

Uri Alon Technion, Israel
Shaked Brody Technion, Israel
Nghi D. Q. Bui Singapore Management University, Singapore
Rajaswa Patil Microsoft PROSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Machine Learning on Source Code

🏷 Browse Papers by Tag

About This Site

Contributing

Contributors

Contributors to the website

Files

index.md

Latest commit

History

index.md

File metadata and controls

Machine Learning on Source Code

🏷 Browse Papers by Tag

About This Site

Contributing

Contributors

Contributors to the website