Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Source code transformations #241

Open
bzz opened this issue Aug 28, 2018 · 12 comments
Open

[PROPOSAL] Source code transformations #241

bzz opened this issue Aug 28, 2018 · 12 comments
Labels

Comments

@bzz
Copy link
Contributor

bzz commented Aug 28, 2018

Idea comes from #233 (comment)

  • Title: Source code match/traverse/transform APIs
  • Author(s): Alex, ?
  • Short description: An overview of existing approaches to match/traverse/transform source code
  • Categories: language analysis
  • Deadlines: no

Table of contents

Very rough grounds that would be covered

OSS:

  • Golang: go fix/go fmt -r
  • Cpp: clang-tidy
  • C: coccinelle
  • Java: JTransformer
  • Example-based refactorings: Java: error-prone/ Golang: eg
  • Python: Bowler

Proprietary/from talks or papers (material)

  • ClangMR/JavacFlume (see details section below)
  • Semmle QL (only query e.g though non-OSS CLI)

with some basic examples of using each API, and conclusion why bblfsh is the best tool for us.

Management

This section will be filled by @campoy.

  • State: (proposed | writing | written | published)
  • Scheduled:
  • Link to post:

Social Media

  • Wording for tweet:
  • Hashtags:
  • Subreddits:

NOTE Please write in short lines so the review is easier to do.


Preliminary content comes from prev. blog pos

{{% center %}} … {{% /center %}}

## Technical details

Based on the internal success-story of C++ with [ClangMR tool](https://research.google.com/pubs/pub41342.html)
for matching/traversing/transforming Abstract Syntax Tree (AST) at scale, a similar tooling was built for Java.

{{% youtube ZpvvmvITOrk %}}

Project [Error-Prone](https://github.com/google/error-prone) is a compiler extension that is able to perform
arbitrary analysis on the *fully typed AST*. One thing to notice is that one can not get such input by using
only a parser even as advanced as [babelfish](https://doc.bblf.sh/). Running a full build would be
required in order to do things like symbol resolution. In the end, after running a number of checker plugins
Error-Prone outputs simple text replacements with suggested code fixes.

The project is open source and is well documented in a [number](https://research.google.com/pubs/pub38275.html)
of [papers](https://research.google.com/pubs/pub41876.html). Another closed source tool was built to scale
application of those fixes to the whole codebase, called JavacFlume — which I would guess looks something like
an Apache Spark job that applies patches in some generic format.

Here is an example of how a full pipeline looks for C++:

{{% grid %}}
{{% caption src="https://cdn-images-1.medium.com/max/4224/1*KpJ5fj4njR1HTDfzhLCQkg.png" title="ClangMR processing pipeline ilustration"%}}
“Large-Scale Automated Refactoring Using ClangMR” by
[Hyrum Wright](https://research.google.com/pubs/HyrumWright.html), Daniel Jasper, Manuel Klimek, [Chandler Carruth](https://research.google.com/pubs/ChandlerCarruth.html), Zhanyong Wan
{{% /caption %}}
{{% /grid %}}

Although it is not disclosed, an attentive reader might have noticed that **Compilation Index** part of the
pipeline is very similar to a [Compilation Database](https://kythe.io/docs/kythe-compilation-database.html)
in the open source Kythe project.

It might be interesting to take a closer look at the example of an API for AST query and transformation for C++.

### C++ Example
> *rename all calls to Foo::Bar with 1 argument to Foo::Baz, independent of the name of the instance variable,
> or whether it is called directly or by pointer or reference*

{{% grid %}}
{{% grid-cell %}}
![API example: invoke a callback function on call to Foo:Bar](https://cdn-images-1.medium.com/max/2000/1*vOYemTlJ2QZyzXvizSy5Og.png)
{{% /grid-cell %}}
{{% grid-cell %}}
This fragment will invoke a callback function on any occurrence of the call to *Foo:Bar* with single argument.
{{% /grid-cell %}}
{{% /grid %}}

{{% grid %}}
{{% grid-cell %}}
![API example: replace matching text of the function name with the "Baz"](https://cdn-images-1.medium.com/max/2116/1*JiUgO-gimsIi2JpRB9LYeg.png)
{{% /grid-cell %}}
{{% grid-cell %}}
This callback will generate a code transformation: for the matched nodes it will replace the matching text of
the function name with the “Baz”.

Regarding code transformations in Java, **Error-Prone** has a similar low-level [patching API](http://errorprone.info/docs/patching)
that is very close to native AST manipulation API. It was found to have a steep learning curve similar to the
Clang, and thus pose a high entry barrier — even an experienced engineer would need few weeks before one can be
productive creating fix suggestions or refactorings.
{{% /grid-cell %}}
{{% /grid %}}

That is why a higher level API was built for Java: first as the separate [Refaster](https://research.google.com/pubs/pub41876.html)
project and then [integrated into Error-Prone](http://errorprone.info/docs/refaster) later.

So a usual workflow would look like — after running all the checks and emitting a collection of suggested
fixes, shard diffs to smaller patches, run all the tests over the changes and if they have passed, submit
patches for code review.

{{% center %}} … {{% /center %}}

{{% center %}}
##### Thank you for reading, stay tuned and keep you codebase healthy!
{{% /center %}}
@campoy
Copy link
Contributor

campoy commented Oct 12, 2018

Hey Alex, maybe I'm lacking knowledge here but the title doesn't mean anything to me.
Could you make it more beginner friendly?

@bzz
Copy link
Contributor Author

bzz commented Oct 24, 2018

Thank you for feedback, Francesc! It's totally WIP as I'm just gaining confidence in existing tools in this field.

The plan is basically to cover some "state of the art" tools for AST transformation (AKA refactoring), so the learnings could be applied to Bblfsh UAST manipulation API.

How about the title along the lines of "Source code transformations"?

OSS:

Proprietary/from talks or papers (material)

  • ClangMR/JavacFlume
  • Semmle QL (only query)

@bzz bzz changed the title [PROPOSAL] Source code match/traverse/transform APIs [PROPOSAL] Source code transformations Oct 24, 2018
@campoy
Copy link
Contributor

campoy commented Oct 24, 2018

Source code transformations makes it much more clear to me, yeah.
Let me know when you have a draft of the blog so I can review.

I'd be curious to see if we can make it so the blog doesn't feel like a series of tools, and instead there's a story tying everything up.

@bzz
Copy link
Contributor Author

bzz commented Oct 25, 2018

I'd be curious to see if we can make it so the blog doesn't feel like a series of tools, and instead there's a story tying everything up.

that is very useful feedback, thank you and please let me think more about that.
I would expect that even initial draft will take some time though - but will post it here asap.

Thanks again.

@bzz
Copy link
Contributor Author

bzz commented Nov 12, 2018

@campoy One story I can think of is:

take simple-but-educational example(s) of some issue in the code as a motivation, and then go though implementing:

  • a code to detect it
  • a code to suggest a fix for it
    in each of those systems.

Due to differences in host languages it could be hard to pick a single example, so it can be adjusted a bit for each specific language, keeping it sufficiently high-level.

A Nice 🍒 on top could be finishing it with the link to a blog post on "how to wrap it as a lookout analyzer" from #249 .

WDYT?

@campoy
Copy link
Contributor

campoy commented Nov 14, 2018

I like it, even if we find an example that only works for a specific language it should be easy to get people from other language communities understand the point of the article.

@kuba--
Copy link

kuba-- commented Nov 29, 2018

Refactoring prolog code: https://pdfs.semanticscholar.org/b48b/bc30427ef7429db83e190f91a579442121b6.pdf

@vcoisne
Copy link
Contributor

vcoisne commented Nov 30, 2018

@bzz did you get a chance to start a draft ?

@bzz
Copy link
Contributor Author

bzz commented Dec 3, 2018

Very preliminary - this is fairly ambitious and requires a lot of research. I would expect a shareble draft early next year.

@vcoisne
Copy link
Contributor

vcoisne commented Jan 18, 2019

@bzz Trying to plan our blog schedule for the upcoming weeks. Did you get a chance to work on this draft ?

@bzz
Copy link
Contributor Author

bzz commented Jan 24, 2019

@vcoisne did some progress on research but not there yet.
I will ping you as soon as have some results to share!

@bzz
Copy link
Contributor Author

bzz commented Oct 29, 2019

This is still in my backlog.

Two more interesting contenders added to the description:

  • https://github.com/google/pasta for python
  • https://comby.dev for assembly, Bash, C/C++, C#, Clojure, CSS, Dart, Elm, Elixir, Erlang, Fortran, F#, Go, Haskell, HTML/XML, Java, Javascript/Typescript, JSON, Julia, LaTeX, Lisp, OCaml, Pascal, PHP, Python, Ruby, Rust, Scala, SQL, Swift, Text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants