Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use erlfmt for parsing #979

Merged
merged 12 commits into from
May 3, 2021
Merged

Conversation

gomoripeti
Copy link
Contributor

Description

Use erlfmt instead of els_dodger for parsing. Uses modified version of erlfmt_ast from PR WhatsApp/erlfmt#237 to convert from erlfmt parse tree to erl_syntax:syntaxTree. Expected benefits:

  • more precise locations
  • parse exotic macros

TODO

@gomoripeti
Copy link
Contributor Author

In commit 608fa18 I needed to apply a workaround for hover macro definitions because the magic tuples from erlfmt_ast started to surface https://github.com/erlang-ls/erlang_ls/pull/979/files#diff-cc9d509044808779837f7175898eeab88501b2666fcf925adfc43abc1104d1d5R483-R484

@@ -0,0 +1,1290 @@
%% Copyright (c) Facebook, Inc. and its affiliates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why forking it, instead of adding it as a dependency?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, saw the branch now. What's the plan long term for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that branch will eventually be merged and erlfmt_ast be a part of erlfmt and we can remove this temporary copy from erlang_ls

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalmuskala Any insights on that?

@robertoaloi
Copy link
Member

@gomoripeti I don't have a strong opinion on this. What we can do is to try out the new parser by merging this PR. We can then see how the "early adopters" (who point to HEAD) feel and then we decide is to definitely move to erlfmt or roll back. Yes, it would be nice to get rid of the fork. We should also look at removing the dodger completely (right now its's also used by the bound-variable-in-pattern code. So, if you squash this, we can get it in and follow up with the rest.

@gomoripeti
Copy link
Contributor Author

Hi Roberto, sorry, I had little time to move this forward. I did the squash (kept erlfmt_ast changes separate so its easier to adapt erlfmt upstream changes)
I didn't have time to test this with a large code base like OTP. If you could give me a few days before merging.
For example testing with an old version of OTP 21 I hit this issue erlang/otp#2348 (resulting in parsing files with map accoss to crash, that's bad). Turns out it was fixed in a later patch for OTP 21, but there might be other glitches that need to be worked around. Also making sure performance does not drop.

@gomoripeti gomoripeti force-pushed the erlfmt_phase2 branch 2 times, most recently from b79ada8 to 0daf0fd Compare April 20, 2021 00:24
- handle argumentless macro as type/opaque name
- fix spec/callback location
- support record_types in type context
- support bitstring_type in type context
- convert record field types in a type context
- convert spec fun name
- silence dialyzer about erl_syntax:function_type
Since OTP 24 (commit erlang/otp@f30514e#diff-9f3b412e8fbd809c6473fcd5bee30f63931c7349010640d76316cfcde504e733)
 `erl_syntax:set_pos` started to require a proper `erl_anno:anno()` instead of any term.

`erlfmt_ast` used to just copy the `erlfmt_scan:anno()` map from the input to
`syntaxTree()`, but that does not work any more. Now it is converted to an
`erl_anno:anno()` to make dialyzer happy. (There is a bit of cheating with
`end_location` in the hope that it will be a valid key in `erl_anno` soon.)
Use erlfmt instead of els_dodger for parsing. Uses modified version of
erlfmt_ast from PR WhatsApp/erlfmt#237 to convert from erlfmt parse tree to
erl_syntax:syntaxTree. Expected benefits are more precise locations and parsing
exotic macros.
Only store the range of the value in `define` POIs and take the text from the
original document. Formerly els_dodger used to create a text tree for definition
values it couldn't parse. However erlfmt can parse more cases and erlfmt_ast
currently converts some of them to "magic tuples". This change is a workaround
to avoid showing magic tuples.

An example
```
-define(M, A when A > 1).
```
would show
```
?M = {'*when*', A, A > 1}
```
Results from multiple diagnostics can arrive in arbitrary order.
@gomoripeti
Copy link
Contributor Author

gomoripeti commented Apr 22, 2021

Tested indexing all of OTP. All files are processed without any crash.

The new implementation is somewhat slower, indexing on select OTP applications (in seconds)
(this is most probably not because of erlfmt_parse, but later steps. It is also possible that more forms can be parsed which are just skipped by els_dodger)
els_dodger
0.989364 "common_test"
1.10195 "compiler"
2.65633 "dialyzer"
2.163496 "kernel"
4.819365 "stdlib"

erlfmt
1.177932 "common_test"
1.320494 "compiler"
3.129172 "dialyzer"
2.58872 "kernel"
6.771605 "stdlib"

I would say this is ready to be merged from my side.

@robertoaloi
Copy link
Member

Thanks for the analysis @gomoripeti !

Interesting results. Do you think we should proceed anyway? Performance degradation is minimal, but still...
We should also consider that the dodger is still in place, so this does not result in less code, either.
One thing we could check is how many POIs / Documents are identified with the two approaches to quantify the gain?

@gomoripeti
Copy link
Contributor Author

I will give it a bit more thinking/analyses.
(I haven't removed the modules but els_dodger and els_io_string is not used any more (els_dodger call deleted), also there is less fiddling with string length in els_range, so some code is eliminated)

I will gather some concrete benefit examples, to be able to judge if its worth it. Few things that pop in my mind right now:

  • range of unnecessarily quoted atoms will be correct (eg 'deep' won't be highlighted as 'dee)
  • it also allows to remove the fiddling and "fundamental flaw" in els_rename_provider

@gomoripeti
Copy link
Contributor Author

I tried to do a bit more profiling. While not full, I think

  • the els_parser part of the code is not slower.
  • erl_scan with text option (which is used by erlfmt to calculate end locations) is considerable slower than without the option compared to itself. This might not be the only reason of the new els_parser being slower but it's definitely a significant part. Even if we wouldn't use erlfmt_parse, we would need to use the text option to get proper end locations in one way or the other, so I think this is an acceptable price.

I tried to look at the diff in POIs

  • the count of POIs is not outstandingly different
  • lot of difference in actual range values
  • redundant atoms are created in different places (this makes comparison a bit hard)

the OTP test suites are good with edge cases that the old parser couldn't parse, but I think are rare in real, common code

  • concatenation of strings and macros, the whole function is missing with the old parser eg erl_eval_SUITE.erl
error_check("{" ?MODULE_STRING ",module_info}().",
            {badfun,{?MODULE,module_info}}),
  • complicated define with arg, c.erl
-define(RENDERABLE_FORMAT(Format),
        Format =:= ?NATIVE_FORMAT;
        binary_part(Format, 0, 5) =:= <<"text/">>).
  • old parser: missing folding range for two line functions with any char between dot and \n
f() ->
    ok. %% post comment

All in all, I think it's worth to include this change as it eliminates the need for some heuristics and assumptions

@robertoaloi
Copy link
Member

Sounds great. I guess we can follow up with @michalmuskala @alanz and the rest of the crew to see if we can get rid of the fork for erlfmt.

@robertoaloi robertoaloi merged commit fb4aada into erlang-ls:main May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants