The right approach to use slang as a formatter #828
-
Hi there ! I am currently starting to work on a SystemVerilog source formatter tool (alongside a language server) using slang.
I am also allowing myself to move around bits of code to make it easier on my eyes (one statement per line, in example) and easier to implement. My approach is to use a custom visitor (class derived from My problem is that it seems quite complicated to handle comments, empty lines and anything that is not directly represented by a token. (the trivia, I guess ?) While I am not surprised by the tediousness of the job, I am wondering if I am going down the right path or if someone did such thing with a more clever approach that I am not aware of. Best regards ! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 13 replies
-
The SyntaxRewriter class (which inherits from SyntaxVisitor) might be helpful here, but I agree that it's probably going to be some work to make it less tedious. Note that you don't necessarily have to write a handler for every type of syntax node; you can write handlers for the base classes (like DataTypeSyntax) and that will be called for all types of nodes that don't have more specific handlers. If I were starting this project I would look at how clang-format works, and also probably how some of the C# auto-formatter tools work (slang's parser design is inspired by clang and the Roslyn C# compiler). If you want to add helpers / utilities / make upstream changes to make the job of writing the formatter easier I'm happy to take PRs. |
Beta Was this translation helpful? Give feedback.
-
Hello again, For reference, after fiddling around quite a bit, I found the following code structure to perform fairly well for my goal (Here, attaching an end of line commend to a port declaration) in a module declaration such as module instruction_counter #(
parameter K_ADWIDTH = 855, //! Address width
parameter int K_DWIDTH_IS_VERY_VERY_LONG = 8 + 35, K_TOTO = 5
) (
input logic i_clk , //! Input clock
input logic i_rstn , //! Reset, active low
input logic i_incr_en , //! Increment enable
output logic [K_ADWIDTH-1:0] o_addr , //! Output address
input logic i_set_addr, //! Override counter value
input logic i_jump , //! Jump one address
input logic [K_ADWIDTH-1:0] i_addr //! Address to set the counter
); I came up with something along the lines of void VisitorModuleBlackBox::handle(const ImplicitAnsiPortSyntax& port)
{
// bb.ports hold the list of ports of the module I'm analyzing
if(bb.ports.size() > 0)
{
for(const slang::parsing::Trivia t : port.header->getFirstToken().trivia())
{
// Assuming that I will only have one comment, could be refined.
// in particular by checking the line number with the port declaration.
if(t.getRawText().starts_with("//"))
{
bb.ports.back().comment = t.getRawText();
break;
}
}
}
// Do quite a bit of other things...
} I'm starting to get around mixing the visitor strategy (with handles) and manually descending in interesting nodes manually. |
Beta Was this translation helpful? Give feedback.
-
So, after tackling several point on another project of mine, I came back to have a try at this whole formatter thing. The issue with my previous approach is the risk of losing elements during the conversion. My idea would have been to just edit the nodes/tokens/trivia required to perform the wanted changes (In example, add a whitespace trivia before a specified token to perform indentation). Would there be a possibility to:
This way, I could be sure that no information is lost without my knowledge. |
Beta Was this translation helpful? Give feedback.
You have the right idea. It's going to be clunky because the memory management is important, but once you have it figured out you can wrap it all up in some nice helper methods and make it look pretty for your actual rewriter logic so I wouldn't worry about it too much.
Some thoughts: if the string you're adding is an actual string literal (in this case " !=") then you don't need to allocate memory for it; string literals live in a fixed location in memory for the lifetime of the program so you can just pass a pointer to it without issue. You only need the alloc and copy for dynamically constructed strings.
Also you need to be careful about this line: