-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize generated code for size, deduplication, quality #256
Comments
P.s. if you're accepting some clean up PR's here. I got pretty familiar with the generator code, but very much unfamiliar with your branching strategies. Which branch should I base a PR on if you accept them? |
Here's my 2 cents, just as someone who was historically involved, not the primary maintainer: There's a tradeoff to be made here. Making the generated code cleaner is nice, but it will likely require making the generator more complicated. I think that's usually the wrong direction to go, since humans work on the generator, but they don't generally work on the generated code. Redundant package aliases are a good example - we do Code duplication affects the source code size for sure, but that doesn't seem very consequential. If we're talking about kilobytes of source code, that's basically free; adding abstraction and complexity to the generator probably doesn't pull its weight unless there's good evidence that it helps a lot with the size of the compiled binary, and even then I'm skeptical. Some of your points are fair though, in my opinion. There's no real reason we need to make a new My feeling is that cosmetic improvements to the generated code should be pretty much the lowest priority possible for the Twirp project. Code generators are like compilers - who cares whether machine code is pleasant, we just want it to be correct and fast; we do care that the compiler itself is well-written and maintained though. |
Thank you for a thought out response, with which I mostly agree. The caveat is obviously that the generated code doesn't pass some smell tests, out of which the cosmetic issues are the least of the worries. Often some changes would result in various unseen behaviour at development time. For example, when deduplicating the code, I ended up with the following http handler code: func (s *uploadServiceServer) serveFilePut(resp http.ResponseWriter, req *http.Request) {
ctx := uploadServiceContext(req.Context(), "FilePut")
reqContent := new(FilePutRequest)
var respContent *FilePutResponse
call := func() (err error) {
respContent, err = s.UploadService.FilePut(ctx, reqContent)
return
}
var err error
switch uploadServiceContentType(req) {
case "application/json":
err = s.processRequestJSON(ctx, resp, req, reqContent, call, respContent)
case "application/protobuf":
err = s.processRequestProtobuf(ctx, resp, req, reqContent, call, respContent)
default:
msg := fmt.Sprintf("unexpected Content-Type: %q", req.Header.Get("Content-Type"))
err = badRouteError(msg, req.Method, req.URL.Path)
}
if err != nil {
s.writeError(ctx, resp, err)
}
} There are some obvious improvements:
I assume this has little bearing on the compiler, as all it needs to produce is a few utility functions (uploadServiceContext, uploadServiceContentType) and the generic processRequest methods (equivalent to the clients From what I saw, these changes can be made quickly, and should not affect the general quality or even the structure of the compiler itself. We can hopefully agree, that in the particular case of improvements 2 and 3, the generated code is actually becoming correct with the suggested changes. Thank you. |
Thanks for the concrete example. To me, I don't think it's obvious that this is an improvement. It's certainly more abstract and has less duplication, but that's not always better; switching the code to use a closure with a free variable referencing the response object seems clearly more complex. It will be harder to statically analyze for escape analysis by the Go compiler, which would negatively impact performance versus the simple, straight-line behavior we have today, since it would likely require a dynamic heap allocation. I'd guess it's a very minor impact for most services, though. My main point is that I think it's not clear this code is better, and I prefer less abstraction I think in generated code.
It lets
I agree that this is probably wrong, but it's much easier to fix than this. We should just call
I don't understand why this is better, can you explain how this benefits Twirp's users?
I think I agree with 2, but that's a much simpler change. I don't agree that 3 changes correctness in any way. Most of this hinges on how it affects the code generator. If you think it improves the code generator's quality, I think that's a much more persuasive argument. |
My 2 cents: Optimize generated code for:
Besides the opinions stated on issues like this one, I don't believe anything is going to change in Twirp unless there are specific pull requests addressing specific, measurable changes. In the other hand, issues like this one are a great way to get a sense of the values from the maintainers. For reference, the Contributing Guidelines also have important values stated in the section for Twirp Design Principles. |
Consistency is important, unless you count code generated assets as completely black-box. It's clear that the current codebase isn't consistent in terms how the client is structured, or how the RPC service itself is, or even how the errors are handled (some errors are given an incomplete context with missing method name). Ideally, the code should be human-readable and also reduce ambiguity (6 writeError calls for what should have been 1). Of course none of this is defined as a guiding value in the design principles. I'm happy for the work you have done, but it seems that I have to take a better look at gRPC. The current opinions stated in this tread read as "we don't want to do maintenance work, twirp isn't a priority" and that's fine (even if I was offering PRs), but it's also discouraging given specific examples and issues. We're allowed to disagree, right? |
As an exercise, I started with one of my smaller RPC projects, to see if I can optimize some low hanging fruit.
This optimisation is basically in line with what is done for
doProtobufRequest
anddoJSONRequest
. That's a 18% saving for size/SLOC. When taking into account that the Utils code is 22K, that's about 39% savings in size. My largest project to date has 125KB, which seems could be reduced to about 80KB with little effort.There is a set of issues with the generated code that I attempted to fix and/or remain:
import io "io"
), cosmetic but against good practices,There's a lot of small inconsistencies which annoy me, which can ultimately, as with everything else, be argued as cosmetic. In my opinion, improving the generated code quality to be comparable to handwritten code would make Twirp better.
The text was updated successfully, but these errors were encountered: