Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize generated code for size, deduplication, quality #256

Closed
titpetric opened this issue Jul 30, 2020 · 6 comments
Closed

Optimize generated code for size, deduplication, quality #256

titpetric opened this issue Jul 30, 2020 · 6 comments

Comments

@titpetric
Copy link
Contributor

titpetric commented Jul 30, 2020

As an exercise, I started with one of my smaller RPC projects, to see if I can optimize some low hanging fruit.

  • size: 42862 bytes, 1243 SLOC my starting point with a RPC api with 3 calls,
  • size: 41848 bytes, 1216 SLOC deduplicate ctxsetters calls, move error cast to callClientError, use prefix const in ServeHTTP)
  • size: 42083 bytes, 1224 SLOC wrote a processRequestJSON from one of the RPC calls
  • size: 39491 bytes, 1146 SLOC updated all JSON server RPCs to use processRequestJSON
  • size: 37152 bytes, 1072 SLOC after doing the same with processRequestProtobuf
  • size: 35160 bytes, 1010 SLOC after basically removing all *JSON and *Protobuf into main handler per RPC,

This optimisation is basically in line with what is done for doProtobufRequest and doJSONRequest. That's a 18% saving for size/SLOC. When taking into account that the Utils code is 22K, that's about 39% savings in size. My largest project to date has 125KB, which seems could be reduced to about 80KB with little effort.

There is a set of issues with the generated code that I attempted to fix and/or remain:

  • lack of const definition for repeating values like header matching, twirp version (this hurts git diffs as well),
  • redundant package import aliases (import io "io"), cosmetic but against good practices,
  • code duplication is a big issue, as shown with some numbers/examples above,
  • error handling looks painful - not only don't errors bubble up the stack, they are getting written with different contexts
  • ServeHTTP doesn't use the URL Prefix constant declared,
  • Setting a response writer to the context, which is never used/read (dead code)
  • ctxsetter method name not available in serveFileGet, but set in serveFileGetJSON, serveFileGetProtobuf (duplicate)
  • Jsonpb Unmarshaller/Marshaller created on each request and not part of service server (avoidable GC pressure)?

There's a lot of small inconsistencies which annoy me, which can ultimately, as with everything else, be argued as cosmetic. In my opinion, improving the generated code quality to be comparable to handwritten code would make Twirp better.

@titpetric
Copy link
Contributor Author

P.s. if you're accepting some clean up PR's here. I got pretty familiar with the generator code, but very much unfamiliar with your branching strategies. Which branch should I base a PR on if you accept them?

@spenczar
Copy link
Contributor

Here's my 2 cents, just as someone who was historically involved, not the primary maintainer:

There's a tradeoff to be made here. Making the generated code cleaner is nice, but it will likely require making the generator more complicated. I think that's usually the wrong direction to go, since humans work on the generator, but they don't generally work on the generated code.

Redundant package aliases are a good example - we do import io "io" to make sure import naming is correct; it helps avoid bugs when a proto package has a name like io. We could do more sophisticated tracking in the generator... but I'm not sure that's better. The code generator is already pretty gnarly and complicated, and adding more to it has real cost to improving Twirp. On the other hand, stuff like redundant package import aliases really have no effect on users in practice, since the generated code isn't going to be modified by users anyway. It has the standard comment prefix, as described in golang/go#13560, so it should be skipped by linters.

Code duplication affects the source code size for sure, but that doesn't seem very consequential. If we're talking about kilobytes of source code, that's basically free; adding abstraction and complexity to the generator probably doesn't pull its weight unless there's good evidence that it helps a lot with the size of the compiled binary, and even then I'm skeptical.

Some of your points are fair though, in my opinion. There's no real reason we need to make a new jsonpb.Marshaler on every JSON request. It makes sense to move setting the method name on contexts into the general serve<Method> method, too. If there's true dead code we should eliminate it; note that we put the ResponseWriter into the context to support the SetHTTPResponseHeader method (see docs).

My feeling is that cosmetic improvements to the generated code should be pretty much the lowest priority possible for the Twirp project. Code generators are like compilers - who cares whether machine code is pleasant, we just want it to be correct and fast; we do care that the compiler itself is well-written and maintained though.

@titpetric
Copy link
Contributor Author

titpetric commented Jul 30, 2020

Thank you for a thought out response, with which I mostly agree. The caveat is obviously that the generated code doesn't pass some smell tests, out of which the cosmetic issues are the least of the worries. Often some changes would result in various unseen behaviour at development time. For example, when deduplicating the code, I ended up with the following http handler code:

func (s *uploadServiceServer) serveFilePut(resp http.ResponseWriter, req *http.Request) {
        ctx := uploadServiceContext(req.Context(), "FilePut")

        reqContent := new(FilePutRequest)
        var respContent *FilePutResponse

        call := func() (err error) {
                respContent, err = s.UploadService.FilePut(ctx, reqContent)
                return
        }

        var err error
        switch uploadServiceContentType(req) {
        case "application/json":
                err = s.processRequestJSON(ctx, resp, req, reqContent, call, respContent)
        case "application/protobuf":
                err = s.processRequestProtobuf(ctx, resp, req, reqContent, call, respContent)
        default:
                msg := fmt.Sprintf("unexpected Content-Type: %q", req.Header.Get("Content-Type"))
                err = badRouteError(msg, req.Method, req.URL.Path)
        }
        if err != nil {
                s.writeError(ctx, resp, err)
        }
}

There are some obvious improvements:

  1. satisfies a http.HandlerFunc so a person can remap the function into their own mux path (feature),
  2. context is set consistently (before: method name was set in leaf JSON and Protobuf functions, so writeError didn't have methodname here) - IMO this may actually fix a bug depending on what writeError does with ctx,
  3. all the errors which can possibly be created are passed into a single s.writeError call (before? about 6?)

I assume this has little bearing on the compiler, as all it needs to produce is a few utility functions (uploadServiceContext, uploadServiceContentType) and the generic processRequest methods (equivalent to the clients do<Type>Request). It does take care of about 80% of the concerns, with, hopefully 20% of effort.

From what I saw, these changes can be made quickly, and should not affect the general quality or even the structure of the compiler itself. We can hopefully agree, that in the particular case of improvements 2 and 3, the generated code is actually becoming correct with the suggested changes.

Thank you.

@spenczar
Copy link
Contributor

Thanks for the concrete example. To me, I don't think it's obvious that this is an improvement. It's certainly more abstract and has less duplication, but that's not always better; switching the code to use a closure with a free variable referencing the response object seems clearly more complex. It will be harder to statically analyze for escape analysis by the Go compiler, which would negatively impact performance versus the simple, straight-line behavior we have today, since it would likely require a dynamic heap allocation. I'd guess it's a very minor impact for most services, though. My main point is that I think it's not clear this code is better, and I prefer less abstraction I think in generated code.

satisfies a http.HandlerFunc so a person can remap the function into their own mux path (feature),

It lets serve<Method> be acceptable as an input to http.HandlerFunc, it is true... but that only matters if the method is exported, changing it to Serve<Method>. That is a much more dramatic change. The only way to access that method would be for a user to make a type assertion onto an interface they define, since a TwirpServer (the return value of the server constructor) doesn't have that exported method in its method set. I think the resulting API would be unpleasant and brittle and I don't think it would be an improvement, but we can discuss that more on a separate issue if you like.

context is set consistently (before: method name was set in leaf JSON and Protobuf functions, so writeError didn't have methodname here) - IMO this may actually fix a bug depending on what writeError does with ctx,

I agree that this is probably wrong, but it's much easier to fix than this. We should just call ctx = ctxsetters.WithMethodName(ctx, "MakeHat") at the top of serveMakeHat. I don't think the rest of the changes are necessary.

all the errors which can possibly be created are passed into a single s.writeError call (before? about 6?)

I don't understand why this is better, can you explain how this benefits Twirp's users?

We can hopefully agree, that in the particular case of improvements 2 and 3, the generated code is actually becoming correct with the suggested changes.

I think I agree with 2, but that's a much simpler change. I don't agree that 3 changes correctness in any way.

Most of this hinges on how it affects the code generator. If you think it improves the code generator's quality, I think that's a much more persuasive argument.

@marioizquierdo
Copy link
Contributor

My 2 cents:

Optimize generated code for:

  • size: Not imporant. While there's a balance to strike, Twirp is too small to care about this.
  • deduplication: Somewhat useful, but not enough to be a "priority". I believe there's value on having readable generated code, because it is easier to understand the generated code than the code generator. But at the end of the day, the code generator is the one that really needs to be readable and maintainable.
  • quality: Always important ... but it depends on how you define quality 😄

Besides the opinions stated on issues like this one, I don't believe anything is going to change in Twirp unless there are specific pull requests addressing specific, measurable changes. In the other hand, issues like this one are a great way to get a sense of the values from the maintainers. For reference, the Contributing Guidelines also have important values stated in the section for Twirp Design Principles.

@titpetric
Copy link
Contributor Author

Consistency is important, unless you count code generated assets as completely black-box. It's clear that the current codebase isn't consistent in terms how the client is structured, or how the RPC service itself is, or even how the errors are handled (some errors are given an incomplete context with missing method name). Ideally, the code should be human-readable and also reduce ambiguity (6 writeError calls for what should have been 1). Of course none of this is defined as a guiding value in the design principles.

I'm happy for the work you have done, but it seems that I have to take a better look at gRPC. The current opinions stated in this tread read as "we don't want to do maintenance work, twirp isn't a priority" and that's fine (even if I was offering PRs), but it's also discouraging given specific examples and issues. We're allowed to disagree, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants