Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream proxy feature #17

Closed
jerry-wolf opened this issue Jan 18, 2018 · 32 comments
Closed

Upstream proxy feature #17

jerry-wolf opened this issue Jan 18, 2018 · 32 comments

Comments

@jerry-wolf
Copy link

jerry-wolf commented Jan 18, 2018

Any plan to upstream proxy support?
Sometimes we want access another proxy to connect some host, but it's not secure and it might be blocked, use a secure & hidden proxy to connect is useful.

@sergeyfrolov
Copy link
Member

@l5h5t7 I am not sure if I completely understand your use-case. Is it:
(client) <------> (proxy server 1) <------> (proxy server 2) <------> (internet) such that everything between client and proxy server #2 is going to be encrypted.
If so, an easy way to achieve this setup will be server 1 being a transparent proxy and server 2 being a forward proxy. Both servers could be Caddy, with server 1 using proxy middleware. In such setup, you will also have to set up client to resolve the domain name of your forwardproxy to IP address of proxy server 1. Would that work for you?

If I did not understand you correctly, please describe setup in more details, including what exactly both proxy servers will be doing.

@jim3ma
Copy link
Contributor

jim3ma commented Jan 19, 2018

@sergeyfrolov
Here is a scenario.
The client want to browse site X, but both of client and proxy server 1 can not access X directly(blocked by gateway).
The client can only connect to proxy server 1, proxy server 1 can connect to server 2 and proxy server 2 can access any sites.
Now we need HTTPS proxy to encrypt any access data to X.
set up HTTPS proxy on both proxy server 1 and 2, set upstream proxy of server 1 to server 2.

all this can be done by proxy chain support.

@jerry-wolf
Copy link
Author

What @jim3ma said is exactly what I'm talking about.
This is another scenario: proxy page and do some modification or route different host to different proxy.
forwardproxy plugin needn't implement these features, upstream will do these. The forwardproxy plugin will focus on keep connection secure.
proxy middleware doesn't accept HTTP CONNECT and doesn't have probe resistance so I can't use it.

@mholt
Copy link
Member

mholt commented Jan 25, 2018

@sergeyfrolov and I have talked about this a little, but I have been too busy with other things to run any experiments myself. It might very well be possible without too much trouble, but we'd have to see if forwardproxy can be configured in conjunction with proxy; including perhaps moving the forwardproxy directive higher in the chain than proxy, if necessary.

If someone wants to look at this, it'd be good to start by setting up a proxy that forwards to some localhost forwardproxy and then step into the code and see where it breaks down. The simplest change would be preferred.

@sergeyfrolov
Copy link
Member

To clarify a bit, there's potentially following solution. Assuming that

  • forwardproxy can learn that proxy plugin is enabled
  • proxy can just proxy traffic as it is to the upstream
  • proxy is after forwardproxy in middleware chain (we can move it if need be)

...we can get forwardproxy things like auth and probe resistance, then just push the traffic down the middleware chain

@mholt
Copy link
Member

mholt commented Jan 25, 2018

Can a subdirective to forwardproxy be set to tell it that yes, we are using proxy, if it can't learn it by itself? That might simplify implementation a little (but complicate end use).

@sergeyfrolov
Copy link
Member

Yes, we can totally do it this way. Say we name the option passthrough, then the whole config would look something like

mywebsite.com {
  forwardproxy {
    passthrough
    // other configs
  }
  proxy / upstreamserver.com
}

I think it's fine to use some option explicitly, it also makes sure we don't inadvertently affect other uncommon setups.

@sergeyfrolov
Copy link
Member

@jim3ma @l5h5t7
Please confirm that this solution works for you. You can build forwardproxy with passthrough directive from branch named "passthrough"

@jim3ma
Copy link
Contributor

jim3ma commented Feb 4, 2018

@sergeyfrolov
It's very nice for "passthrough" feature. I think it's useful for #12

I think what @l5h5t7 need is a proxy chain, the traffic must forward by forwardproxy to upstream proxy

@sergeyfrolov
Copy link
Member

sergeyfrolov commented Feb 4, 2018

@jim3ma passthrough is introduced exactly to allow you to forward the traffic to upstream proxy, if combined with default proxy like I mentioned in #17 (comment)

@jim3ma
Copy link
Contributor

jim3ma commented Feb 5, 2018

@sergeyfrolov @mholt
I have tried the passthrough directive.
Unfortunately, the proxy did not support CONNECT method( it treats CONNECT as a common method like GET and did not hijack the underlay connection)
Currently, the proxy is a reverse proxy, not a forward proxy. But how about combined the forwardproxy and proxy?

@sergeyfrolov
Copy link
Member

sergeyfrolov commented Feb 5, 2018

@jim3ma
That's too bad. It would have been nice to chain them together and keep changes to forwardproxy minimal, which is one of our priorities. Apparently, proxy only hijacks the websocket connections and doesn't do transparent proxying in the same way squid does.
I'll close #20 to avoid overloading forwardproxy with features. We don't need it for #12, but we can always reopen the PR, if we find good use-cases for it in future.

@mholt
Copy link
Member

mholt commented Feb 5, 2018

I'd rather have smaller, independent middlewares that work together rather than making a big middleware that does everything; perhaps there is a (simple) change to proxy that could work.

Or, better yet, what if it was a straight-up TCP proxy? Caddy has one of those. 😎 https://caddyserver.com/docs/net

@jim3ma
Copy link
Contributor

jim3ma commented Feb 6, 2018

Implementing the upstream proxy requires to connect a proxy server such as socks5, http/https proxy.
I'm afraid the https://caddyserver.com/docs/net is not able to do this.

There are many proxy types:

  • http proxy
  • https proxy
  • socks4 proxy
  • socks5 proxy
  • ssh2 proxy

And we can also enable some optimizes like QUIC with proxy.
We can find a library or create a new one to do this.

@sergeyfrolov
Copy link
Member

If we put simple net plugin after forwardproxy, that should cover http proxy upstreaming.
If said plugin is able to establish tls connections, that should cover https proxy upstreaming.
I'd guess developers would be happy to add https(which is a trivial change: swap net.Dial for tls.Dial if Scheme == "https"), and might be open to ssh and socks targets. We can ask.

@cjhenck
Copy link

cjhenck commented Feb 6, 2018

Out of curiosity, how is the Caddy-as-client proxying being conducted? My understanding is that Go's built-in capacity for HTTP proxies doesn't support an arbitrary bytestream for HTTP CONNECT but is instead closely tied to an individual request.

E.g. if a go client opens an HTTPS connection to a remote server over a proxy, you can do:

Arbitrary Data over HTTPS over HTTP CONNECT over TLS

but not:

Arbitrary Data over HTTP CONNECT over TLS

It seems like the latter is what you'd need for this

@mholt
Copy link
Member

mholt commented Feb 7, 2018

@cjhenck I'm not sure I understand your question, sorry.

@l5h5t7 @jim3ma Wouldn't it be simpler to just use iptables or something to forward all the bytes on port 443 (or whatever) upstream to your proxy? Then you get probe resistance because the proxy acts exactly like the upstream.

@jim3ma
Copy link
Contributor

jim3ma commented Feb 7, 2018

@mholt It's about the protocol.
E.g. upstream is a http proxy, when client requests come in, caddy forwardproxy will connect to upstream proxy with CONNECT method and set correct domain or ip with port. Then upstream send the response with 200 status. Final all data bytes will be transferred one by one.
we can't just forward the data.

@mholt
Copy link
Member

mholt commented Feb 7, 2018

Okay. I'm still not sure I follow -- the iptables approach doesn't care about protocol. It should just forward the bytes along as if it wasn't even there. But what is the goal again? The OP says:

Sometimes we want access another proxy to connect some host, but it's not secure and it might be blocked, use a secure & hidden proxy to connect is useful.

So it looks like the goal is indirection. If one proxy is inaccessible, and you have to proxy through a proxy to get to another proxy, why not just configure that "middle" proxy to be "the" (only) proxy? Like, why do you need two?

You'll have to forgive me, I don't personally have much experience with needing to use proxies that get blocked, but I do want to understand this thoroughly as possible, especially since I'm speaking about it at Internet Freedom Festival in less than a month!

@billryan
Copy link

billryan commented Feb 8, 2018

@mholt I think the most common scenario which needs this feature is surfing internet under China GFW. In China, if you want to access Google, Facebook and Twitter, you need other VPN or proxy.

The upstream feature sounds like squids parent proxy or nghttp2 proxy with backend. The squid ACL and parent proxy support is very powerful, HTTP/2 with TLS is easy with caddy. If we can chain these two features in caddy, it would be great.

Currently, nghttp2 + squid or cow can archive the upstream feature without caddy.

Compared to proxy method, you can set iptables up in router or local machine, but it is a little difficult to set up for most people, proxy method is much easier and everyone can use it in browser.

@jim3ma
Copy link
Contributor

jim3ma commented Feb 8, 2018

One case:

I run a caddy with forwardproxy in my mac for debugging some web site or ip such as google.com in China mainland. While my mac cannot connect with google.com due to GFW. I need upstream proxy to enable connective with google.com

Another case:

data Link optimization

    low speed
A <----------> X
    low speed 
A <---------> C(with caddy)
    high speed 
C <---------> X
    high speed 
B <---------> C

    high speed               low speed
A <---------> B(with caddy) <-----------> X

    high speed               high speed                            high speed
A <---------> B(with caddy) <-----------> C(with upstream caddy) <------------> X 

@bash99
Copy link

bash99 commented Jun 4, 2018

some additional info about @billryan 's cow mode.
browser -> cow -> caddy(forward proxy) -> internet
vs
browser -> caddy(forward proxy) -> internet

In the later mode, we enjoy full http/2 support, you'll find very few connections to caddy server. (Can test with netstat -an)
But with cow in middle, you'll found cow established many connections to caddy, one for each request.

I hope the new feature can also have full http/2 support.

@jeddytier4
Copy link

I use Caddy as a forward proxy specifically for the http/2 support. Also, squid and others tend to spawn separate processes for each request. This is untenable in today's https environment with some sites requiring connections to 3-5 other sites just to show a page. Performance requires a big beefy box, but with Caddy, I'm able to run the same amount of connections on a much smaller platform.

As for the need for an upstream directive, a considerable number of corporate environments run a proxy for filtering sites, and do not allow outgoing connections from anything other than the proxy. With an upstream in this plugin this would allow testing and debugging behind any proxy.

@mholt
Copy link
Member

mholt commented Jun 8, 2018

Okay, I can see why this would be useful. I don't have time to do it, unfortunately. Maybe somebody else does.

@jeddytier4
Copy link

I have an existing pr in for a different function of forwardproxy, but @jim3ma was very close to having this solved. Once I clear that pr, I can fork his changes and integrate them.

@sergeyfrolov
Copy link
Member

Alright. It seems like both #22 and #26 do the same thing, but #26 also supports multiple upstream IPs. I like #22 more than #26, since it adds ~6 times less lines of code, and is, as a result, cleaner. We certainly don't aim to be a cutting edge squid replacement, which may lag behind on http/2 support, but it's on their roadmap.

That being said, we can add multiple upstream IPs, as long as the code is small and easily testable (I'd like to have at least a very simple sanity check for everything). @jeddytier4 is there a compelling reason to have so many policies? It would be substantially neater if we had just one. Do you have any suggestions on which policy would be best? Hashing IPs?

@jeddytier4
Copy link

@sergeyfrolov #26 changes the outgoing IP address on the host, instead of letting the local system choose the best route based on the target IP, it does not allow for an upstream proxy request. I will trim the policies down to ip_hash and random, as these are the only 2 that I have seen any benefit from using. Any other suggestions for cutting the code back would be greatly appreciated.

@sergeyfrolov
Copy link
Member

sergeyfrolov commented Jun 9, 2018

My bad, got confused and thought #26 is your fork of #22, and didn't look too much into it. I checked it out by now, and left some comments: let's continue discussion of #26 in PR.
Regarding #22 : I cleaned it up a bit, added tests, and resubmitted as #27. I want to get another pair of eyes to look at it, but it should be good to go soon.

@sergeyfrolov
Copy link
Member

We are trying to make decision about which schemes are we going to support.
Current PR has support for https, http and socks5 upstreams, but it concerns me how horribly insecure the latter two are: both http and socks5 leak everything in plaintext including all requests made and credentials to upstream proxy. They also allow any adversary on the network to redirect your requests, inject arbitrary responses and do other sorts of Man-in-The-Middle attacks. To reiterate, it's a bad idea to use unencrypted http or socks5 in general, and even more so for GFW jumping.
Because of that, I'd like to disable insecure schemes. But before I do that, I'd like to consult with the users: do you have any good use-cases, for which you'd want to use insecure schemes, and absolutely cannot spin up https upstream? I don't want to do any harm to people who may use those schemes without realizing how bad they are, and I'd rather not provide a gun for people to shoot themselves in the foot.
We will merge after this discussion settles.

cc: @l5h5t7 @jim3ma @billryan @bash99 @jeddytier4
If https works for your use-case well, you can just give this comment a 👍

@billryan
Copy link

billryan commented Jun 15, 2018

http traffic is insecure and should not be used as upstream proxy through public network, while for private network or local server, it is a nice feature to support insecure scheme such as http/socks.

For my use case,

https proxy request <--(public network)--> Caddy <--(private network)--> upstream proxy(insecure)

Currently I can use caddy(compiled with jim3ma's fork) as https(HTTP/2 and HTTP 1.1) proxy to use in public network and forward proxy request to upstream proxy(in the same server with caddy).

If forwardproxy supports https upstream only, then we have to secure http proxy with TLS first. Without caddy, we have to setup TLS and secure related settings which caddy does this really well.
On the other hand, https upstream would impact performance in private network environment.

@sergeyfrolov
Copy link
Member

sergeyfrolov commented Jun 15, 2018

@billryan I agree with your point on localhost upstreaming: it allows insecure schemes.
As for private network, you should be able to tunnel it through localhost:port with either an explicit tunnel or a simple IP table rule.

@sergeyfrolov
Copy link
Member

Implemented in #27.
Supported schemes to remote host: https.
Supported schemes to localhost: socks5, http, https(certificate check is ignored).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants