Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] Implement auth provider for GitHub #128

Closed
4 tasks done
vit-zikmund opened this issue Jan 4, 2024 · 11 comments
Closed
4 tasks done

[enhancement] Implement auth provider for GitHub #128

vit-zikmund opened this issue Jan 4, 2024 · 11 comments

Comments

@vit-zikmund
Copy link
Contributor

vit-zikmund commented Jan 4, 2024

A pretty straightforward use case for giftless is to divert the LFS from a common github repo to one's own storage. But that storage suddenly needs to deal with authentication and authorization, most likely a bit more capable than the crude "Dummy authentication module" and a bit less demanding on one's infrastructure than the otherwise inspiring JWT module.

A nice and readily deployable midpoint might be to tap the existing github access management and allow giftless access with the same creds normally used for github.
One of the contemporary ways of logging into github is via HTTPS, where the same login creds (personal access token) can be also used to access the github REST/graphql API. It's then possible to verify if holder of such token is eligible to read/write to a particular repository. This information can be easily leveraged to provide/restrict access to the "same" path (org/repo) on the LFS storage.

This has been apparently already proved to work in a couple PoCs I know (#121 - accidental kickoff 😇, still too raw, lsst-sqre/giftless-github-proxy-auth - more mature, but abandoned).

There are yet unexplored limitations for using this with SSH, but testing out how github plays along with the git-lfs-authenticate protocol is surely worth a shot.

I'd also like to explore/discuss some potential solutions for hiding the two different hostnames (github.com & your.private.giftless.com) under one hood (hostname), so any git credential helper would offer the same creds one is already using for github for both the git and git-lfs operations. This should be possible thanks to the automatic LFS server discovery (which works pretty much by appending .git/info/lfs to the usual git url). Such solution likely requires some kind of reverse proxy routing (thinking of an nginx/envoy/haproxy sidecar). One downside is that this will drag (otherwise readily available) files from github through the proxy (those transfers should be small, though). If this (the credentials sharing w/o a proxy) could be done with some git config, that might be better, but I don't know about any such thing.

Otherwise one is likely supposed to override the LFS URL with git config remote.origin.lfsurl https://your.private.giftless.com/org/repo or such and provide the same HTTPS credentials twice.

Maybe there are other options I missed? Thanks for your help!

Tasks

@rufuspollock
Copy link
Member

A pretty straightforward use case for giftless is to divert the LFS from a common github repo to one's own storage. But that storage suddenly needs to deal with authentication and authorization, most likely a bit more capable than the crude "Dummy authentication module" and a bit less demanding on one's infrastructure than the otherwise inspiring JWT module.

👍👍👍

There are yet unexplored limitations for using this with SSH, but testing out how github plays along with the git-lfs-authenticate protocol is surely worth a shot.

Big 👍

I'd also like to explore/discuss some potential solutions for hiding the two different hostnames (github.com & your.private.giftless.com) under one hood (hostname), so any git credential helper would offer the same creds one is already using for github for both the git and git-lfs operations. This should be possible thanks to the automatic LFS server discovery (which works pretty much by appending .git/info/lfs to the usual git url). Such solution likely requires some kind of reverse proxy routing (thinking of an nginx/envoy/haproxy sidecar). One downside is that this will drag (otherwise readily available) files from github through the proxy (those transfers should be small, though). If this (the credentials sharing w/o a proxy) could be done with some git config, that might be better, but I don't know about any such thing.

I don't have a lot of thoughts here yet. However, i do think getting some kind of PoC working - perhaps a combination of a PR to actual code and an example in a (new) examples/ directory could work well.

@vit-zikmund
Copy link
Contributor Author

vit-zikmund commented Jan 11, 2024

I'd also like to explore/discuss some potential solutions for hiding the two different hostnames (github.com & your.private.giftless.com) under one hood (hostname), so any git credential helper would offer the same creds one is already using for github for both the git and git-lfs operations.

Today, I actually managed to bring up a PoC that uses properly configured kubernetes objects that acts as a single server! It was quite some trial-and-error, but it is done!

The core part is an Ingress, with the kubernetes/ingress-nginx on the back. Some URL rewriting was necessary, which is configured via the ingress-specific annotations:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: giftless
  labels:
    app.kubernetes.io/name: giftless
    app.kubernetes.io/instance: giftless
  annotations:
    # automatic TLS provisioning
    #cert-manager.io/cluster-issuer: letsencrypt 
    # forwarding to github requires TLS (that means giftless uwsgi also needs to use TLS)
    nginx.ingress.kubernetes.io/backend-protocol: HTTPS
    # arbitrarily higher limit than usual
    nginx.ingress.kubernetes.io/proxy-body-size: 5g
    # pretending the caller targets github.com
    nginx.ingress.kubernetes.io/upstream-vhost: github.com
    # needed for path matching and rewriting
    nginx.ingress.kubernetes.io/use-regex: "true"
    # strip the '.git/info/lfs' part from the default LFS URI to please giftless demands
    nginx.ingress.kubernetes.io/rewrite-target: /$1$2
spec:
  ingressClassName: nginx
  #tls:
  #  - hosts:
  #      - "git.example.com"
  #    secretName: git.example.com-tls
  rules:
    - host: "git.example.com"
      http:
        paths:
          # only this path routes to the giftless service
          - path: /(.+)\.git/info/lfs(/.*|$)
            pathType: ImplementationSpecific
            backend:
              service:
                name: giftless
                port:
                  number: 443
          # anything else goes to an "externalName" service effectively routing to github.com
          - path: /(.+|$)
            pathType: ImplementationSpecific
            backend:
              service:
                name: giftless-github-forward
                port:
                  number: 443

The remaining cunning part is the externalName service:

apiVersion: v1
kind: Service
metadata:
  name: giftless-github-forward
  labels:
    app.kubernetes.io/name: giftless
    app.kubernetes.io/instance: giftless
spec:
  type: ExternalName
  externalName: github.com

While this happened entirely outside of giftless, I definitely spotted a room for improvement 1️⃣, which is - making giftless handle the URI in the default git-lfs format /<org>/<repo>[/<potentially-anything-else>].git/info/lfs, instead of the current /<org>/<repo>. This would allow me to ditch the URI rewriting at the reverse-proxy.

I'm also glad that giftless is not sensitive to the Host header, as that says github.com now 😄

This setup indeed works and I can confirm the commandline git client only asks for the creds once. Along with the auth provider my dear colleague @paluyana wrote (and leaked in #121 😉), this identified the user and properly admitted or rejected the user per their repo access rights.

Once the token is saved in one's creds helper the user experience is so far completely transparent and no LFS config needs to be done. Even the lfs.locksverify = false adds itself to the config after the first failed git push:

$ git clone https:/git.example.com/$GITHUB_ORG/$GITHUB_REPO
Cloning into 'github_repo'...
remote: Enumerating objects: 66, done.
remote: Counting objects: 100% (66/66), done.
remote: Compressing objects: 100% (55/55), done.
remote: Total 66 (delta 10), reused 65 (delta 9), pack-reused 0
Receiving objects: 100% (66/66), 6.64 KiB | 2.21 MiB/s, done.
Resolving deltas: 100% (10/10), done.
Filtering content: 100% (2/2), 2.00 MiB | 856.00 KiB/s, done. <<< here come the LFS files

I noticed that giftless returned 401 Unauthorized when the user's Github repo access was revoked, so the git client kept asking the user for creds forever. Any clue what she did wrong? 2️⃣ I'd suppose giftless to return 403 Forbidden in that case, but I also didn't yet rule out it's how giftless works at the moment 🤔

@rufuspollock
Copy link
Member

@vit-zikmund wow, great progress 👏👏

I noticed that giftless returned 401 Unauthorized when the user's Github repo access was revoked, so the git client kept asking the user for creds forever. Any clue what she did wrong? 2️⃣ I'd suppose giftless to return 403 Forbidden in that case, but I also didn't yet rule out it's how giftless works at the moment 🤔

This is potentially something to change inside of Giftless if we can identify the relevant part.

@athornton
Copy link
Collaborator

Oooh, that is elegant!

@vit-zikmund
Copy link
Contributor Author

Thanks! 😉 But I suppose that solution has a little trouble I found today. Quoting myself...

I'm also glad that giftless is not sensitive to the Host header, as that says github.com now 😄

While using the basic_streaming transfer adapter (my storage backend is on a private network), the file URLs in the response are inferred from the Host header, which is pretty cunning, but that means the files are supposed to be provided on http://<Host>:<port>/<org>/<repo>/objects/storage/<oid> 😮‍💨

One can set the SERVER_NAME config option for giftless/flask, (which needs to be mentioned in the default config first, for one to be able to override it). This binds the server to that hostname, though, and rejects all requests having a different Host header. So this doesn't help either. Catch 22 😵‍💫

All in all, the ingress/proxy routing needs better rules for Host header overwriting that overwrites really only the traffic heading towards [api.]github.com. Alas, this is not nicely doable with the kubernetes means I used before, so I suppose we're about to witness a standalone reverse proxy example @rufuspollock suggested earlier.

@vit-zikmund
Copy link
Contributor Author

vit-zikmund commented Mar 11, 2024

OK, here's a second take with an Envoy proxy sidecar. The config is a little verbose, but it does exactly the same as the thing above, plus it's less of a routing spaghetti hell.

It should also support the git-credential-manager, which is clever enough to detect the backend is GitHub and offers a couple GitHub-specific login options 👍

There's also no URI rewrite for giftless, as it's now supposed to support the native endpoints.

envoy.yaml
static_resources:
  listeners:
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 80  # proxy port
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
              suppress_envoy_headers: true
          access_log:
          - name: envoy.access_loggers.file
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
              path: /dev/stdout
          generate_request_id: false
          preserve_external_request_id: true
          route_config:
            name: ingress_route
            virtual_hosts:
            - name: gitlfs
              domains:
              - "*"
              routes:
              - name: local_gitlfs
                # Only this goes to the gitlfs service
                match:
                  safe_regex:
                    regex: (?:/[^/]+){2,}\.git/info/lfs(?:/.*|$)
                route:
                  timeout: 0s  # don't break long-running downloads
                  cluster: local_gitlfs
              - name: api_github_com
                # Routing 3rd party tools assuming this is a GitHub Enterprise URL /api/v#/X to public api.github.com/X
                match:
                  safe_regex: &api_regex
                    regex: /api/v\d(?:/(.*)|$)
                route:
                  regex_rewrite:
                    pattern: *api_regex
                    substitution: /\1
                  host_rewrite_literal: api.github.com
                  cluster: api_github_com
                request_headers_to_remove:
                  - x-forwarded-proto
              - name: github_com
                # Anything else is forwarded directly to GitHub
                match:
                  prefix: "/"
                route:
                  host_rewrite_literal: github.com
                  cluster: github_com
                request_headers_to_remove:
                  - x-forwarded-proto
  clusters:
  - name: local_gitlfs
    connect_timeout: 0.25s
    type: strict_dns
    lb_policy: round_robin
    load_assignment:
      cluster_name: local_gitlfs
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 8080  # local giftless port
  - name: api_github_com
    type: logical_dns
    # Comment out the following line to test on v6 networks
    dns_lookup_family: v4_only
    load_assignment:
      cluster_name: api_github_com
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: api.github.com
                port_value: 443
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        sni: api.github.com
  - name: github_com
    type: logical_dns
    # Comment out the following line to test on v6 networks
    dns_lookup_family: v4_only
    load_assignment:
      cluster_name: github_com
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: github.com
                port_value: 443
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        sni: github.com

So far only tested locally with a fake backend, but the idea is there. Also, Envoy's selective backend TLS allows us not to bake TLS into giftless, which surely is a PITA inside an otherwise TLS-terminated environment.

Testing setup:

backend.sh:

#!/bin/bash
set -eux
BACKEND_NAME=echo-backend
exec docker run --rm -it --name "$BACKEND_NAME" --publish 8888:80 mendhak/http-https-echo:31

proxy.sh:

#!/bin/bash
set -eux
PROXY_NAME=gh-proxy
BACKEND_NAME=echo-backend
exec docker run --rm -it \
  --name "$PROXY_NAME" \
  --network "container:$BACKEND_NAME" \
  --volume "$PWD/envoy.yaml:/etc/envoy/envoy.yaml" \
  envoyproxy/envoy:v1.29-latest /usr/local/bin/envoy -c /etc/envoy/envoy.yaml
curl -v http://localhost:8888/api/v3/user

@vit-zikmund
Copy link
Contributor Author

Just verified the envoy settings to work. All I had to change in my production setup were the port numbers.

@rufuspollock
Copy link
Member

@vit-zikmund this is great 👏👏 Thank-you so much for sharing 🙏

@vit-zikmund
Copy link
Contributor Author

Just played a bit with with the SSH possibilities, and got as far as calling github's git-lfs-authenticate:

 ssh '--' '[email protected]' 'git-lfs-authenticate $ORG/$REPO.git download'
{
  "href": "https://lfs.github.com/$ORG/$REPO",
  "header": {
    "Authorization": "RemoteAuth gitauth-v1-gsa0wDFqfuz8wDtKwSK08A0SG7AZY5Edc-hl3OWe0eDA5iSrDpLOZg6jXoKmbWVtYmVyuXVzZXI6NzU0NDM0NDg6dml0LXppa211bmSlcHJvdG-jc3No"
  },
  "expires_at": "2024-04-04T12:55:58Z",
  "expires_in": 599

The returned Authorization header doesn't work for GitHub API. Heck, I wasn't even able to make it work for the URL suggested in the href section 🤕 (but that's likely because I didn't provide working /objects/batch request data).

Anyway, this means the way to verify the SSH login has proper permissions to a certain GitHub repo won't work. If only there would be a public endpoint to verify this token's permissions, or a way to get a real personal access token via SSH... alas, we're not in the '90s anymore 💾😄.

Every other workaround I could think of eventually circles back to the personal access token, so closing this investigation as a dead end.

@vit-zikmund
Copy link
Contributor Author

Alright folks, it's about time for me to tie this up and move on:

@vit-zikmund
Copy link
Contributor Author

The last pieces of the puzzle are merged to master. Here's the full feature with a ribbon on top. Thank you so much @athornton for your guidance and help, and, naturally, @rufuspollock & co. for their dedication to open source ❤️

Now don't be a stranger and carve us that new release, right? 😏 The plate Adam prepared is hardly getting any more silver. Thank you all once again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants