Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Htsget data-retrieval with encryption #7

Closed
mrtamm opened this issue Apr 16, 2024 · 12 comments
Closed

Htsget data-retrieval with encryption #7

mrtamm opened this issue Apr 16, 2024 · 12 comments
Assignees

Comments

@mrtamm
Copy link
Contributor

mrtamm commented Apr 16, 2024

Add support for requesting genomic data in encrypted (crypt4gh) format.

Htsget (more specifically htsget-rs) is supposed to support this functionality, as described here:
https://github.com/umccr/htsget-rs/blob/194457b077d3387414800fd5ffcb2a2141a6d1b3/docs/crypt4gh/ARCHITECTURE.md

Funnel needs to implement the referred htsget protocol for downloading encrypted files.

This means extending the current htsget protocol implementation:

  1. forward client-public-key in HTTP headers
  2. detect that the referred file is encrypted (c4gh)
  3. forward server-public-key in HTTP headers when downloading parts
  4. decrypt the downloaded data
  5. configuration parameters for the key-pair and the server-public-key
@xhejtman
Copy link
Contributor

I believe, client keys should be ad-hoc generated by the funnel.

@mrtamm
Copy link
Contributor Author

mrtamm commented Apr 24, 2024

Initial development is here: https://github.com/mrtamm/funnel-gdi/tree/dev-htsget-crypt4gh

At the moment, I still need to do more full-scale testing (and potentially fixing) before reaching a PR. So I'm estimating May 8 for the PR.

@xhejtman
Copy link
Contributor

From slack:

"inputs": [
    {
      "name": "pub key input",
      "description": "Public C4GH key.",
      "type": "FILE",
      "path": "/tmp/c4gh.pub",
      "content": "PUBKEY AS STRNG"
    }
  ],

@mrtamm
Copy link
Contributor Author

mrtamm commented May 15, 2024

HTSGET storage configuration in Funnel now looks like this:

HTSGETStorage:
  Disabled: false
  Protocol: https
  SendPublicKey: false

When SendPublicKey is true, Funnel will generate the key-pair if existing keys (files) are not found. Funnel itself cannot detect if the Htsget server sends the data encrypted or not. So user must specify it explicitly.

Protocol specifies the replacement protocol for calling HTSGET API (default is https).

@mrtamm
Copy link
Contributor Author

mrtamm commented May 15, 2024

Overview about the local testing setup.

Testing dependencies

  1. htsget-rs: https://github.com/umccr/htsget-rs/tree/crypt4gh
  2. htsget: https://pypi.org/project/htsget/
  3. crypt4gh: https://pypi.org/project/crypt4gh/

Htsget Docker Image

Inside htsget-rs directory:

cp deploy/Dockerfile .
docker build -t ghcr.io/umccr/htsget-rs:latest .

Htsget configuration

formatting_style = "Compact"

# The main ticket-server:
ticket_server_addr = "0.0.0.0:8080"

# The local-data-server:
data_server_enabled = true
data_server_local_path = "/data" # This is INSIDE the container

[[resolvers]]

[resolvers.storage]
response_url = "http://localhost:9091/"
forward_headers = true

[resolvers.storage.endpoints]
file = "http://localhost:8081/"
index = "http://localhost:8081/"

[resolvers.object_type]
send_encrypted_to_client = true
private_key = "/crypt4gh/private.key"
public_key = "/crypt4gh/public.key"

Folder-structure for Docker-Compose Data

./htsget/
  - crypt4gh/
    - private.key
    - public.key
  - data/
    - test_data.vcf.gz.c4gh
    - test_data.vcf.gz.tbi
  - htsget.toml

Generate private and public keys using command:
crypt4gh-keygen -f --nocrypt --sk private.key --pk public.key

Sample VCF for testing:
https://github.com/EGA-archive/beacon2-ri-tools/blob/main/test/test_1000G.vcf.gz

Generate index (TBI) for the VCF: bcftools index -t test_data.vcf.gz

Htsget on Docker-Compose

services:
  htsget:
    container_name: htsget
    image: ghcr.io/umccr/htsget-rs:latest
    command: htsget-actix --config /etc/htsget.toml
    ports:
      - "9090:8080"
      - "9091:8081"
    volumes:
      - "./htsget/data:/data:ro"
      - "./htsget/crypt4gh:/crypt4gh:ro"
      - "./htsget/htsget.toml:/etc/htsget.toml:ro"

After docker compose up, call the API (for testing):

curl -H 'client-public-key: Qjn...' 'http://localhost:9090/variants/test_1000G?class=header'

Htsget configuration in Funnel

Copy config/default-config.yaml to my-config.yaml and modify HTSGETStorage:

HTSGETStorage:
  Disabled: false
  Protocol: http
  SendPublicKey: true

Htsget storage testing

# copy the keys:
cp htsget/crypt4gh/private.key .private.key
cp htsget/crypt4gh/public.key  .public.key

go run . storage get "htsget://localhost:9090/variants/test_data?class=header" header.vcf.gz -c my-config.yaml

xhejtman added a commit that referenced this issue May 22, 2024
#7 Htsget data-retrieval with encryption
@MalinAhlberg
Copy link

First of all, really nice that you are implementing support for htsget!
I'm testing this implementation together with starter-kit-htsget + starter-kit-storage-and-interfaces, and have two questions:

  • I am not able to use a private key that uses a passphrase, even if the passphrase is empty. Is it possible?
  • Would it not be safer/sounder to decrypt the file inside of the execution container, instead of first decrypting and then copying the decrypted file to the container?

Thanks :)

@mrtamm
Copy link
Contributor Author

mrtamm commented May 28, 2024

Hi and thank you for the feedback!

I am not able to use a private key that uses a passphrase, even if the passphrase is empty. Is it possible?

As shown above, I used --nocrypt option to generate the keys without a passphrase. So it should work in that case. However, if the environment, where funnel is running, contains an environment variable C4GH_PASSPHRASE, crypt4gh would use that value for decrypting the key. At the moment, this is the only way to make it work. Theoretically, it would be possible to add this passphrase to funnel configuration file, too.

Would it not be safer/sounder to decrypt the file inside of the execution container, instead of first decrypting and then copying the decrypted file to the container?

It depends. If it has to be done in the container, this (additional) task would be left to the container developer. However, the private key is already in the host system, so this decryption could be executed outside of the container as well. For the sake of user experience, I decided to decrypt the file beforehand, and leave the security task for the maintainer of the host system (where funnel is running).

This is how I figured it out how it would work best but if there are more ways to solve it, I would gladly discuss them.

@MalinAhlberg
Copy link

Thanks for the answers, @mrtamm ! I think your reasoning makes sense, and I now have the complete setup running 👍 .

A side note, in case someone else finds it useful: the htsget command (cmd1) might hang, if there is something wrong with the decryption (cmd2) so that it stops reading from the pipe. For example, if an old version of crypt4gh is used.

@mrtamm
Copy link
Contributor Author

mrtamm commented May 30, 2024

Thanks for the feedback! I need to check, indeed, how the problems could be detected when something goes wrong with the commands. Secondly, I'm also considering support for other crypt4gh implementations (they have different CLI flags), or otherwise integrating decryption to the Funnel source code. Estimating this to be ready by the end of June.

@mrtamm
Copy link
Contributor Author

mrtamm commented Aug 7, 2024

I added a new PR for having Htsget+Crypt4gh support right in the source code of Funnel: #12

@mrtamm
Copy link
Contributor Author

mrtamm commented Sep 2, 2024

Related PR #12 is ready for review and merge.

@mrtamm
Copy link
Contributor Author

mrtamm commented Oct 8, 2024

It is now available in the master branch.

@mrtamm mrtamm closed this as completed Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants