Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(build): housekeeping for cl-postgresql and cl-es TLS #4617

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cweider
Copy link
Collaborator

@cweider cweider commented Oct 24, 2024

Strict verification of certificates in development isn't something that is necessary, but it is still productive. Matching or exceeding the rigor required by production can shake out configuration issues before they reach production.

With a few adjustments, we can provide cl-django the information it needs to verify the cl-postgresql and cl-es services:

  • Remove unsafe default for ELASTICSEARCH_CA_CERT and instead specify this value in docker-compose.yml.
  • Create ELASTICSEARCH_VERIFY_CERT and use it to configure certificate verification in development.
  • Provide cl-postgres.crt and cl-es.crt to cl-django and cl-celery

Strict verification of certificates in development isn't something
that is necessary, but it is still productive. Matching or
exceeding the rigor required by production can shake out
configuration issues before they reach production.

With a few adjustments, we can provide `cl-django` the information
it needs to verify the `cl-postgresql` and `cl-es` services.

- Remove unsafe default for `ELASTICSEARCH_CA_CERT` and instead
  specify this value in `docker-compose.yml`.
- Create `ELASTICSEARCH_VERIFY_CERT` and use it to configure
  certificate verification in development.
- Provide `cl-postgres.crt` and `cl-es.crt` to `cl-django` and
  `cl-celery`
@cweider
Copy link
Collaborator Author

cweider commented Oct 24, 2024

I should say: no urgency to this change (I understand that Alberto is on vacation for the moment).

@@ -80,6 +80,8 @@ services:
volumes:
- ${CL_POSTGRES_RUN_DIR:-/var/run/postgresql}:/var/run/postgresql
- ${CL_BASE_DIR:-../../}:/opt/courtlistener
- ${CL_BASE_DIR:-../../}/docker/postgresql/cl-postgres.crt:/root/.postgresql/root.crt
- ${CL_BASE_DIR:-../../}/docker/elastic/cl-es.crt:/run/secrets/cl-es.crt
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elastic/cl-es.crt is being used here, instead of elastic/ca.crt, because using with ca.crt yields this warning with verification on:

TLS error caused by: TlsError(TLS error caused by: SSLError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: CA cert does not include key usage extension (_ssl.c:1020))), Retrying in 10 seconds...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the problem. It appears that the CA file didn't have the correct extensions. In the past, this wasn't an issue since the verification process didn't check extensions, but in recent versions of elastic_transport, this verification has become more strict.

I believe it works using cl-es.crt instead of ca.crt because it matches the exact certificate in cl-es, but it's not performing proper verification using the CA.

I've regenerated the certificates using a proper CA and elasticsearch-certutil.

Which now contains:

X509v3 extensions:
            X509v3 Basic Constraints: critical
                CA:TRUE
            X509v3 Key Usage: critical
                Certificate Sign, CRL Sign

You can use these files to replace the ones in /docker/elastic:

ca.crt.txt
cl-es.crt.txt
cl-es.key.txt

And use:
ELASTICSEARCH_CA_CERT=/run/secrets/ca.crt

Instead of:
ELASTICSEARCH_CA_CERT=/run/secrets/cl-es.crt

ELASTICSEARCH_CA_CERT = env(
"ELASTICSEARCH_CA_CERT",
default="/opt/courtlistener/docker/elastic/ca.crt",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m fairly confident that this is given a real value in production. However, in the unlikely event that this is not, cl-django will not be happy that it disappeared (but will tolerate this since certificates are not verified).

Silver lining is that it will reveal an issue that should be fixed? (the private key for cl-es.crt is in the public repo).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I believe we don't have the ELASTICSEARCH_CA_CERT value set in production. @mlissner could help confirm this.

The reason for this is that we decided to skip certificate verification in production because we were unable to obtain the correct CA for verifying self-signed certificates, as described in #2958 so we just set verify_certs to False

I think we have two options moving forward:

  • We could try again to obtain the correct CA from the ES cluster to use alongside the ALB's certificate, which would allow us to verify certificates in production.
  • We could continue to skip certificate verification in production, as this is the default behavior set by ELASTICSEARCH_VERIFY_CERT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, pretty sure that's not set.

Copy link
Contributor

@albertisfu albertisfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cweider, this looks good to me. I've just left some comments above that might be helpful.

Let me know what you think.

ELASTICSEARCH_CA_CERT = env(
"ELASTICSEARCH_CA_CERT",
default="/opt/courtlistener/docker/elastic/ca.crt",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I believe we don't have the ELASTICSEARCH_CA_CERT value set in production. @mlissner could help confirm this.

The reason for this is that we decided to skip certificate verification in production because we were unable to obtain the correct CA for verifying self-signed certificates, as described in #2958 so we just set verify_certs to False

I think we have two options moving forward:

  • We could try again to obtain the correct CA from the ES cluster to use alongside the ALB's certificate, which would allow us to verify certificates in production.
  • We could continue to skip certificate verification in production, as this is the default behavior set by ELASTICSEARCH_VERIFY_CERT.

@@ -80,6 +80,8 @@ services:
volumes:
- ${CL_POSTGRES_RUN_DIR:-/var/run/postgresql}:/var/run/postgresql
- ${CL_BASE_DIR:-../../}:/opt/courtlistener
- ${CL_BASE_DIR:-../../}/docker/postgresql/cl-postgres.crt:/root/.postgresql/root.crt
- ${CL_BASE_DIR:-../../}/docker/elastic/cl-es.crt:/run/secrets/cl-es.crt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the problem. It appears that the CA file didn't have the correct extensions. In the past, this wasn't an issue since the verification process didn't check extensions, but in recent versions of elastic_transport, this verification has become more strict.

I believe it works using cl-es.crt instead of ca.crt because it matches the exact certificate in cl-es, but it's not performing proper verification using the CA.

I've regenerated the certificates using a proper CA and elasticsearch-certutil.

Which now contains:

X509v3 extensions:
            X509v3 Basic Constraints: critical
                CA:TRUE
            X509v3 Key Usage: critical
                Certificate Sign, CRL Sign

You can use these files to replace the ones in /docker/elastic:

ca.crt.txt
cl-es.crt.txt
cl-es.key.txt

And use:
ELASTICSEARCH_CA_CERT=/run/secrets/ca.crt

Instead of:
ELASTICSEARCH_CA_CERT=/run/secrets/cl-es.crt

@@ -112,7 +117,8 @@ services:
volumes:
- ${CL_POSTGRES_RUN_DIR:-/var/run/postgresql}:/var/run/postgresql
- ${CL_BASE_DIR:-../../}:/opt/courtlistener
- ${CL_BASE_DIR:-../../}/.postgresql:/root/.postgresql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine; however, I wanted to provide some context about this file.
We mounted it to emulate this error, which occurred in production after libpq and postgresql-client were upgraded.

Because Docker containers run as root in development, no permission error was raised, preventing us from catching this issue in development before it reached production. The solution was to mount an empty postgresql.crt file to trigger the same error due to the file being blank. This allowed us to confirm that adding the line ENV PGSSLCERT=/tmp/postgresql.crt in the Dockerfile resolved the issue.

While the file .postgresql/postgresql.crt is not strictly necessary and it's okay to remove it. We left it in this PR so we could catch potential problems in development in case we forget about it and remove the line ENV PGSSLCERT=/tmp/postgresql.crt from the Dockerfile.

So the risk of removing this file is low since we already have a comment in the Dockerfile explaining the necessity of that line. However, if you think keeping the file isn’t redundant and is productive, an alternative is to mount the file individually instead of mounting the entire directory:

- ${CL_BASE_DIR:-../../}/.postgresql/postgresql.crt:/root/.postgresql/postgresql.crt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants