-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spike data export for restoring prod db to local #2469
base: master
Are you sure you want to change the base?
Changes from 1 commit
8a729cd
db04b9d
7405f18
4ef5a0e
7996447
c1ab7c0
71e4a2f
c36c93a
51b3570
577723c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
# This script does two things: | ||
# | ||
# 1. Gets a DATABASE_URL from the environment or the first argument and | ||
# normalizes it to be able to connect to postgres's CLI tools | ||
# 2. Validates that it's possible to connect to the URL provided | ||
# 3. Sets a validated URL as the `_SCRIPT_DATABASE_URL` environment variable for | ||
# use in other scripts. This only happens if the script detects it's not | ||
# being invoked directly. | ||
# | ||
# This script can be used on its own for validating connections (useful for | ||
# debugging different environments and catching problems early) or as a | ||
# utility script in other scripts that need to connect to a database. | ||
|
||
REQUIRED_POSTGRES_VERSION="16" | ||
|
||
# Check for required tools | ||
REQUIRED_TOOLS="createdb psql" | ||
for tool in $REQUIRED_TOOLS; do | ||
if ! command -v "$tool" >/dev/null 2>&1; then | ||
echo "Error: $tool is required but not installed." >&2 | ||
exit 1 | ||
fi | ||
done | ||
|
||
|
||
# Get the database URL | ||
# TODO: we might want this to be its own script | ||
# 1. Check if DATABASE_URL is provided as the first argument | ||
if [ -n "${1:-}" ]; then | ||
echo "Getting DATABASE_URL from the provided argument" | ||
DATABASE_URL="$1" | ||
# 2. Check DATABASE_URL is set in the environment | ||
elif [ -n "$DATABASE_URL" ]; then | ||
echo "Getting DATABASE_URL from the environment" | ||
DATABASE_URL="$DATABASE_URL" | ||
fi | ||
|
||
# Normalize if DATABASE_URL starts with "postgis://" | ||
# We do this because `dj-database-url` uses "postgis://" | ||
# to alter the Django engine that's used, but the postgres | ||
# cli tools don't support this protocol. | ||
case "$DATABASE_URL" in postgis://*) | ||
DATABASE_URL="postgres://${DATABASE_URL#postgis://}" | ||
;; | ||
esac | ||
|
||
# Check if DATABASE_URL is set after all attempts | ||
if [ -z "$DATABASE_URL" ]; then | ||
echo "Error: DATABASE_URL is not provided." | ||
echo "please the environment variable DATABASE_URL or pass it in as an argument" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please the environment variable -> please set the environment variable |
||
echo "The format must comply with \033[4mhttps://www.postgresql.org/docs/$REQUIRED_POSTGRES_VERSION/libpq-connect.html#LIBPQ-CONNSTRING-URIS\033[0m" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like this is supposed to be doing nice formatting but for me this just literally printed the characters |
||
exit 1 | ||
fi | ||
|
||
# Extract the database name from the database URL. | ||
# 1. Use sed to remove any trailing slashes | ||
# 2. Use `tr` to replace slashes with newlines | ||
# 3. Use tail to get the last line, e.g the last element after a slash | ||
# 4. Use the same method to strip off any query arguments after a `?` | ||
DB_NAME=$(echo "$DATABASE_URL" | sed 's:/*$::' | tr "/" "\n" | tail -n 1 | tr "?" "\n" | head -n 1) | ||
|
||
# Create the database if it doesn't exist. | ||
# If it already exists, we don't fail. At this point, | ||
# we're only making a DB to ensure that we can connect to the | ||
# database URL in the next step, so we can ignore fails here. | ||
# Because of this, we route the output of `createdb` to /dev/null. | ||
# Without this, the script prints an error that might confuse users | ||
echo "Creating the DB if it doesn't exist." | ||
createdb $DB_NAME >/dev/null 2>&1 || true | ||
|
||
# Check that we can connect to the local DB before returning | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this only intended for local DBs? if so should we validate that in the url? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure how we validate it in the local case because I don't know if |
||
psql $DATABASE_URL -c "\q" | ||
if [ $? -ne 0 ]; then | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's another example. This probably does something standalone, but when invoked from |
||
echo "❌ Failed to connect to $DATABASE_URL" | ||
exit 1 | ||
fi | ||
|
||
|
||
# Check the server version | ||
SERVER_POSTGRES_VERSION=$(psql -t -c "SHOW server_version;" -d $DATABASE_URL | cut -d '.' -f 1) | ||
if [ $SERVER_POSTGRES_VERSION != $REQUIRED_POSTGRES_VERSION ]; then | ||
echo "❌ Postgres version $REQUIRED_POSTGRES_VERSION required, found $SERVER_POSTGRES_VERSION" | ||
fi | ||
|
||
echo "✅ Successfully connected to the local database '$DB_NAME'" | ||
|
||
|
||
# Check if the basename of $0 (the file that was executed) is the same | ||
# as this file name. If not, this script is being called as a 'utility' | ||
# so we should set an environment variable. | ||
if [ "${0##*/}" != "check-database-url.sh" ]; then | ||
# Script is being sourced, export a "private" DATABASE URL | ||
# that we can use in other scripts | ||
export _SCRIPT_DATABASE_URL=$DATABASE_URL | ||
fi |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,12 +28,16 @@ for tool in $REQUIRED_TOOLS; do | |
fi | ||
done | ||
|
||
# Check the DB URL and get the cleaned $_SCRIPT_DATABASE_URL | ||
. ./scripts/check-database-url.sh | ||
|
||
|
||
# Create a temporary file and set up clean up on script exit | ||
TEMP_FILE=$(mktemp) | ||
trap 'rm -f "$TEMP_FILE"' EXIT | ||
|
||
# Invoke AWS Lambda and store the result in the temp file | ||
# The result is a presigned URL to the dump file on S3 | ||
# The result is a pre-signed URL to the dump file on S3 | ||
echo "Invoking Lambda to get DB URL. This might take a few minutes..." | ||
aws lambda invoke \ | ||
--function-name "$LAMBDA_FUNCTION_NAME" \ | ||
|
@@ -46,12 +50,22 @@ aws lambda invoke \ | |
# Extract the URL from the response | ||
# This is because the response is quoted, so we just need to remove the quotation marks | ||
URL=$(sed 's/^"\(.*\)"$/\1/' "$TEMP_FILE") | ||
echo "Got URL: $(URL)" | ||
case "$URL" in | ||
https://*) | ||
echo "Got URL: $(URL)" | ||
|
||
;; | ||
*) | ||
echo "The received URL looks invalid. This might mean the database export failed." | ||
echo "Check the logs of the '$LAMBDA_FUNCTION_NAME' Lambda function" | ||
exit 1 | ||
;; | ||
esac | ||
|
||
echo "Dropping DB $(LOCAL_DB_NAME)" | ||
dropdb --if-exists "$LOCAL_DB_NAME" | ||
echo "Creating DB $(LOCAL_DB_NAME)" | ||
createdb "$LOCAL_DB_NAME" | ||
echo "Dropping DB $(_SCRIPT_DATABASE_URL)" | ||
dropdb --if-exists "$_SCRIPT_DATABASE_URL" | ||
echo "Creating DB $(_SCRIPT_DATABASE_URL)" | ||
createdb "$_SCRIPT_DATABASE_URL" | ||
Comment on lines
+65
to
+68
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was not able to get this to work at all. As far as I can see, |
||
|
||
echo "Downloading and restoring DB $(LOCAL_DB_NAME)" | ||
wget -qO- "$URL" | pg_restore -d "$LOCAL_DB_NAME" -Fc --no-owner --no-privileges | ||
echo "Downloading and restoring DB $(_SCRIPT_DATABASE_URL)" | ||
wget -qO- "$URL" | pg_restore -d "$_SCRIPT_DATABASE_URL" -Fc --no-owner --no-privileges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script exhibits different behaviour when run standalone vs when called by
get-prod-db.sh
.If I run
scripts/check-database-url.sh
this block runs and outputs a help message.If I run
scripts/get-prod-db.sh
this just fails withI think the reason for this is that we've set
set -euxo
inget-prod-db.sh
but not in this script. So when it is called fromget-prod-db.sh
those settings are inherited, whereas in standalone mode it just tries to plough on.