-
Notifications
You must be signed in to change notification settings - Fork 2
Manually roll back a deploy
Although we have revert.sh
to recover builds, it does not always succeed. This page is to help you if you need to roll back and the automation is failing.
To roll back ce-deploy
builds you need to know the following things:
- the location of the code
- the location of the symlink for the live site
- (sometimes) the location of the database backups
- (sometimes) the database name and credentials
Paths are set in the _init
role and are rarely overridden, although they can be so do check repo variables if you're not sure. To figure out the default paths you need these variables:
project_name
build_type
-
deploy_user
- found in thece-deploy-config
implementation for the deploy server, almost alwaysdeploy
The key paths are:
-
deploy_base_path
- defaults to/home/{{ deploy_user }}/deploy/{{ project_name }}_{{ build_type }}
-
live_symlink_dest
- defaults to{{ deploy_base_path }}/live.{{ project_name }}_{{ build_type }}') }}
So if project_name
is acme-website
and the build_type
is dev
then the paths would be:
/home/deploy/deploy/acme-website_dev
/home/deploy/deploy/acme-website_dev/live.acme-website_dev
On standalone servers the deployment method is always the same, thus so is rollback. The /home/deploy/deploy
directory will be full of sites and within each site directory (e.g. acme-website_dev
) there will be numbered builds. The link at the path defined in live_symlink_dest
will be pointing to the last successful build (note, this may not be the latest build if builds have been failing).
If you are using the rolling database management strategy then to roll back you just delete the link at the path defined in live_symlink_dest
and recreate it pointing at the build directory you desire, for example to re-point the link to build 26
I would just do this:
# Remove the link to the bad build
sudo rm /home/deploy/deploy/acme-website_dev/live.acme-website_dev
# Recreate the link to the last known good build
sudo ln -s /home/deploy/deploy/acme-website_dev/acme-website_dev_build_26 /home/deploy/deploy/acme-website_dev/live.acme-website_dev
It is a good idea to clear caches. With Drupal this can be achieved with drush
doing a cache rebuild (drush cr
) - for other applications see their respective documentation. Reloading or restarting PHP is usually a good idea too, in case of opcode caching. If you need to restore a database (next section) then do the cache rebuild after that step.
In most cases mysql_backup.handling
will be set to rolling
. If this is the case there will be nothing to do, each build has its own database so simply moving the link will make it use the correct previous database.
If the handling
is set to dump
then things are a little more involved. This is unusual, but can be the case to save disk/cost or speed up build times. In this case you need to know the dumps directory location. This is set in mysql_backup.dumps_directory
and is usually not overridden, but again, check project variables to be sure. To follow our example, assuming the path isn't altered we would expect the dump files to be here:
/home/deploy/shared/acme-website_dev/db_backups/mysql/build
Note they are on the shared
directory. For standalone servers this doesn't matter, but this directory will be mounted storage on ASGs and other types of highly available server layouts, so this makes the dumps available to all servers.
You will also need to know the database name and credentials so you can restore the database. These can be found in settings.php
for Drupal, for other applications check their docs. For Drupal the settings.php
file can be found somewhere like this:
/home/deploy/shared/acme-website_dev/acme-website_dev_build_26/web/sites/default/
Note, web
might be different and is often set in the webroot
variable in the code repo in common.yml
like this:
Alternatively, you can also use drush
to restore the database. If you move the symlink first then the built-in drush
MySQL handling will use the correct database and you can use drush sql:cli
instead of the mysql
CLI to restore the database, then you don't need to hunt for credentials.
The dumps will be numbered in the same way as build directories, so all you need to do is deflate and restore the dump corresponding to the code you intend to restore, e.g. if you are linking to build_26
then restore database 26
as well.
If you are using AWS RDS then in certain circumstances it may be quicker to restore to a point in time via the AWS console. If you do this, bear in mind a new endpoint URL will be created, so your application settings will need to be updated accordingly.
Rollback on ASGs depends on the deploy_code.mount_type
variable. For smaller applications this is often set to tarball
and in this case you can use the same method as for standalone servers, but not forgetting to do it on each web server in turn. To get a list of web servers you can run ansible-inventory --graph
from the ce-deploy
directory on the respective deploy server.
Once you have done that you will need to re-pack the tarball and copy it up to the correct location on the shared directory so future ASG events use the correct code. These are the tasks that create and move the tarball to the correction location. The commands from those tasks, which you can copy, are as follows:
sudo tar -cvf /tmp/{{ project_name }}_{{ build_type }}_{{ build_number }}.tar --owner=0 --group=0 {{ deploy_base_path }}
sudo mv /tmp/{{ project_name }}_{{ build_type }}_{{ build_number }}.tar {{ deploy_code.mount_sync }}/{{ project_name }}_{{ build_type }}.tar
As you can see, you need the deploy_code.mount_sync
variable to know where to copy the file to, and the format for the filename is {{ project_name }}_{{ build_type }}.tar
. So to give an example with our dev
version of acme-website
and build ID of 26
:
# Create a tarball of the application directory
sudo tar -cvf /tmp/acme-website_dev_26.tar --owner=0 --group=0 /home/deploy/deploy/acme-website_dev
# Copy the tarball to the shared drive so it gets used in future autoscale events
sudo mv /tmp/acme-website_dev_26.tar /home/deploy/shared/acme-website/deploy/acme-website_dev.tar
Once the code has been packaged and copied to the shared drive you are done.
For larger applications with many files the tarball
approach is too slow to deploy on an autoscale event, so we use SquashFS images instead. This includes Drupal from version 8.0.0 onwards:
Normally the build process will have created a previous
image that you can copy down and mount. Before ce-deploy
replaces the currently deployed SquashFS image it uses this task to copy it to the shared volume:
As you can see, the path is {{ deploy_code.mount_sync }}/{{ project_name }}_{{ build_type }}_previous.sqsh
, so if we extent the dev
build of the acme-website
project model above, you can expect to find the previous build here:
/home/deploy/shared/acme-website_dev/deploy/acme-website_dev_previous.sqsh
As long as you are using the rolling database management strategy, simply unmounting the current image and mounting the old one will roll back your build, as the old code points to the database just prior to the build. Remember to do this on all servers. As the deploy
user, and ensuring your are not in the live deployed code directory on the server or the mount will be locked:
# Copy down the previous build to the server
cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev_previous.sqsh /home/deploy/builds/acme-website_dev/deploy.sqsh
# Stop any services that might lock the mount - typically just PHP
sudo service phpX.X-fpm stop # where X.X is your PHP version
# Unmount the bad image
sudo umount -f /home/deploy/deploy/acme-website_dev
# Mount the previous image you copied down
sudo mount /home/deploy/builds/acme-website_dev/deploy.sqsh /home/deploy/deploy/acme-website_dev -t squashfs -o loop
# Start your services again
sudo service phpX.X-fpm start # where X.X is your PHP version
If you are happy your rollback was successful, then you should replace the SquashFS image on the shared drive, so that future autoscaling events use the correct image:
# Copy the failed build out of the way
cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev.sqsh cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev_failed.sqsh
# Copy the previous build so it becomes the live build
cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev_previous.sqsh cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev.sqsh
If you are not using rolling database management you will still need to restore your database, as described above.
In the very rare event that ce-deploy
fails to create the previous
image correctly, you will need to pack a new SquashFS image manually. To do this you should follow a mix of the standalone server instructions above and the hotfixes instructions below to pack a new previous
image, for example if build 26
was the last good build for acme-website
and the dev
environment - and note we are operating in the builds
directory as deploy
is read only:
# Remove the link to the bad build
sudo rm /home/deploy/builds/acme-website_dev/live.acme-website_dev
# Recreate the link to the last known good build - /home/deploy/deploy in the first part is deliberate!
sudo ln -s /home/deploy/deploy/acme-website_dev/acme-website_dev_build_26 /home/deploy/builds/acme-website_dev/live.acme-website_dev
# Copy the old image to one side, just in case
cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev.sqsh cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev_failed.sqsh
# Create a new SquashFS image, excluding the current image
mksquashfs /home/deploy/builds/acme-website_dev /tmp/acme-website_dev_26.sqsh -e /home/deploy/builds/acme-website_dev/deploy.sqsh
# Copy the image to the shared drive so it gets used in future autoscale events
mv /tmp/acme-website_dev_26.sqsh /home/deploy/shared/acme-website_dev/deploy/acme-website_dev.sqsh
This you can do on one single server. However, once you have done that you must follow the instructions in the previous section to copy down your new image and unmount and remount the image on the server, which you must do on all servers.
On standalone servers and ASGs using the mount_type
of tarball
you can just edit the code, remembering to do so on each web server in an ASG and also clear caches. Restarting PHP isn't bad idea either, to clear any opcode caching. As before, ensure you recreate the tarball and copy it to shared to future proof autoscaling events, instructions above.
Hotfixing SquashFS is tricky, again, because mounted SquashFS images are read only. SquashFS code is built in a separate location, /home/deploy/builds
. A quick way to get a writable cope of the site for hot-fixing is to edit your vhost, change the document root from /home/deploy/deploy
to /home/deploy/builds
and also change from the live symlink to the target build directory, and restart the web server. You can then edit the code in /home/deploy/builds
and it is identical to the code in the SquashFS image. When you edit the code you will now see the changes directly. Remember, because you are hotfixing you'll have to do this on all servers.
Once you have completed your hotfixes and you want to ensure they are captured, the approach is similar to the one above for packing a new tarball, except we are of course packing a new SquashFS image instead. You can do this on one single server:
# Copy the old image to one side, just in case
cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev.sqsh cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev_failed.sqsh
# Set your link back to the live deploy website
rm /home/deploy/builds/acme-website_dev/live.acme-website_dev
ln -s /home/deploy/deploy/acme-website_dev/acme-website_dev_build_26 /home/deploy/builds/acme-website_dev/live.acme-website_dev
# Create a new SquashFS image, excluding the current image
mksquashfs /home/deploy/builds/acme-website_dev /tmp/acme-website_dev_26.sqsh -e /home/deploy/builds/acme-website_dev/deploy.sqsh
# Copy the image to the shared drive so it gets used in future autoscale events
mv /tmp/acme-website_dev_26.sqsh /home/deploy/shared/acme-website_dev/deploy/acme-website_dev.sqsh
# Fix the link again so your patched site is served
rm /home/deploy/builds/acme-website_dev/live.acme-website_dev
ln -s /home/deploy/builds/acme-website_dev/acme-website_dev_build_26 /home/deploy/builds/acme-website_dev/live.acme-website_dev
At this point you are done, effectively your live servers are not using the mounted SquashFS image, you have left them serving the site from the server's normal disk in /home/deploy/builds
, but this doesn't actually matter. If they become unhealthy then autoscaling will replace them with a new machine, which will correctly grab the SquashFS image from the shared drive and mount it on boot.
Sometimes it is hard to deal with a rollback situation because your AWS cluster is unstable and the servers you are working with keep getting terminated. To deal with this you can use the AWS console to pause autoscaling:
- Login to AWS, go to EC2
- Go to Auto Scaling Groups at the bottom of the left menu
- Click on the name of the affected ASG
- Scroll down to 'Advanced configurations' and click Edit
- Under 'Suspended processes' choose Terminate and Health Check, then click Update
This will prevent your EC2 instances from being terminated while you are working on them. Now you can stabilise your application in peace. Once you are done, you can remove the suspensions from those processes and order an instance refresh, to assure yourself everything is back to normal.