Manually roll back a deploy

Although we have revert.sh to recover builds, it does not always succeed. This page is to help you if you need to roll back and the automation is failing.

To roll back ce-deploy builds you need to know the following things:

the location of the code
the location of the symlink for the live site
(sometimes) the location of the database backups
(sometimes) the database name and credentials

Paths

Paths are set in the _init role and are rarely overridden, although they can be so do check repo variables if you're not sure. To figure out the default paths you need these variables:

project_name
build_type
deploy_user - found in the ce-deploy-config implementation for the deploy server, almost always deploy

The key paths are:

deploy_base_path - defaults to /home/{{ deploy_user }}/deploy/{{ project_name }}_{{ build_type }}
live_symlink_dest - defaults to {{ deploy_base_path }}/live.{{ project_name }}_{{ build_type }}') }}

So if project_name is acme-website and the build_type is dev then the paths would be:

/home/deploy/deploy/acme-website_dev
/home/deploy/deploy/acme-website_dev/live.acme-website_dev

Standalone servers

On standalone servers the deployment method is always the same, thus so is rollback. The /home/deploy/deploy directory will be full of sites and within each site directory (e.g. acme-website_dev) there will be numbered builds. The link at the path defined in live_symlink_dest will be pointing to the last successful build (note, this may not be the latest build if builds have been failing).

If you are using the rolling database management strategy then to roll back you just delete the link at the path defined in live_symlink_dest and recreate it pointing at the build directory you desire, for example to re-point the link to build 26 I would just do this:

# Remove the link to the bad build
sudo rm /home/deploy/deploy/acme-website_dev/live.acme-website_dev
# Recreate the link to the last known good build
sudo ln -s /home/deploy/deploy/acme-website_dev/acme-website_dev_build_26 /home/deploy/deploy/acme-website_dev/live.acme-website_dev

It is a good idea to clear caches. With Drupal this can be achieved with drush doing a cache rebuild (drush cr) - for other applications see their respective documentation. Reloading or restarting PHP is usually a good idea too, in case of opcode caching. If you need to restore a database (next section) then do the cache rebuild after that step.

Database handling

In most cases mysql_backup.handling will be set to rolling. If this is the case there will be nothing to do, each build has its own database so simply moving the link will make it use the correct previous database.

Restoring databases

If the handling is set to dump then things are a little more involved. This is unusual, but can be the case to save disk/cost or speed up build times. In this case you need to know the dumps directory location. This is set in mysql_backup.dumps_directory and is usually not overridden, but again, check project variables to be sure. To follow our example, assuming the path isn't altered we would expect the dump files to be here:

/home/deploy/shared/acme-website_dev/db_backups/mysql/build

Note they are on the shared directory. For standalone servers this doesn't matter, but this directory will be mounted storage on ASGs and other types of highly available server layouts, so this makes the dumps available to all servers.

You will also need to know the database name and credentials so you can restore the database. These can be found in settings.php for Drupal, for other applications check their docs. For Drupal the settings.php file can be found somewhere like this:

/home/deploy/shared/acme-website_dev/acme-website_dev_build_26/web/sites/default/

Note, web might be different and is often set in the webroot variable in the code repo in common.yml like this:

https://gitlab.ce-deploy1.codeenigma.net/web/ce-www/-/blob/dev/deploy/vars/common.yml#L4

Alternatively, you can also use drush to restore the database. If you move the symlink first then the built-in drush MySQL handling will use the correct database and you can use drush sql:cli instead of the mysql CLI to restore the database, then you don't need to hunt for credentials.

The dumps will be numbered in the same way as build directories, so all you need to do is deflate and restore the dump corresponding to the code you intend to restore, e.g. if you are linking to build_26 then restore database 26 as well.

If you are using AWS RDS then in certain circumstances it may be quicker to restore to a point in time via the AWS console. If you do this, bear in mind a new endpoint URL will be created, so your application settings will need to be updated accordingly.

ASG rollbacks

Rollback on ASGs depends on the deploy_code.mount_type variable. For smaller applications this is often set to tarball and in this case you can use the same method as for standalone servers, but not forgetting to do it on each web server in turn. To get a list of web servers you can run ansible-inventory --graph from the ce-deploy directory on the respective deploy server.

Once you have done that you will need to re-pack the tarball and copy it up to the correct location on the shared directory so future ASG events use the correct code. These are the tasks that create and move the tarball to the correction location. The commands from those tasks, which you can copy, are as follows:

sudo tar -cvf /tmp/{{ project_name }}_{{ build_type }}_{{ build_number }}.tar --owner=0 --group=0 {{ deploy_base_path }}
sudo mv /tmp/{{ project_name }}_{{ build_type }}_{{ build_number }}.tar {{ deploy_code.mount_sync }}/{{ project_name }}_{{ build_type }}.tar

As you can see, you need the deploy_code.mount_sync variable to know where to copy the file to, and the format for the filename is {{ project_name }}_{{ build_type }}.tar. So to give an example with our dev version of acme-website and build ID of 26:

# Create a tarball of the application directory
sudo tar -cvf /tmp/acme-website_dev_26.tar --owner=0 --group=0 /home/deploy/deploy/acme-website_dev
# Copy the tarball to the shared drive so it gets used in future autoscale events
sudo mv /tmp/acme-website_dev_26.tar /home/deploy/shared/acme-website/deploy/acme-website_dev.tar

Once the code has been packaged and copied to the shared drive you are done.

For larger applications with many files the tarball approach is too slow to deploy on an autoscale event, so we use SquashFS images instead. This includes Drupal from version 8.0.0 onwards:

SquashFS

Normally the build process will have created a previous image that you can copy down and mount. Before ce-deploy replaces the currently deployed SquashFS image it uses this task to copy it to the shared volume:

https://github.com/codeenigma/ce-deploy/blob/1.x/roles/deploy_code/tasks/cleanup.yml#L88-L96

As you can see, the path is {{ deploy_code.mount_sync }}/{{ project_name }}_{{ build_type }}_previous.sqsh, so if we extent the dev build of the acme-website project model above, you can expect to find the previous build here:

/home/deploy/shared/acme-website_dev/deploy/acme-website_dev_previous.sqsh

As long as you are using the rolling database management strategy, simply unmounting the current image and mounting the old one will roll back your build, as the old code points to the database just prior to the build. Remember to do this on all servers. As the deploy user, and ensuring your are not in the live deployed code directory on the server or the mount will be locked:

# Copy down the previous build to the server
cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev_previous.sqsh /home/deploy/builds/acme-website_dev/deploy.sqsh
# Stop any services that might lock the mount - typically just PHP
sudo service phpX.X-fpm stop # where X.X is your PHP version
# Unmount the bad image
sudo umount -f /home/deploy/deploy/acme-website_dev
# Mount the previous image you copied down
sudo mount /home/deploy/builds/acme-website_dev/deploy.sqsh /home/deploy/deploy/acme-website_dev -t squashfs -o loop
# Start your services again
sudo service phpX.X-fpm start # where X.X is your PHP version

If you are happy your rollback was successful, then you should replace the SquashFS image on the shared drive, so that future autoscaling events use the correct image:

# Copy the failed build out of the way
cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev.sqsh cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev_failed.sqsh
# Copy the previous build so it becomes the live build
cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev_previous.sqsh cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev.sqsh

If you are not using rolling database management you will still need to restore your database, as described above.

Missing or bad previous SquashFS image

In the very rare event that ce-deploy fails to create the previous image correctly, you will need to pack a new SquashFS image manually. To do this you should follow a mix of the standalone server instructions above and the hotfixes instructions below to pack a new previous image, for example if build 26 was the last good build for acme-website and the dev environment - and note we are operating in the builds directory as deploy is read only:

# Remove the link to the bad build
sudo rm /home/deploy/builds/acme-website_dev/live.acme-website_dev
# Recreate the link to the last known good build - /home/deploy/deploy in the first part is deliberate!
sudo ln -s /home/deploy/deploy/acme-website_dev/acme-website_dev_build_26 /home/deploy/builds/acme-website_dev/live.acme-website_dev
# Copy the old image to one side, just in case
cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev.sqsh cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev_failed.sqsh
# Create a new SquashFS image, excluding the current image
mksquashfs /home/deploy/builds/acme-website_dev /tmp/acme-website_dev_26.sqsh -e /home/deploy/builds/acme-website_dev/deploy.sqsh
# Copy the image to the shared drive so it gets used in future autoscale events
mv /tmp/acme-website_dev_26.sqsh /home/deploy/shared/acme-website_dev/deploy/acme-website_dev.sqsh

This you can do on one single server. However, once you have done that you must follow the instructions in the previous section to copy down your new image and unmount and remount the image on the server, which you must do on all servers.

Hotfixes

On standalone servers and ASGs using the mount_type of tarball you can just edit the code, remembering to do so on each web server in an ASG and also clear caches. Restarting PHP isn't bad idea either, to clear any opcode caching. As before, ensure you recreate the tarball and copy it to shared to future proof autoscaling events, instructions above.

SquashFS

Hotfixing SquashFS is tricky, again, because mounted SquashFS images are read only. SquashFS code is built in a separate location, /home/deploy/builds. A quick way to get a writable cope of the site for hot-fixing is to edit your vhost, change the document root from /home/deploy/deploy to /home/deploy/builds and also change from the live symlink to the target build directory, and restart the web server. You can then edit the code in /home/deploy/builds and it is identical to the code in the SquashFS image. When you edit the code you will now see the changes directly. Remember, because you are hotfixing you'll have to do this on all servers.

Once you have completed your hotfixes and you want to ensure they are captured, the approach is similar to the one above for packing a new tarball, except we are of course packing a new SquashFS image instead. You can do this on one single server:

# Copy the old image to one side, just in case
cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev.sqsh cp /home/deploy/shared/acme-website_dev/deploy/acme-website_dev_failed.sqsh
# Set your link back to the live deploy website
rm /home/deploy/builds/acme-website_dev/live.acme-website_dev
ln -s /home/deploy/deploy/acme-website_dev/acme-website_dev_build_26 /home/deploy/builds/acme-website_dev/live.acme-website_dev
# Create a new SquashFS image, excluding the current image
mksquashfs /home/deploy/builds/acme-website_dev /tmp/acme-website_dev_26.sqsh -e /home/deploy/builds/acme-website_dev/deploy.sqsh
# Copy the image to the shared drive so it gets used in future autoscale events
mv /tmp/acme-website_dev_26.sqsh /home/deploy/shared/acme-website_dev/deploy/acme-website_dev.sqsh
# Fix the link again so your patched site is served
rm /home/deploy/builds/acme-website_dev/live.acme-website_dev
ln -s /home/deploy/builds/acme-website_dev/acme-website_dev_build_26 /home/deploy/builds/acme-website_dev/live.acme-website_dev

At this point you are done, effectively your live servers are not using the mounted SquashFS image, you have left them serving the site from the server's normal disk in /home/deploy/builds, but this doesn't actually matter. If they become unhealthy then autoscaling will replace them with a new machine, which will correctly grab the SquashFS image from the shared drive and mount it on boot.

Unstable AWS cluster

Sometimes it is hard to deal with a rollback situation because your AWS cluster is unstable and the servers you are working with keep getting terminated. To deal with this you can use the AWS console to pause autoscaling:

Login to AWS, go to EC2
Go to Auto Scaling Groups at the bottom of the left menu
Click on the name of the affected ASG
Scroll down to 'Advanced configurations' and click Edit
Under 'Suspended processes' choose Terminate and Health Check, then click Update

This will prevent your EC2 instances from being terminated while you are working on them. Now you can stabilise your application in peace. Once you are done, you can remove the suspensions from those processes and order an instance refresh, to assure yourself everything is back to normal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manually roll back a deploy

Paths

Standalone servers

Database handling

Restoring databases

ASG rollbacks

SquashFS

Missing or bad previous SquashFS image

Hotfixes

SquashFS

Unstable AWS cluster

Clone this wiki locally