Skip to content

Using tar to stream a directory to a remote server

Greg Bowler edited this page Dec 2, 2022 · 3 revisions

tar is the tape archive tool that comes as standard on most Linux distributions. It's super useful when you know how to use it.

A "tape archive" is a single binary file that represents a whole directory of files and subdirectories. The file would historically be encoded to a magnetic tape, but that's probably not what many people do these days.

However, the nature of tape, a linear and probably slow medium, is very similar to a socket connection on the internet: data can only go in one way, and it might take a while to transfer lots of data, depending on bandwidth. That's why tar is still used so much for this kind of thing!

Simple usage:

tar -cf my-archive.tar /path/to/project will create (c) an archive file (f) called my-archive.tar that contains all the data located at /path/to/project.

tar -xf my-archive.tar /path/to/project will extract (x) an existing archive file (f) called my-archive.tar to the path provided, /path/to/project.

For when bandwidth is an issue, it's possible to have tar automatically zip the contents of the archive, so the binary content is vastly compressed, but can still be extracted (a.k.a. inflated) without loss of data. This can be done by adding the zip (z) flag, and it's also customary to indicate that the archive is zipped by using an appropriate file extension, for example my-zipped-archive.tar.zip, although you can call this whatever you want as the filename is completely arbitrary. Example create: tar -czf archive.tar.zip /path/to/project, and example extraction: tar -xzf archive.zip /path/to/project.

To help me remember the flags, I like to read out the flags as their full names as I type them; "with tar, I'm going to extract this zipped file".

Using tar to stream data

An extremely useful feature of tar was invented due to the slow and linear mechanisms of tape storage - streaming - but we can take advantage of this feature to reduce memory usage and time spent.

Rather than waiting to creating a whole tar archive locally, then waiting to copy the file across the internet, then waiting to extract the whole tar archive on the remote server, tar can stream the archive out as soon as the bytes are ready - this even works when compressing!

Some Linux commands take the hyphen (-) character with special meaning. With tar, the hyphen means to use the standard input/output stream instead of a file. So if you used tar -czf - /path/to/project you would see a lot of gibberish binary content streamed to your terminal. However, we can use the powerful concept of Linux pipes to redirect the binary content elsewhere.

We can pipe the output of the archive creation into an SSH process, which will stream the contents over the internet and onto a remote server, at which point we can re-pipe the stream back into the tar command on the remote server and extract the archive in place - crazy!

tar -czf - /path/to/project | ssh [email protected] "tar -xzvf -"

The above command will stream the archive data to the example.com server via SSH. As the data comes in, it is extracted in place. This means there's never a .tar file created - the data is sent directly where it needs to go. The v flag stands of "verbose", and its purpose is to list out each file path as it's extracted, acting as a progress indicator of sorts.

Usage notes

When extracting an archive created with absolute paths, the current directory will be treated as the root directory. For example, an archive extracted with tar -cf example.tar /var/www/example will extract to include the var directory in the current working directory.

The way this behaviour is managed within php-actions/deploy-ssh is by changing to the Github Workspace directory, and then using . as the directory to archive. . means the current directory, which means all paths within the archive will be relative to the outermost directory when extracting. With this knowledge, the remote server and the Github Action runner can treat the archive identically, so long as they both have the current working directory set to the project directory.

// TODO: User ownership - is it possible to change ownership to the remote user? Is that the most appropriate in our case?

Clone this wiki locally