Skip to content

2 Using the StorageLoader

Yali Sassoon edited this page May 7, 2013 · 3 revisions

HOME > SNOWPLOW SETUP GUIDE > Step 4: setting up alternative data stores > Using the StorageLoader

  1. Overview
  2. Command-line options
  3. Running
  4. Troubleshooting
  5. Next steps
## 1. Overview

Running the StorageLoader is very straightforward - please review the command-line options in the next section.

## 2. Command-line options

Invoke StorageLoader using Bundler's bundle exec syntax:

$ bundle exec bin/snowplow-storage-loader

Note the bin/ sub-folder, and that the bundle exec command will only work when you are inside the storage-loader folder.

The command-line options for StorageLoader look like this:

Usage: snowplow-storage-loader [options]

Specific options:
    -c, --config CONFIG              configuration file
    -s, --skip download|delete,load,archive skip work step(s)

Common options:
    -h, --help                       Show this message
    -v, --version                    Show version

A note on the --skip option: this skips the work steps listed. So for example --skip download,load would only run the final archive step. This is useful if you have an error in your load and need to re-run only part of it.

## 3. Running

As per the above, running StorageLoader is a matter of populating your configuration file, let's call it my-config.yml for this example, and then invoking StorageLoader like so:

$ bundle exec snowplow-storage-loader --config my-config.yml
## 4. Troubleshooting

locate command missing

StorageLoader depends on Snowplow's [Infobright Ruby Loader] [irl], which in turn uses the locate shell command. If your shell complains that this is missing, in which case you can install it separately.

To install and configure locate on Debian/Ubuntu:

$ sudo apt-get install mlocate
$ sudo updatedb
## Next steps

All done? Then schedule the StorageLoader to regularly migrate new data into your data store (e.g. Infobright or Redshift).

HOME > SNOWPLOW SETUP GUIDE > Step 4: Setting up alternative data stores

Setup Snowplow

  • [Step 1: Setup a Collector] (setting-up-a-collector)
  • [Step 2: Setup a Tracker] (setting-up-a-tracker)
  • [Step 3: Setup EmrEtlRunner] (setting-up-EmrEtlRunner)
  • [Step 4: Setup alternative data stores] (setting-up-alternative-data-stores)
    • [4.1: setup Redshift] (setting-up-redshift)
    • [4.2: setup PostgreSQL] (setting-up-postgresql)
    • [4.3: installing the StorageLoader] (1-installing-the-storageloader)
    • [4.4: using the StorageLoader] (2-using-the-storageloader)
    • [4.5: scheduling the StorageLoader] (3-scheduling-the-storageloader)
  • [Step 5: Analyze your data!] (Getting started analyzing Snowplow data)

Useful resources

Clone this wiki locally