Skip to content

Setting up alternative data stores

Yali Sassoon edited this page Aug 5, 2013 · 20 revisions

HOME > SNOWPLOW SETUP GUIDE > Step 4: setting up alternative data stores

Snowplow supports storing your data into three different data stores:

Storage Description Status
S3 Data is stored in the S3 file system where it can be analysed using EMR emr (e.g. Hive, Pig, Mahout) Production-ready
Redshift setup-redshift A columnar database offered as a service on EMR. Optimized for performing OLAP analysis. Scales to Petabytes Production-ready
[PostgreSQL] setup-postgres A popular, open source, RDBMS database Production-ready

By setting up the EmrEtlRunner (in the previous step), you are already successfully loading your Snowplow event data into S3 where it is accessible to EMR for analysis.

If you wish to analyse your data using a wider range of tools (e.g. BI tools like ChartIO chartio or Tableau tableau, or statistical tools like R r), you will want to load your data into a database like Amazon's Redshift setup-redshift or [PostgreSQL] setup-postgres to support enable use of these tools.

The StorageLoader storage-loader-setup is an application to make it simple to keep an updated copy of your data in Redshift. To setup Snowplow to automatically populate a PostgreSQL and / or Redshift database with Snowplow data, you need to first:

  1. [Create a database and table for Snowplow data in Redshift] setup-redshift and / or
  2. [Create a database adn table for Snowplow data in PostgreSQL] setup-postgres

Then, afterwards, you will need to [set up the StorageLoader to regularly transfer Snowplow data into your new store(s)] storage-loader-setup

(Note that instructions on setting up both Redshift and PostreSQL on EC2 are included in this setup guide and referenced from the links above.)

All done? Then start analysing your data.

Note: We recommend running all Snowplow AWS operations through an IAM user with the bare minimum permissions required to run Snowplow. Please see our IAM user setup page for more information on doing this.

HOME > SNOWPLOW SETUP GUIDE > Step 4: Setting up alternative data stores

Setup Snowplow

  • [Step 1: Setup a Collector] (setting-up-a-collector)
  • [Step 2: Setup a Tracker] (setting-up-a-tracker)
  • [Step 3: Setup EmrEtlRunner] (setting-up-EmrEtlRunner)
  • [Step 4: Setup alternative data stores] (setting-up-alternative-data-stores)
    • [4.1: setup Redshift] (setting-up-redshift)
    • [4.2: setup PostgreSQL] (setting-up-postgresql)
    • [4.3: installing the StorageLoader] (1-installing-the-storageloader)
    • [4.4: using the StorageLoader] (2-using-the-storageloader)
    • [4.5: scheduling the StorageLoader] (3-scheduling-the-storageloader)
  • [Step 5: Analyze your data!] (Getting started analyzing Snowplow data)

Useful resources

Clone this wiki locally