Update the EmrEtlRunner configuration YAML file

HOME > SNOWPLOW SETUP GUIDE > Step 1: setup a Collector > Clojure collector setup > Update the EmrEtlRunner configuration YAML file

The final step is to update your ETL process to work with the Clojure Collector rather than the default CloudFront Collector.

This is a necessary step because, although the Clojure Collector and the CloudFront Collector log raw Snowplow events in exactly the same format, they name their files differently. (If we attempt to change the filename formats, then Elastic Beanstalk will cease to store the log files on S3 correctly.)

If you are using EmrEtlRunner, then updating your ETL process to work with the Clojure Collector is a matter of editing your config.yml configuration file, and first changing:

:etl:
  :collector_format: cloudfront

to:

:etl:
  :collector_format: clj-tomcat

Second, you will need to update the In Bucket specified:

:s3:
  :region: eu-west-1
  :buckets:
    # ...
    :in: s3://elasticbeanstalk-{{REGION NAME}}-{{UUID}}/resources/environments/logs/publish/{{SECURITY GROUP IDENTIFIER}}

Replace all of these {{x}} variables with the appropriate ones for your environment (which you should have written down in the Enable logging to S3 stage).

Important: do not include an {{INSTANCE IDENTIFIER}} at the end of your path. This is because your Clojure Collector may end up logging into multiple {{INSTANCE IDENTIFIER}} folders. By specifying your In Bucket only to the level of the Security Group identifier, you make sure that Snowplow can process all logs from all instances.

That's it! Once you have made these two changes, you can start processing your raw log files from the Clojure Collector. The rest of the ETL and storage processes are unchanged.

HOME > SNOWPLOW SETUP GUIDE > Collectors > Clojure collector setup

Setup Snowplow

[Setup a Collector] (setting-up-a-collector)

[Setup the Clojure Collector] (Setting-up-the-Cloudfront-collector)
- Download the Clojure collector WAR file
- Create a new application in Elastic Beanstalk, and upload the WAR file into it
- [Enable logging to S3](Enable logging to S3)
- Enable support for HTTPS
- [Additional configuration options (optional)](additional configuration options)

[Step 2: Setup a Tracker] (setting-up-a-tracker)
[Step 3: Setup EmrEtlRunner] (setting-up-EmrEtlRunner)
[Step 4: Setup the StorageLoader] (setting-up-storageloader)
[Step 5: Analyze your data!] (Getting started analyzing Snowplow data)

Useful resources

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the EmrEtlRunner configuration YAML file

Clone this wiki locally