-
Notifications
You must be signed in to change notification settings - Fork 0
Update the EmrEtlRunner configuration YAML file
HOME > SNOWPLOW SETUP GUIDE > Step 1: setup a Collector > Clojure collector setup > Update the EmrEtlRunner configuration YAML file
The final step is to update your ETL process to work with the Clojure Collector rather than the default CloudFront Collector.
This is a necessary step because, although the Clojure Collector and the CloudFront Collector log raw Snowplow events in exactly the same format, they name their files differently. (If we attempt to change the filename formats, then Elastic Beanstalk will cease to store the log files on S3 correctly.)
If you are using EmrEtlRunner, then updating your ETL process to work with the Clojure Collector is a matter of editing your config.yml
configuration file, and first changing:
:etl:
:collector_format: cloudfront
to:
:etl:
:collector_format: clj-tomcat
Second, you will need to update the In Bucket specified:
:s3:
:region: eu-west-1
:buckets:
# ...
:in: s3://elasticbeanstalk-{{REGION NAME}}-{{UUID}}/resources/environments/logs/publish/{{SECURITY GROUP IDENTIFIER}}
Replace all of these {{x}}
variables with the appropriate ones for your environment (which you should have written down in the Enable logging to S3 stage).
Important: do not include an {{INSTANCE IDENTIFIER}}
at the end of your path. This is because your Clojure Collector may end up logging into multiple {{INSTANCE IDENTIFIER}}
folders. By specifying your In Bucket only to the level of the Security Group identifier, you make sure that Snowplow can process all logs from all instances.
That's it! Once you have made these two changes, you can start processing your raw log files from the Clojure Collector. The rest of the ETL and storage processes are unchanged.
Home | About | Project | Setup Guide | Technical Docs | Copyright © 2012-2013 Snowplow Analytics Ltd
HOME > SNOWPLOW SETUP GUIDE > Collectors > Clojure collector setup
- [Setup a Collector] (setting-up-a-collector)
- [Setup the Clojure Collector] (Setting-up-the-Cloudfront-collector)
- Download the Clojure collector WAR file
- Create a new application in Elastic Beanstalk, and upload the WAR file into it
- [Enable logging to S3](Enable logging to S3)
- Enable support for HTTPS
- [Additional configuration options (optional)](additional configuration options)
- [Step 2: Setup a Tracker] (setting-up-a-tracker)
- [Step 3: Setup EmrEtlRunner] (setting-up-EmrEtlRunner)
- [Step 4: Setup the StorageLoader] (setting-up-storageloader)
- [Step 5: Analyze your data!] (Getting started analyzing Snowplow data)
Useful resources