Skip to content

a open source project to demonstrate a data pipeline from REST API all the way to elasticsearch through kafka connectors and some kind of data manipulation

License

Notifications You must be signed in to change notification settings

meticulo3366/cassandra-kafka-elasticsearch-open-source

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open Source Big Data Workbench

This open source project is to provide a solid foundation on building a data pipeline end to end using purely open source technology with zero licensed solutions, you can use this in your own learning and incorporate into your data science workflows.

Our motivation was to empower you with creating big data workflows from beginning to end.

Requirements

  • Install python3 on your workstation
  • Install Docker on your workstation

Note: if you are running Docker Desktop, allocate at least 3 GB for memory and 4 CPUs. alt text

  1. Right click on Docker Desktop icon
  2. Select Preferences
  3. Select Resources
  4. Set CPUs = 4
  5. Set Memory to at least 4GB
  6. Press the Apply & Restart button to make the changes.

Set up the environment

  • Set up and install Docker
  • Download the kafka connectors
  mkdir jars
  cd jars/
  curl -L -O https://cassandra-kafka-elasticsearch-open-source.s3-us-west-1.amazonaws.com/kafka-connect-rest-plugin-1.0.3-shaded.jar
  curl -L -O https://cassandra-kafka-elasticsearch-open-source.s3-us-west-1.amazonaws.com/kafka-connect-transform-add-headers-1.0.3-shaded.jar
  curl -L -O https://cassandra-kafka-elasticsearch-open-source.s3-us-west-1.amazonaws.com/kafka-connect-transform-from-json-plugin-1.0.3-shaded.jar
  curl -L -O https://cassandra-kafka-elasticsearch-open-source.s3-us-west-1.amazonaws.com/kafka-connect-transform-velocity-eval-1.0.3-shaded.jar
  curl -L -O https://cassandra-kafka-elasticsearch-open-source.s3-us-west-1.amazonaws.com/kafka-connect-elastic6-1.2.3-2.1.0-all.jar
  curl -L -O https://cassandra-kafka-elasticsearch-open-source.s3-us-west-1.amazonaws.com/kafka-connect-cassandra-1.2.3-2.1.0-all.jar
  cd ..

Deploy the docker environment

  1. docker-compose up --force-recreate -V

Lab Exercises

About

a open source project to demonstrate a data pipeline from REST API all the way to elasticsearch through kafka connectors and some kind of data manipulation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published