Cloudera Manager (CM) is a great tool to manage Hadoop (CDH) clusters. It consists of CM server and agents. Both of them run on a Linux box, but need some prerequisite software (e.g. JDK, RDB) before installation. CM has 3 installation path (which is called Installation Path A, B, C) and the cloudera-managerizer here is currently for Installation Path B on AWS EC2 hosts.
A CM agent is prepared with the following software automatically installed
- set hostname
- set /etc/hosts for cluster
- set cloudera repo for yum
- install JDK (currently jdk-8u102)
- install MySQL Connector (currently mysql-connector-java-5.1.39)
- set some kernel parameters for CM not to complain about it (e.g. change vm.swappiness)
A CM server is prepared with the following software automatically installed
- all agent software (above)
- install cloudera-manager-daemons and cloudera-manager-server
- install/configure external DB (currently the latest mariadb)
- start CM server
- Mac or Linux (Client)
- AMI image should be RHEL or CentOS
- install awscli & configure AWS API key
$ brew install awscli
$ aws configure
- create a VPC and a public subnet in it
- create a security group for cluster hosts
- create a key-pair
- Edit a cluster.csv file (which contains a cluster hosts information)
$ vi cluster.csv
$ cat cluster.csv
# role (server|agent), private-ip, hostname, security-group-ids, instance-type, volume-size (gb), subnet-id, ebs-opt (yes|no), tags
server, 10.0.1.20, kobe, sg-27afd14e, t2.large, 50, subnet-2d59df44, no, owner=tsuyo testkey=testval
agent, 10.0.1.21, tokyo, sg-27afd14e, t2.large, 50, subnet-2d59df44, no, owner=tsuyo testkey=testval
- Launch cluster (ami-c74789a9 is CentOS 7.2)
$ ./cloudera-managerizer --region <region> --key-name <key-name> --image-id ami-c74789a9 --db-pass <db-password>
-
Why not just use all-set customized AMI? A custom AMI, once created, becomes obsoleted too quickly. Also storing AMI is not free of charge on AWS. On the other hand, this repo's scripts just use AWS-prepared standard AMIs (of course, no charge at all) and every time create new instances from scratch.
-
Why not use Cloudera Director? Yes, Cloudera Director is a great tool for fairly large clusters. Basically cloudera-managerizer solves another problem - e.g. you need a 3-node cluster for a demo just now with least effort. Also you can use cloudera-managerizer to create bootstrap scripts so that they can be used on your non-cloud (on-premise) environment.