CloudFormation Template with MySQL¶
Overview¶
Using CloudFormation Templates, Fire can be easily installed on AWS. This CFT works with EMR 5.8 onwards.
The below steps would allow you to start up an EMR Cluster and have Fire setup on it.
The CFT does the following:
- Creates External DB for Fire to be used as the metastore for Fire data
- Creates EMR cluster with 1 master node and 2 worker nodes by default.
- Once the cluster is ready it runs the job/script to deploy Fire (takes around 1-1:30 min for deploying app!).
Relevant Files¶
| Title | Description | File |
|---|---|---|
| emr-file-mysql.json | CloudFormation Template | https://s3.amazonaws.com/sparkflows-cft/mysql-db/emr-fire-mysql.json |
| deploy-fire-mysql.sh | Script for deploying Fire with MySQL | https://s3.amazonaws.com/sparkflows-cft/mysql-db/deploy-fire-mysql.sh |
| script-runner.jar | Script Runner | https://s3.amazonaws.com/sparkflows-cft/mysql-db/script-runner.jar |
Ports¶
- With this CFT and deploy-fire-mysql.sh, when Fire comes up, it would be listening on ports 8085 and 8086.
Download Files and Upload to your S3 Bucket¶
- Download CFT emr-fire-mysql.json from the above link.
- Download deploy-fire-mysql.sh and script-runner.jar from the above links and upload them to your s3 bucket
Update Cloudformation template based on your environment¶
Update the CFT emr-fire-mysql.json according to your requirement and environment in which you are deploying.
ElasticMapReduce-Master-SecurityGroup under mastersg:
From AWS console -> EC2 -> Security Groups -> search for "ElasticMapReduce-master"
ElasticMapReduce-Slave-SecurityGroup under slavesg:
From AWS console -> EC2 -> Security Groups -> search for "ElasticMapReduce-slave"
Applications:
By default the CFT deploys Hadoop, Hive & Spark. Add any other Applications which you need.
EbsRootVolumeSize:
If required change the root(/) ebs volume size. By default CFT has 50GB disk volume
SizeInGB for Master and Core Instances:
If required change the SizeInGB under EbsConfiguration. By default CFT has 50GB disk volume (used for hdfs)
VolumesPerInstance for Master and Core Instances:
If required change the VolumesPerInstance under EbsConfiguration By default cft has 1. It means one additional disk of 50GB added to each instance(for hdfs). e.g. If you change it 2, two 50GB (SizeInGB size) disks will be added to each instances.
deploy-fire-mysql.sh and script-runner.jar:
Change the s3 bucket path for these two files, this s3 bucket must be same bucket as S3Bucket. You'll pass the S3Bucket value while creating the cloudformation stack.
Steps to Create EMR Cluster and Deploy Fire¶
- AWS web Console -> Management tools -> CloudFormation
- Click on Create Stack.
- Next page is Select Template
- Select the radio-button Upload a template to Amazon S3
- Select the updated emr-fire-mysql.json from your system
- Click Next
- Next page is Specify Details
- Enter CloudFormation stack name
| Name of Parameter | Description |
|---|---|
| AdditionalSecurityGroups | From the list choose the additional secuirty group(sg), it’s required because default emr sg’s ports are not opened for ssh, fire & etc… |
| AmiId | EMR cluster can be launched using Custom AMI, pass the value if you have a Custom AMI |
| ClusterName | Name for EMR Cluster |
| CoreInstanceType | Provide the required instance type for core nodes, default instance type is m4.xlarge |
| CoreNodes | Choose the required number of core nodes, by default it’s 2 |
| EmrVersion | Choose the required EMR version, it’s should be above EMR v.5.8.x |
| Environment | By default dev |
| FireVersion | Enter the required version of Fire |
| KeyName | Enter the valid pem key name to connect to emr nodes |
| MasterInstanceType | Provide the required instance type for master nodes, default instance type is m4.xlarge |
| MasterNodes | By default 1 |
| Owner | provide the name of a team or person creating the cluster |
| ReleaseVersion | Enter the required ReleaseVersion, it has to match with fire version |
| S3Bucket | Provide the s3 bucket name, this s3 bucket should be same s3 bucket where deploy-fire.sh and script-runner.jar are uploaded |
| Subnet | Provide the proper subnet name, which has sufficient resources to create emr cluster |
| TaskInstanceType | Optional, required only if you’re choosing TaskNodes. Provide the required instance type for task nodes, default instance type is m4.xlarge |
| TaskNodes | Optional, required only if you want to create the cluster with tasknodes.By default zero, enter the required number of nodes |
- Click
Next - Next Page is Options
- If required (not mandatory) enter tag details
- Click
Next
- Next Page is Review
- Review all the details provided to create an EMR stack
- Click on
Create - It will start creating the Stack
- Next page is back to Cloudformation Page
- Choose your Stack name
- Click on
Eventsto check the process - Click on
Resourcesto get the EMR Cluster id
- Once the stack runs successfully, your EMR Cluster and Fire is ready to use. Cluster creation time depends on your EMR cluster configuration
- To cross check the Fire installation
- Go to EMR from AWS web console
- Choose your EMR Cluster
- Identify the Master Node Public DNS
- Go to
http://masternodeip:8085/index.html
Connect Fire to the New Cluster¶
- Go to
Administration/Configuration - Click on
Infer Hadoop Configuration - Click on the
Savebutton
Load Examples¶
- In Fire, click on
Load Examples sshto the master nodecd /opt/fire/fire-3.1.0hadoop fs -put data
Create hadoop user¶
- Go to
Administration/User - Click on
Add User - Create a new user with username
hadoop - Log out and log back in as user
hadoop
Start running the Examples¶
- Go to
Applications - Start building your Applications.
Summary¶
Using the above CFT you have your EMR cluster with Fire running seamlessly.