CloudFormation Template with MySQL¶

Overview¶

Using CloudFormation Templates, Fire can be easily installed on AWS. This CFT works with EMR 5.8 onwards.

The below steps would allow you to start up an EMR Cluster and have Fire setup on it.

The CFT does the following:

Creates External DB for Fire to be used as the metastore for Fire data
Creates EMR cluster with 1 master node and 2 worker nodes by default.
Once the cluster is ready it runs the job/script to deploy Fire (takes around 1-1:30 min for deploying app!).

Relevant Files¶

Below are the Relevant Files¶
Title	Description	File
emr-file-mysql.json	CloudFormation Template	https://s3.amazonaws.com/sparkflows-cft/mysql-db/emr-fire-mysql.json
deploy-fire-mysql.sh	Script for deploying Fire with MySQL	https://s3.amazonaws.com/sparkflows-cft/mysql-db/deploy-fire-mysql.sh
script-runner.jar	Script Runner	https://s3.amazonaws.com/sparkflows-cft/mysql-db/script-runner.jar

Ports¶

With this CFT and deploy-fire-mysql.sh, when Fire comes up, it would be listening on ports 8085 and 8086.

Download Files and Upload to your S3 Bucket¶

Download CFT emr-fire-mysql.json from the above link.
Download deploy-fire-mysql.sh and script-runner.jar from the above links and upload them to your s3 bucket

Update Cloudformation template based on your environment¶

Update the CFT emr-fire-mysql.json according to your requirement and environment in which you are deploying.

ElasticMapReduce-Master-SecurityGroup under mastersg:

From AWS console -> EC2 -> Security Groups -> search for "ElasticMapReduce-master"

ElasticMapReduce-Slave-SecurityGroup under slavesg:

From AWS console -> EC2 -> Security Groups -> search for "ElasticMapReduce-slave"

Applications:

By default the CFT deploys Hadoop, Hive & Spark. Add any other Applications which you need.

EbsRootVolumeSize:

If required change the root(/) ebs volume size. By default CFT has 50GB disk volume

SizeInGB for Master and Core Instances:

If required change the SizeInGB under EbsConfiguration. By default CFT has 50GB disk volume (used for hdfs)

VolumesPerInstance for Master and Core Instances:

If required change the VolumesPerInstance under EbsConfiguration By default cft has 1. It means one additional disk of 50GB added to each instance(for hdfs). e.g. If you change it 2, two 50GB (SizeInGB size) disks will be added to each instances.

deploy-fire-mysql.sh and script-runner.jar:

Change the s3 bucket path for these two files, this s3 bucket  must be same bucket as S3Bucket. You'll pass the S3Bucket value while creating the cloudformation stack.

Steps to Create EMR Cluster and Deploy Fire¶

AWS web Console -> Management tools -> CloudFormation
- Click on Create Stack.
Next page is Select Template
- Select the radio-button Upload a template to Amazon S3
- Select the updated emr-fire-mysql.json from your system
- Click Next
Next page is Specify Details
- Enter CloudFormation stack name

Update Parameters where needed¶
Name of Parameter	Description
AdditionalSecurityGroups	From the list choose the additional secuirty group(sg), it’s required because default emr sg’s ports are not opened for ssh, fire & etc…
AmiId	EMR cluster can be launched using Custom AMI, pass the value if you have a Custom AMI
ClusterName	Name for EMR Cluster
CoreInstanceType	Provide the required instance type for core nodes, default instance type is m4.xlarge
CoreNodes	Choose the required number of core nodes, by default it’s 2
EmrVersion	Choose the required EMR version, it’s should be above EMR v.5.8.x
Environment	By default dev
FireVersion	Enter the required version of Fire
KeyName	Enter the valid pem key name to connect to emr nodes
MasterInstanceType	Provide the required instance type for master nodes, default instance type is m4.xlarge
MasterNodes	By default 1
Owner	provide the name of a team or person creating the cluster
ReleaseVersion	Enter the required ReleaseVersion, it has to match with fire version
S3Bucket	Provide the s3 bucket name, this s3 bucket should be same s3 bucket where deploy-fire.sh and script-runner.jar are uploaded
Subnet	Provide the proper subnet name, which has sufficient resources to create emr cluster
TaskInstanceType	Optional, required only if you’re choosing TaskNodes. Provide the required instance type for task nodes, default instance type is m4.xlarge
TaskNodes	Optional, required only if you want to create the cluster with tasknodes.By default zero, enter the required number of nodes

Click Next
Next Page is Options
- If required (not mandatory) enter tag details
- Click Next
Next Page is Review
- Review all the details provided to create an EMR stack
- Click on Create
- It will start creating the Stack
Next page is back to Cloudformation Page
- Choose your Stack name
- Click on Events to check the process
- Click on Resources to get the EMR Cluster id
Once the stack runs successfully, your EMR Cluster and Fire is ready to use. Cluster creation time depends on your EMR cluster configuration
To cross check the Fire installation
- Go to EMR from AWS web console
- Choose your EMR Cluster
- Identify the Master Node Public DNS
- Go to http://masternodeip:8085/index.html

Connect Fire to the New Cluster¶

Go to Administration/Configuration
Click on Infer Hadoop Configuration
Click on the Save button

Load Examples¶

In Fire, click on Load Examples
ssh to the master node
cd /opt/fire/fire-3.1.0
hadoop fs -put data

Create hadoop user¶

Go to Administration/User
Click on Add User
Create a new user with username hadoop
Log out and log back in as user hadoop

Start running the Examples¶

Go to Applications
Start building your Applications.

Summary¶

Using the above CFT you have your EMR cluster with Fire running seamlessly.