Databricks Python Integration Steps¶
Fire Insights integrates with Databricks and can submit Python jobs. It submits jobs to the Databricks clusters using the REST API of Databricks and have the results displayed back in Fire Insights.
Below are the steps for Integrating Fire Insights with your Databricks Clusters for running Python jobs.
Note
The Machine on which Fire Insights is installed should have Python 3.7.0 or above.
Python Installation Steps:
Install Fire Insights¶
Install Fire Insights on your machines. The machine has to be reachable from the Databricks cluster.
Upload Fire wheel file to Databricks¶
Fire Insights wheel file has to be uploaded to Databricks. Fire Insights jobs running on Databricks make use of this wheel file.
Upload fire-x.y.z/dist/fire-3.1.0-py3-none-any.whl to Databricks. Upload it under Workspace as a Library on to Databricks under DBFS or even in S3 Bucket which is accessible from the Databricks Cluster.
- Login to
Databricks Cluster - Click on
workspacein the left side pane
- Create a new Library
You can select Library Source as DBFS, Library Type as Python Whl, provide any Library Name field, & add File Path of fire-3.1.0-py3-none-any.whl located in DBFS.
On Clicking on Create button it will ask to install on specific databricks Cluster, select cluster on which you want to install.
On Successfull installation of wheel file on Databricks Cluster, it would be displayed under Libraries.
Another option is to upload fire-3.1.0-py3-none-any.whl file to s3 Bucket which is accessible from Databricks Cluster.
Once you upload fire-3.1.0-py3-none-any.whl file to s3 Bucket, login to Databricks Cluster & inside Libraries tab.
Install New Library & select DBFS/S3 in Library Source, Python Whl in Library Type and copy paste the location of python wheel file available in s3 in File Path & Click on Install.
Once it is installed successfully, you can see the python wheel inside Library is up.
Install Python dependencies¶
You need to install the python dependencies required by Fire Insights on the machine by running below Command from fire-x.y.z/dist/fire/ directory:
pip install -r requirements.txt
Note: Make sure that pip etc. is already installed on that machine
Install dependency for AWS¶
Copy the jars hadoop-aws and aws-java-sdk to pyspark jar path.
Install any specific package of python, if Need to use in Custom Processors on databricks Cluster aswellas Fire Insights Machine.
Use the command below to install it on the Fire Insights machine:
pip install scorecardpy
Install it on your Databricks cluster with the below:
* Open a Notebook and attach to Databricks Cluster.
* %sh pip install scorecardpy
Upload Fire workflowexecutedatabricks.py file to DBFS¶
For Python Job submission to Databricks Cluster.
Upload fire-x.y.z/dist/workflowexecutedatabricks.py, file to DBFS or even S3 Bucket too.
You can UPLOAD it, using DBFS Browser too.
Configure the Uploaded Library in Fire Insights¶
Configure the path of the uploaded fire python wheel package file & workflowexecutedatabricks.py under databricks.pythonFile & databricks.pythonPackages respectively in Fire Insights.
It can be two source either DBFS or S3 path.
If you have Uploaded in DBFS path.
If you have Uploaded in S3 path.
Job Submission using Pyspark Engine¶
Now You can submit pyspark jobs to Databricks Cluster from Fire Insights.