Bike Rental Prediction¶
This workflow reads in a dataset.It then Predicts the number of bikes to be rented in any given hour.
Workflow¶
Below is the workflow. It does the following:
- Reads data from a sample dataset.
- Extracts hour from time using datatype timestamp.
- Calculates Count to datatype double.
- Assembles features for modelling.
- Calculates vectorindexer.
- Splits it.
- GBTRegression.
- Prediction.
- RegressionEvaluator.
- Correlation with columns.
- Summary analysis.
- Calculate count for rental per hour.
- Analyse using Graph.
Extract hour from time using datatype timestamp¶
It Extracts hour from time using datatype timestamp using DateTimeFieldExtract Node.
Processor Configuration¶
Processor Output¶
Calculate Count to datatype double¶
It Calculates cast the Count field to datatype double using CastColumnType Node.
Processor Configuration¶
Processor Output¶
Assemble features for modelling¶
It Assembles features columns into a feature vector using VectorAssembler Node.
Processor Configuration¶
Processor Output¶
Calculate vectorindexer¶
It identifies categorical features and index them using vectorindexer Node.
Processor Configuration¶
Processor Output¶
Split it¶
It will split our dataset into seperate training and test sets using split Node.
Processor Configuration¶
Processor Output¶
GBTRegression¶
It validates held out test sets in order to know about high confidence using GBTRegression Node.
Processor Configuration¶
Processor Output¶
Prediction¶
It will make prediction on future data using Prediction Node.
Processor Configuration¶
Processor Output¶
RegressionEvaluator¶
It validates held out test sets in order to know about high confidence using RegressionEvaluator Node.
Processor Configuration¶
Processor Output¶
Correlation with columns¶
It will analyse correlation between various columns using Correlation Node.
Processor Configuration¶
Processor Output¶
Summary analysis¶
It visualizes our data to get sense of whether the features are meaningful using Summary Node.