Data Wrangling

Data wrangling is the process of gathering, selecting, and transforming data to answer an analytical question. Also known as data cleaning or “munging”. This workflow reads in a dataset. It then wrangles the dataset based on provided conditions and prints the results.

Workflow

Below is the workflow. It does the following:

  • Reads data from a dataset
  • It then create new DataFrame based on the rules provided
  • Prints the results
data-wrangling

Reading from Dataset

DatasetStructured processor creates a Dataframe of your dataset by reading data from HDFS, HIVE etc. which had been defined earlier in Fire by using the Dataset feature.

Processor Output

data-wrangling

Data Wrangling

DataWrangling processor creates new DataFrame after applying the provided rules

Processor Configuration

data-wrangling

Processor Output

data-wrangling

Prints the Results

It prints the first few records onto the screen.