Profiling-Correlation

This workflow reads in a dataset. It then creates the correlation analysis and summary statistics.

Workflow

Below is the workflow. It does the following:

  • Reads data from a dataset.
  • Perform correlation analysis of the required columns
  • Provide summary statistics of the dataset
ProfilingCorrelation

Performing Correlation analysis

Correlation processor performs correlation analysis on the selected columns as shown below:

Processor Configuration

ProfilingCorrelation

Processor Output - Correlation matrix

ProfilingCorrelation

Processor Output - Correlation Matrix Heat Map

ProfilingCorrelation

Processor Output - Sample Rows of Input Dataset

ProfilingCorrelation

Summary Statistics

Summary processor provides summary statistics of the input dataset.

Summary statistics provides useful information about sample data. eg: measures of spread.

It provides a table with number of non-null entries (count), mean, standard deviation, and minimum and maximum value for each numerical column.

Processor Configuration

ProfilingCorrelation

Processor Output: Summary Statistics

ProfilingCorrelation

Processor Output: Sample Rows of Input Dataset

ProfilingCorrelation