GaussianMixture

This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs). A GMM represents a composite distribution of independent Gaussian distributions with associated mixing weights specifying each’s contribution to the composite.

Input

It takes in a DataFrame as input and performs GaussianMixture clustering

Output

The input DataFrame is passed along to the next Processors

Type

ml-estimator

Class

fire.nodes.ml.NodeGaussianMixture

Fields

Name Title Description
featuresCol Features Column Features column of type vectorUDT for model fitting.
k K The number of clusters to create.
maxIter Max Iterations The maximum number of iterations.
predictionCol Prediction Column The prediction column created during model scoring.
seed Seed Random Seed.
tol Tolerence The convergence tolerance for iterative algorithms.

Details

GaussianMixture clustering will maximize the log-likelihood for a mixture of k Gaussians, iterating until the log-likelihood changes by less than convergenceTol, or until it has reached the max number of iterations. While this process is generally guaranteed to converge, it is not guaranteed to find a global optimum.

More at Spark MLlib/ML docs page : https://spark.apache.org/docs/2.2.0/mllib-clustering.html#gaussian-mixture