GaussianMixture¶
This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs). A GMM represents a composite distribution of independent Gaussian distributions with associated mixing weights specifying each’s contribution to the composite.
Input¶
It takes in a DataFrame as input and performs GaussianMixture clustering
Output¶
The input DataFrame is passed along to the next Processors
Type¶
ml-estimator
Class¶
fire.nodes.ml.NodeGaussianMixture
Fields¶
| Name | Title | Description |
|---|---|---|
| featuresCol | Features Column | Features column of type vectorUDT for model fitting. |
| k | K | The number of clusters to create. |
| maxIter | Max Iterations | The maximum number of iterations. |
| predictionCol | Prediction Column | The prediction column created during model scoring. |
| seed | Seed | Random Seed. |
| tol | Tolerence | The convergence tolerance for iterative algorithms. |
Details¶
GaussianMixture clustering will maximize the log-likelihood for a mixture of k Gaussians, iterating until the log-likelihood changes by less than convergenceTol, or until it has reached the max number of iterations. While this process is generally guaranteed to converge, it is not guaranteed to find a global optimum.
More at Spark MLlib/ML docs page : https://spark.apache.org/docs/2.2.0/mllib-clustering.html#gaussian-mixture