LDA¶
LDA is given a collection of documents as input data, via the featuresCol parameter. Each document is specified as a Vector of length vocabSize, where each entry is the count for the corresponding term (word) in the document
Input¶
It takes in a DataFrame as input and performs LDA
Output¶
LDA Model is passed to the next Node for Prediction or Storing
Type¶
ml-estimator
Class¶
fire.nodes.ml.NodeLDA
Fields¶
| Name | Title | Description |
|---|---|---|
| featuresCol | Features Column | Features column of type vectorUDT for model fitting. |
| k | K | The number of topics to create. |
| maxIter | Max Iterations | The maximum number of iterations. |
| optimizer | Optimizer | Optimizer or inference algorithm used to estimate the LDA model. |
| topicDistributionCol | TopicDistributionColumn | Output column with estimates of the topic mixture distribution for each document |
| checkpointInterval | checkpointInterval | The checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. |
| subsamplingRate | subsamplingRate | Fraction of the corpus to be sampled and used in each iteration of mini-batch gradient descent, in range (0, 1]. |
| seed | Seed | Random Seed. |
| maxTermsPerTopic | MaxTermsPerTopic | Number of Terms in Topics |