LDA

LDA is given a collection of documents as input data, via the featuresCol parameter. Each document is specified as a Vector of length vocabSize, where each entry is the count for the corresponding term (word) in the document

Input

It takes in a DataFrame as input and performs LDA

Output

LDA Model is passed to the next Node for Prediction or Storing

Type

ml-estimator

Class

fire.nodes.ml.NodeLDA

Fields

Name Title Description
featuresCol Features Column Features column of type vectorUDT for model fitting.
k K The number of topics to create.
maxIter Max Iterations The maximum number of iterations.
optimizer Optimizer Optimizer or inference algorithm used to estimate the LDA model.
topicDistributionCol TopicDistributionColumn Output column with estimates of the topic mixture distribution for each document
checkpointInterval checkpointInterval The checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations.
subsamplingRate subsamplingRate Fraction of the corpus to be sampled and used in each iteration of mini-batch gradient descent, in range (0, 1].
seed Seed Random Seed.
maxTermsPerTopic MaxTermsPerTopic Number of Terms in Topics