Word2Vec

Transforms vectors of words into vectors of numeric codes for the purpose of further processing by NLP or machine learning algorithms.

Input

It takes in a DataFrame as input and transforms it to another DataFrame

Output

A new column containing feature vector is added to the incoming DataFrame

Type

ml-transformer

Class

fire.nodes.ml.NodeWord2Vec

Fields

Name Title Description
inputCol Input Column Contains sequences of words
inputColStringArrCol Text Array Column The text array column which is produced
outputCol Output Column Output column name
vectorSize Vector Size Vector Size
minCount Min Count Min Count

Details

Word2Vec is an Estimator which takes sequences of words representing documents and trains a Word2VecModel. The model maps each word to a unique fixed-size vector. The Word2VecModel transforms each document into a vector using the average of all words in the document; this vector can then be used for as features for prediction, document similarity calculations, etc.

More at Spark MLlib/ML docs page : http://spark.apache.org/docs/latest/ml-features.html#word2vec