Word2Vec¶
Transforms vectors of words into vectors of numeric codes for the purpose of further processing by NLP or machine learning algorithms.
Input¶
It takes in a DataFrame as input and transforms it to another DataFrame
Output¶
A new column containing feature vector is added to the incoming DataFrame
Type¶
ml-transformer
Class¶
fire.nodes.ml.NodeWord2Vec
Fields¶
| Name | Title | Description |
|---|---|---|
| inputCol | Input Column | Contains sequences of words |
| inputColStringArrCol | Text Array Column | The text array column which is produced |
| outputCol | Output Column | Output column name |
| vectorSize | Vector Size | Vector Size |
| minCount | Min Count | Min Count |
Details¶
Word2Vec is an Estimator which takes sequences of words representing documents and trains a Word2VecModel. The model maps each word to a unique fixed-size vector. The Word2VecModel transforms each document into a vector using the average of all words in the document; this vector can then be used for as features for prediction, document similarity calculations, etc.
More at Spark MLlib/ML docs page : http://spark.apache.org/docs/latest/ml-features.html#word2vec