NGramTransformer¶
Converts the input array of strings into an array of n-grams. Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words.When the input is empty, an empty array is returned. When the input array length is less than n (number of elements per n-gram), no n-grams are returned
Input¶
It takes in a DataFrame as input and transforms it to another DataFrame
Output¶
It adds a new column consisting of a sequence of nn-grams where each nn-gram is represented by a space-delimited string of nn consecutive words, to the incoming DataFrame
Type¶
ml-transformer
Class¶
fire.nodes.ml.NodeNGramTransformer
Fields¶
| Name | Title | Description |
|---|---|---|
| inputCol | Input Column | Contains sequence of strings |
| inputColStringArrCol | List of Words | Sequence of words |
| outputCol | Output Column | Consist of a sequence of n-grams where each n-gram is represented by a space-delimited string of n consecutive words |
| numberOfGrams | Number of Grams | Sequence of ‘string array’ for integer ‘Number of Grams’ |
Details¶
This node converts the input array of strings into an array of n-grams. Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words.When the input is empty, an empty array is returned. When the input array length is less than n (number of elements per n-gram), no n-grams are returned”
More at Spark MLlib/ML docs page : http://spark.apache.org/docs/latest/ml-features.html#n-gram