RegexTokenizer¶
This node creates a new DataFrame by the process of taking text (such as a sentence) and breaking it into individual terms (usually words) based on regular express
Type¶
transform
Class¶
fire.nodes.etl.NodeRegexTokenizer
Fields¶
| Name | Title | Description |
|---|---|---|
| inputCol | Column | input column for tokenizing |
| outputCol | Tokenized Column | New output column after tokenization |
| pattern | Pattern | The regex pattern used to match delimiters |
| gaps | Gaps | Indicates whether the regex splits on gaps |