Text Processors API¶
Tokenizers¶
-
class
Tokenizer
[source]¶ Tokenizer is the base class of all tokenizers.
-
detokenize
(x)[source]¶ - Parameters
x (Union[List[str], List[Tuple[str, str]]]) – The result of
Tokenizer.tokenize()
, can be a list of tokens or tokens with POS tags.- Returns
A sentence.
- Return type
str
-
tokenize
(x, pos_tagging=True)[source]¶ - Parameters
x (str) – A sentence.
pos_tagging (bool) – Whether to return Pos Tagging results.
- Returns
A list of tokens if pos_tagging is False
A list of (token, pos) tuples if pos_tagging is True
- Return type
Union[List[str], List[Tuple[str, str]]]
POS tag must be one of the following tags:
["noun", "verb", "adj", "adv", "other"]
-