Text Processors API

Tokenizers

class Tokenizer[source]

Tokenizer is the base class of all tokenizers.

detokenize(x)[source]
Parameters

x (Union[List[str], List[Tuple[str, str]]]) – The result of Tokenizer.tokenize(), can be a list of tokens or tokens with POS tags.

Returns

A sentence.

Return type

str

tokenize(x, pos_tagging=True)[source]
Parameters
  • x (str) – A sentence.

  • pos_tagging (bool) – Whether to return Pos Tagging results.

Returns

A list of tokens if pos_tagging is False

A list of (token, pos) tuples if pos_tagging is True

Return type

Union[List[str], List[Tuple[str, str]]]

POS tag must be one of the following tags: ["noun", "verb", "adj", "adv", "other"]

JiebaTokenizer

class JiebaTokenizer(OpenAttack.text_process.tokenizer.Tokenizer)[source]

Tokenizer based on jieba.posseg

Package Requirements
  • jieba

Language

chinese

PunctTokenizer

class PunctTokenizer(OpenAttack.text_process.tokenizer.Tokenizer)[source]

Tokenizer based on nltk.word_tokenizer.

Language

english

TransformersTokenizer

class TransformersTokenizer(OpenAttack.text_process.tokenizer.Tokenizer)[source]

Pretrained Tokenizer from transformers.

Usually returned by TransformersClassifier .

Lemmatizer

class Lemmatizer[source]

Base class of all lemmatizers.

delemmatize(lemma, pos)[source]
Parameters
  • lemma (str) – A lemma of some token.

  • pos (str) – POS tag of input lemma.

Returns

The original token.

Return type

str

lemmatize(token, pos)[source]
Parameters
  • token (str) – A token.

  • pos (str) – POS tag of input token.

Returns

Lemma of this token.

Return type

str

WordnetLemmatimer

class WordnetLemmatimer(OpenAttack.text_process.lemmatizer.Lemmatizer)[source]

Lemmatizer based on nltk.wordnet

Language

english

ConstituencyParser

class ConstituencyParser[source]

Base class of all constituency parsers.

__call__(sentence)[source]
Parameters

sentence (str) – A sentecne.

Returns

Constituency parser results.

Return type

str

StanfordParser

class StanfordParser(OpenAttack.text_process.constituency_parser.ConstituencyParser)[source]

Constituency parser based on stanford parser.

Requirements
  • java