Metric API

Attacker Metrics

class AttackMetric[source]

Base class of all metrics.

BLEU

class BLEU(tokenizer)[source]
__init__(tokenizer)[source]
Parameters

tokenizer (OpenAttack.text_process.tokenizer.base.Tokenizer) – A tokenizer that will be used in this metric. Must be an instance of Tokenizer

Language

english

Return type

None

calc_score(tokenA, tokenB)[source]
Parameters
  • tokenA (List[str]) – The first list of tokens.

  • tokenB (List[str]) – The second list of tokens.

Return type

float

Retruns:

The BLEU score.

Make sure two list have the same length.

GPT2LM

class GPT2LM[source]
__init__()[source]

Language Models are Unsupervised Multitask Learners. [pdf] [code]

Language

english

GPT2LMChinese

class GPT2LMChinese[source]
__init__()[source]

Language Models are Unsupervised Multitask Learners. [pdf] [code]

Package Requirements
  • tensorflow>=2

Language

chinese

JaccardChar

class JaccardChar[source]
__init__()

Initialize self. See help(type(self)) for accurate signature.

calc_score(senA, senB)[source]
Parameters
  • senA (str) – First sentence.

  • senB (str) – Second sentence.

Returns

Jaccard char similarity of two sentences.

Return type

float

JaccardWord

class JaccardWord(tokenizer)[source]
__init__(tokenizer)[source]
Parameters

tokenizer (OpenAttack.text_process.tokenizer.base.Tokenizer) – A tokenizer that will be used in this metric. Must be an instance of Tokenizer

calc_score(sentA, sentB)[source]
Parameters
  • sentA (str) – First sentence.

  • sentB (str) – Second sentence.

Returns

Jaccard word similarity of two sentences.

Return type

float

LanguageTool

class LanguageTool[source]
__init__()[source]

Use language_tool_python to check grammer.

Package Requirements
  • language_tool_python

Language

english

Return type

None

LanguageToolChinese

class LanguageToolChinese[source]
__init__()[source]

Use language_tool_python to check grammer.

Package Requirements
  • language_tool_python

Language

chinese

Return type

None

Levenshtein

class Levenshtein(tokenizer)[source]
__init__(tokenizer)[source]
Parameters

tokenizer (OpenAttack.text_process.tokenizer.base.Tokenizer) – A tokenizer that will be used in this metric. Must be an instance of Tokenizer

Return type

None

calc_score(a, b)[source]
Parameters
  • a (List[str]) – The first list.

  • b (List[str]) – The second list.

Returns

Levenshtein edit distance between two sentences.

Return type

int

Both parameters can be str or list, str for char-level edit distance while list for token-level edit distance.

Modification

class Modification(tokenizer)[source]
__init__(tokenizer)[source]
Parameters

tokenizer (OpenAttack.text_process.tokenizer.base.Tokenizer) – A tokenizer that will be used in this metric. Must be an instance of Tokenizer

calc_score(tokenA, tokenB)[source]
Parameters
  • tokenA (List[str]) – The first list of tokens.

  • tokenB (List[str]) – The second list of tokens.

Returns

Modification rate.

Return type

float

Make sure two list have the same length.

SentenceSim

class SentenceSim[source]
__init__()[source]
Pakcage Requirements
  • sentence_transformers

Language

english

calc_score(sen1, sen2)[source]
Parameters
  • sen1 (str) – The first sentence.

  • sen2 (str) – The second sentence.

Returns

Sentence similarity.

Return type

float

UniversalSentenceEncoder

class UniversalSentenceEncoder[source]
__init__()[source]

Universal Sentence Encoder in tensorflow_hub. [pdf] [page]

Data Requirements

AttackAssist.UniversalSentenceEncoder

Package Requirements
  • tensorflow >= 2.0.0

  • tensorflow_hub

Language

english

calc_score(sentA, sentB)[source]
Parameters
  • sentA (str) – The first sentence.

  • sentB (str) – The second sentence.

Returns

Cosine distance between two sentences.

Return type

float

Metrics Selector

class MetricSelector[source]

Base class of all metric selectors.

MetricSelector is a helper class for OpenAttack to select AttackMetric by language.

EditDistance

class EditDistance[source]
English

Levenshtein ( PunctTokenizer )

Chinese

Levenshtein ( JiebaTokenizer )

Fluency

class Fluency[source]
English

GPT2LM

Chinese

GPT2LMChinese

GrammaticalErrors

class GrammaticalErrors[source]
English

LanguageTool

Chinese

LanguageToolChinese

ModificationRate

class ModificationRate[source]
English

Modification ( PunctTokenizer )

Chinese

Modification ( JiebaTokenizer )

SemanticSimilarity

class SemanticSimilarity[source]
English

UniversalSentenceEncoder