Customized Attack Model

Attacker is the core module of OpenAttack. In this example, we write a new attacker which swaps tokens randomly. This is a simple way to generate adversarial samples and it requires a capacity of “Blind”.

Initialize Attacker with Options

1
2
3
4
5
6
from OpenAttack.tags import Tag
from OpenAttack.text_process.tokenizer import PunctTokenizer
class MyAttacker(OpenAttack.attackers.ClassificationAttacker):
    TAGS = { Tag("english", "lang"), Tag("get_pred", "victim") }
    def __init__(self):
        self.tokenizer = PunctTokenizer()

We create a new class called MyAttacker and create a PunctTokenizer in its initialization phase of MyAttacker for tokenization and detokenization.

Besides writing the __init__ method, we also indicate the attacker’s supported language and required capabilities via the TAGS attribute.

The TAGS are used to help OpenAttack automatically check the parameters to avoid situations where attacker and victim are using different languages or victim model has insufficient capabilities.

Randomly Swap Tokens

1
2
3
def swap(self, tokens):
    random.shuffle(tokens)
    return tokens

In swap method, we shuffle the tokens of input sentence to generate a candidate.

Check Candidate Sentence and Return

1
2
3
4
5
6
7
8
def attack(self, victim, input_, goal):
    x_new = self.tokenizer.detokenize(
        self.swap( self.tokenizer.tokenize(input_, pos_tagging=False) )
    )
    y_new = victim.get_pred([ x_new ])
    if goal.check(x_new, y_new):
        return x_new
    return None

attack method is the main procedure of Attacker. In this method, we generate a candidate sentence and use Classifier.get_pred to get the prediction of victim classifier. Then we check the prediction, return adversarial_sample if succeed and return None if failed.

See Attacker for detail.

Complete Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import OpenAttack
import random
import datasets

from OpenAttack.tags import Tag
from OpenAttack.text_process.tokenizer import PunctTokenizer

class MyAttacker(OpenAttack.attackers.ClassificationAttacker):
    TAGS = { Tag("english", "lang"), Tag("get_pred", "victim") }
    def __init__(self):
        self.tokenizer = PunctTokenizer()

    def attack(self, victim, input_, goal):
        x_new = self.tokenizer.detokenize(
            self.swap( self.tokenizer.tokenize(input_, pos_tagging=False) )
        )
        y_new = victim.get_pred([ x_new ])
        if goal.check(x_new, y_new):
            return x_new
        return None

    def swap(self, sentence):
        random.shuffle(sentence)
        return sentence


def main():
    victim = OpenAttack.loadVictim("BERT.SST")
    def dataset_mapping(x):
        return {
            "x": x["sentence"],
            "y": 1 if x["label"] > 0.5 else 0,
        }
    dataset = datasets.load_dataset("sst").map(function=dataset_mapping)

    attacker = MyAttacker()
    attack_eval = OpenAttack.attack_evals.DefaultAttackEval(attacker, victim)
    attack_eval.eval(dataset, visualize=True)

Run python examples/custom_attacker.py to see visualized results.