Customized Attack Model¶
Attacker is the core module of OpenAttack. In this example, we write a new attacker which swaps tokens randomly. This is a simple way to generate adversarial samples and it requires a capacity of “Blind”.
Initialize Attacker with Options¶
1 2 3 4 5 6 | from OpenAttack.tags import Tag
from OpenAttack.text_process.tokenizer import PunctTokenizer
class MyAttacker(OpenAttack.attackers.ClassificationAttacker):
TAGS = { Tag("english", "lang"), Tag("get_pred", "victim") }
def __init__(self):
self.tokenizer = PunctTokenizer()
|
We create a new class called MyAttacker and create a PunctTokenizer in its initialization phase of MyAttacker for tokenization and detokenization.
Besides writing the __init__ method, we also indicate the attacker’s supported language and required capabilities via the TAGS attribute.
The TAGS are used to help OpenAttack automatically check the parameters to avoid situations where attacker and victim are using different languages or victim model has insufficient capabilities.
Randomly Swap Tokens¶
1 2 3 | def swap(self, tokens):
random.shuffle(tokens)
return tokens
|
In swap method, we shuffle the tokens of input sentence to generate a candidate.
Check Candidate Sentence and Return¶
1 2 3 4 5 6 7 8 | def attack(self, victim, input_, goal):
x_new = self.tokenizer.detokenize(
self.swap( self.tokenizer.tokenize(input_, pos_tagging=False) )
)
y_new = victim.get_pred([ x_new ])
if goal.check(x_new, y_new):
return x_new
return None
|
attack method is the main procedure of Attacker. In this method, we generate a candidate sentence
and use Classifier.get_pred to get the prediction of victim classifier. Then we check the prediction, return
adversarial_sample if succeed and return None if failed.
See Attacker for detail.
Complete Code¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | import OpenAttack
import random
import datasets
from OpenAttack.tags import Tag
from OpenAttack.text_process.tokenizer import PunctTokenizer
class MyAttacker(OpenAttack.attackers.ClassificationAttacker):
TAGS = { Tag("english", "lang"), Tag("get_pred", "victim") }
def __init__(self):
self.tokenizer = PunctTokenizer()
def attack(self, victim, input_, goal):
x_new = self.tokenizer.detokenize(
self.swap( self.tokenizer.tokenize(input_, pos_tagging=False) )
)
y_new = victim.get_pred([ x_new ])
if goal.check(x_new, y_new):
return x_new
return None
def swap(self, sentence):
random.shuffle(sentence)
return sentence
def main():
victim = OpenAttack.loadVictim("BERT.SST")
def dataset_mapping(x):
return {
"x": x["sentence"],
"y": 1 if x["label"] > 0.5 else 0,
}
dataset = datasets.load_dataset("sst").map(function=dataset_mapping)
attacker = MyAttacker()
attack_eval = OpenAttack.attack_evals.DefaultAttackEval(attacker, victim)
attack_eval.eval(dataset, visualize=True)
|
Run python examples/custom_attacker.py to see visualized results.