What is meant by 'Adversarial Perturbation' in Neural Structured Learning?

154 Views Asked by At

Neural Structured Learning (NSL) has recently been introduced in tensorflow 2.0. I have gone through this guide on NSL on tensorflow site as also this tutorial on 'Adversarial regularization for image classification'. Conceptually, it is not clear to me how this works. How are additional adversarial samples generated, what is meant by adversarial training, and how does it help in achieving greater accuracy/performance? The additional code is really short but what this code is doing behind the scenes is not clear. Will be grateful for a step-by-step inside explanation from a layman's point of view.

1

There are 1 best solutions below

3
On BEST ANSWER

Typically adversarial examples are created by getting the gradient of the output w.r.t. the input and then maximizing the loss. E.g. if you have a classification task for cats and dogs and you want to create adversarial examples, you input a 256 x 256 cat image into your network, get the gradient of the loss w.r.t. the input, which will be also be a 256 x 256 tensor, then add the negative gradient (perturbation) to your image until the network classifies it as a dog. By training on these generated images again with the correct label, the network becomes more robust to noise/perturbation.

There are also other more sophisticated approaches. For example this paper explains how a pattern in the input can corrupt the output of an optical flow estimation network.