Difference between adversarial training/perturbation with FGSM in Tensorflow nsl versus cleverhans

262 Views Asked by At

I've implemented what I believe to be the same model training loop in both TensorFlow's neural structured learning (nsl) and the cleverhans library, and curiously, they show that models trained using adversarial training with the two libraries (via nsl.AdversarialRegularization and cleverhans.attacks.FastGradientMethod) do not achieve comparable performance. However, this question isn't about those specific results, so I don't attempt to replicate them here.

I'm curious more generally about what the implementation differences are for adversarial perturbation in nsl.AdversarialRegularization.perturb_on_batch() versus the cleverhans implementation of the same/similar functionality, which would be FastGradientMethod.generate().

The nsl docs aren't especially clear, but they seem to imply that nsl is using the Fast Gradient Sign Method of Goodfellow et al. 2014, which is supposedly the same method implemented in FastGradientMethod. For example, nsl refers to the Goodfellow et al. paper in the adversarial training tutorial and in some of the function docs. Both libraries allow specification of similar parameters, e.g. an epsilon to control the level of perturbation and control over the norm used to constrain it. However, the differences in adversarially-trained performance lead me to believe that these libraries are not using the same underlying implementation. nsl is difficult to parse, so I am particularly curious what might be happening under the hood there.

What are the differences in implementation in nsl.AdversarialRegularization.perturb_on_batch() and the cleverhans.attacks.FastGradientMethod.generate() which could cause different perturbations for the same inputs? Are there other differences in these functions which might contribute to differences in their performance (I am not interested in speed or efficiency, but in ways in which the results of the two perturbations might be different for the same model, epsilon, and norm).

1

There are 1 best solutions below

0
On

Yes, both nsl.AdversarialRegularization.perturb_on_batch() and cleverhans.attacks.FastGradientMethod.generate() implement the Fast Gradient Sign Method in Goodfellow et al. 2014. And both offer parameters like epsilon and norm type to control perturbations. Since both nsl and cleverhans implements FGSM, the generated perturbations have no difference when the configurations are carefully specified. Yet some implementation details might be handled differently, especially in their default configuration. For example,

  • cleverhans by default takes model predictions as labels for generating adversarial perturbations, while nsl takes true labels.
  • cleverhans usually expects the model to output logits (since the default loss_fn is softmax_cross_entropy_with_logits), while nsl's models may output different things. In nsl's adversarial training tutorial, the model outputs probability distributions.

There could be more differences in other places. I can take a closer look if you could provide a concrete example.

Regarding adversarial training, nsl.keras.AdversarialRegularization takes the adversarial loss as regularization, which means the model is trained on both original and adversarial examples. cleverhans.loss.CrossEntropy also calculates loss on both original and adversarial examples, but the weighting scheme is a bit different. In nsl the original and adversarial examples are weighted as 1:multiplier, while in cleverhans they are weighted as (1-adv_coeff):adv_coeff. Note that another training approach is deployed in some literatures, where the model is only trained on adversarial examples.