I've implemented what I believe to be the same model training loop in both TensorFlow's neural structured learning (nsl
) and the cleverhans
library, and curiously, they show that models trained using adversarial training with the two libraries (via nsl.AdversarialRegularization
and cleverhans.attacks.FastGradientMethod
) do not achieve comparable performance. However, this question isn't about those specific results, so I don't attempt to replicate them here.
I'm curious more generally about what the implementation differences are for adversarial perturbation in nsl.AdversarialRegularization.perturb_on_batch()
versus the cleverhans
implementation of the same/similar functionality, which would be FastGradientMethod.generate()
.
The nsl
docs aren't especially clear, but they seem to imply that nsl
is using the Fast Gradient Sign Method of Goodfellow et al. 2014, which is supposedly the same method implemented in FastGradientMethod
. For example, nsl
refers to the Goodfellow et al. paper in the adversarial training tutorial and in some of the function docs. Both libraries allow specification of similar parameters, e.g. an epsilon
to control the level of perturbation and control over the norm used to constrain it. However, the differences in adversarially-trained performance lead me to believe that these libraries are not using the same underlying implementation. nsl
is difficult to parse, so I am particularly curious what might be happening under the hood there.
What are the differences in implementation in nsl.AdversarialRegularization.perturb_on_batch()
and the cleverhans.attacks.FastGradientMethod.generate()
which could cause different perturbations for the same inputs? Are there other differences in these functions which might contribute to differences in their performance (I am not interested in speed or efficiency, but in ways in which the results of the two perturbations might be different for the same model, epsilon, and norm).
Yes, both
nsl.AdversarialRegularization.perturb_on_batch()
andcleverhans.attacks.FastGradientMethod.generate()
implement the Fast Gradient Sign Method in Goodfellow et al. 2014. And both offer parameters like epsilon and norm type to control perturbations. Since bothnsl
andcleverhans
implements FGSM, the generated perturbations have no difference when the configurations are carefully specified. Yet some implementation details might be handled differently, especially in their default configuration. For example,cleverhans
by default takes model predictions as labels for generating adversarial perturbations, whilensl
takes true labels.cleverhans
usually expects the model to output logits (since the defaultloss_fn
issoftmax_cross_entropy_with_logits
), whilensl
's models may output different things. Innsl
's adversarial training tutorial, the model outputs probability distributions.There could be more differences in other places. I can take a closer look if you could provide a concrete example.
Regarding adversarial training,
nsl.keras.AdversarialRegularization
takes the adversarial loss as regularization, which means the model is trained on both original and adversarial examples.cleverhans.loss.CrossEntropy
also calculates loss on both original and adversarial examples, but the weighting scheme is a bit different. Innsl
the original and adversarial examples are weighted as1:multiplier
, while incleverhans
they are weighted as(1-adv_coeff):adv_coeff
. Note that another training approach is deployed in some literatures, where the model is only trained on adversarial examples.