In the Breaking Linear Classifiers on ImageNet, the author proposes the following way to create adversarial images that fool ConvNets:
In short, to create a fooling image we start from whatever image we want (an actual image, or even a noise pattern), and then use backpropagation to compute the gradient of the image pixels on any class score, and nudge it along. We may, but do not have to, repeat the process a few times. You can interpret backpropagation in this setting as using dynamic programming to compute the most damaging local perturbation to the input. Note that this process is very efficient and takes negligible time if you have access to the parameters of the ConvNet (backprop is fast), but it is possible to do this even if you do not have access to the parameters but only to the class scores at the end. In this case, it is possible to compute the data gradient numerically, or to to use other local stochastic search strategies, etc. Note that due to the latter approach, even non-differentiable classifiers (e.g. Random Forests) are not safe (but I haven’t seen anyone empirically confirm this yet).
I know I can calculate the gradient of an image like this:
np.gradient(img)
But how do I compute the gradient of an image relative to another image class using TensorFlow or Numpy? Probably I need to do something similar to the process in this tutorial? Such as:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())
for i in range(20000):
batch = mnist.train.next_batch(50)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x:batch[0], y_: batch[1], keep_prob: 1.0})
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print("test accuracy %g"%accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
But I'm not sure exactly how...Specifically, I have an image of digit 2 as below:
array([[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.14117648, 0.49019611, 0.74901962,
0.85490203, 1. , 0.99607849, 0.99607849, 0.9450981 ,
0.20000002, 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.80000007, 0.97647065, 0.99215692, 0.99215692,
0.99215692, 0.99215692, 0.99215692, 0.99215692, 0.99215692,
0.98039222, 0.92156869, 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.34509805,
0.9450981 , 0.98431379, 0.99215692, 0.88235301, 0.55686277,
0.19215688, 0.04705883, 0.04705883, 0.04705883, 0.41176474,
0.99215692, 0.99215692, 0.43529415, 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.37254903, 0.88235301,
0.99215692, 0.65490198, 0.44313729, 0.05490196, 0. ,
0. , 0. , 0. , 0. , 0.0627451 ,
0.82745105, 0.99215692, 0.45882356, 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.35686275, 0.9333334 , 0.99215692,
0.66666669, 0.10980393, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.58823532, 0.99215692, 0.45882356, 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0.38431376, 0.98431379, 0.85490203, 0.18823531,
0.01960784, 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.58823532, 0.99215692, 0.45882356, 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0.43921572, 0.99215692, 0.43921572, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.03529412,
0.72156864, 0.94901967, 0.07058824, 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0.07843138, 0.17647059, 0.01960784, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.26274511,
0.99215692, 0.94117653, 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.10588236, 0.91764712,
0.97254908, 0.41176474, 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.17254902, 0.6156863 , 0.99215692,
0.51764709, 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.04313726, 0.74117649, 0.99215692, 0.7960785 ,
0.10588236, 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.04313726, 0.61176473, 0.99215692, 0.96470594, 0.3019608 ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.04313726,
0.61176473, 0.99215692, 0.79215693, 0.26666668, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.04313726, 0.61176473,
0.99215692, 0.88627458, 0.27843139, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.11764707, 0.12941177,
0.12941177, 0.54901963, 0.63921571, 0.72941178, 0.99215692,
0.88627458, 0.14901961, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0.04705883, 0.31764707, 0.95686281, 0.99215692,
0.99215692, 0.99215692, 0.99215692, 0.99215692, 0.99215692,
0.99215692, 0.72941178, 0.27450982, 0.09019608, 0. ,
0. , 0.08627451, 0.61176473, 0.3019608 , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0.3137255 , 0.76470596, 0.99215692, 0.99215692, 0.99215692,
0.99215692, 0.99215692, 0.97254908, 0.91764712, 0.65098041,
0.97254908, 0.99215692, 0.99215692, 0.94117653, 0.58823532,
0.28627452, 0.56470591, 0.40784317, 0.20000002, 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.02745098,
0.97254908, 0.99215692, 0.99215692, 0.99215692, 0.99215692,
0.99215692, 0.94901967, 0.41176474, 0. , 0. ,
0.41960788, 0.94901967, 0.99215692, 0.99215692, 0.99215692,
0.96078438, 0.627451 , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.22352943,
0.98039222, 0.99215692, 0.99215692, 0.99215692, 0.96862751,
0.52941179, 0.08235294, 0. , 0. , 0. ,
0. , 0.08235294, 0.45882356, 0.71764708, 0.71764708,
0.18823531, 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0.47450984, 0.48235297, 0.6901961 , 0.52941179, 0.0627451 ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ]], dtype=float32)
How do I compute the gradient of this image relative to the the digit 6 image class (with an example shown below)? (I guess I need to compute the gradient for all digit 6 images using back propagation.)
array([[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.19215688, 0.70588237, 0.99215692,
0.95686281, 0.19607845, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.72156864, 0.98823535, 0.98823535,
0.90980399, 0.64313728, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.25882354, 0.91764712, 0.98823535, 0.53333336,
0.14901961, 0.21960786, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.07450981, 0.92549026, 0.98823535, 0.6901961 , 0.01568628,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.29803923, 0.98823535, 0.98823535, 0.21960786, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.54509807, 0.99215692, 0.67843139, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.08627451,
0.83137262, 0.98823535, 0.27058825, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.45490199,
0.99215692, 0.94117653, 0.19607845, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.6156863 ,
0.99215692, 0.80784321, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.90196085,
0.99215692, 0.40000004, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.90588242,
1. , 0.70588237, 0.5411765 , 0.70588237, 0.99215692,
1. , 0.99215692, 0.8705883 , 0.38039219, 0.01176471,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.90196085,
0.99215692, 0.98823535, 0.98823535, 0.98823535, 0.98823535,
0.82745105, 0.98823535, 0.98823535, 0.98823535, 0.45882356,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.90196085,
0.99215692, 0.94117653, 0.71764708, 0.34901962, 0.27058825,
0.02745098, 0.27058825, 0.67058825, 0.98823535, 0.98823535,
0.33333334, 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.52941179,
0.99215692, 0.60000002, 0. , 0. , 0. ,
0. , 0. , 0.0509804 , 0.84313732, 0.98823535,
0.45490199, 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.45490199,
0.99215692, 0.80784321, 0. , 0. , 0. ,
0. , 0. , 0. , 0.60784316, 0.98823535,
0.45490199, 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.41568631,
1. , 0.82745105, 0.02745098, 0. , 0. ,
0. , 0. , 0.19215688, 0.91372555, 0.99215692,
0.45490199, 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.62352943, 0.98823535, 0.60392159, 0.03529412, 0. ,
0. , 0.11764707, 0.77254909, 0.98823535, 0.98823535,
0.37254903, 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0.06666667, 0.89019614, 0.98823535, 0.60392159, 0.27450982,
0.31764707, 0.89411771, 0.98823535, 0.89019614, 0.50980395,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.19607845, 0.89019614, 0.98823535, 0.98823535,
0.99215692, 0.98823535, 0.72549021, 0.19607845, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.18823531, 0.7019608 , 0.98823535,
0.74509805, 0.45882356, 0.02352941, 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. ]], dtype=float32)
Thanks in advance for any help!
Here're two related questions that I asked:
How to use image and weight matrix to create adversarial images in TensorFlow?
How to create adversarial images for ConvNet?
And here's my script.
Class scores only
If you only have access to class scores for any image you suggest there's not much fancy you can do to truly compute a gradient.
If what is returned can be seen as a relative score for each category it is a vector
v
that is the result of some functionf
acting on a vectorA
that contains all the information on the image*. The true gradient of the function is given by the matrixD(A)
, which depends onA
, such thatD(A)*B = (f(A + epsilon*B) -f(A))/epsilon
in the limit of smallepsilon
for anyB
. You could approximate this numerically using some small value for epsilon and a number of test matricesB
(one for each element ofA
should be enough), but this is likely to be needlessly expensive.What you are trying to do is maximize the difficulty the algorithm has in recognizing the image. That is, for a given algorithm
f
you want to maximize some appropriate measure for how poorly the algorithm recognizes each of your imagesA
. There is a plethora of methods for this. I'm not too familiar with them, but a talk I saw recently had some interesting material on this (https://wsc.project.cwi.nl/woudschoten-conferences/2016-woudschoten-conference/PRtalk1.pdf, see page 24 and onwards). Computing the whole gradient is usually way too expensive if you have high dimensional input. Instead you just modify a randomly chosen coordinate and take many (many) small, cheap steps each more or less in the right direction rather than going for somehow optimal large, but expensive steps.Model available and suitable
If you know the model in full and it is possible to write is explicitly as
v = f(A)
then you can compute the gradient of the functionf
. This would be the case if the algorithm you're trying to beat is a linear regression, possibly with multiple layers. The form of the gradient should be easier for you to figure out than for me to write it down here.With this gradient available and fairly cheap to evaluate its value for different images
A
you can proceed with, for example, a steepest descent (or ascent) approach to making the image less recognizable for the algorithm.Important note
It's probably best not to forget that your approach should not render the image illegible to humans too, that would make it all rather pointless.