iOS - bitwise XOR on a vector using Accelerate.framework

707 Views Asked by At

I am trying to perform a bitwise XOR between a predetermined value and each element of an array.

This can clearly be done in a loop like so (in psuedocode):

int scalar = 123;
for(int i = 0; i < VECTOR_LENGTH; i++) {
  int x_or = scalar ^ a[i];
}

but I'm starting to learn about the performance enhancements by using the Accelerate.framework.

I'm looking through the docs for Accelerate.framework, but I haven't seen anyway to do an element based bitwise XOR. Does anyone know if this is possible?

2

There are 2 best solutions below

2
On BEST ANSWER

Accelerate doesn't implement the operation in question. You can pretty easily write your own vector code to do it, however. Once nice approach is to use clang vector extensions:

#include <stddef.h>
typedef int vint8 __attribute__((ext_vector_type(8),aligned(4)));
typedef int vint4 __attribute__((ext_vector_type(4),aligned(4)));
typedef int vint2 __attribute__((ext_vector_type(2),aligned(4)));

int vector_xor(int *x, size_t n) {
    vint8 xor8 = 0;
    while (n >= 8) {
        xor8 ^= *(vint8 *)x;
        x += 8;
        n -= 8;
    }
    vint4 xor4 = xor8.lo ^ xor8.hi;
    vint2 xor2 = xor4.lo ^ xor4.hi;
    int xor = xor2.lo ^ xor2.hi;
    while (n > 0) {
        xor ^= *x++;
        n -= 1;
    }
    return xor ^ 123;
}

This is pretty nice because (a) it doesn't require use of intrinsics and (b) it doesn't tie you to any specific architecture. It generates pretty decent code for any architecture you compile for. On the other hand, it ties you to clang, whereas if you use intrinsics your code may work with other compilers as well.

5
On

Stephen's answer is useful, but as you're looking at Accelerate, keep in mind that it is not a magic "go fast" library. Unless VECTOR_LENGTH is very large (say 10,000 -- EDIT: Stephen disagrees on this scale, and tends to know more about this subject than I do; see comments), the cost of the function call will often overwhelm any benefits you get. Remember, at the end of the day, Accelerate is just code. Very often, simple hand-written loops like yours (especially with good compiler optimizations) are going to be just as good or better on simple operations like xor.

But in many cases you need to let the compiler help you. Clang knows how to do all kinds of useful vector optimizations (just like in Stephen's answer) automatically. But in most cases, the default optimization setting is -Os (Fastest, Smallest). That says "clang, you may do any optimizations you want, but not if it makes the resulting binary any larger." You might notice that Stephen's example is a little larger than yours. That means that the compiler is often forbidden from applying the automatic vector optimizations it knows how to do.

But, if you switch to -Ofast, then you give clang permission to improve performance, even if it increases binary size (and on modern hardware, even mobile hardware, that is often a very good tradeoff). In the Build Settings panel, this is called "Optimization Level: Fastest, Aggressive Optimizations." In nearly every case, that is the correct setting for iOS and OS X apps. (It is not currently the default because of history; I expect that Apple will make it the default in the future.)

For more discussion on the limitations of Accelerate (wonderful library that it is), you may be interested in "Introduction to Fast Bézier (and Trying the Accelerate.framework)". I also highly recommend "What's New in the LLVM Compiler" (Session 402 from WWDCS 2013), which I found even more useful than the introduction to Accelerate. Clang can do some really amazing optimizations if you get out of its way.