I have an algorithm I would like to implement, which involves
- coordinatewise addition,
- coordinatewise multiplication, and
- cyclic rotation of coordinates.
My addition and multiplication are a little annoying (they are arithmetic modulo n), but I imagine I can port over standard algorithms, e.g. using Montgomery arithmetic, so am not focusing on that topic now.
For this last step, I mean that mapping a vector [a, b, c, d] -> [b, c, d, a] -> [c, d, a, b] -> [d, a, b, c] -> [a,b,c,d], though in general my vectors will be ~1000 dimensions, so the general cyclic rotation will be somewhat more complex.
What is the best way to accomplish this third step that I require? Looking into the Thrust library, it seems that the simplest thing to do is to use a permutation_iterator. This should suffice to provide a correct implementation, but given that the permutation I want to compute is exceedingly simple, I was wondering if this is the fastest option.
Also, while this question is tagged thrust, if there is some obviously better library/framework to use, I would appreciate pointers.
If you use a transform iterator to build the permutation on-the-fly, it should be very efficient; certainly faster than actually moving the data.
This should work:
Prints