I am writing RC4 for the DCPU-16, however I have some questions before I begin.
RC4 algorithm:
//KSA
for i from 0 to 255
S[i] := i
endfor
j := 0
for i from 0 to 255
j := (j + S[i] + key[i mod keylength]) mod 256
swap values of S[i] and S[j]
endfor
//PRGA
i := 0
j := 0
while GeneratingOutput:
i := (i + 1) mod 256
j := (j + S[i]) mod 256
swap values of S[i] and S[j]
K := S[(S[i] + S[j]) mod 256]
output K
endwhile
As I am working with 16-bit words so each element of S[]
can go from a range from 0-65535, instead of the expected 0-255. And K needs to be 0-65535, what would be the best approach to deal with this problem?
The options I see (and their problems) are:
- Still use
Mod 255
everywhere and populate the output with two rounds concatenated (will take longer to run and I want to keep my CPB as low as possible) - Tweak RC4 so
K
will be a 16 bit number while still using an array of length 255 forS[]
(I want to do the crypto right so I am concerned about making mistakes tinkering with RC4.)
What is my best option? I feel that I may have to do #1, but I am hoping people here can instill confidence for me to do #3.
option 2 will make the encryption weaker
you can do
this is about as fast as you can make it (57 cycles per 16 bit word unless I missed something) this assumes that
S
is static (the arr value in my code) andi
andj
are store in the registers (you can store them before/afterS
when you are outside of the code)trying to pack the array will make everything slower as you need to unpack it each time