For fun I am writing a stream cipher in assembly for the DCPU-16 (the fictional CPU for the game 0x10c). The processor only has 16 bit registers and it runs at 100 Khz. However, for now, memory access and things like multiplication and division are obscenely fast (A DIV using two indirect memory accesses only takes 5 cycles).
With those kinds of constraints, what would my best option be? I was considering RC4, however I am concerned that other people using my code will not use it correctly and we could have another WEP disaster on our hands. I feel that I will need to do something more complicated than RC4 so to protect other programmers from themselves, however I am concerned about algorithms that use magic numbers or precomputed tables that are expecting 8 bit bytes as the smallest unit instead of 16 bit words.
 
                        
That would be true of any cipher - RC4 should work just fine.
You might also want to check out the eStream ciphers. In particular, Grain claims to be "designed for restricted hardware environments." I have no experience with it, though, so I don't know how easy it would be to implement in software.