parallel get_range() phpcassa

215 Views Asked by At

I am trying to made something similar to map reduce, but without hadoop.

I plane to use several php processes, each doing $cf->get_range($begin, $end) and to iterate every row.

But because of random partitioner, the data does not come sorted. This means I can not really select good $begin, $end variables, and will be difficult to start 30-40 processes in parallel.

Cassandra support get_range by token, but it is not exposed in phpcassa.

I have several possibilities, but do not like them because they do not seems unprofessional:

  1. put all keys on single row and use CoulumnSlice() + multiget() after that.
  2. put all keys on single row but with their MD5 values. Then by MD5 value to get key, and to do get_range()
  3. doing similar stuff with secondary index
  4. import all keys in Redis.
0

There are 0 best solutions below