Ruby library for distributed computing?

1.4k Views Asked by At

I'm developing an algorithm for a realtime data analysis task in Ruby. The bottleneck is the CPU because of the quite large dataset. So to reach the needed performance, I have to use more cores in parallel, probably on different machines.

My question is whether there is an existing Ruby library providing the following features:

  • Cluster-management, ideally masterless, with dynamic reconfiguration (joining and leaving nodes) and some level of fault tolerance
  • Distribution of the computation jobs to the (active) nodes, error handling (job-retry etc.)
  • Fast (direct?) communication to ensure realtime capabilities

Stuff I've looked at already:

  • DRb: Too low-level, manual node-handling, no fault tolerance?
  • DCell: Mature? Automatic cluster-management?
  • Resque/Sidekiq: Nice, but too slow (polling Redis, sleeping workers, ...)
  • Riak Map/Reduce: Nice, but not recommended for real-time queries
  • Spark: Complex stuff, enterprisy?

Last resort: Maybe there's no solution for Ruby but for other platforms? Perhaps Java (yeah, JRuby!) or node.js.

1

There are 1 best solutions below

1
On

If you're finding yourself with a CPU-bound problem that would benefit from greater scale and greater concurrency, I'd highly recommend checking out the Go language. Concurrency and parallelism aren't Ruby's strong suits, and in my experience trying to make them work is always an uphill battle.

You'll find that with Go, you'll be able to scale out to multiple cores and machines much better, have excellent communication between go-routines, and a really nice concurrency-based router.

For an introduction to concurrency in Go, I'd check out Rob Pike's 'Concurrency Is Not Parallelism' talk.