Intel Cilk Plus code example with cilk_for keyword

1.4k Views Asked by At

cilk_for is a keyword of Intel Cilk Plus, and we can use it following way:

cilk_for (int i = 0; i < 8; ++i)
{
    do_work(i);
}

I need some more example codes of Intel Cilk Plus with cilk_for keyword.

1

There are 1 best solutions below

3
On BEST ANSWER

That's pretty much all there is. A cilk_for loop is one of the easiest ways you can parallelize your code. Things to watch out for:

  • Don't try to size your loop to the number of cores. Tuning your code like this is inherently fragile. Instead, expose the full range of your data in the for loop and let the Cilk Plus runtime worry about scheduling the loop iterations.
  • Beware of races! If you haven't tested your application with a race detector like Cilkscreen or Intel Inspector, you've probably got races slowing you down (at best) and generating anomalous results.
  • cilk_for loops (examples) are implemented using a divide-and=conquer algorithm that recursively splits the range in half until the number of iterations remaining is less than the "grainsize". The runtime calculates grainsize by dividing the range by 8P, or 8 times the number of cores. This is a usually a pretty good value - Not too much so there's excess overhead, not too little so you're starved for parallelism. You can specify the grainsize using a pragma of the form "#pragma cilk grainsize=value", where "value" can be a constant or an expression. But our experience is that there are some specialized places where the correct grainsize is 1, and in most others you're best off using the default.
  • If your code is accumulating a result, consider using reducers instead of locks. Reducers provide lock-free "views" of the data that get merged automatically by the Cilk Plus runtime so that sequential ordering is preserved.

Barry Tannenbaum, Intel Cilk Plus Development