I have a perfectly good perl subroutine written as part of a perl module. Without going into too many details, it takes a string and a short list as arguments (often taken from terminal) and spits out a value (right now, always a floating point, but this may not always be the case.)
Right now, the list portion of my argument takes two values, say (val1,val2). I save the output of my subroutine for hundreds of different values for val1 and val2 using for loops. Each iteration takes almost a second to complete--so completing this entire process takes hours.
I recently read of a mystical (to me) computational tool called "threading" that apparently can replace for loops with blazing fast execution time. I have been having trouble understanding what these are and do, but I imagine they have something to do with parallel computing (and I would like to have my module as optimized as possible for parallel processors.)
If I save all the values I would like to pass to val1 as a list, say @val1 and the same for val2, how can I use these "threads" to execute my subroutine for every combination of the elements of val1 and val2? Also, it would be helpful to know how to generalize this procedure to a subroutine that also takes val3, val4, etc.
As Sinan says, the "threading" you were probably thinking of is "PDL threading", now renamed (as of 2.075) to "broadcasting" to match the general terminology (see docs). It allows you to replace something like this:
with just this, since "+=" fundamentally operates on one thing (a zero-dimensional scalar), so with more dimensions than a scalar (such as this 1-dimensional sequence) it can "broadcast":
This is also faster because unlike the
for
loop, it doesn't have to keep leaving and re-entering the Perl environment (aka "Perl-land"), but can stay in extremely fast "C-land" to do the calculations with no overhead.The motivation behind its original name was that these "broadcasted" calculations are all independent, and therefore "embarrassingly parallel", so can be automatically parallelised. See doc - as of 2.059, PDL by default sets parallel processing to happen automatically, on the number of CPU cores available.