I am trying to solve LU decomposition with the Doolittle Algorithm – according to this document. Without parallelization, code works fine. However, I would like to make this code run in parallel - trying to make a parallel outer and two inner loops. Unfortunately I am still missing something. If I tried to first make the outer loop run in parallel, I received a bad result.
I tried to detect where there might be a collision. I locked those places afterwards, but I still did not receive the right result. I added them to the copied code as comments. What am I doing wrong, do I need lock out other places?
What´s the right implementation of outer loop?
How would inner loops look like?
Thank you in advance.
Implementation of algorithm (sequential)
//upper triangle
var upper = new double[arr.GetLength(0), arr.GetLength(0)];
//lower triangle
var lower = new double[arr.GetLength(0), arr.GetLength(0)];
//field initialization
for (int i = 0; i < n; i++)
{
for (int j = i; j < n; j++)
upper[i, j] = arr[i, j];
for (int j = i + 1; j < n; j++)
lower[j, i] = arr[j, i];
lower[i, i] = 1;
}
for(int i=0; i<n; i++)
{
for (int j = i; j < n; j++)
{
for (int k = 0; k < i; k++)
{
upper[i, j] = upper[i, j] - (lower[i, k] * upper[k, j]);
}
}
for (int j = i + 1; j < n; j++)
{
for (int k = 0; k < i; k++)
{
lower[j, i] = lower[j, i] - (lower[j, k] * upper[k, i]);
}
lower[j, i] = lower[j, i] / upper[i, i];
}
}
Implementation of algorithm (parallel)
//upper triangle
var upper = new double[arr.GetLength(0), arr.GetLength(0)];
//lower triangle
var lower = new double[arr.GetLength(0), arr.GetLength(0)];
//field initialization
for (int i = 0; i < n; i++)
{
for (int j = i; j < n; j++)
upper[i, j] = arr[i, j];
for (int j = i + 1; j < n; j++)
lower[j, i] = arr[j, i];
lower[i, i] = 1;
}
//making outer loop parallel
Parallel.For(0, n, (i, state) =>
{
//possibly make this loop parallel also
for (int j = i; j < n; j++)
{
for (int k = 0; k < i; k++)
{
//lower[i,k] is it potential problem?
/*
* I tried this solution
* double a;
* lock(lowerLock){
* a = lower[i,k];
* }
* upper[i, j] = upper[i, j] - (a * upper[k, j])
*/
upper[i, j] = upper[i, j] - (lower[i, k] * upper[k, j]);
}
}
//possibly make this loop parallel also
for (int j = i + 1; j < n; j++)
{
for (int k = 0; k < i; k++)
{
//upper [k,i] is it potential problem?
/*
* I tried this solution
* double b;
* lock(upperLock){
* b = upper[k, i];
* }
* lower[j, i] = lower[j, i] - (lower[j, k] * b);
*/
lower[j, i] = lower[j, i] - (lower[j, k] * upper[k, i]);
}
lower[j, i] = lower[j, i] / upper[i, i];
}
});
sequential right result
Concatenation Upper triangle Lower triangle
2 -1 -2 2 -1 -2 1 0 0
-2 4 -1 0 4 -1 -2 1 0
-2 -1 3 0 0 3 -2 -1 1
parallel bad result
Concatenation Upper triangle Lower triangle
2 -1 -2 2 -1 -2 1 0 0
-2 4 -1 0 4 -1 -2 1 0
-2 -1 3 0 0 10 -->BAD -2 -1 1
EDIT I tried to lock all the approaches to the fields with one lock. I am aware of loosing all parallelization in this way. However, I wanted to achieve at least the right result, unfortunately without success.
static object mylock = new object();
//making outer loop parallel
Parallel.For(0, n, (i, state) =>
{
for (int j = i; j < n; j++)
{
for (int k = 0; k < i; k++)
{
lock (mylock)
{
upper[i, j] = upper[i, j] - (lower[i, k] * upper[k, j]);
}
}
}
for (int j = i + 1; j < n; j++)
{
for (int k = 0; k < i; k++)
{
lock (mylock)
{
lower[j, i] = lower[j, i] - (lower[j, k] * upper[k, i]);
}
}
lock (mylock)
{
lower[j, i] = lower[j, i] / upper[i, i];
}
}
});
The parallel loops write to the same array, right?
But it is not defined, when which loop will write to a position in your array. So two loops will NOT write to the same index but read from an index, where the other loop might have already written to. You can't parallize your algorithm this way.