Automatic parallelization despite data dependency
It appears that data dependencies aren’t always detected as in the code below.
n = 600
A = rand(n,n)
b = rand(n)
%#pragma force_serial
for j = 0..n-2
b[j] = b[j] / A[j,j]
for i = j+1..n-1
b[i] -= A[i,j]*b[j]
end
end
3 Answers
Thanks for the report! Because of the atomic operation b[i] -= A[i,j]*b[j]
the loop was incorrectly identified as parallelizable. I had to add an special case in the compiler that detects b[j]
in the right hand side of this equation.
An alternative workaround (next to #pragma force_serial
):
for j = 0..n-2
b[j] = b[j] / A[j,j]
for i = j+1..n-1
b[i] = b[i] - A[i,j]*b[j]
end
end
function [] = bsub_row(A: mat'const'unchecked, b: vec'unchecked, i_start:int, j_start: int, n: int)
b[i_start+n-1] = b[i_start+n-1] / A[i_start+n-1,j_start+n-1]
#pragma force_serial
for i = n-2..-1..0
for j = i+1..n-1
b[i_start+i] = b[i_start+i] - A[i_start+i,j_start+j]*b[i_start+j]
end
b[i_start+i] = b[i_start+i] / A[i_start+i,j_start+i]
end
end