Illegal address error
The following code gives an IllegalAddressError when executing on a GPU. The commented code is my workaround. I’d love to know what the specific reason behind this error is as the description doesn’t give much explanation.
max_GPU = 2048
blk_size = 128
function [] = __device__ fsub_col_unit(A: mat'const'unchecked, b: vec'unchecked, i_start:int, j_start: int, n: int)
for j = 0..n-2
for i = j+1..n-1
b[i_start+i] -= A[i_start+i,j_start+j]*b[i_start+j]
end
end
end
function [] = __kernel__ trsv(A: mat'unchecked, i_start:int, j_start: int, k: int, jb: int, pos: ivec1)
temp = A[:, j_start+k+jb+pos]
fsub_col_unit(A, temp, i_start, j_start, jb)
A[:, j_start+k+jb+pos] = temp
end
function [] = f(A: mat'unchecked, i_start:int, j_start: int, cols: int)
% if cols-blk_size > max_GPU
% for k = 0..max_GPU..cols-blk_size-1
% jb2 = min(max_GPU, cols-blk_size-k)
% parallel_do(jb2, A, i_start, j_start,k, blk_size, trsv)
% end
% else
parallel_do(cols-blk_size, A, 0, 0, 0, blk_size, trsv)
% endif
end
dim = 2200
A = rand(dim,dim)
f(A, 0,0,dim)
Most likely, using vectors of length 2200 and due to memory alignment, you are using all off the available dynamic kernel memory (variable temp
) and a memory allocation failure occurs inside the kernel function. See section 8.3 of the quick reference manual.
In this case, you can easily refactor the code to not use any dynamic kernel memory at all (which is also much faster).
I will check if I can get a more informative error message, although for these things the CUDA error handling mechanism does not allow a lot of flexibility…