How to’s

The Three Function Types of Quasar from a Technical Perspective

March 20, 2018

To give programmers ultimate control on the acceleration of functions on compute devices (GPU, multi-core CPU), Quasar has three distinct function types:

Host Functions: host functions are the default functions in Quasar. Depending on the compilation target, they are either interpreted or compiled to binary CPU code. Host functions run on the CPU and form a glue for the code that runs on other devices. When host functions contain for-loops, these loops are (when possible) automatically parallelized by the Quasar compiler. The parallelized version then runs on multi-core CPU or GPU; what happens in the background is that a kernel function is generated. Within the Quasar Redshift IDE, host functions are interpreted, which also allows stepping through the code and putting breakpoints.
Kernel Functions: given certain data dimensions, kernel functions specify what needs to be done to each element of the data. The task can be either a parallel task or a serial task. Correspondingly, kernel functions are launched either using the parallel_do function or using the serial_do function:
```
function [] = __kernel__ filter(image, pos)
    ...
endfunction
parallel_do(size(image),image,filter)
```
In the above example, the kernel function filter is applied in parallel to each pixel component of the image image. The first parameter of parallel_do/serial_do is always the data dimension (i.e., the size of the data). The last parameter is the kernel function to be called, and in the middle are the parameters to be passed to the kernel function. pos is a special kernel function parameter that indicates the position within the data (several other parameters exist, such as blkpos, blkdim etc., see the documentation for a detailed list).

Kernel functions can call other kernel functions, but only through serial_do and/or parallel_do. The regular function calling syntax (direct call, i.e, without serial_do/parallel_do) on kernel functions is not permitted. This is because kernel functions have other infrastructure (e.g., shared memory, device-specific parameters) which would cause conflicts when kernel functions would call each other.
Device functions: in contrast to host functions, device functions (indicated with __device__) are intended to be natively compiled using the back-end compiler. They can be seen as auxiliary functions that can be called from either kernel functions, device functions or even host functions (note that a kernel function cannot call a host function). The compiler back-end may generate C++, CUDA or OpenCL code that is then further compiled to binary code to be executed on CPU and/or GPU. Currently, apart from the CPU there are no accelerators that can run a device function directly when called from a host function. This means, for a GPU, the only way to call a device function is through a kernel function.
```
function y = __device__ norm(vector : vec)
    y = sqrt(sum(vector.^2))
endfunction
```

Functions in Quasar are first-class variables. Correspondingly, they have a data type. Below are some example variable declarations:

host : [(cube, cube) -> scalar]
filter : [__kernel__ (cube) -> ()]
norm : [__device__ (vec) -> scalar]

This allows passing functions to other functions, where the compiler can statically check the types. Additionally, to prevent functions from being accessed unintendedly, functions can be nested. For example, a gamma correction operation can be implemented using a host function containing a kernel function.

    function y : mat = gamma_correction(x : mat, gamma_fn : [__device__ scalar->scalar])
        function [] = __kernel__ my_kernel(gamma_fn : [__device__ scalar->scalar], x : cube, y : cube, pos : ivec2)    
           y[pos] = gamma_fn(x[pos])
        endfunction
        y = uninit(size(x)) % Allocation of uninitialized memory
        parallel_do(size(x), gamma_fn, x, y, my_kernel)
    endfunction

Because functions support closures (i.e., they capture the variable values of the surround contexts at the moment the function is defined), the above code can be simplified to:

    function y : mat = gamma_correction(x : mat, gamma_fn : [__device__ scalar->scalar])
        function [] = __kernel__ my_kernel(pos : ivec2)    
           y[pos] = gamma_fn(x[pos])
        endfunction
        y = uninit(size(x))
        parallel_do(size(x), my_kernel)
    endfunction

Or even in shorter form, using lambda expression syntax:

function y : mat = gamma_correction(x : mat, gamma_fn : [__device__ scalar->scalar])
    y = uninit(size(x))
    parallel_do(size(x), __kernel__ (pos) -> y[pos] = gamma(x[pos]))
end

In many cases, the type of the variables can safely be omitted (e.g., pos in the above example) because the compiler can deduce it from the context.

Now, experienced GPU programmers may wonder about the performance cost of a function pointer call from a kernel function. The good news is that the function pointer call is avoided, by automatic function specialization (a bit similar to template instantiation in C++). With this mechanism, device functions passed via the function parameters or closure can automatically be inlined in a kernel function.

function y = gamma_correction(x, gamma_fn)
    y = uninit(size(x))
    parallel_do(size(x), __kernel__ (pos) -> y[pos] = gamma(x[pos]))
end
gamma = __device__ (x) -> 255*(x/255)^2.4
im_out = gamma_correction(im_in, gamma)

This will cause the following kernel function to be generated, behind the hood:

function [] = __kernel__ opt__gamma_correction(pos:int,y:vec,x:vec)
      % Inlining of function 'gamma' with parameters 'x'=x[pos]
      y[pos]=(255*((x[pos]/255)^2.4))
endfunction

Another optimization was performed in the background: kernel flattening. Because each element of the data is accessed separately, the data is processed as a vector (using raster scanning) rather than as a matrix (or cube) representation. This entirely eliminates costly index calculations.

Summarizing, three function types offer the flexibility of controlling on which target devices (CPU or GPU) certain code fragments might run, while offering a unified programming interface: the code written for GPU can be executed on CPU as well, thereby triggering other target-specific optimizations.

Focal Black & White effect

April 27, 2016

Focal Black & White Effect

A well known Google Picasa effect is the Focal Black & White Effect. This effect preserves the color within a focal region and converts pixels outside this region to grayscale.

The algorithm is surprisingly simple: it consists of calculating a weighting factor (that depends on the focal radius), converting the pixel RGB values at each position to grayscale, and calculating a weighted average of this value with the original RGB value. Let us see how this can be achieved in Quasar:

function [] = __kernel__ focus_bw(x, y, focus_spot, falloff, radius, pos)

    % Calculation of the weight
    p = (pos - focus_spot) / max(size(x,0..1))
    weight = exp(-1.0 / (0.05 + radius * (2 * dotprod(p,p)) ^ falloff))

    % Conversion to grayscale & averaging
    rgbval = x[pos[0],pos[1],0..2]
    grayval = dotprod(rgbval,[0.3,0.59,0.11])*[1,1,1]
    y[pos[0],pos[1],0..2] = lerp(grayval, rgbval, weight)

end

Code explanation

First, the kernel function focus_bw is defined. __kernel__ is a special function qualifier that identifies a kernel function (similar to OpenCL’s __kernel or kernel qualifier). Kernel functions are natively compiled to any target architecture that you have in mind. This can be multi-core CPU x86/64 ELF, even ARM v9 with Neon instructions; up to NVidia PTX code. Next, a parameter list follows, the type of the parameters is (in this case) not specified.

Note the special parameter pos. In Quasar, pos is a parameter name that is reserved for kernel functions to obtain the current position in the image.

The kernel function contains two major blocks:

In the weight calculation step, first the relative position compared to the focus spot coordinates is computed. This relative position is normalized by dividing by the maximum size in the first two dimensions (note that Quasar uses base-0), so this is the maximum of the width and the height of the image. Next, the weight is obtained as being inversely proportional to distance to the the focal spot. A special built-in function dotprod, that can also be found in high-level shader languages (HLSL/GLSL) and that calculates the dot product between two vectors, is used for this purpose.
For extracting the RGB value at the position pos, we use a matrix slice indexer: 0..2 constructs a vector of length 3 (actually [0,1,2]), which vector is used for indexing. In fact:
```
x[pos[0],pos[1],0..2] = [x[pos[0],pos[1],0],x[pos[0],pos[1],1],x[pos[0],pos[1],2]]        
```
which form do you prefer, the left-handed side, or the right-handed side? You can choose. Note that it is not possible to write: x[pos[0..1],0..2], because this expression would construct a matrix of size 2 x 3.
The gray value is calculated by performing the dot product of [0.3,0.59,0.11] with the original RGB value. Finally, the gray value is mixed with the original RGB value using the lerp “linear interpolation” function. In fact, lerp is nothing more than the function:
```
lerp = (x,y,a) -> (1-a) * x + a * x
```
The resulting RGB value is written to the output image y. That’s it!

Finally, we still need to call the kernel function. For this, we use the parallel_do construct:

img_in = imread("quasar.jpg")
img_out = zeros(size(img_in))

parallel_do(size(img_out,0..1),img_in,img_out,[256,128],0.5,10,focus_bw)
imshow(img_out)

First, an input image img_in is loaded using the function imread “image read”. Then, an output image is allocated with the same size as the input image.

The parallel_do function is called, with some parameters. The first parameter specifies the dimensions of the “work items” that can run in parallel. Here, each pixel of the image can be processed in parallel, hence the dimensions are the size (i.e., height + width) of the output image. The following parameters are argument values that are passed to the kernel function and that are declared in the kernel function definition. Finally, the kernel function to be called is passed.

Note that in contrast to scripting languages that are dynamically typed, the Quasar language is (mostly) statically typed and the Quasar compiler performs type inference in order to derive the data types of all the parameters. This is done based on the surrounding context. Here, Quasar will find out that img_in is a cubedata type (a 3D array) and it will derive all other missing data types based on that. Consequently, efficient parallel code can be generated in a manner that is independent of the underlying platform.

Now: the complete code again:

function [] = __kernel__ focus_bw(x, y, focus_spot, falloff, radius, pos)  
    p = (pos - focus_spot) / max(size(x,0..1))
    weight = exp(-1.0 / (0.05 + radius * (2 * dotprod(p,p)) ^ falloff))
    rgbval = x[pos[0],pos[1],0..2]
    grayval = dotprod(rgbval,[0.3,0.59,0.11])*[1,1,1]
    y[pos[0],pos[1],0..2] = lerp(grayval, rgbval, weight)                    
end

img_in = imread("flowers.jpg")
img_out = zeros(size(img_in))

parallel_do(size(img_out,0..1),img_in,img_out,[256,128],0.5,10,focus_bw)
imshow(img_out)

Example

With eleven lines of code, you have a beautifully shining Focal Black & White effect:

Posted in How to's, Use case and tagged Quasar example, visualization

Quasar for Matlab/Octave users

April 19, 2016 |

This section gives an overview of various functions and operators in Matlab and Quasar. Please feel free to extend this page if you find something that is missing.

Quick hints (see reference manual for a complete description):

Matrices are passed by reference rather than by value. If necessary, use the copy(.) function.
Indices start with 0 rather than with 1.
end variable, like in A(end,end) is currently not supported.
All functions work transparently on the GPU (in MATLAB you need to use special datatypes GPUsingle, GPUdouble, …)

Arithmetic operators

	MATLAB/Octave	Quasar
Assignment	a=1; b=2;	a=1; b=2
Multiple assignment	[a,b]=[1,2];	[a,b]=[1,2]
Addition	a+b	a+b
Subtraction	a-b	a-b
Multiplication	a*b	a*b
Division	a/b	a/b
Power	a^b	a^b
Remainder	rem(a,b)	periodize(a,b)
Addition (atomic)	a+=1	a+=1
Pointwise multiplication	a.*b	a.*b
Pointwise division	a./b	a./b
Pointwise power	a.^b	a.^b

Relational operators

	MATLAB/Octave	Quasar
Equality	a==b	a==b
Less than	a < b	a < b
Greater than	a > b	a > b
Less than or equal	a <= b	a <= b
Greater than or equal	a >= b	a >= b
Inequal	a ~= b	a != b

Logical operators

	MATLAB/Octave	Quasar
Logical AND	a && b	a && b
Logical OR	a \|\| b	a \|\| b
Logical NOT	~a	!a
Logical XOR	xor(a,b)	xor(a,b)
Bitwise AND	a & b	and(a,b)
Bitwise OR	a \| b	or(a,b)
Bitwise NOT	~a	not(a)
Bitwise XOR	xor(a,b)	xor(a,b)

Square root, logarithm

	MATLAB/Octave	Quasar
Square root	sqrt(a)	sqrt(a)
Logarithm, base e	log(a)	log(a)
Logarithm, base 2	log2(a)	log2(a)
Logarithm, base 10	log10(a)	log10(a)
Exponential function	exp(a)	exp(a)

Rounding

	MATLAB/Octave	Quasar
Round	round(a)	round(a)
Round up	ceil(a)	ceil(a)
Round down	floor(a)	floor(a)
Round towards zero	fix(a)	Not implemented yet
Fractional part	frac(a)	frac(a)

Mathematical constants

	MATLAB/Octave	Quasar
pi	pi	pi
e	exp(1)	exp(1)

Missing values (IEEE-754 floating point status flags)

	MATLAB/Octave	Quasar
Not a number	NaN	0/0
+Infinity	+inf	1/0
-Infinity	-inf	-1/0
Is not a number	isnan(x)	isnan(x)
Is infinity	isinf(x)	isinf(x)
Is finite	isfinite(x)	isfinite(x)

Complex numbers

	MATLAB/Octave	Quasar
Imaginary unit	i	1i or 1j
Complex number 3+4i	z=3+4i	z=3+4i or z=complex(3,4)
Modulus	abs(z)	abs(z)
Argument	arg(z)	angle(z)
Real part	real(z)	real(z)
Imaginary part	imag(z)	imag(z)
Complex conjugate	conj(z)	conj(z)

Trigonometry

	MATLAB/Octave	Quasar
Sine	sin(x)	sin(x)
Cosine	cos(x)	cos(x)
Tangent	tan(x)	tan(x)
Hyperbolic Sine	sinh(x)	sinh(x)
Hyperbolic Cosine	cosh(x)	cosh(x)
Hyperbolic Tangent	tanh(x)	tanh(x)
Arcus Sine	asin(x)	asin(x)
Arcus Cosine	acos(x)	acos(x)
Arcus Tangent	atan(x)	atan(x)
Arcus Hyperbolic Sine	asinh(x)	asinh(x)
Arcus Hyperbolic Cosine	acosh(x)	acosh(x)
Arcus Hyperbolic Tangent	atanh(x)	atanh(x)

Random numbers

	MATLAB/Octave	Quasar
Uniform distribution [0,1[	rand(1,10)	rand(1,10)
Gaussian distribution	randn(1,10)	randn(1,10)
Complex Gaussian distribution	randn(1,10)+1i*randn(1,10)	complex(randn(1,10),randn(1,10))

Vectors

	MATLAB/Octave	Quasar	Comment
Row vector	a=[1 2 3 4];	a=[1,2,3,4]
Column vector	a=[1; 2; 3; 4];	a=transpose([1,2,3,4])
Sequence	1:10	1..10
Sequence with steps	1:2:10	1..2..10
Decreasing sequence	10:-1:1	10..-1..1
Linearly spaced vectors	linspace(1,10,5)	linspace(1,10,5)
Reverse	reverse(a)	reverse(a)	import “system.q”

Concatenation

Quasar notes: please avoid because this programming approach is not very efficient. More efficient is to create the resulting matrix using zeros(x), then use slice accessors to fill in the different parts.

	MATLAB/Octave	Quasar	Comments
Horizontal concatenation	[a b]	cat(a,b)	import “system.q”
Vertical concatenation	[a; b]	vertcat(a,b)	import “system.q”
Vertical concatenation with sequences	[0:5; 6:11]	vertcat(0..5,6..11)	import “system.q”
Repeating	repmat(a,[1 2])	repmat(a,[1,2])

Matrices

	MATLAB/Octave	Quasar	Comment
The first 10 element X+Y	a(1:10,1:10);	a[0..9,0..9]
Skip the first elements X+Y	a(2:end,2:end);	a[1..size(a,0)-1,1..size(a,1)-1]
The last elements	a(end,end)	a[size(a,0..1)-1]
10th row	a(10,:)	a[9,:]
10th column	a(:,10)	a[:,9]
Diagonal elements	diag(a)	diag(a)	import “system.q”
Matrix with diagonal 1,2,…,10	diag(1..10)	diag(0..9)	import “system.q”
Reshaping	reshape(a,[1 2 3])	reshape(a,[1,2,3])
Transposing	transpose(a)	transpose(a)
Hermitian Transpose	a’	herm_transpose(a)	import “system.q”
Copying matrices	a=b	a=copy(b)	Matrices passed by reference for efficiency
Flip upsize-down	flipud(a)	flipud(a)
Flip left-right	fliplr(a)	fliplr(a)

Matrix dimensions

	MATLAB/Octave	Quasar	Comments
Number of elements	numel(a)	numel(a)
Number of dimensions	ndims(a)	ndims(a)	import “system.q”
Size	size(a)	size(a)
Size along 2nd dimension	size(a,2)	size(a,1)
Size along 2nd & 3rd dimension	[size(a,2),size(a,3)]	size(a,1..2)

Array/matrix creation

	MATLAB/Octave	Quasar	Comment
Identity matrix	eye(3)	eye(3)
Zero matrix	zeros(2,3)	zeros(2,3)
Ones matrix	ones(2,3)	ones(2,3)
Uninitialized matrix	–	uninit(2,3)
Complex identity matrix	complex(eye(3))	complex(eye(3))
Complex zero matrix	complex(zeros(2,3))	czeros(2,3)
Complex ones matrix	complex(ones(2,3))	complex(ones(2,3))
Complex Uninitialized matrix	–	cuninit(2,3)
Vector	zeros(1,10)	zeros(10) or vec(10)	vec enables generic programming
Matrix	zeros(10,10)	zeros(10,10) or mat(10,10)	mat enables generic programming
Cube	zeros(10,10,10)	zeros(10,10,10) or cube(10,10,10)	cube enables generic programming

Maximum and minimum

	MATLAB/Octave	Quasar
Maximum of two values	max(a,b)	max(a,b)
Minimum of two values	min(a,b)	min(a,b)
Maximum of a matrix	max(max(a))	max(a)
Minimum of a matrix along dimension X=1,2	min(a,[],X)	min(a,X-1)
Maximum of a matrix along dimension X=1,2	max(a,[],X)	max(a,X-1)

Matrix assignment

	MATLAB/Octave	Quasar	Comments
Assigning a constant to a matrix slice	a(:,1)=44;	a[:,0]=44
Assigning a vector to a matrix row slice	a(:,1)=0:99	a[:,0]=0..99
Assigning a vector to a matrix column slice	a(1,:)=(0:99)’	a[0,:]=transpose(0..99)
Replace all positive elements by one	a(a>0)=1	a[a>0]=1	New

Boundary handling, clipping (coordinate transforms)

	MATLAB/Octave	Quasar	Comments
Periodic extension	rem(x,N)	periodize(x,N)
Mirrored extension	–	mirror_ext(x,N)
Clamping	–	clamp(x,N)	returns 0 when x=N, otherwise x.
Saturation between 0 and 1	–	sat(x)	returns 0 when x=1, otherwise x.

Fourier transform

	MATLAB/Octave	Quasar	Comments
1D FFT	fft(x)	fft1(x)
2D FFT	fft2(x)	fft2(x)
3D FFT	fftN(x)	fft3(x)
1D inverse FFT	ifft(x)	ifft1(x)
2D inverse FFT	ifft2(x)	ifft2(x)
3D inverse FFT	ifftN(x)	ifft3(x)
1D fftshift	fftshift(x)	fftshift1(x)	import “system.q”
2D fftshift	fftshift(x)	fftshift2(x)	import “system.q”; Works correctly or images with size MxNx3
3D fftshift	fftshift(x)	fftshift3(x)	import “system.q”
1D inverse fftshift	ifftshift(x)	ifftshift1(x)	import “system.q”
2D inverse fftshift	ifftshift(x)	ifftshift2(x)	import “system.q”; Works correctly or images with size MxNx3
3D inverse fftshift	ifftshift(x)	ifftshift3(x)	import “system.q”

Conversion between integer and floating point

	MATLAB/Octave	Quasar
To single	single(x)	float(x)	32-bit FP precision mode only
To double	double(x)	float(x)	64-bit FP precision mode only
To 32-bit signed integer	int32(x)	int(x) or int32(x)	import “inttypes.q”
To 16-bit signed integer	int16(x)	int16(x)	import “inttypes.q”
To 8-bit signed integer	int8(x)	int8(x)	import “inttypes.q”
To 32-bit unsigned integer	uint32(x)	uint32(x)	import “inttypes.q”
To 16-bit unsigned integer	uint16(x)	uint16(x)	import “inttypes.q”
To 8-bit unsigned integer	uint8(x)	uint8(x)	import “inttypes.q”

Posted in Documentation, How to's and tagged Matlab, Must-read Resources, Octave, programming, quasar

1 Comment