Now that we’ve written a complete program, let’s introduce a central concept in CSL: memory Data Structure Descriptors (DSDs). Memory DSDs provide an efficient mechanism for performing operations on entire tensors.Documentation Index
Fetch the complete documentation index at: https://sdk.cerebras.ai/llms.txt
Use this file to discover all available pages before exploring further.
Learning objectives
After completing this tutorial, you should know how to:- Define memory DSDs for tensor accesses
- Use memory DSDs in builtin operations on tensors
- Use builtins to initialize tensors
Example overview
Our program will run on a single processing element (PE). Like the previous tutorial, we will demonstrate the program with a simulated fabric consisting of an 8 x 3 block of PEs. Our problem steps are identical to the previous tutorial. Our layout file, host code, and compile and run commands are also identical. We only need to modifype_program.csl, and we’ll take
a closer look at changes to this file.
Modifying the CSL
In the previous tutorial, we created a complete CSL program using a single PE to initialize and computey = Ax + b.
What do we need to do in pe_program.csl to take advantage of
memory DSDs and builtin operations on tensors?
- We need to define DSDs for accessing our tensors
- We need to rewrite the
gemvfunction to operate on these DSDs
layout.csl, which is the same for
this tutorial as the previous one.
We include the new pe_program.csl below, and highlight the
changes in this code.
Defining our memory DSDs
First, let’s take a look at the DSDs we define for accessingb and y:
b_dsd and y_dsd are the memory DSDs for
accessing b, and y, respectively.
The tensor_access field defines the access pattern of these DSDs.
|i| specifies the induction variable, and {M} specifies
the loop bound; i.e., these DSDs will access M elements.
After ->, an expression is given for accessing a memory location
using the induction variable.
This expression must be affine, or linear plus a constant.
The access pattern for these DSDs is straightforward: these DSDs
loop over all M elements, in order, of their respective tensors.
Now let’s take a look at the DSD for accessing A:
M elements of A, but strided by N elements;
i.e., A_dsd accesses elements 0, N, 2*N, ... (M-1)*N.
Because A is stored in row major format, this means that A_dsd
as defined here accesses the 0th column of A.
Note
These memory DSDs are of type
These memory DSDs are of type
mem1d_dsd, which are one-dimensional
memory DSDs. CSL also provides mem4d_dsd, multidimensional memory
DSDs for up to four dimensions.You can learn more about memory DSDs in our language reference guide
Data Structure Descriptors.Using our DSDs to compute GEMV
Now that we’ve defined our DSDs, let’s take a look at how to use them to compute GEMV. Recall that our previousgemv() function was defined as follows:
gemv() looks like this:
N,
instead of two explicit loops.
At each iteration, this @fmacs operation does the following:
- performs a vector-scalar multiplication between the column of
Areferenced byA_dsdand the scalarx[i], - performs an elementwise vector addition between this result and
the vector
y, - and stores this final result into
y.
@fmacs operation increments the M elements of y
by the vector-scalar product of column i of A
and element i of x.
The @increment_dsd_offset operation at each loop iteration increments
A_dsd to reference the next column of A.
This builtin operation takes A_dsd and creates a new DSD by offseting
its access by 1 f32 element.
For instance, the first time this operation occurs, A_dsd will now
access elements 1, N+1, 2*N+1, ... (M-1)*N+1 of A.
Again, because A is stored row major,
this will access the 1st column of A.
Once this loop over the N columns of A is complete,
y contains the result of A*x.
The @fadds operation performs an elementwise vector addition between
y and b, storing the result back in y.
Now y contains the result of A*x + b.
Using builtins to initialize tensors
You may have noticed one other slight change to this code. Instead of initializingx, b, and y, in the initialize function,
we make use of builtins to provide values for them at declaration:
@constants builtin returns a tensor of the specified type,
with all elements initialized to the specified value.
Thus, x is initialized as an N element tensor of all ones,
and b is initialized as an M element tensor of all twos.
The @zeros builtin is rather obvious. y is initialized as an M
element tensor of all zeros.
Compiling and running the program
As with the previous tutorial, we compile and run this code using:SUCCESS! message at the end of execution.
Exercises
A is stored row-major in the above code.
How would you rewrite A_dsd and the gemv function
if A were stored column major instead?
Next
In the next tutorial, we’ll introduce host-to-devicememcpy,
and copy host-initialized values for A, x, and b onto
the device.
