We’ve already written a program that launches a kernel and copies the result back to the host, so lets extend this to copying the initial tensors from the host to the device. This program will now have three phases:Documentation Index
Fetch the complete documentation index at: https://sdk.cerebras.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Host-to-device memcpy of
A,x, andb - Kernel launch
- Device-to-host memcpy of
y
Learning objectives
After completing this tutorial, you should know how to:- Copy data from host to device using
SdkRuntime’smemcpy_h2dfunction
Example overview
Our program will run on a single processing element (PE). Like the previous tutorials, we will demonstrate the program with a simulated fabric consisting of an 8 x 3 block of PEs. Our problem steps are nearly identical to the previous tutorials, except we now copyA, x, and b to the
device after initializing them on the host.
pe_program.csl no longer needs to initialize A, x,
and b, but both CSL files will need to be updated to
export symbols for these tensors.
The host code will need to introduce three memcpy_h2d
calls to copy the tensors to the device.
Problem Steps
Visually, this program consists of the following steps: 1. Host copies A, x, b to device.


Modifying the CSL
Our previous tutorials initializedA, x, and b on device
before computing GEMV.
What else do we need for our device code to support a host-to-device
memcpy of A, x, and b, so that we need only initialize
them on the host?
- We need our layout file to export the symbol names for
A,x, andb. - We need our PE program to export pointers to
A,x, andb. The PE program no longer needs to initialize these tensors.
layout.csl below, and highlight the changes.
@export_name makes symbol names visible
to the host program.
Notice that we now have @export_name calls for A, x, and b.
Unlike y, the mutability of these symbols is set to true,
since the host will write to these symbols.
Now let’s take a look at pe_program.csl.
initialize function.
When init_and_compute is called, we assume A, x, and b
have already been initialized.
We additionally now define pointers A_ptr, x_ptr, and b_ptr
to A, x, and b, respectively.
These pointers are exported with @export_symbol,
so that they will be visible to the host.
Modifying the host code
The host code is largely similar to the previous tutorials, except we now must copyA, x, and b to the device after
initializing them on the host.
We do this with memcpy_h2d, which has similar syntax to
the previously introduced memcpy_d2h.
We include our modified run.py below.
memcpy_h2d calls, one for each of A,
x, and b:
memcpy_d2h, other than the first two.
For memcpy_h2d, the first argument is the symbol on device that
points to the array to which you want to copy.
The next argument is the numpy array from which you are copying.
Note that the arrays passed to memcpy must be 1D.
See GEMV Tutorial 1: A Complete Program for an explanation of the remaining
arguments.
Compiling and running the program
As with the previous tutorial, we compile and run this code using:SUCCESS! message at the end of execution.
Exercises
Try initializingA, x, and b to other values.
Modify the host code to do multiple matrix-vector products:
Try using your output y from a matrix-vector product
as your input x to another matrix-vector product.

