Skip to main content

Documentation Index

Fetch the complete documentation index at: https://sdk.cerebras.ai/llms.txt

Use this file to discover all available pages before exploring further.

This section presents the SdkRuntime Python host API reference and associated utilities to develop kernels for the Cerebras Wafer-Scale Engine.

sdkruntimepybind module

Python API for SdkRuntime functions.

MemcpyDataType

class cerebras.sdk.runtime.sdkruntimepybind.MemcpyDataType
Bases: Enum
Specifies the data size for transfers using SdkRuntime.memcpy_d2h() and SdkRuntime.memcpy_h2d() copy mode.
Values:
  • MEMCPY_16BIT
  • MEMCPY_32BIT

MemcpyOrder

class cerebras.sdk.runtime.sdkruntimepybind.MemcpyOrder
Bases: Enum
Specifies mapping of data for transfers using SdkRuntime.memcpy_d2h() and SdkRuntime.memcpy_h2d() copy mode.
Values:
  • ROW_MAJOR
  • COL_MAJOR

SdkCompileArtifacts

class cerebras.sdk.runtime.sdkruntimepybind.SdkCompileArtifacts(artifacts_path: str)
Bases: object
Specifies compile artifacts for execution.

SdkExecutionPlatform

class cerebras.sdk.runtime.sdkruntimepybind.SdkExecutionPlatform
Bases: object
Specifies the simulator or system target and architecture for execution.

SdkRuntime

class cerebras.sdk.runtime.sdkruntimepybind.SdkRuntime(bindir: Union[pathlib.Path, str], **kwargs)
Bases: object
Manages the execution of SDK programs on the Cerebras Wafer Scale Engine (WSE) or simfabric. The constructor analyzes the WSE ELFs in the bindir and prepares the WSE or simfabric for a run. Requires CM IP address and port for WSE runs.

SdkTarget

class cerebras.sdk.runtime.sdkruntimepybind.SdkTarget
Bases: Enum
Specifies a target compilation architecture.
Values:
  • WSE2
  • WSE3

SimfabConfig

class cerebras.sdk.runtime.sdkruntimepybind.SimfabConfig(num_threads: int = 16, suppress_trace: bool = False, dump_core: bool = False, core_path: Optional[Union[pathlib.Path, str]] = None)
Bases: object
Specifies simfab configuration for simulator runs.

Task

class cerebras.sdk.runtime.sdkruntimepybind.Task
Handle to a task launched by SdkRuntime.

get_platform

cerebras.sdk.runtime.sdkruntimepybind.get_platform(cmaddr: Optional[str] = None, config: SimfabConfig = SimfabConfig(), target: SdkTarget = SdkTarget::WSE3) → SdkExecutionPlatform
Constructs an SdkExecutionPlatform object configured by simulator or system settings and target architecture.

get_simulator

cerebras.sdk.runtime.sdkruntimepybind.get_simulator(config: SimfabConfig = SimfabConfig(), target: SdkTarget = SdkTarget::WSE3) → SdkExecutionPlatform
Constructs an SdkExecutionPlatform object for simulator.

get_system

cerebras.sdk.runtime.sdkruntimepybind.get_system(cmaddr: str) → SdkExecutionPlatform
Constructs an SdkExecutionPlatform object for a real system.

sdk_utils module

Utility functions for common operations with SdkRuntime. Import from cerebras.sdk.sdk_utils.

calculate_cycles

cerebras.sdk.sdk_utils.calculate_cycles(timestamp_buf: numpy.ndarray) → numpy.int64
Converts values in timestamp_buf returned from device into a human-readable elapsed cycle count.
  • Returns: Elapsed cycle count.
  • Return type: numpy.int64
Example:Consider the following CSL snippet which records timestamps and produces a single array to copy back to the host, to generate an elapsed cycle count:
// import time module and create timestamp buffers
const timestamp = @import_module("<time>");
var tsc_end_buf = @zeros([timestamp.tsc_size_words]u16);
var tsc_start_buf = @zeros([timestamp.tsc_size_words]u16);

// create elapsed timer buffer and advertise to host
var timer_buf = @zeros([3]f32);
var ptr_timer_buf: [*]f32 = &timer_buf;

timestamp.enable_tsc();
// record starting timestamp
timestamp.get_timestamp(&tsc_start_buf);

// perform some operation for which you want to calculate elapsed cycles

// record ending timestamp
timestamp.get_timestamp(&tsc_end_buf);
timestamp.disable_tsc();

var lo_: u16 = 0;
var hi_: u16 = 0;
var word: u32 = 0;

lo_ = tsc_start_buf[0];
hi_ = tsc_start_buf[1];
timer_buf[0] = @bitcast(f32, (@as(u32,hi_) << @as(u16,16)) | @as(u32, lo_) );

lo_ = tsc_start_buf[2];
hi_ = tsc_end_buf[0];
timer_buf[1] = @bitcast(f32, (@as(u32,hi_) << @as(u16,16)) | @as(u32, lo_) );

lo_ = tsc_end_buf[1];
hi_ = tsc_end_buf[2];
timer_buf[2] = @bitcast(f32, (@as(u32,hi_) << @as(u16,16)) | @as(u32, lo_) );
Then the elapsed cycles can be calculated on the host with:
# Get symbol for timer_buf on device
symbol_timer_buf = runner.get_id("timer_buf")

# Copy back timer_buf from all width x height PEs
data = np.zeros((width*height*3, 1), dtype=np.uint32)
runner.memcpy_d2h(data, symbol_timer_buf, 0, 0, width, height, 3, streaming=False,
  data_type=MemcpyDataType.MEMCPY_32BIT, order=MemcpyOrder.ROW_MAJOR, nonblock=False)
elapsed_time_hwl = data.view(np.float32).reshape((height, width, 3))

# Print elapsed cycles for each PE
for pe_x in range(width):
  for pe_y in range(height):
    cycle_cnt = sdk_utils.calculate_cycles(elapsed_time_hwl[pe_y, pe_x, :])
    print("Elapsed cycles on PE ", pe_x, ", ", pe_y, ": ", cycle_cnt)

input_array_to_u32

cerebras.sdk.sdk_utils.input_array_to_u32(arr: numpy.ndarray, sentinel: Optional[int], fast_dim_sz: int) → numpy.ndarray
Converts a 16-bit tensor to a 32-bit tensor of type u32 for use with memcpy. The parameter sentinel distinguishes two different extensions of 16-bit data. If sentinel is None, zero-pad the upper 16 bits. If sentinel is not None, pack the index of the innermost dimension of the array into the upper 16-bits.
  • Returns: Numpy view into arr with specified numpy data type.
  • Return type: numpy.ndarray.view

memcpy_view

cerebras.sdk.sdk_utils.memcpy_view(arr: numpy.ndarray, datatype: numpy.dtype) → numpy.ndarray.view
Returns a 32, 16 or 8 bit view of a 32 bit numpy array (only the lower 16 or 8 bits of each 32 bit word in the last two cases).
  • Returns: Numpy view into arr with specified numpy data type.
  • Return type: numpy.ndarray.view
Example:memcpy_view() simplifies the use of various precision data types when copying between host and device. Consider the following Python host code which creates a float16 view into a numpy array. Note that this array must be 32-bit. The user can fill the array with float16 data, and copy it to an array on the device with CSL data type f16.
x_symbol = runner.get_symbol('x')
# This container array must be 32-bit
x_container = np.zeros(N, dtype=np.uint32)

x = sdk_utils.memcpy_view(x_container, np.float16)
x.fill(0.5)

runner.memcpy_h2d(x_symbol, x_container, 0, 0, 1, 1, N,
            streaming=False, data_type=MemcpyDataType.MEMCPY_16BIT,
            order=MemcpyOrder.ROW_MAJOR, nonblock=False)

debug_util module

Utilities for parsing debug output and core files of a simulator run. Import from cerebras.sdk.debug.debug_util.

debug_util

class cerebras.sdk.debug.debug_util.debug_util(bindir: Union[pathlib.Path, str])
Bases: object
Loads ELF files in bindir in order to dump symbols for debugging.The user does not need to export the symbols in the kernel. debug_util dumps the core and looks for the symbols in the ELFs. If the symbol at Px.y is not found in the corresponding ELF, debug_util emits an error.The most common errors are either: 1) a wrong coordinate passed in debug_util.get_symbol(), or 2) a correct coordinate, but the symbol has been removed due to compiler optimization. One can use readelf to check if the symbol exists or not. If not, the user can export the symbol in the kernel to keep the symbol in the ELF.The functionality of this class is only supported in the simulator.