Libraries are imported by enclosing the name of the library with angled brackets.Documentation Index
Fetch the complete documentation index at: https://sdk.cerebras.ai/llms.txt
Use this file to discover all available pages before exploring further.
<complex>
The complex library provides structs containing real and imag
components and basic complex functions.
complex is a generic struct parameterized by its field type. The
complex_32 and complex_64 non-generic names are also provided; these
define a complex number using two f16 values and a complex number using
two f32 values, respectively.
get_complex is a generic constructor that returns a complex struct based on
the type of its inputs. The non-generic get_complex_32 and
get_complex_64 constructor functions are provided as well:
complex_32 and complex_64
functions are provided. These functions have names suffixed with _32 and
_64, respectively.
<control>
The control library provides utilities for constructing control wavelets.
The following functions and enums are provided by the library:
encode_control_task_payload returns a control wavelet payload which
activates a control task on all receiving PEs. It has one argument:
entrypoint: acontrol_task_idwhich is bound to the control task activated on a CE by the receipt of this wavelet.
encode_single_payload returns a control wavelet payload containing one
switch command, along with an optional control task entrypoint with 16-bit
data argument. The function has the following arguments:
cmd: a switching opcode to be consumed by the receiving PE router. This command will instruct the router to modify the configuration of the color on which the control wavelet is sent. This command can advance the switch position, reset the switch position, teardown the color, or do nothing. If the router of the PE on which the control wavelet is sent pops this command, then no additional receiving PEs will receive a switching opcode.ce_ignore: a boolean which determines whether this control wavelet is to be ignored by the CE of PEs which receive it. Iftrue, the control wavelet will not be forwarded to the CE. Iffalse, and the receiving color is configured to transmit down theRAMP, the control wavelet will be forwarded to the CE.ce_ignoremust befalsefor anentrypointto be activated by a receiving PE.entrypoint: acontrol_task_idwill be activated on a CE by the receipt of this wavelet. Passing{}indicates that no control task activation on receiving PEs is desired. The control task will only be activated on a CE ifce_ignoreisfalse, and the receiving color is configured to transmit down theRAMP.data: The control task activated byentrypointmay take a single 16-bit argument. If the control task takes no argument, then this value will be ignored.
encode_payload can encode a general control wavelet payload with up to
eight switching commands. The function has the following arguments:
N: number of commands to encode in the control wavelet. Maximum number of commands is eight.cmd: an array of switching opcodes to be consumed by PE routers. Each command will instruct the router to modify the configuration of the color on which the control wavelet is sent. Each command can advance the switch position, reset the switch position, teardown the color, or do nothing. If the router of the PE on which a command is executed pops the command, then the next command will be executed by the next receiving router.ce_ignore: an array of booleans which determines whether this control wavelet is to be ignored by the CE of PEs which receive it. Eachce_ignorevalue is processed along with the associatedcmd, i.e., the same rules for popping commands apply. If the processed value istrue, the control wavelet will not be forwarded to the CE. Iffalse, and the receiving color is configured to transmit down theRAMP, the control wavelet will be forwarded to the CE.ce_ignoremust befalsefor anentrypointto be activated by a receiving PE.ce_ignore_remaining: a boolean which determines whether all other commands contained in this control wavelet are to be ignored by the CE of PEs receiving it. Whence_ignore_remainingis set tofalse, each unspecified command will travel down theRAMPand reach the CE (as aNOPcommand).entrypoint: acontrol_task_idwhich is bound to the control task activated on a CE by the receipt of this wavelet. Passing{}indicates that no control task activation on receiving PEs is desired. The control task will only be activated on a CE ifce_ignoreisfalse, and the receiving color is configured to transmit down theRAMP. Because this function can encode up to eight switching commands, no data payload can be provided for this control task.
encode_single_payload, encode_payload does not take a data
argument. If a control payload only contains a single switching command,
then a 16-bit data argument can be supplied as an argument to the control task
activated on receipt of the wavelet. data is not meaningful if there is
more than one switching command in the control wavelet, because the bits
that would encode data encode the additional switching commands instead.
A control task that declares no arguments will ignore data, and
furthermore, data is ignored if the wavelet is not forwarded to the CE
(the current command’s ce_ignore value is true).
Example
The taskmain_task sends out a control wavelet along the color comm,
which encodes a control task ID:
comm will activate a control
task bound to this control task ID. For instance, if the receiving PE has the
following code, then upon receipt of the control wavelet, it will activate a
task which increments the value my_global:
<data_utils>
The data_utils library provides low-level data manipulation and bit
extraction functions.
The following functions return the lower or higher 16 bits of a 32-bit
variable. The lo16 function can also be called on a 16-bit data type.
Similarly, variants for 64-bit data types are also available.
<debug>
The debug library provides a tracing mechanism to record tagged values.
<debug> library:
<directions>
The directions library provides utility functions for manipulating
directions.
<dsd_ops>
The dsd_ops library provides wrappers around DSD op builtins that select an
appropriate builtin depending on argument indicating the types of the
underlying data. These wrappers are guaranteed to expand to a single call to a
DSD op builtin. The wrappers may be used with any combination of DSD, DSR,
scalar, or pointer-to-scalar operands that is supported by the underlying
builtin operation.
Each function operates on a limited set of types. For DSD operations, the
programmer must ensure that the specified type accurately reflects the type of
the data being accessed in memory or streamed via the DSD.
The final argument, named config, is a configuration struct for the
underlying DSD op builtin. See Builtins for more details on
the builtins underlying these functions.
Note that the config argument must be completely comptime-known. This
means that runtime .activate or .unblock values are not allowed with
these wrapper functions. We hope to lift this limitation in a future release.
Example
The following example illustrates the use ofdsd_ops to build a generic
module that instantiates a local task with a given ID, and moves data from the
given input color via the given input queue, into a user-specified buffer
buf.
<empty>
This library is empty on purpose. This allows a conditional module import as
follows:
<layout>
This library provides access to information about where the PE is located.
Specifically, the x and y coordinates in the rectangle can be
accessed at runtime, allowing code to be shared between PEs at different
locations.
<malloc>
The malloc library implements an arena allocator using a statically
allocated buffer.
In arena allocators, a single buffer (arena) is used to ensure that
all objects are allocated sequentially in memory. Allocating and
deallocating memory are fast operations, requiring an addition and/or
assignment. The free operation frees all allocated objects at once.
The parameter buffer_num_words specifies the number of words of the
statically allocated buffer.
If the param asserts_enabled is true, all allocations assert that the
buffer has enough free memory. The default is false.
<math>
Math constants
The following can be used anywhere a floating point number is needed.Math functions
Themath library provides standard mathematical functions. They are
written as generic functions to facilitate use in other libraries or
abstractions. In addition, non-generic @fp16() and f32 functions are
provided. These functions have names suffixed with _f16 and _f32,
respectively.
The following functions are provided:
Example
Note on sin and cos accuracy
Both f16 and f32 versions of sin and cos will produce
incorrect results when abs(x) ≥ 16384π (approximately 51472).
<random>
The random library provides utility functions that wrap the @random16
builtin to create random values across various ranges and distributions.
See @random16 for information on the PRNG used
by these functions.
<simprint>
The simprint library contains functions to print strings and various numeric
data types to the simulator logs. This is intended primarily for debugging, as
the printed output is not visible when running on hardware.
Messages produced by the simprint library are stored by the simulator in
fixed-size buffers, with one buffer per PE. A buffer will be flushed, with its
contents printed to the simulator logs, when the buffer is full or a "\n"
newline character is produced. Any data remaining in a PE’s print buffer at
the end of simulator execution will be silently discarded.
Basic printing functions
Format strings
Two functions are provided to print formatted strings:args. Available format specifiers are:
{d}: print the argument as a decimal number. Argument must have typeu16oru32.{X}: print the argument as a hexadecimal number in upper case. Argument must have typeu16oru32.{b}: print the argument as a binary number. Argument must have typeu16oru32.{f}: print the argument as a floating-point number. Argument must have type@fp16()orf32.
{ character may be escaped by doubling it. For example,
{{hello} will print as {hello}.
For example:
Disabling output
Sometimes it is useful to disable all of the debug prints produced by a particular instance of thesimprint module, while keeping the option to
turn them back on later. This helps save on runtime and space overhead, and
can also be used to conditionally enable or disable debug printing on certain
PEs. Prints originating from a specific simprint instance can be disabled
by setting the enable parameter to false at import time.
enable parameter is optional. Its default value is true, which
means that printing is enabled.
<string>
The string library contains functions for converting comptime_int values
to comptime_string values and for formatting strings at compile time.
comptime_int to comptime_string conversion
Format strings
args. Currently, only the {d} format specifier is
supported, which corresponds to comptime_int arguments.
A literal { character can be escaped by doubling it, such as {{foo}, which
will be formatted as {foo}.
For example:
<tile_config>
The tile_config library contains APIs relating to the hardware configuration
of a PE. It contains the following top-level constants:
tile_config library also contains an API to access the PE’s coordinates
in the rectangle at runtime.
color_config
This submodule oftile_config contains APIs and an enum type for changing
the configuration of a given color during a teardown phase.
First of all, the color_config submodule defines the following enum type:
color_config library consists of the following functions:
control_transform
This submodule oftile_config contains a function for setting the mask for
transforming the index part of control wavelets. This function is to be used
together with the DSD property control_transform to XOR the first six
bits of the index portion of a wavelet with the specified mask.
set_mask function can be used either at comptime or runtime. Only the
first six bits of the mask are taken into account.
exceptions
This submodule oftile_config contains functions for setting values in
the exception mask register.
The exception mask register determines which exceptions cause the
processor to stop.
An unmasked exception causes the processor to immediately stop execution.
A masked exception allows execution to continue.
By default, all exceptions are masked.
The functions in this submodule can be used to unmask them.
set_exception_mask overwrites the exception mask register.
Multiple exceptions can be unmasked simultaneously as follows:
filters
This submodule oftile_config contains APIs for configuring filters:
input_queue_status
This submodule oftile_config contains APIs for inspecting input queue
status.
main_thread_priority
This submodule oftile_config contains APIs for configuring main thread
priority. The main thread is the thread that executes non-async
operations. Operations tagged with async execute on a microthread, which
is associated with a fabric input or output queue. Main thread priority and
microthread priority determine the relative scheduling priority of the
threads.
output_queue_status
This submodule oftile_config contains APIs for inspecting output queue
status.
switch_config
This submodule oftile_config contains APIs and enum types that can be
used to change the switch configuration of a given color during a teardown
phase.
First of all, the switch_config submodule defines the following enum types:
switch_config submodule consists of the following
functions:
task_priority
This submodule oftile_config contains APIs for configuring task priority:
task_id can be a data_task_id or local_task_id to set
the priority of the associated task.
In addition, the priority of tasks activated by wavelets, including tasks
bound to a control_task_id, can be specified using the color
on WSE-2, or the input_queue on WSE-3, that carries the wavelets.
Note that updates to task priority made at runtime may take a few clock cycles
to take effect. These functions may be used at comptime or at runtime.
These functions can be used like:
teardown
This submodule oftile_config contains teardown APIs:
<time>
The time library returns the current 48-bit timestamp
counter as three 16-bit unsigned integers in little endian form.
<timer>
The timer library provides some additional utilities for managing
multiple timers in a program and calculating elapsed time.
<types>
The types library provides several type-related functions.
Basic queries
is_numeric function returns true for all types on which numerical
computations can be done, i.e. floating point types and integer types.
is_signed returns true if the type is signed, which is true for floating
point types and signed integer types.
is_float returns true if the data type is a floating point type. The
function returns true for f16, cb16, bf16, f32, and comptime_float.
Note that it returns true for all half-precision types regardless of what
@fp16() type is enabled at runtime. is_float16 has a similar behavior but
returns true for half-precision types only. In contrast, is_enabled_float
returns true only if the data type is @fp16() or f32.
The is_signed_int and is_unsigned_int functions allow you to perform tests
on integer types. Note that is_signed_int also returns true for
comptime_int.
is_dsd returns true if the type is a DSD type, while is_dsr returns
true if the type is a DSR type.
has_dsd_type and has_dsr_type functions are provided to check if a
given expression has a DSD or DSR type.
Size and alignment
Thetypes module also provides functions allowing to query low-level
information, such as size and alignment, on a given type T:
<kernels>
This library differs from all other libraries in that it provides kernels, as
opposed to individual functions. The tally kernel implements a two-phase tally,
used to coordinate the work done by multiple PEs. The fft kernel library
implements a 3D FFT.
<fft>
The FFT library implements a 3D FFT across a rectangle of PEs.
The library consists of several modules:
<kernels/fft/fft3d_layout>: Provides a full implementation of the 3D FFT, including host exported functions for launching FFT and iFFT computations. Imported once in a program’slayoutfile.<kernels/fft/fft3d>: Underlying implementation of 3D FFT. If using thefft3d_layoutmodule, then this module is not necessary to import. Using this module requires the user to manually construct the layout and host exported functions.<kernels/fft/get_params>: Imported once in a program’slayoutfile to provide correct FFT parameters for thefft3dmodule. If using thefft3d_layoutmodule, then this module is not necessary to import.
fft3d_layout module in a program
is as follows:
fft3d_layout module.
<tally>
The tally library implements a two-phase tally kernel that allows PEs within a
rectangle to communicate progress/completion to the host.
The library consists of two modules:
<kernels/tally/layout>: imported once and use in thelayoutblock to parameterize each PE’s tally behavior.<kernels/tally/pe>: imported once by each PE, consuming the parameters generated by the layout module.
phase2_tally parameter, the kernel signals
completion by sending the total to the North on output_color from the
PE at (kernel_width - 1, 0).
The second phase is optional. If phase2_tally == 0, the second phase will
be skipped and the output signal on output_color will be 0.
<collectives_2d>
This library implements collective communication directives that allows
PEs to communicate data with one another.
The library consists of two modules:
<collectives_2d/params>: Imported once to parameterize each PE in thelayoutblock.<collectives_2d/pe>: Imported once per dimension per PE. Contains collective communication directives for a single axis.
<collectives_2d/params>
The parameter module exposes a compile-time helper function for configuring
PEs to use <collectives_2d>
-
Pxis the PE’s x-coordinate. -
Pyis the PE’s y-coordinate. -
idsis a struct that is expected to have either thex-related fields, they-related fields, or all four, of the following:x_colors: a struct containing 2 distinct colors as anonymous fieldsx_entrypoints: a struct containing 2 distinct local task IDs as anonymous fieldsy_colors: a struct containing 2 distinct colors as anonymous fieldsy_entrypoints: a struct containing 2 distinct local task IDs as anonymous fields
-
Returns a struct containing the parameters necessary to import
library modules for the specified PE. This struct contains:
x: an opaque struct containing parameters needed to configure collective communications in the x-dimension.y: an opaque struct containing parameters needed to configure collective communications in the y-dimension.
<collectives_2d/pe>
The following directives are currently supported:
init initializes the library. It must be invoked for each axis.
broadcast transmits the contents of buf from the root PE to the buf
of other PEs in the row or column. count should be the length of buf.
It is akin to MPI_Bcast.
scatter transmits count-many elements from send_buf from the
root PE to the recv_buf of other PEs in the row/column. It is akin
to MPI_Scatter.
gather accumulates count-many elements from send_buf of other
PEs into the recv_buf of the root PE. It is akin to MPI_Gather.
When distributing or aggregating elements using scatter or gather
for N PEs, the send_buf or recv_buf should have space for
count * N elements, respectively.
reduce_fadds computes an MPI_Sum for buffers of f32.
In general, all PEs must call the same directive with same root
and count. The primitives have the following common parameters:
rootis the root PE for network configuration,send_bufis a buffer containing data to be transmitted,recv_bufis a buffer for holding data received,countis the number of elements to be transmitted,callbackis activated when the primitive completes.
collectives_2d. Each
imported module must be assigned queue IDs (queues) and DSR
IDs (dest_dsr_ids, src0_dsr_ids, src1_dsr_ids). If the
user does not specify these parameters explicitly, the default values
apply. The following example shows the default values of queue IDs
and DSR IDs of collectives_2d.
A minimal example that sets up PEs to broadcast 10 elements from
the root PE to every other PE in the row/column consists of
the following layout code:

