This section documents the builtins available in CSL. Builtins related to remote procedure calls (RPC) are documented in Builtins for Supporting Remote Procedure Calls (RPC), and builtins for operation on DSDs are documented in Builtins for DSD Operations.Documentation Index
Fetch the complete documentation index at: https://sdk.cerebras.ai/llms.txt
Use this file to discover all available pages before exploring further.
@activate
Set the status of a local task to Active, allowing it to be picked by the task picker if it is also unblocked.Syntax
idis an expression of typelocal_task_idthat is bound to a local task.
Example
@allocate_fifo
Create a FIFO DSD value. See Data Structure Descriptors for details. See Data Structure Registers for details about use with DSRs.@as
Coerce an input value from one numeric or boolean type to another.Syntax
result_typeis a numeric (i.e. boolean, integer, or float) type.valueis numeric value.
Example
Semantics
Float-to-integer type coercion rounds the value towards zero. For example:@as(i16, 11.2) == 11@as(i16, 10.8) == 10@as(i16, -10.8) == -10
@as(bool, 0) == false@as(bool, -0.0) == false@as(bool, -5) == true@as(bool, nan) == true
@assert
Asserts that a condition is true.Syntax
condis an expression of typebool.
Example
Semantics
Causes program execution to abort if the assert condition is false. Note: aborting will only happen in simulation, hardware executions of the program will ignore the assert. If the assert expression is encountered in acomptime context, the builtin
is equivalent to @comptime_assert.
@bitcast
Reinterpret the raw bits of the input value as a value of another type.Syntax
result_typeis a pointer or numeric (i.e. boolean, integer, or float) type.valueis numeric value. Must be an integer ifresult_typeis a pointer.- the bit width of
valuematches the bit width of values of typeresult_type.
Example
@bind_control_task
Bind a task to acontrol_task_id, so that each time a control wavelet
containing this ID in its payload is received, the task is activated
and can be scheduled for activation.
Syntax
this_control_taskis the name of a task.this_control_task_idis an identifier of typecontrol_task_id.
Example
Semantics
The@bind_control_task builtin must appear in a top-level comptime
block.
@bind_data_task
Bind a task to adata_task_id, so that each time a wavelet is received
along the routable color underlying data_task_id, the task is activated
and can be scheduled for execution.
Syntax
this_data_taskis the name of a task.this_data_task_idis an identifier of typedata_task_id.
Example
Semantics
The@bind_data_task builtin must appear in a top-level comptime
block.
Tasks passed into this builtin must take at least one argument.
@bind_local_task
Bind a task to alocal_task_id, so that each time that local_task_id
is unblocked and activated, the task is activated and can be scheduled
for execution.
Syntax
this_local_taskis the name of a task.this_local_task_idis an identifier of typelocal_task_id.
Example
Semantics
The@bind_local_task builtin must appear in a top-level comptime
block.
Tasks passed into this builtin cannot take any arguments.
@block
Block the task associated with the inputcolor, data_task_id, or
local_task_id so that the task is prevented from running when the task
identifier is activated.
For color and data_task_id inputs, @block prevents incoming
wavelets on the associated color from activating tasks.
This also applies to control wavelets carried by a color,
preventing a control task bound to the ID in a control wavelet’s
payload from activating.
Syntax
-
idis an expression of type- WSE-2:
color,data_task_id, orlocal_task_id. - WSE-3:
input_queue,data_task_id,local_task_id, orut_id.
- WSE-2:
Example
@comptime_assert
Assert a compile-time condition to be true; abort compilation if otherwise.Syntax
condis acomptimeexpression of typeboolandmessageis an expression of typecomptime_string.
Example
@comptime_print
Prints values at compile-time whenever the compiler evaluates this statement.Syntax
- all arguments are
comptimeexpressions.
Example
Semantics
Acomptime_print statement causes the compiler to print information for its
arguments whenever the compiler evaluates such builtin.
During comptime evaluation, the builtin is evaluated whenever control-flow
reaches that line.
Whenever the compiler is analysing reachable non-comptime code, the builtin
is evaluated exactly once. For instance, a @comptime_print builtin inside a
non-comptime loop causes the compiler to evaluate it exactly once.
@constants
Initialize a tensor with a value.Syntax
tensor_typeis acomptimetensor type.- The type of
valueis the same as the base type oftensor_type.
Example
@dimensions
Returns a 1D array in which the i’th element equals the size of the i’th dimension of the input array type. The length of the returned array equals the rank of the input array type. The type of each element in the returned array isu32.
Syntax
array_typeis atypedefining an array.
Example
@element_count
Returns the total number of elements in the input array type as anu32.
Syntax
array_typeis atypedefining an array.
Example
@element_type
Returns the element type of the input array type as atype.
Syntax
array_typeis atypedefining an array.
Example
@export
Creates a symbol in the output object file that refers to a global function or variable.Syntax
ptris a pointer to a global function or variable.optionsis a struct literal containing the required field:.name: comptime_stringis the name of the object file symbol.
Example
Semantics
The@export builtin must be called within a comptime block. The first
argument must be a direct pointer to a global function or variable. The second
argument must be a struct literal containing the required .name field with a
comptime_string value.
Calling @export has the same effect as declaring a symbol with the export
storage class. This builtin is useful when the exported name needs to be
computed at compile-time or when symbols need to be conditionally exported
based on compile-time parameters.
When @export is called on a symbol that is already declared via export, the
builtin’s name takes precedence. In addition, a symbol may only be exported
once via @export. Multiple @export calls on the same symbol result in an
error.
@field
Access the value of a given struct field.Syntax
some_structis a value of a struct type.field_nameis a string.
Example
Semantics
The builtin returns the value stored in thefield_name field of
some_struct if and only if such field exists.
A call to @field can also be used as the left-hand side of an
assignment as shown in the example. In this scenario, the underlying field
of some_struct named field_name will be updated.
@fp16
Returns the selected runtime FP16 format.Syntax
Semantics
The builtin returns atype representing the runtime FP16 format specified by
the --fp16-format command line option: f16, cb16, or bf16.
Example
@get_array
Convert a string to an array of bytes.Syntax
stringis an expression of typecomptime_string.
Semantics
Given a values of type comptime_string, @get_array returns an
array of type [@strlen(s)]u8. This array contains the bytes inside the
string.
Note that:
- Strings in CSL are not null-terminated, so the length of the array returned
by
@get_array(s)is@strlen(s), not@strlen(s)+1. If a null-terminated array is required, this can be constructed by concatenating the string"\x00"onto the end of the string before passing it to@get_array. - Strings in CSL are strings of bytes, not of characters. String literals
are interpreted as UTF-8, so if a string contains non-ASCII Unicode
characters, the length of the array returned by
@get_arraywill not match the number of characters in the string.
Example
@get_color
Create a value of typecolor with the provided identifier.
Syntax
color_idis an integer value
Semantics
Ifcolor_id is comptime-known then it must be within the range of valid
routable colors as defined by the target architecture.
If color_id is not comptime-known its type must be a 16-bit unsigned
integer. No runtime checks are performed in this case to ensure that the color
id is within the range of valid colors.
@get_config
Read the value of a PE configuration register.Syntax
addris a machine-word-sized unsigned integer expression that represents the word-address of the configuration register.access_range, if specified, is a comptime-known 2-element tuple of integers specifying an inclusive range of addresses thataddrfalls within.accessed_ranges, if specified, is a tuple of comptime-known 2-element tuples of integers that each specify an inclusive range of addresses thataddrmay fall within.
Example
Semantics
The@get_config builtin can only be called at runtime and during the
evaluation of a top-level comptime block.
It cannot be evaluated at comptime unless it is during the evaluation
of a top-level comptime block.
If @get_config is encountered during the evaluation of a top-level
comptime block then it will retrieve any configuration value that was
previously stored at addr. If no user-defined value has previously been
written to addr, and a default value exists for the register at addr,
the default value will be returned. Otherwise, @get_config will raise an
error at compile time.
A call to @get_config at runtime will become a volatile runtime read
operation (i.e., a read that should never be optimized by the compiler)
that will return any configuration value stored to addr. In that
scenario the addr expression does not have to be comptime-known.
If addr is comptime-known then it must be a comptime-known integer value
that falls within the valid configuration address range for the selected target
architecture.
In cases where addr may be runtime but is known to occur within a specific
range or set of ranges, a second argument can be provided to communicate this
assumption.
A single, contiguous range may be specified as a tuple of integers,
.{access_start, access_end}. In this case, it is required that
access_start <= addr <= access_end.
Multiple ranges may be specified as a nested tuple,
.{.{access_start_1, access_end_1}, ..., .{access_start_N, access_end_N}}. In
this case, each inner tuple .{access_start_i, access_end_i} must satisfy
access_start_i <= access_end_i, and addr must fall within one of the
specified ranges.
An error is emitted if the compiler is able to detect a violation of the above
requirements. If a violation occurs at runtime that the compiler cannot detect,
behavior is undefined.
In addition, if @get_config is called during the evaluation of a top-level
comptime block then it is not allowed to specify an address that falls
within a configuration range that is reserved by the compiler. These ranges
correspond to the following configurations:
- All DSRs
- Filters
- Basic routing
- Switches
- Input queues
- Task table
addr must be coercible to a machine-word-sized unsigned integer expression
regardless of whether it’s comptime-known or not.
@get_config_unchecked
Read the value of a PE configuration register unsafely.Syntax
addris a machine-word-sized unsigned integer expression that represents the word-address of the configuration register.
Example
Semantics
@get_config_unchecked is identical to @get_config (see
@get_config) with two exceptions:
- The compiler will not attempt to check if a reserved address is accessed by
@get_config_unchecked. Accessing a reserved address may result in undefined behavior. - The code generated for
@get_configmay insert delays to guarantee that it observes effects of preceding writes to configuration space. In some cases, this may be overly conservative.@get_config_uncheckedwill not cause such delays to be inserted. It is the programmer’s responsibility to ensure that@get_config_uncheckeddoes not observe indeterminate states of configuration space.
@get_control_task_id
Create a value of typecontrol_task_id with the provided identifier.
Syntax
idis a comptime-known expression of any unsigned integer type, or a runtime expression of typeu16.
Semantics
The builtin will only accept integers in the corresponding target architecture’s valid range for control task IDs. Ifid is comptime-known, the builtin will only accept integers
in the corresponding target architecture’s valid range for control task IDs.
If id is not comptime-known, its type must be u16.
No runtime checks are performed in this case to ensure that id
is within the range of valid control task IDs.
@get_data_task_id
Create a value of typedata_task_id with the provided identifier.
Syntax
-
idis an expression of type- WSE-2:
color. - WSE-3:
input_queue.
- WSE-2:
Semantics
On WSE-2, ifid is comptime-known, it must be within the
range of valid routable colors as defined by the target architecture.
If id is not comptime-known, no runtime checks are performed in this case
to ensure that id is within the range of valid routable colors.
@get_dsd
Create either a memory or fabric DSD value. See Data Structure Descriptors for details.@get_dsr
Create a unique DSR identifier value. This value will uniquely identify a physical DSR along with its DSR file. See Data Structure Registers for details.@get_filter_id
Get the integer identifier of the filter associated with a given color.Syntax
color_valueis a value of typecolor
Semantics
The inputcolor_value must be comptime-known and the builtin is guaranteed
to be evaluated at compile-time.
It returns the filter’s identifier (if any) as an unsigned 16-bit integer value.
If there is no filter set for color_value, an error is emitted. An error is
also emitted if the compiler is unable to determine a unique filter identifier
for all the PEs that share the same code and parameter values.
@get_input_queue
Create a value of type ‘input_queue’ with the provided identifier.Syntax
queue_idis a comptime-known non-negative integer expression
Semantics
The provided comptime-knownqueue_id must be a non-negative comptime-known
integer expression that is within the range of valid input queue ids as defined
by the target architecture.
@get_int
For types containing an underlying integer, return that integer value.Syntax
-
valueis an expression with any of the following types:colorcontrol_task_iddata_task_iddsr_destdsr_fifo_destdsr_fifo_src1dsr_src0dsr_src1- any
enumtype input_queue- any integer type
local_task_idoutput_queuesrut_idxdsr
Semantics
The@get_int builtin must have a single argument value having one of the
types listed above. The underlying integer value of value is returned.
@get_int can be evaluated at both comptime and runtime.
- If
valuehasenumtype, a value of the enum’s underlying integer type is returned. - If
valuehas integer type, it is returned unchanged. - A
u16is returned ifvaluehas typecolor,control_task_id,data_task_id,input_queue,local_task_id, oroutput_queue.
Example
@get_local_task_id
Create a value of typelocal_task_id with the provided identifier.
Syntax
idis a comptime-known expression of any unsigned integer type, or a runtime expression of typeu16.
Semantics
Ifid is comptime-known, the builtin will only accept integers
in the corresponding target architecture’s valid range for local task IDs.
If id is not comptime-known, its type must be u16.
No runtime checks are performed in this case to ensure that id
is within the range of valid local task IDs.
@get_output_queue
Create a value of type ‘output_queue’ with the provided identifier.Syntax
queue_idis a comptime-known non-negative integer expression
Semantics
The provided comptime-knownqueue_id must be a non-negative comptime-known
integer expression that is within the range of valid output queue ids as defined
by the target architecture.
@get_rectangle
Access the size of the rectangular region that was given toset_rectangle,
and other layout information.
Syntax
u16 fields width and height, and additional
information about the underlying fabric and offsets.
Example
Semantics
get_rectangle returns the width and height provided to
set_rectangle as a struct. This struct also contains fabric, a nested
struct that contains the width and height of the underlying fabric, and
offsets, a nested struct that contains the width and height of the offset
of the rectangle.
The @get_rectangle builtin can be used anywhere.
In a layout block, @get_rectangle is only valid after the call to
@set_rectangle.
@get_string_from_byte
Given a comptime-known, non-negative integer small enough to fit in one byte, returns a one-bytecomptime_string containing only that byte.
Syntax
byteis a non-negative integer that fits in one byte (i.e., is in the range [0, 255]).
Example
@has_field
Checks whether a given struct value or struct type has a field with a given name.Syntax
some_structis a value of a struct type, or a struct type.field_nameis a string.
Example
Semantics
The builtin returns true if and only if the structsome_struct has a field
called field_name.
The builtin is guaranteed to be evaluated at compile-time. The input
expressions are guaranteed to have no run-time effects.
@import_module
Import a group of global symbols defined in a CSL file, while optionally initializing parameters in the imported file. See Modules for details.@increment_dsd_offset
Set the offset of a memory DSD value. See Data Structure Descriptors for details.@initialize_queue
Associates a routable color with a queue ID, and optionally sets the priority of the microthread associated with the queue.Syntax
-
queueis a comptime-known expression of typeinput_queueoroutput_queue. -
configis a comptime-known struct expression with the following fields:-
coloris a comptime-known expression of typecolor. -
priorityis an optional field that can be either.{ .high = true },.{ .medium = true }, or.{ .low = true }. -
ctrl_table_idis- WSE-2: not supported.
- WSE-3: an optional field that must be a comptime-known integer expression.
-
dense_modeis- WSE-2: not supported.
- WSE-3: an optional field that must be a comptime-known boolean expression.
-
Example
Semantics
The@initialize_queue builtin will initialize the input or output queue
configuration associated with the input or output queue ID queue
respectively.
The builtin can only be called at most once per queue during the evaluation
of a top-level comptime block.
On WSE-2:
- If the argument
queueis an expression of typeoutput_queuethen the builtin must have no more than a single argument (i.e., thequeueargument). - If the argument
queueis an expression of typeinput_queuethen theconfigargument must be supplied, and must be a comptime-known struct with fieldscolor(required) andpriority(optional). - The
colorfield is required and specifies the routable fabric color to which the input queue with IDqueuewill be bound. - The
priorityfield is optional and can be used to specify the priority of the microthread that will be attached to the respective input queue with IDqueue. See Microthread Priority for more information on microthread priority. The default value is.{ .high = true }.
- Both input and output queues require both
queueandconfigarguments. - The
configcomptime-known struct argument must have thecolorfield but not thepriorityfield. - The
colorfield specifies the routable fabric color that the input or output queue with IDqueuewill be bound to. - The
ctrl_table_idfield is optional and allowed on input queues only. It can be used to specify an index identifier that represents a per-queue local control task table. What this means is that control wavelets arriving through the input queue with IDqueuewill be associated with a per-queue local control task table identified by thectrl_table_idvalue. The default value is0. - Multiple input queues can have the same value for
ctr_table_idwhich means that they will be sharing the same control task table. For example, if we never use thectrl_table_idfor any of our input queues then the default behavior is that they will all share the same control task table withctrl_table_id=0which is the same behavior as on WSE-2. - When
dense_modeis enabled on an output queue, 16-bit data are sent as half wavelets rather than full wavelets. A half wavelet is a special kind of wavelet that is processed more efficiently by the hardware by allowing queues (input and output) to operate on a finer granularity. By default,dense_modeis disabled, meaning that data is sent as full wavelets. - An input queue must have
dense_modeenabled in order to process half wavelets. Otherwise, the behavior is undefined. Input queues havedense_modedisabled by default.
@is_arch
Returns true if the current CSL program is being compiled for the given target architecture.Syntax
-
an_archis a comptime-known string value that represents the architecture mnemonic. The available mnemonics are:"wse2": for the WSE-2 architecture"wse3": for the WSE-3 architecture
Example
@is_comptime
Returnstrue if this expression is being evaluated as a comptime expression,
and false otherwise. (See is_constant_evaluated for the details about the
same function in C++).
Syntax
Example
@is_same_type
Returns true if the two type arguments to this function are the same.Syntax
this_typeandanother_typeare values of typetype.
Example
@load_to_dsr
Load a DSD value into a DSR. See Data Structure Registers for details.@map
Given a function, a list of input arguments and an optional output argument, perform a mapping of the input arguments to the output argument (if any) using the provided function.Syntax
callbackis a function that accepts as many arguments as the number ofInputarguments. It may optionally produce a value.Inputis a list of zero or more input arguments.Outputis an output argument.
Example
Semantics
The@map builtin requires at least one of its arguments to be a DSD or DSR
(input or output).
If callback returns a non-void value, the Output argument is mandatory
and must be either a DSD, DSR of type dsr_dest, or a non-const pointer value
whose base-type must match the return type of callback. A fabin_dsd
value is not allowed as the Output argument.
The Input arguments may include non-DSD/DSR values whose types must be
compatible with the corresponding parameter types of callback. Values of
type fabout_dsd are not allowed as Input arguments. If a DSR is used as
an Input argument, it must be of type dsr_src1.
For each DSD or DSR argument to @map, the corresponding parameter type or
return type of callback must be an ABI-compatible numeric type.
Currently, these types are: i16, i32, u16, u32,
@fp16(), f32. Note that @fp16() gives the type of the selected
runtime FP16 format (see @fp16).
DSR arguments to @map are expected to be loaded with the single_step
property (see single_step).
Execution semantics
The@map builtin repeatedly calls the callback function for each element
of the DSD/DSR argument(s). Before each call to callback, the next available
value from each Input DSD/DSR is read and passed to callback while the
non-DSD/DSR Input arguments are forwarded to callback. The value
returned from the callback call - if any - is written back to the Output
DSD/DSR or to the memory address that is specified by the Output non-const
pointer.
After reading or writing a DSD/DSR element value the length (or extent for
fabric DSDs) of the respective DSD/DSR is decremented by one. If the
length/extent is zero then the read/write operation fails and the implicit
@map loop terminates. If DSD/DSR operands have different lengths/extents, it
is possible for values to be read and discarded. Similarly, the computed value
from callback may be discarded.
@ptrcast
Casts a value of pointer type to a different pointer type.Syntax
destination_ptr_typeis a pointer type.ptris a value of pointer type.
Semantics
The builtin returns a pointer with the same memory address asptr, but
whose type is destination_ptr_type.
The destination_ptr_type must not be a pointer whose base type is only
valid in comptime expressions.
See Comptime.
This builtin is not valid in comptime expressions.
Example
@random16
Generates a 16-bit pseudo-random value.Syntax
Semantics
The builtin returns anu16 value, generated through the LFSR algorithm with
polynomial (x^23 + x^18 + 1). The LFSR state is advanced 128 iterations
after every use. The initial state of the algorithm, when the program starts, is
set to 0xdeadbeef. The state is not shared between PEs, but it is shared
between tasks.
The builtin is not valid in comptime expressions.
Example
@range
Generates a sequence of evenly spaced numbers.Syntax
elem_typeis an integer type.start,stopandstepare numeric values.
Examples
Semantics
The range of elements is defined as follows:startdefines the first element of the sequence.stepdefines how to generate the next element of the sequence given the previous element:next = previous + step.stopdefines an upper bound on the sequence such that all elements in the sequence are strictly less thanstop.
start, stop and step are coerced to the
type elem_type. If this it not possible, a compilation error is issued.
step != 0 is required. If step > 0 and stop <= start or
step < 0 and stop >= start, then the resulting sequence is empty.
The two-argument version of @range is equivalent to the common
scenario where start == 0 and step == 1.
@range_start, @range_stop, @range_step
Returns thestart, stop or step value of a given range.
Syntax
Examples
@rank
Returns the rank (number of dimensions) of the input array type as au16.
Syntax
array_typeis atypedefining an array.
Example
@set_active_prng
Sets the active PRNG (Pseudo-Random Number Generator).Syntax
prng_idis a 16-bit unsigned integer expression that specifies the PRNG ID to be set.
Semantics
The input integer expressionprng_id specifies the active PRNG ID as
prng_id % N where N is the total number of PRNGs for the given
architecture.
This builtin cannot be evaluated at comptime.
Example
@set_color_config, @set_local_color_config
Specify the color configuration for a specific color at a specific processing element (PE) from a layout block (@set_color_config) or
from a processing element’s top-level comptime block
(@set_local_color_config). A color configuration includes routing,
switching and filter configurations.
Syntax
-
x_coordandy_coordare comptime-known integers indicating the PE coordinates. -
this_coloris a comptime-known expression yielding a color value. -
configis a comptime-known anonymous struct with the following fields and sub-fields:-
routesrxtxpop_mode(deprecated, moved toswitchesfield)color_swap_xcolor_swap_y
-
switchespos1pos2pos3current_switch_posring_modepop_mode
-
filter -
teardown
-
Example
Semantics
Both@set_color_config and @set_local_color_config builtins will set
the color configuration - provided by the config field - to the input color
value of one or more PEs.
Calls to @set_color_config are only allowed during the evaluation of a
layout block. As a result they always refer to a specific PE that is specified
by the coordinate fields x_coord and y_coord.
Calls to @set_local_color_config are only allowed during the evaluation of
a top-level comptime block that belongs to a specific PE’s code and thus
explicit coordinates are not needed as in calls to @set_color_config.
However, since one or more PEs may share the same code - and thus the same
top-level comptime block - a call to @set_local_color_config may be
associated with multiple PEs depending on the rectangle’s PE-to-code mapping
defined by calls to the @set_tile_code builtin.
Any two calls to @set_color_config and/or set_local_color_config
that refer to the same combination of PE and color are not allowed.
Finally, a color configuration without a routes field is not allowed and
will result in an error. That’s because both the switches and filter
configurations are not valid without routes. If needed (e.g., for testing),
a user can specify an empty routes field as .routes = .{}.
Routing Configuration Semantics
rx and tx
Therx and tx fields specify the receive and transmit route
configurations for the given color. In particular, rx specifies the
direction(s) (i.e., EAST, WEST, SOUTH, NORTH and RAMP)
from which we are expecting to receive data and tx specifies the
direction(s) in which we wish to transmit data for the given color. The
example above demonstrates how these fields can be used in calls to the
@set_color_config and @set_local_color_config builtins. Both rx and
tx fields expect comptime-known structs with nameless fields of
unique direction values (e.g., .tx = .{WEST, EAST, NORTH}) or a single
direction (e.g., .tx = WEST and .rx = RAMP).
Bitvector Format: Alternatively, the entire routes field can be
specified as a 10-bit integer bitvector that encodes both RX and TX directions.
The format is as follows:
- RX directions (bits 0-4):
WEST=0x1,EAST=0x2,SOUTH=0x4,NORTH=0x8,RAMP=0x10 - TX directions (bits 5-9):
WEST=0x20,EAST=0x40,SOUTH=0x80,NORTH=0x100,RAMP=0x200
.routes = 0x210 configures RX from RAMP (0x10) and TX to
RAMP (0x200). The <directions> library provides helper constants
(RX_RAMP, TX_RAMP, etc.) and conversion functions (dirToRxBits,
dirToTxBits).
Note that it is only safe to enable multiple input directions for a given color
if it is known that wavelets will never arrive from multiple directions on this
color at once. If wavelets arrive from two enabled directions on the same color
at once, the behavior of the hardware router is undefined.
pop_mode
This field is deprecated as a route setting and therefore it is moved to the switch configuration semantics section. See Switching Configuration Semantics.color_swap_x and color_swap_y
Bothcolor_swap_x and color_swap_y fields expect a boolean value that
indicates whether we want to enable color swapping for the horizontal and
vertical direction respectively. More details about color swapping can be found
in Color Swapping.
Switching Configuration Semantics
pos1, pos2, pos3
Thepos1, pos2 and pos3 fields expect either an integer bitvector, as
described in rx and tx, or an anonymous struct value with only one
of the following fields:
rxtxinvalid
rx and tx fields as the initial configuration of the
receive and transmit routes respectively, then pos1, pos2 and
pos3 are additional configurations we can switch to in-sequence from
pos1 to pos3 whenever we receive route-switching control wavelets on the
given color. In particular, the first route-switching control wavelet will set
the pos1 configuration. The next one will cause an advance to the pos2
configuration and the third will cause an advance to the pos3 configuration.
Any additional route-switching control wavelets will either have no effect or go
back to the initial configuration (defined by the rx and tx fields)
depending on whether the ring_mode field is specified (see next section
about the ring_mode). Unlike the top-level rx field, the one nested
under the pos1, pos2 and pos3 fields can only have a single
direction value (i.e., EAST, WEST, NORTH, SOUTH or RAMP).
On the other hand, the tx field can accept a single direction value or
more than one (if the target supports it) just like the top-level tx field.
In the following example, the call to @set_local_color config builtin will
configure routing for color red such that receiving a route-switching
control wavelet will change the receive direction to EAST:
invalid field expects a boolean that must always be true which will
indicate that we can never advance to the corresponding switch position and we
will either remain on the previous one (if ring_mode is not enabled) or
advance back to the original switch position indicated by the top-level rx
and tx fields (if ring_mode is enabled). All switch positions will
default to .{ .invalid = true }.
pop_mode
Thepop_mode field expects an anonymous struct value with only one of the
following fields:
no_popalways_poppop_on_advancepop_on_advance_nop
true. In other words,
the pop_mode field can be viewed as an enum value that can take one of the 4
possible values above. By specifying the pop_mode field in a color
configuration we can effectively mutate the sequence of instructions carried by
control wavelets as they pass through a PE on a given color. In particular, when
we select the no_pop mode the instruction sequence of control wavelets
remains as is. If we select the always_pop mode then the first instruction
in the sequence is always popped every time a control wavelet arrives and the
respective instruction executed. If we select pop_on_advance then the first
instruction is popped only if the control wavelet has advanced the route
configuration to a new switch position (see section about pos1, pos2 and
pos3 fields). If we select pop_on_advance_nop then the
first instruction is popped if the control wavelet has advanced the route
configuration to a new switch position or is a no-op.
ring_mode
Thering_mode field expects a boolean value. If true then
route-switching control wavelets will cause the route configuration to loop-back
to the original configuration (specified by the tx and rx fields) once
all valid switch positions have been visited (see previous section about
pos1, pos2 and pos3 fields). If false then route-switching
control wavelets will have no effect on the routing configuration once we reach
the last valid switch position. In the following example, if at a given point in
time, the route configuration for color red is at switch position 3
(specified by the pos3 field) then receiving a route-switching control
wavelet will cause the route configuration for red to loop-back into the
initial one specified by fields rx and tx:
current_switch_pos
Thecurrent_switch_pos field expects a non-negative integer in the range
[0-3] with 0 representing the initial route configuration (specified by the
tx and rx fields) and 1, 2 and 3 representing switch
positions 1, 2 and 3 respectively specified by pos1, pos2
and pos3 fields. The switch position pointed to by the
current_switch_pos field will specify the initial route configuration for
the given color.
Filter Configuration Semantics
Thefilter field expects an anonymous struct value with the following
fields:
kindcount_datacount_controlinit_counterlimit1limit2max_counterfilter_controlmax_idxmin_idx
kind field specifies the kind of the filter (or filter mode) which is
an anonymous struct value with a single boolean field that is always true. The
name of the boolean field represents the filter kind mnemonic which is one of:
countersparse_counterrange
kind field can be viewed as an enum value that can take one of the 3
values above. The kind of the filter will determine the subset of filter fields
that are legal for that kind. Lets look at each one of the three possible filter
kinds separately.
Counter filter
The legal filter fields for a counter filter are the following:count_datacount_controlinit_counterlimit1max_counterfilter_control
count_data
and/or count_control fields that expect a boolean value where true means
that we enable data and control wavelet counting respectively. The default is
false for both fields meaning that neither data nor control wavelets will
cause the counter to get incremented. We can initialize the active wavelet
counter by specifying the init_counter field that expects a non-negative
integer value (the default value is zero). The filter counter will get
incremented up to a certain value (inclusive) and then get reset to zero.
That value is specified by the limit1 field that expects a non-negative
integer value that defaults to zero. The counter filter will reject all
wavelets whose active counter is greater than a maximum value (exclusive)
specified by the max_counter field that expects a non-negative integer value
that defaults to zero. Finally, the filter_control field expects a boolean
value. If true then only control wavelets that arrive when the value of
the active counter is equal to max_counter are allowed to pass. All other
wavelets will be rejected by the filter.
In the following example, we set a counter filter for color red so that
every 5th data wavelet is rejected:
Sparse counter filter
The legal filter fields for a sparse counter filter are the following:count_datacount_controlinit_counterlimit1limit2max_counter
count_data,
count_control and max_counter fields are identical. The difference is
that when the limit2 field is specified then when the active counter reaches
the limit1 value (inclusive) then the limit2 value is copied to
limit1 before the active counter gets reset back to zero. When this happens
limit2 is effectively disabled and won’t be copied to limit1 again. If
the wavelet counter reaches limit1 a second time - after limit2 was
copied to limit1 - then the filter will no longer filter any wavelets and
thus it will be effectively disabled. Both limit1 and limit2 counters
expect non-negative integer values that default to zero. However, it is
important to note that when limit2 is not specified and the active counter
reaches limit1 then no other wavelets are filtered which means that the
sparse counter filter is effectively disabled. The same is true when no
limit1 value is provided, i.e., the filter is effectively disabled and thus
no filtering is done.
Range filter
The legal filter fields for a range filter are the following:max_idxmin_idx
range then all control wavelets are
accepted. However, data wavelets are filtered based on their index value. In
particular, a data wavelet will be rejected iff its index value is not within
the range [min_idx, max_idx].
Teardown Configuration Semantics
Theteardown field expects a comptime-known boolean expression. If true
then the color associated with the given configuration will be set to teardown
mode when the program starts. This means that all traffic will be suspended on
that color until the teardown mode is exited explicitly at runtime through
the <tile_config/teardown.csl> standard library API.
While a color is in teardown mode, all the configuration settings can be
re-set at runtime using a standard library API. For example, filters can be
configured using the <tile_config/filters.csl> standard library API.
@set_config
Write the value of a PE configuration register.Syntax
addris a machine-word-sized unsigned integer expression that represents the word-address of the configuration register.config_valueis a machine-word-sized unsigned integer expressions that represents the new configuration value.access_range, if specified, is a comptime-known 2-element tuple of integers specifying an inclusive range of addresses thataddrfalls within.accessed_ranges, if specified, is a tuple of comptime-known 2-element tuples of integers that each specify an inclusive range of addresses thataddrmay fall within.
Example
Semantics
The@set_config builtin can be called at runtime and during the
evaluation of a top-level comptime block.
It cannot be evaluated at comptime unless it is during the evaluation
of a top-level comptime block.
If @set_config is encountered during the evaluation of a top-level
comptime block then it will store config_value to addr during
link time and therefore the configuration value is guaranteed to be present
before the program begins execution.
A call to @set_config during top-level comptime evaluation will always
overwrite any configuration value that was previously stored at addr.
A call to @set_config at runtime will become a volatile runtime write
operation that will store config_value to addr. In that scenario,
both config_value and addr expressions don’t have to be comptime-known.
If addr is comptime-known then it must be a comptime-known integer value
that falls within the valid configuration address range for the selected target
architecture.
In cases where addr may be runtime but is known to occur within a specific
range or set of ranges, a second argument can be provided to communicate this
assumption.
A single, contiguous range may be specified as a tuple of integers,
.{access_start, access_end}. In this case, it is required that
access_start <= addr <= access_end.
Multiple ranges may be specified as a nested tuple,
.{.{access_start_1, access_end_1}, ..., .{access_start_N, access_end_N}}. In
this case, each inner tuple .{access_start_i, access_end_i} must satisfy
access_start_i <= access_end_i, and addr must fall within one of the
specified ranges.
An error is emitted if the compiler is able to detect a violation of the above
requirements. If a violation occurs at runtime that the compiler cannot detect,
behavior is undefined.
In addition, if @set_config is called during the evaluation of a top-level
comptime block then it is not allowed to specify an address that falls
within a configuration range that is reserved by the compiler. These ranges
correspond to the following configurations:
- All DSRs
- Filters
- Basic routing
- Switches
- Input queues
- Task table
@set_color_config, @set_local_color_config and
@initialize_queue builtins) while the rest are managed automatically by
the compiler (i.e., DSRs and task tables).
Both addr and config_value must be coercible to a machine-word-sized
unsigned integer expressions regardless of whether they are comptime-known
or not.
@set_config_unchecked
Write the value of a PE configuration register unsafely.Syntax
addris a machine-word-sized unsigned integer expression that represents the word-address of the configuration register.config_valueis a machine-word-sized unsigned integer expressions that represents the new configuration value.
Example
Semantics
@set_config_unchecked is identical to @set_config (see
@set_config) with two exceptions:
- The compiler will not attempt to check if a reserved address is accessed by
@set_config_unchecked. Accessing a reserved address may result in undefined behavior. - The code generated for
@set_configmay insert delays to guarantee that its effects are observed by subsequent writes to configuration space. In some cases, this may be overly conservative.@set_config_uncheckedwill not cause such delays to be inserted. It is the programmer’s responsibility to ensure that@set_config_uncheckeddoes not cause indeterminate states of configuration space to be observed.
@set_dsd_base_addr
Set the base-address of a memory DSD value. See Data Structure Descriptors for details.@set_dsd_length
Set the length of a 1D memory DSD value. See Data Structure Descriptors for details.@set_dsd_stride
Set the stride of a 1D memory DSD value. See Data Structure Descriptors for details.@set_fifo_read_length
Set the read length of a FIFO DSD. See Data Structure Descriptors for details.@set_fifo_write_length
Set the write length of a FIFO DSD. See Data Structure Descriptors for details.@set_rectangle
Specify the size of the rectangular region of processing element that will execute this code.Syntax
widthandheightarecomptimeintegers.
Example
Semantics
The@set_rectangle builtin must appear only in a layout block.
Additionally, there must be exactly one call to @set_rectangle in a
layout block.
@set_teardown_handler
Set a function to be the teardown handler for a given color.Syntax
this_funcis the name of a function with no input parameters and ‘void’ return type.this_coloris a comptime-known expression of type ‘color’.
Example
Semantics
The@set_teardown_handler builtin must appear in a top-level
comptime block. When a color goes into teardown mode at runtime,
then the function associated with that color, through a call to
@set_teardown_handler, will be executed.
Calling @set_teardown_handler for the same color more than once
is not allowed and will result in an error.
The color will not automatically exit from the teardown mode.
The user is responsible for exiting teardown mode explicitly from within
the respective teardown handler function.
The color that is passed to a @set_teardown_handler call must be
within the range of routable colors for the given target architecture.
If there is at least 1 call to @set_teardown_handler in the program
then no task is allowed to be bound to the teardown task ID and vice-versa.
The teardown task ID is the value returned by the CSL standard library through
the teardown API (see teardown).
@set_tile_code
Specify the file that contains instructions to execute on a specific processing element, while optionally initializing parameters defined in the file.Syntax
x_coordandy_coordarecomptimeintegers.filenameis acomptimestring.param_bindingis acomptimeanonymous struct.
Example
Semantics
The@set_tile_code builtin must appear only in a layout block.
Additionally, there must be exactly one call to @set_tile_code for each
coordinate contained in the dimensions specified in the call to
@set_rectangle. Unless the specified file path is an absolute path, it is
interpreted as relative to the path of the file that contains the
@set_tile_code() builtin call.
@strcat
Concatenates compile-time strings.Syntax
- each argument is an expression of type
comptime_string.
Semantics
The@strcat builtin returns a value of comptime_string that results
from concatenating its arguments.
Example
@strlen
Returns the length of a compile-time string.Syntax
stris an expression of typecomptime_string.
Semantics
The@strlen builtin returns a value of type comptime_int equal to
the length of its argument, i.e., the number of characters in the string.
Example
@type_of
Returns the type of an expression.Syntax
any_expressionis any valid expression.
Semantics
The@type_of builtin returns a value of type type describing the
evaluated type of the input expression.
The builtin is always evaluated at compile-time, but the input expression does
not need to be comptime.
No code is generated for the input expression; as such, this expression will
not have run-time effects on the program.
@unblock
Unblock the task associated with the inputcolor, data_task_id, or
local_task_id so that the task can be run when the task identifier
is activated.
Syntax
-
idis an expression of type- WSE-2:
color,data_task_id, orlocal_task_id. - WSE-3:
input_queue,data_task_id,local_task_id, orut_id.
- WSE-2:
Example
@zeros
Initialize a tensor with zeros.Syntax
tensor_typeis a comptime-known numeric tensor type.
Example
Builtins for Supporting Remote Procedure Calls (RPC)
This category includes builtins that enable users to advertise device symbols to the host so that the host can interact with them through wavelets akin to RPC. The advertised symbols could be data or functions forming a host-callable API. In addition, this builtin category includes builtins that allow users to interpret incoming wavelets by associating them with the respective advertised symbols.@export_name
Declare that a given symbolic name can be advertised from one or more processing elements with a specific type and mutability.Syntax
nameis an expression of typecomptime_stringtypeis an expression of typetypeisMutableis a comptime-known expression of typebool
Example
Semantics
Calls to the@export_name builtin can only appear during the evaluation
of a layout block. A given name can only be exported once with
@export_name. The third isMutable parameter must be provided unless
type is a function type.
The type parameter cannot be a comptime-only type (e.g., comptime_int,
comptime_float etc.) with the exception of function types.
In addition, the type parameter cannot be an aggregate type like an array
or struct.
The type parameter cannot be an enum type as well.
If type is a function type, then the same rules apply to the respective
function parameter types and return type.
If type is a function type, then it can have a maximum of 15 input
parameters.
@export_symbol
Advertise a device symbol to the host with a given name, if provided.Syntax
symbolis a reference to a global device symbol.nameis an expression of typecomptime_string.
Example
Semantics
Calls to the@export_symbol builtin can only appear during the evaluation
of a top-level comptime block.
Its first argument must be a global symbol that is used at least once by
code that is not comptime evaluated.
A given symbol can be exported multiple times as long as each time the name
argument is provided and it is unique. If name is not provided then the name
of the symbol is used as the advertised name instead.
The advertised name (whether it is explicitly provided or defaulted to the
symbol’s actual name) must always correspond to a name that was exported during
layout evaluation using the @export_name builtin. That is, there must be a
name exported with @export_name during layout evaluation that has the same
name, type and mutability.
The compiler will collect all exported symbols and advertise them to the host
by producing a JSON file containing meta-data for each one. The schema of the
produced JSON file is as follows:
@get_symbol_id
Returns the unique integer identifier for an advertised symbol.Syntax
symbolis a reference to an advertised global device symbol.
Example
Semantics
The@get_symbol_id builtin can be called at comptime or runtime but it
is not allowed to appear during layout evaluation. The input symbol
must have been advertised using @export_symbol.
@get_symbol_value
Returns the value of an advertised global symbol given a runtime integer identifier value.Syntax
typean expression of typetype.idis a runtime-only integer identifier value.
Example
Semantics
The@get_symbol_value builtin can only be called at runtime.
If no symbol was advertised with the given integer identifier then
the behavior is undefined.
If there is a symbol advertised with the given integer identifier,
then the builtin will return a copy of its value.
There has to be at least one global symbol advertised with type type.
@get_tensor_ptr
Returns the value of an exported tensor pointer given a runtime integer identifier value.Syntax
idis a runtime-determined integer identifier value.
Example
Semantics
The@get_tensor_ptr builtin can only be called at runtime.
If no tensor pointer has been advertised with the given integer
identifier then the behavior is undefined.
If there is a tensor pointer advertised with the given integer
identifier, then the builtin will return a copy of the pointer
bit-casted into a [*]u16 type.
@get_xdsr
Create a unique XDSR identifier value. This value will uniquely identify a physical XDSR. See Data Structure Registers for details.@has_exported_tensors
Returnstrue iff there is at least 1 tensor pointer exported
and false otherwise.
Syntax
Example
Semantics
The@has_exported_tensors builtin cannot be called from a top-level
comptime block or a layout block.
It is guaranteed to be evaluated at comptime.
It will return true iff there is at least 1 exported tensor pointer
and false otherwise.
@rpc
Creates an RPC server listening to a given color.Syntax
task_idis an expression of typedata_task_id.
Example
Semantics
The@rpc builtin can only be called during the evaluation of a top-level
comptime block.
A call to @rpc will produce a wavelet-triggered task (WTT) that is bound
to task_id and would receive data from the underlying routable color.
Note that the user is responsible for routing the data through that color
such that they are received by the produced WTT.
No other task may be bound to task_id and vice-versa.
The WTT-based RPC server is expected to receive sequences of wavelets. Each one
of these sequences corresponds to a single RPC that consists of a unique integer
identifier corresponding to an exported function along with its input arguments.
If the input arguments of an RPC do not match the expected number of arguments
for a given exported function, a runtime assertion is triggered.
If the unique integer identifier of an RPC does not match any exported function,
the call will be ignored and the server will be ready for the next RPC sequence.
If the function called by the RPC server returns a value, then this value will
be ignored.
No more than 1 call to @rpc is allowed for a given tile code which means
that we can always have up to 1 RPC server per PE.
Builtins for DSD Operations
These builtins perform bulk operations on a set of elements described by DSDs, by exploiting native hardware instructions. The destination operand is always the first argument, and the subsequent arguments are either DSDs, scalars, or pointers. Additionally, many of these builtins have a SIMD (single instruction, multiple data) mode. For more information, see SIMD Mode.Syntax
For the DSD operation builtins below, the arguments are labeled as follows:dest_dsd,src_dsd1, andsrc_dsd2are constants or variables created using the@get_dsdbuiltin or theget_dsrbuiltin.dest_dsdis the destination DSD or DSR. If this is a DSR value it must be of typedsr_destordsr_src0.src_dsd1andsrc_dsd2are source DSDs or DSRs or any combination of them. If any of the source operands are DSRs then they cannot be of typedsr_dest.i16_valueis a value of typei16.i32_valueis a value of typei32.u16_valueis a value of typeu16.u32_valueis a value of typeu32.fp16_valueis a value of type@fp16(), where@fp16()gives the type of the selected runtime FP16 format (see @fp16).f32_valueis a value of typef32.i16_pointeris a pointer to a value of typei16.i32_pointeris a pointer to a value of typei32.u16_pointeris a pointer to a value of typeu16.u32_pointeris a pointer to a value of typeu32.fp16_pointeris a pointer to a value of type@fp16().f32_pointeris a pointer to a value of typef32.
@add16
Add two 16-bit integers.@addc16
Add two 16-bit integers, with carry.@and16
Bitwise-and two 16-bit integers.@clz
Count leading zeros.@ctz
Count trailing zeros.@fabsh
Absolute value of a 16-bit floating point.@fabss
Absolute value of a 32-bit floating point.@faddh
Add two 16-bit floating point values.@faddhs
Add a 16-bit and 32-bit floating point value.@fadds
Add two 32-bit floating point values.@fh2s
Convert a 16-bit floating point value to a 32-bit floating point value.@fh2xp16
Convert a 16-bit floating point value to a 16-bit integer.@fmach
16-bit floating point multiply-add.@fmachs
16-bit floating point multiply with 32-bit addition.@fmacs
32-bit floating point multiply-add.@fmaxh
16-bit floating point max.@fmaxs
32-bit floating point max.@fmovh
Move a 16-bit floating point value.@fmovs
Move a 32-bit floating point value.@fmulh
Multiply 16-bit floating point values.@fmuls
Multiply 32-bit floating point values.@fnegh
Negate a 16-bit floating point value.@fnegs
Negate a 32-bit floating point value.@fnormh
Normalize a 16-bit floating point value.@fnorms
Normalize a 32-bit floating point value.@fs2h
Convert a 32-bit floating point value to a 16-bit floating point value.@fs2xp16
Convert a 32-bit floating point value to a 16-bit integer.@fscaleh
16-bit floating point multiplied by a constant.@fscales
32-bit floating point multiplied by a constant.@fsubh
Subtract two 16-bit floating point values.@fsubs
Subtract two 32-bit floating point values.@mov16
Move a 16-bit integer.@mov32
Move a 32-bit integer.@or16
Bitwise-or on two 16-bit integers.@popcnt
Population count of an integer.@sar16
Arithmetic shift right of a 16-bit integer.@sll16
Logical shift left of a 16-bit integer.@slr16
Logical shift right of a 16-bit integer.@sub16
Substract two 16-bit integers.@xor16
Xor two 16-bit integers.@xp162fh
Convert a 16-bit integer into a 16-bit floating point value.@xp162fs
Convert a 16-bit integer into a 32-bit floating point value.Example
@dfilt
Instructs an input queue to drop all data wavelets until a certain number of control wavelets are encountered.Syntax
-
dsdis afabin_dsdor DSR that contains afabin_DSD.- If a DSR is used, it must have type
dsr_src1and be loaded with theasyncconfiguration (see Data Structure Registers). Behavior is undefined if@dfiltis used with a DSR that does not meet these conditions.
- If a DSR is used, it must have type
-
configurationis the configuration struct that is optionally provided to other DSD operations (see Data Structure Descriptors).
Semantics
The first argument to@dfilt must be a fabin_dsd or a DSR representing
an ‘async’ fabin_dsd. A call to @dfilt will drop data wavelets arriving
on the input queue associated with the input DSD. The extent of the DSD
determines the number of control wavelets the operation expects. The input
queue will drop all data wavelets until the specified number of control
wavelets is encountered.
Unlike other DSD operations, the configuration struct is required, and the
async configuration must be true. @dfilt does not support the
on_control or index configurations.

