Skip to main content
Version: v1.4.1

SIMT Intrinsics

For the CUDA backend, Taichi supports warp-level and block-level intrinsics, which are needed for writing high-performance SIMT kernels. You can use them in Taichi similar to the usage in CUDA kernels. Currently, the following functions are supported:

OperationMapped CUDA intrinsic
ti.simt.warp.all_nonzero__all_sync
ti.simt.warp.any_nonzero__any_sync
ti.simt.warp.unique__uni_sync
ti.simt.warp.ballot__ballot_sync
ti.simt.warp.shfl_sync_i32__shfl_sync
ti.simt.warp.shfl_sync_f32__shfl_sync
ti.simt.warp.shfl_up_i32__shfl_up_sync
ti.simt.warp.shfl_up_f32__shfl_up_sync
ti.simt.warp.shfl_down_i32__shfl_down_sync
ti.simt.warp.shfl_down_f32__shfl_down_sync
ti.simt.warp.shfl_xor_i32__shfl_xor_sync
ti.simt.warp.match_any__match_any_sync
ti.simt.warp.match_all__match_all_sync
ti.simt.warp.active_mask__activemask
ti.simt.warp.sync__syncwarp

See Taichi's API reference for more information on each function.

Here is an example of performing data exchange within a warp in Taichi:

a = ti.field(dtype=ti.i32, shape=32)

@ti.kernel
def foo():
ti.loop_config(block_dim=32)
for i in range(32):
a[i] = ti.simt.warp.shfl_up_i32(ti.u32(0xFFFFFFFF), a[i], 1)

for i in range(32):
a[i] = i * i

foo()

for i in range(1, 32):
assert a[i] == (i - 1) * (i - 1)
Was this helpful?