triton.autotune¶

triton.autotune(configs, key, prune_configs_by=None, reset_to_zero=None, restore_value=None, pre_hook=None, post_hook=None, warmup=None, rep=None, use_cuda_graph=False, do_bench=None, cache_results=False)¶

用于自动调优 triton.jit 修饰的函数的装饰器。

@triton.autotune(configs=[
    triton.Config(kwargs={'BLOCK_SIZE': 128}, num_warps=4),
    triton.Config(kwargs={'BLOCK_SIZE': 1024}, num_warps=8),
  ],
  key=['x_size'] # the two above configs will be evaluated anytime
                 # the value of x_size changes
)
@triton.jit
def kernel(x_ptr, x_size, BLOCK_SIZE: tl.constexpr):
    ...

注意:: 当评估所有配置时，内核将运行多次。这意味着内核更新的任何值都将被多次更新。为避免这种不希望的行为，您可以使用 reset_to_zero 参数，它会在运行任何配置之前将所提供张量的值重置为零。

如果环境变量 TRITON_PRINT_AUTOTUNING 设置为 "1"，Triton 将在自动调优每个内核后向标准输出打印一条消息，包括自动调优所花费的时间和最佳配置。

参数:

configs (list[triton.Config]) – 一个 triton.Config 对象的列表。
key (list[str]) – 一个参数名称的列表，其值的变化将触发对所有提供的配置的评估。
prune_configs_by –
一个用于修剪配置的函数字典，字段包括：‘perf_model’：用于预测不同配置下运行时间的性能模型，返回运行时间；‘top_k’：要进行基准测试的配置数量；‘early_config_prune’：一个用于修剪配置的函数。它应具有以下签名

prune_configs_by( configs: List[triton.Config], named_args: Dict[str, Any], **kwargs: Dict[str, Any]) -> List[triton.Config]: 并返回修剪后的配置。它应至少返回一个配置。
reset_to_zero (list[str]) – 一个参数名称的列表，在评估任何配置之前，其值将被重置为零。
restore_value (list[str]) – 一个参数名称的列表，在评估任何配置之后，其值将被恢复。
pre_hook (lambda args, reset_only) – 一个在内核被调用之前调用的函数。这会覆盖用于 ‘reset_to_zero’ 和 ‘restore_value’ 的默认 pre_hook。‘kwargs’：传递给内核的所有参数的字典。‘reset_only’：一个布尔值，指示 pre_hook 是否仅为重置值而被调用，而没有相应的 post_hook。
post_hook (lambda args, exception) – 一个在内核被调用之后调用的函数。这会覆盖用于 ‘restore_value’ 的默认 post_hook。‘kwargs’：传递给内核的所有参数的字典。‘exception’：在发生编译或运行时错误时，内核抛出的异常。
warmup (int) – 传递给基准测试的预热时间（以毫秒为单位）（已弃用）。
rep (int) – 传递给基准测试的重复时间（以毫秒为单位）（已弃用）。
do_bench (lambda fn, quantiles) – 一个用于测量每次运行时间的基准测试函数。
cache_results – 是否将自动调优的时间数据缓存到磁盘。默认为 False。

“type cache_results: bool