diff options
author | Changpeng Fang <changpeng.fang@amd.com> | 2022-09-20 17:25:52 -0700 |
---|---|---|
committer | Changpeng Fang <changpeng.fang@amd.com> | 2022-09-20 17:25:52 -0700 |
commit | 3ae4c3589ec7336d363fc1779c4a99360164c8f4 (patch) | |
tree | 3f9890c66762823d3bbc6ae991232f5a1ecc5154 /clang/lib/Serialization/ModuleManager.cpp | |
parent | 2d3b54feb25b36f8c76333718adb17e47d9953ca (diff) | |
download | llvm-3ae4c3589ec7336d363fc1779c4a99360164c8f4.zip llvm-3ae4c3589ec7336d363fc1779c4a99360164c8f4.tar.gz llvm-3ae4c3589ec7336d363fc1779c4a99360164c8f4.tar.bz2 |
AMDGPU: Implicit kernel arguments related optimization when uniform-workgroup-size=true
Summary:
Under code object version 5, ockl_get_local_size returns the value computed by the expression:
workgroup_id < hidden_block_count ? hidden_group_size : hidden_remainder
For functions with the attribute uniform-work-group-size=true. we can evaluate workgroup_id < hidden_block_count
as true, and thus hidden_group_size is returned for ockl_get_local_size.
With uniform-workgroup-size=true, this work also set all remainders to zero, and if there
is reqd_work_group_size, we also set work-group-size to the required value from the metadata.
Reviewers:
arsenm and bcahoon
Differential Revision:
https://reviews.llvm.org/D131276
Diffstat (limited to 'clang/lib/Serialization/ModuleManager.cpp')
0 files changed, 0 insertions, 0 deletions