aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/CodeGen/CodeGenModule.cpp
diff options
context:
space:
mode:
authorMahesh-Attarde <145317060+mahesh-attarde@users.noreply.github.com>2024-10-30 01:17:25 -0700
committerGitHub <noreply@github.com>2024-10-30 16:17:25 +0800
commite61a7dc256bd530a0b9551e2732e5b5b77e2cd1e (patch)
tree1f7941f0b7f6706cb67ea508b7fd925a32c0ffb8 /clang/lib/CodeGen/CodeGenModule.cpp
parentf3584222682bd64daa89cbfe41c071c6bfc2347a (diff)
downloadllvm-e61a7dc256bd530a0b9551e2732e5b5b77e2cd1e.zip
llvm-e61a7dc256bd530a0b9551e2732e5b5b77e2cd1e.tar.gz
llvm-e61a7dc256bd530a0b9551e2732e5b5b77e2cd1e.tar.bz2
[X86][AVX512] Use comx for compare (#113567)
We added AVX10.2 COMEF ISA in LLVM, This does not optimize correctly in scenario mentioned below. Summary Input ``` define i1 @oeq(float %x, float %y) { %1 = fcmp oeq float %x, %y ret i1 %1 }define i1 @une(float %x, float %y) { %1 = fcmp une float %x, %y ret i1 %1 }define i1 @ogt(float %x, float %y) { %1 = fcmp ogt float %x, %y ret i1 %1 } // Prior AVX10.2, default code generation oeq: # @oeq cmpeqss xmm0, xmm1 movd eax, xmm0 and eax, 1 ret une: # @une cmpneqss xmm0, xmm1 movd eax, xmm0 and eax, 1 ret ogt: # @ogt ucomiss xmm0, xmm1 seta al ret ``` This patch will remove `cmpeqss` and `cmpneqss`. For complete transform check unit test. Continuing on what PR https://github.com/llvm/llvm-project/pull/113098 added Earlier Legalization and combine expanded `setcc oeq:ch` node into `and` and `setcc eq` , `setcc o`. From suggestions in community new internal transform ``` Optimized type-legalized selection DAG: %bb.0 'hoeq:' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: f16,ch = CopyFromReg t0, Register:f16 %0 t4: f16,ch = CopyFromReg t0, Register:f16 %1 t14: i8 = setcc t2, t4, setoeq:ch t10: ch,glue = CopyToReg t0, Register:i8 $al, t14 t11: ch = X86ISD::RET_GLUE t10, TargetConstant:i32<0>, Register:i8 $al, t10:1 Optimized legalized selection DAG: %bb.0 'hoeq:' SelectionDAG has 12 nodes: t0: ch,glue = EntryToken t2: f16,ch = CopyFromReg t0, Register:f16 %0 t4: f16,ch = CopyFromReg t0, Register:f16 %1 t15: i32 = X86ISD::UCOMX t2, t4 t17: i8 = X86ISD::SETCC TargetConstant:i8<4>, t15 t10: ch,glue = CopyToReg t0, Register:i8 $al, t17 t11: ch = X86ISD::RET_GLUE t10, TargetConstant:i32<0>, Register:i8 $al, t10:1 ``` Earlier transform is mentioned here https://github.com/llvm/llvm-project/pull/113098#discussion_r1810307663 --------- Co-authored-by: mattarde <mattarde@intel.com>
Diffstat (limited to 'clang/lib/CodeGen/CodeGenModule.cpp')
0 files changed, 0 insertions, 0 deletions