aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/PrologEpilogInserter.cpp
diff options
context:
space:
mode:
authorMatt Arsenault <Matthew.Arsenault@amd.com>2023-06-30 22:27:31 -0400
committerMatt Arsenault <Matthew.Arsenault@amd.com>2023-07-05 16:53:01 -0400
commit9c82dc6a6ba1f3d75b5547680e0a8532684879c9 (patch)
treeb080783f868816b99e44c8d404b744819283b39f /llvm/lib/CodeGen/PrologEpilogInserter.cpp
parent59c311c5d4a04a6a4f8c4abf140a63af1079e34c (diff)
downloadllvm-9c82dc6a6ba1f3d75b5547680e0a8532684879c9.zip
llvm-9c82dc6a6ba1f3d75b5547680e0a8532684879c9.tar.gz
llvm-9c82dc6a6ba1f3d75b5547680e0a8532684879c9.tar.bz2
AMDGPU: Always use v_rcp_f16 and v_rsq_f16
These inherited the fast math checks from f32, but the manual suggests these should be accurate enough for unconditional use. The definition of correctly rounded is 0.5ulp, but the manual says "0.51ulp". I've been a bit nervous about changing this as the OpenCL conformance test does not cover half. Brute force produces identical values compared to a reference host implementation for all values.
Diffstat (limited to 'llvm/lib/CodeGen/PrologEpilogInserter.cpp')
0 files changed, 0 insertions, 0 deletions