diff options
author | Matt Arsenault <Matthew.Arsenault@amd.com> | 2023-06-30 22:27:31 -0400 |
---|---|---|
committer | Matt Arsenault <Matthew.Arsenault@amd.com> | 2023-07-05 16:53:01 -0400 |
commit | 9c82dc6a6ba1f3d75b5547680e0a8532684879c9 (patch) | |
tree | b080783f868816b99e44c8d404b744819283b39f /llvm/lib/CodeGen/PrologEpilogInserter.cpp | |
parent | 59c311c5d4a04a6a4f8c4abf140a63af1079e34c (diff) | |
download | llvm-9c82dc6a6ba1f3d75b5547680e0a8532684879c9.zip llvm-9c82dc6a6ba1f3d75b5547680e0a8532684879c9.tar.gz llvm-9c82dc6a6ba1f3d75b5547680e0a8532684879c9.tar.bz2 |
AMDGPU: Always use v_rcp_f16 and v_rsq_f16
These inherited the fast math checks from f32, but the manual suggests
these should be accurate enough for unconditional use. The definition
of correctly rounded is 0.5ulp, but the manual says "0.51ulp". I've
been a bit nervous about changing this as the OpenCL conformance test
does not cover half. Brute force produces identical values compared to
a reference host implementation for all values.
Diffstat (limited to 'llvm/lib/CodeGen/PrologEpilogInserter.cpp')
0 files changed, 0 insertions, 0 deletions