AMDGPU: Correctly expand f64 sqrt intrinsic

rocm-device-libs and llpc were avoiding using f64 sqrt intrinsics in favor of their own expansions. Port the expansion into the backend. Both of these users should be updated to call the intrinsic instead. The library and llpc expansions are slightly different. llpc uses an ldexp to do the scale; the library uses a multiply. Use ldexp to do the scale instead of the multiply. I believe v_ldexp_f64 and v_mul_f64 are always the same number of cycles, but it's cheaper to materialize the 32-bit integer constant than the 64-bit double constant. The libraries have another fast version of sqrt which will be handled separately. I am tempted to do this in an IR expansion instead. In the IR we could take advantage of computeKnownFPClass to avoid the 0-or-inf argument check.
author: Matt Arsenault <Matthew.Arsenault@amd.com> 2022-11-20 08:40:25 -0800
committer: Matt Arsenault <Matthew.Arsenault@amd.com> 2023-07-25 07:54:11 -0400
commit: e3fd8f83a801b1918508c7c0a71cc31bc95ad4d2 (patch)
tree: aa3422ab947d7251f1720ceb1a68e651a054cdb9 /llvm/include
parent: 47b3ada432f8afee9723a4b3d27b3efbef34dedf (diff)
download: llvm-e3fd8f83a801b1918508c7c0a71cc31bc95ad4d2.zip
llvm-e3fd8f83a801b1918508c7c0a71cc31bc95ad4d2.tar.gz
llvm-e3fd8f83a801b1918508c7c0a71cc31bc95ad4d2.tar.bz2
1 files changed, 7 insertions, 0 deletions
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
index a1ff764..5341b57 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
@@ -1181,6 +1181,13 @@ public:
                                 const SrcOp &Op0, const SrcOp &Op1,
                                 std::optional<unsigned> Flags = std::nullopt);
 
+  /// Build and insert a \p Res = G_IS_FPCLASS \p Pred\p Src, \p Mask
+  MachineInstrBuilder buildIsFPClass(const DstOp &Res, const SrcOp &Src,
+                                     unsigned Mask) {
+    return buildInstr(TargetOpcode::G_IS_FPCLASS, {Res},
+                      {Src, SrcOp(static_cast<int64_t>(Mask))});
+  }
+
   /// Build and insert a \p Res = G_SELECT \p Tst, \p Op0, \p Op1
   ///
   /// \pre setBasicBlock or setMI must have been called.
author	Matt Arsenault <Matthew.Arsenault@amd.com>	2022-11-20 08:40:25 -0800
committer	Matt Arsenault <Matthew.Arsenault@amd.com>	2023-07-25 07:54:11 -0400
commit	e3fd8f83a801b1918508c7c0a71cc31bc95ad4d2 (patch)
tree	aa3422ab947d7251f1720ceb1a68e651a054cdb9 /llvm/include
parent	47b3ada432f8afee9723a4b3d27b3efbef34dedf (diff)
download	llvm-e3fd8f83a801b1918508c7c0a71cc31bc95ad4d2.zip llvm-e3fd8f83a801b1918508c7c0a71cc31bc95ad4d2.tar.gz llvm-e3fd8f83a801b1918508c7c0a71cc31bc95ad4d2.tar.bz2