diff options
| author | Alex MacLean <amaclean@nvidia.com> | 2024-04-23 08:56:39 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2024-04-23 08:56:39 -0700 |
| commit | df608051234c256f1dc2c89f30afd034706c2c2e (patch) | |
| tree | 11e5882cfcf1a7881dda4b31664c24403157df6d /llvm/lib/CodeGen/PatchableFunction.cpp | |
| parent | f426be195a08874686d01783bbc490295bf4afb2 (diff) | |
| download | llvm-df608051234c256f1dc2c89f30afd034706c2c2e.zip llvm-df608051234c256f1dc2c89f30afd034706c2c2e.tar.gz llvm-df608051234c256f1dc2c89f30afd034706c2c2e.tar.bz2 | |
[NVPTX] Improve support for rsqrt.approx (#89417)
Complete support for rsqrt.approx with rsqrt.approx.f64 ([PTX ISA
9.7.3.17. Floating Point Instructions:
rsqrt.approx.ftz.f64](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-rsqrt-approx-ftz-f64)).
Additionally, add support for folding `sqrt` into `rsqrt`, with an
optional flag to disable.
Diffstat (limited to 'llvm/lib/CodeGen/PatchableFunction.cpp')
0 files changed, 0 insertions, 0 deletions
