diff options
| author | peterbell10 <peterbell10@openai.com> | 2025-01-16 14:53:24 +0000 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-01-16 14:53:24 +0000 |
| commit | 5e5fd0e6fc50cc1198750308c11433a5b3acfd0f (patch) | |
| tree | 916b8b41a46a02df791ad5dc1bc8109b8274b719 /lldb/source/Plugins/ScriptInterpreter/Python/lldb-python.h | |
| parent | 9033e0c2d22c9f247eccea50ae8c975eb3468ac1 (diff) | |
| download | llvm-5e5fd0e6fc50cc1198750308c11433a5b3acfd0f.zip llvm-5e5fd0e6fc50cc1198750308c11433a5b3acfd0f.tar.gz llvm-5e5fd0e6fc50cc1198750308c11433a5b3acfd0f.tar.bz2 | |
[NVPTX] Select bfloat16 add/mul/sub as fma on SM80 (#121065)
SM80 has fma for bfloat16 but not add/mul/sub. Currently these ops incur
a promotion to f32, but we can avoid this by writing them in terms of
the fma:
```
FADD(a, b) -> FMA(a, 1.0, b)
FMUL(a, b) -> FMA(a, b, -0.0)
FSUB(a, b) -> FMA(b, -1.0, a)
```
Unfortunately there is no `fma.ftz` so when ftz is enabled, we still
fall back to promotion.
Diffstat (limited to 'lldb/source/Plugins/ScriptInterpreter/Python/lldb-python.h')
0 files changed, 0 insertions, 0 deletions
