diff options
author | Umang Yadav <29876643+umangyadav@users.noreply.github.com> | 2025-06-09 14:13:31 -0400 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-06-09 13:13:31 -0500 |
commit | 7f08503a3bf3acdd2a58ac712d5e95682ce583dd (patch) | |
tree | b8de92b9e0a13a68a7a215a20c609f8f7912e14d /llvm/lib/Bitcode/Writer/BitcodeWriter.cpp | |
parent | 5d6218d311854a0b5d48ae19636f6abe1e67fc69 (diff) | |
download | llvm-7f08503a3bf3acdd2a58ac712d5e95682ce583dd.zip llvm-7f08503a3bf3acdd2a58ac712d5e95682ce583dd.tar.gz llvm-7f08503a3bf3acdd2a58ac712d5e95682ce583dd.tar.bz2 |
Introduce `arith.scaling_extf` and `arith.scaling_truncf` (#141965)
This PR adds `arith.scaling_truncf` and `arith.scaling_extf` operations
which supports the block quantization following OCP MXFP specs listed
here
https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf
OCP MXFP Spec comes with reference implementation here
https://github.com/microsoft/microxcaling/tree/main
Interesting piece of reference code is this method `_quantize_mx`
https://github.com/microsoft/microxcaling/blob/7bc41952de394f5cc5e782baf132e7c7542eb4e4/mx/mx_ops.py#L173.
Both `arith.scaling_truncf` and `arith.scaling_extf` are designed to be
an elementwise operation. Please see description about them in
`ArithOps.td` file for more details.
Internally,
`arith.scaling_truncf` does the
`arith.truncf(arith.divf(input/(2^scale)))`. `scale` should have
necessary broadcast, clamping, normalization and NaN propagation done
before callling into `arith.scaling_truncf`.
`arith.scaling_extf` does the `arith.mulf(2^scale, input)` after taking
care of necessary data type conversions.
CC: @krzysz00 @dhernandez0 @bjacob @pashu123 @MaheshRavishankar
@tgymnich
---------
Co-authored-by: Prashant Kumar <pk5561@gmail.com>
Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>
Diffstat (limited to 'llvm/lib/Bitcode/Writer/BitcodeWriter.cpp')
0 files changed, 0 insertions, 0 deletions