diff options
| author | Sandeep Dasgupta <sdasgup@google.com> | 2025-03-23 05:37:55 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-03-23 07:37:55 -0500 |
| commit | 81d7eef13453f21303acfba773d0903b27ad754b (patch) | |
| tree | 85514f02cb28e96483cddca5c907b01b4065c61f /lldb/packages/Python/lldbsuite/test/gdbclientutils.py | |
| parent | 7bda9caa4981bf8c378f2c721e4e1b172b0e906c (diff) | |
| download | llvm-81d7eef13453f21303acfba773d0903b27ad754b.zip llvm-81d7eef13453f21303acfba773d0903b27ad754b.tar.gz llvm-81d7eef13453f21303acfba773d0903b27ad754b.tar.bz2 | |
Sub-channel quantized type implementation (#120172)
This is an implementation for [RFC: Supporting Sub-Channel Quantization
in
MLIR](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694).
In order to make the review process easier, the PR has been divided into
the following commit labels:
1. **Add implementation for sub-channel type:** Includes the class
design for `UniformQuantizedSubChannelType`, printer/parser and bytecode
read/write support. The existing types (per-tensor and per-axis) are
unaltered.
2. **Add implementation for sub-channel type:** Lowering of
`quant.qcast` and `quant.dcast` operations to Linalg operations.
3. **Adding C/Python Apis:** We first define he C-APIs and build the
Python-APIs on top of those.
4. **Add pass to normalize generic ....:** This pass normalizes
sub-channel quantized types to per-tensor per-axis types, if possible.
A design note:
- **Explicitly storing the `quantized_dimensions`, even when they can be
derived for ranked tensor.**
While it's possible to infer quantized dimensions from the static shape
of the scales (or zero-points) tensor for ranked
data tensors
([ref](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694/3)
for background), there are cases where this can lead to ambiguity and
issues with round-tripping.
```
Consider the example: tensor<2x4x!quant.uniform<i8:f32:{0:2, 0:2}, {{s00:z00, s01:z01}}>>
```
The shape of the scales tensor is [1, 2], which might suggest that only
axis 1 is quantized. While this inference is technically correct, as the
block size for axis 0 is a degenerate case (equal to the dimension
size), it can cause problems with round-tripping. Therefore, even for
ranked tensors, we are explicitly storing the quantized dimensions.
Suggestions welcome!
PS: I understand that the upcoming holidays may impact your schedule, so
please take your time with the review. There's no rush.
Diffstat (limited to 'lldb/packages/Python/lldbsuite/test/gdbclientutils.py')
0 files changed, 0 insertions, 0 deletions
