riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Sandeep Dasgupta <sdasgup@google.com>	2025-03-23 05:37:55 -0700
committer	GitHub <noreply@github.com>	2025-03-23 07:37:55 -0500
commit	81d7eef13453f21303acfba773d0903b27ad754b (patch)
tree	85514f02cb28e96483cddca5c907b01b4065c61f /lldb/packages/Python/lldbsuite/test/gdbclientutils.py
parent	7bda9caa4981bf8c378f2c721e4e1b172b0e906c (diff)
download	llvm-81d7eef13453f21303acfba773d0903b27ad754b.zip llvm-81d7eef13453f21303acfba773d0903b27ad754b.tar.gz llvm-81d7eef13453f21303acfba773d0903b27ad754b.tar.bz2

Sub-channel quantized type implementation (#120172)

This is an implementation for [RFC: Supporting Sub-Channel Quantization in MLIR](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694). In order to make the review process easier, the PR has been divided into the following commit labels: 1. **Add implementation for sub-channel type:** Includes the class design for `UniformQuantizedSubChannelType`, printer/parser and bytecode read/write support. The existing types (per-tensor and per-axis) are unaltered. 2. **Add implementation for sub-channel type:** Lowering of `quant.qcast` and `quant.dcast` operations to Linalg operations. 3. **Adding C/Python Apis:** We first define he C-APIs and build the Python-APIs on top of those. 4. **Add pass to normalize generic ....:** This pass normalizes sub-channel quantized types to per-tensor per-axis types, if possible. A design note: - **Explicitly storing the `quantized_dimensions`, even when they can be derived for ranked tensor.** While it's possible to infer quantized dimensions from the static shape of the scales (or zero-points) tensor for ranked data tensors ([ref](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694/3) for background), there are cases where this can lead to ambiguity and issues with round-tripping. ``` Consider the example: tensor<2x4x!quant.uniform<i8:f32:{0:2, 0:2}, {{s00:z00, s01:z01}}>> ``` The shape of the scales tensor is [1, 2], which might suggest that only axis 1 is quantized. While this inference is technically correct, as the block size for axis 0 is a degenerate case (equal to the dimension size), it can cause problems with round-tripping. Therefore, even for ranked tensors, we are explicitly storing the quantized dimensions. Suggestions welcome! PS: I understand that the upcoming holidays may impact your schedule, so please take your time with the review. There's no rush.

Diffstat (limited to 'lldb/packages/Python/lldbsuite/test/gdbclientutils.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: