vllm.model_executor.layers.quantization.turboquant.quantizer ¶
TurboQuant quantizer utilities.
Serving path uses generate_wht_signs() for WHT rotation sign buffers. Triton kernels handle all quantization, packing, and dequantization on GPU.
generate_wht_signs ¶
Generate deterministic random ±1 signs for WHT rotation.
Used with Walsh-Hadamard Transform for per-layer rotation randomization. Same seed derivation as QR (per-layer via seed + layer_idx * stride).