Skip to content

vllm.v1.attention.ops

Modules:

Name Description
common
dcp_alltoall

DCP All-to-All communication backend for attention.

flashmla
merge_attn_states
rocm_aiter_mla_sparse
triton_decode_attention

Memory-efficient attention for decoding.

triton_prefill_attention

Memory-efficient attention for prefill.

triton_reshape_and_cache_flash
triton_turboquant_decode

Triton fused TurboQuant decode attention.

triton_turboquant_store

Fused Triton kernels for TurboQuant KV store.

triton_unified_attention
vit_attn_wrappers

This file contains ops for ViT attention to be compatible with torch.compile

xpu_mla_sparse