vllm.compilation.codegen ¶
Code generation for split_gm stitching graph execution.
Generates a plain Python function that replaces the FX GraphModule's interpreter-based execution of the stitching graph, eliminating nn.Module.call overhead and getattr dispatch.
_node_ref ¶
Convert an FX node argument to a source code reference recursively.
Source code in vllm/compilation/codegen.py
compile_execution_fn ¶
compile_execution_fn(
code: str,
submod_callables: dict[str, Callable[..., Any]],
submod_names: list[str],
) -> Callable[..., Any]
Compile execution code and bind submodule callables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code | str | Python source from generate_execution_code(). | required |
submod_callables | dict[str, Callable[..., Any]] | Mapping of submodule names to their callables. | required |
submod_names | list[str] | Ordered list of submodule names matching the indices used in the generated code. | required |
Returns:
| Type | Description |
|---|---|
Callable[..., Any] | A callable that executes the stitching logic. |
Source code in vllm/compilation/codegen.py
generate_execution_code ¶
generate_execution_code(
split_gm: GraphModule,
) -> tuple[str, list[str]]
Generate Python source code from a split_gm's stitching graph.
Walks split_gm.graph.nodes and produces a function that calls submodules via a vllm_submods list, avoiding FX GraphModule overhead and dict lookup cost.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
split_gm | GraphModule | The split graph module produced by split_graph(). | required |
Returns:
| Type | Description |
|---|---|
str | A tuple of (code, submod_names) where code is the Python source |
list[str] | and submod_names is the ordered list of submodule target names |
tuple[str, list[str]] | corresponding to list indices used in the generated code. |