Expand description
SIMD-optimized microkernels for tropical GEMM.
This module provides architecture-specific SIMD implementations of the microkernel, which is the innermost loop of the BLIS-style GEMM algorithm.
§Supported Architectures
| Architecture | Instruction Set | Register Width | Supported Types |
|---|---|---|---|
| x86_64 | AVX-512 | 512-bit | f32, f64 |
| x86_64 | AVX2 | 256-bit | f32, f64 |
| x86_64 | SSE4.1 | 128-bit | f32, f64 |
| aarch64 | NEON | 128-bit | f32 |
| Any | Portable | Scalar | All types |
§Runtime Dispatch
At runtime, tropical_gemm_dispatch selects the best kernel:
ⓘ
// Automatically uses AVX2 on supported CPUs
tropical_gemm_dispatch::<MaxPlus<f32>>(...);The dispatch mechanism:
simd_level()detects CPU features at runtimeKernelDispatchtrait routes to the appropriate implementation- Falls back to portable kernel if no SIMD available
§Microkernel Design
For tropical MaxPlus f32 with AVX2 (8-wide vectors):
// MR×NR = 8×8 output tile
for k in 0..KC:
a_vec = load_8xf32(packed_a) // 8 elements from A column
for j in 0..8:
b_scalar = broadcast(packed_b[j]) // 1 element from B row
prod = a_vec + b_scalar // tropical multiply
c[j] = max(c[j], prod) // tropical accumulate§Module Contents
detect: CPU feature detection (SimdLevel)dispatch: Runtime kernel selection (KernelDispatch)kernels: Architecture-specific microkernel implementations
Re-exports§
pub use dispatch::tropical_gemm_dispatch;pub use dispatch::KernelDispatch;pub use kernels::*;
Modules§
Enums§
- Simd
Level - CPU feature detection for runtime SIMD dispatch. Available SIMD instruction sets.
Functions§
- simd_
level - Get the detected SIMD level (cached).