Module simd

Module simd 

Source
Expand description

SIMD-optimized microkernels for tropical GEMM.

This module provides architecture-specific SIMD implementations of the microkernel, which is the innermost loop of the BLIS-style GEMM algorithm.

§Supported Architectures

ArchitectureInstruction SetRegister WidthSupported Types
x86_64AVX-512512-bitf32, f64
x86_64AVX2256-bitf32, f64
x86_64SSE4.1128-bitf32, f64
aarch64NEON128-bitf32
AnyPortableScalarAll types

§Runtime Dispatch

At runtime, tropical_gemm_dispatch selects the best kernel:

// Automatically uses AVX2 on supported CPUs
tropical_gemm_dispatch::<MaxPlus<f32>>(...);

The dispatch mechanism:

  1. simd_level() detects CPU features at runtime
  2. KernelDispatch trait routes to the appropriate implementation
  3. Falls back to portable kernel if no SIMD available

§Microkernel Design

For tropical MaxPlus f32 with AVX2 (8-wide vectors):

// MR×NR = 8×8 output tile
for k in 0..KC:
    a_vec = load_8xf32(packed_a)     // 8 elements from A column
    for j in 0..8:
        b_scalar = broadcast(packed_b[j])  // 1 element from B row
        prod = a_vec + b_scalar            // tropical multiply
        c[j] = max(c[j], prod)             // tropical accumulate

§Module Contents

Re-exports§

pub use dispatch::tropical_gemm_dispatch;
pub use dispatch::KernelDispatch;
pub use kernels::*;

Modules§

dispatch
kernels
SIMD microkernel implementations.

Enums§

SimdLevel
CPU feature detection for runtime SIMD dispatch. Available SIMD instruction sets.

Functions§

simd_level
Get the detected SIMD level (cached).