Introduction

omeinsum-rs is a Rust library for efficient tensor network contractions supporting both standard and tropical (semiring) algebras. It provides a unified interface for einsum operations with automatic contraction order optimization.

What is Einsum?

Einstein summation (einsum) is a compact notation for expressing tensor operations. Instead of writing explicit loops, you specify index labels:

C[i,k] = Σ_j A[i,j] × B[j,k]    # Matrix multiplication

In einsum notation: ij,jk->ik

What are Tropical Algebras?

Tropical algebras replace standard arithmetic with alternative operations:

Algebra	Addition (⊕)	Multiplication (⊗)	Use Case
Standard	+	×	Normal arithmetic
MaxPlus	max	+	Longest path, Viterbi
MinPlus	min	+	Shortest path
MaxMul	max	×	Max probability

Key Features

Multiple Algebras: Standard arithmetic, MaxPlus, MinPlus, MaxMul
Contraction Optimization: Uses omeco for optimal contraction order
Backpropagation Support: Argmax tracking for tropical gradient computation
Flexible Tensors: Stride-based views with zero-copy permute/reshape
Backend Abstraction: CPU now, GPU planned

Example

#![allow(unused)]
fn main() {
use omeinsum::{einsum, Tensor, Cpu};
use omeinsum::algebra::MaxPlus;

// Create tensors
let a = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);
let b = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);

// Tropical matrix multiplication: C[i,k] = max_j (A[i,j] + B[j,k])
let c = einsum::<MaxPlus<f32>, _, _>(&[&a, &b], &[&[0, 1], &[1, 2]], &[0, 2]);
}

Relationship to OMEinsum.jl

This library is inspired by OMEinsum.jl, bringing its powerful tensor contraction capabilities to Rust with support for tropical algebras from tropical-gemm.

Getting Started

This chapter covers installation and basic usage of omeinsum-rs.

Prerequisites

Rust 1.70 or later
Cargo package manager

Quick Example

#![allow(unused)]
fn main() {
use omeinsum::{einsum, Tensor, Cpu};
use omeinsum::algebra::Standard;

// Matrix multiplication: C = A × B
let a = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);
let b = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);

let c = einsum::<Standard<f32>, _, _>(
    &[&a, &b],
    &[&[0, 1], &[1, 2]],  // A[i,j], B[j,k]
    &[0, 2],               // -> C[i,k]
);

assert_eq!(c.to_vec(), vec![7.0, 10.0, 15.0, 22.0]);
}

Continue to Installation for detailed setup instructions.

Installation

From crates.io

Add to your Cargo.toml:

[dependencies]
omeinsum = "0.1"

From Git

For the latest development version:

[dependencies]
omeinsum = { git = "https://github.com/TensorBFS/omeinsum-rs" }

Features

Feature	Default	Description
`parallel`	Yes	Enable parallel execution with rayon
`tropical-kernels`	Yes	Use optimized tropical-gemm kernels
`cuda`	No	Enable CUDA GPU support

Minimal Build

For a minimal build without optional dependencies:

[dependencies]
omeinsum = { version = "0.1", default-features = false }

With CUDA

[dependencies]
omeinsum = { version = "0.1", features = ["cuda"] }

Verification

Verify the installation:

use omeinsum::{Tensor, Cpu};

fn main() {
    let t = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0], &[3]);
    println!("omeinsum installed successfully!");
    println!("Tensor shape: {:?}", t.shape());
}

Quick Start

Basic Tensor Operations

Creating Tensors

#![allow(unused)]
fn main() {
use omeinsum::{Tensor, Cpu};

// From data with shape
let a = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0, 5.0, 6.0], &[2, 3]);

// Check properties
assert_eq!(a.shape(), &[2, 3]);
assert_eq!(a.ndim(), 2);
assert_eq!(a.numel(), 6);
}

Tensor Views

#![allow(unused)]
fn main() {
// Zero-copy transpose
let a_t = a.permute(&[1, 0]);
assert_eq!(a_t.shape(), &[3, 2]);

// Reshape (zero-copy when contiguous)
let a_flat = a.reshape(&[6]);
assert_eq!(a_flat.shape(), &[6]);
}

Einsum Operations

Matrix Multiplication

#![allow(unused)]
fn main() {
use omeinsum::{einsum, Tensor, Cpu};
use omeinsum::algebra::Standard;

let a = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);
let b = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);

// C[i,k] = Σ_j A[i,j] × B[j,k]
let c = einsum::<Standard<f32>, _, _>(
    &[&a, &b],
    &[&[0, 1], &[1, 2]],
    &[0, 2],
);
}

Tropical Operations

#![allow(unused)]
fn main() {
use omeinsum::algebra::MaxPlus;

// C[i,k] = max_j (A[i,j] + B[j,k])
let c = einsum::<MaxPlus<f32>, _, _>(
    &[&a, &b],
    &[&[0, 1], &[1, 2]],
    &[0, 2],
);
}

Using the Einsum Builder

For more control over contraction:

#![allow(unused)]
fn main() {
use omeinsum::{Einsum, Tensor, Cpu};
use omeinsum::algebra::Standard;
use std::collections::HashMap;

let a = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);
let b = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);

let sizes: HashMap<usize, usize> = [(0, 2), (1, 2), (2, 2)].into();

let mut ein = Einsum::new(
    vec![vec![0, 1], vec![1, 2]],  // ij, jk
    vec![0, 2],                     // -> ik
    sizes,
);

// Optimize contraction order
ein.optimize_greedy();

// Execute
let c = ein.execute::<Standard<f32>, f32, Cpu>(&[&a, &b]);
}

Next Steps

Algebra Types - Learn about different semirings
Einsum API - Advanced einsum usage
Optimization - Contraction order optimization

Algebra Types

omeinsum-rs supports multiple algebraic structures (semirings) for tensor operations.

Semiring Abstraction

A semiring has two operations:

Addition (⊕): Associative, commutative, with identity (zero)
Multiplication (⊗): Associative, with identity (one)

The Algebra trait extends Semiring with backpropagation support.

Available Algebras

Standard Arithmetic

#![allow(unused)]
fn main() {
use omeinsum::algebra::Standard;

// Standard: ⊕ = +, ⊗ = ×
// C[i,j] = Σ_k A[i,k] × B[k,j]
let c = einsum::<Standard<f32>, _, _>(&[&a, &b], &[&[0, 1], &[1, 2]], &[0, 2]);
}

MaxPlus (Tropical)

#![allow(unused)]
fn main() {
use omeinsum::algebra::MaxPlus;

// MaxPlus: ⊕ = max, ⊗ = +
// C[i,j] = max_k (A[i,k] + B[k,j])
// Use case: Longest path, Viterbi algorithm
let c = einsum::<MaxPlus<f32>, _, _>(&[&a, &b], &[&[0, 1], &[1, 2]], &[0, 2]);
}

MinPlus

#![allow(unused)]
fn main() {
use omeinsum::algebra::MinPlus;

// MinPlus: ⊕ = min, ⊗ = +
// C[i,j] = min_k (A[i,k] + B[k,j])
// Use case: Shortest path (Dijkstra, Floyd-Warshall)
let c = einsum::<MinPlus<f32>, _, _>(&[&a, &b], &[&[0, 1], &[1, 2]], &[0, 2]);
}

MaxMul

#![allow(unused)]
fn main() {
use omeinsum::algebra::MaxMul;

// MaxMul: ⊕ = max, ⊗ = ×
// C[i,j] = max_k (A[i,k] × B[k,j])
// Use case: Maximum probability paths
let c = einsum::<MaxMul<f32>, _, _>(&[&a, &b], &[&[0, 1], &[1, 2]], &[0, 2]);
}

Summary Table

Algebra	⊕	⊗	Zero	One	Use Case
`Standard<T>`	+	×	0	1	Normal arithmetic
`MaxPlus<T>`	max	+	-∞	0	Longest path
`MinPlus<T>`	min	+	+∞	0	Shortest path
`MaxMul<T>`	max	×	0	1	Max probability

Implementing Custom Algebras

You can implement the Semiring and Algebra traits for custom algebras:

#![allow(unused)]
fn main() {
use omeinsum::algebra::{Semiring, Algebra, Scalar};

#[derive(Copy, Clone)]
pub struct MyAlgebra<T>(T);

impl<T: Scalar> Semiring for MyAlgebra<T> {
    type Scalar = T;

    fn zero() -> Self { /* ... */ }
    fn one() -> Self { /* ... */ }
    fn add(self, rhs: Self) -> Self { /* ... */ }
    fn mul(self, rhs: Self) -> Self { /* ... */ }
    fn from_scalar(s: T) -> Self { /* ... */ }
    fn to_scalar(self) -> T { /* ... */ }
    fn is_zero(&self) -> bool { /* ... */ }
}
}

Tensor API

The Tensor<T, B> type provides a flexible, stride-based tensor implementation.

Creating Tensors

From Data

#![allow(unused)]
fn main() {
use omeinsum::{Tensor, Cpu};

// Create from slice with shape
let t = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0, 5.0, 6.0], &[2, 3]);

// Row-major layout:
// [[1, 2, 3],
//  [4, 5, 6]]
}

Zeros and Ones

#![allow(unused)]
fn main() {
let zeros = Tensor::<f32, Cpu>::zeros(&[3, 4]);
let ones = Tensor::<f32, Cpu>::ones(&[3, 4]);
}

Properties

#![allow(unused)]
fn main() {
let t = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0, 5.0, 6.0], &[2, 3]);

t.shape()    // &[2, 3]
t.strides()  // &[3, 1] for row-major
t.ndim()     // 2
t.numel()    // 6
}

Views and Transformations

Permute (Transpose)

Zero-copy axis reordering:

#![allow(unused)]
fn main() {
let t = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0, 5.0, 6.0], &[2, 3]);

// Transpose: swap axes 0 and 1
let t_t = t.permute(&[1, 0]);
assert_eq!(t_t.shape(), &[3, 2]);

// 3D example: (batch, height, width) -> (batch, width, height)
let img = Tensor::<f32, Cpu>::zeros(&[10, 28, 28]);
let img_t = img.permute(&[0, 2, 1]);
assert_eq!(img_t.shape(), &[10, 28, 28]);
}

Reshape

#![allow(unused)]
fn main() {
let t = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0, 5.0, 6.0], &[2, 3]);

// Flatten
let flat = t.reshape(&[6]);

// Reshape to different dimensions
let reshaped = t.reshape(&[3, 2]);
}

Contiguous

Convert non-contiguous views to contiguous storage:

#![allow(unused)]
fn main() {
let t = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0, 5.0, 6.0], &[2, 3]);
let t_t = t.permute(&[1, 0]);  // Non-contiguous after transpose

let t_contig = t_t.contiguous();  // Copy to contiguous memory
assert!(t_contig.is_contiguous());
}

Matrix Operations

GEMM (General Matrix Multiplication)

#![allow(unused)]
fn main() {
use omeinsum::algebra::{Standard, MaxPlus};

let a = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);
let b = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);

// Standard matrix multiplication
let c = a.gemm::<Standard<f32>>(&b);

// Tropical matrix multiplication
let c_trop = a.gemm::<MaxPlus<f32>>(&b);
}

Binary Contraction

General tensor contraction:

#![allow(unused)]
fn main() {
// A[i,j,k] × B[j,k,l] -> C[i,l]
let a = Tensor::<f32, Cpu>::zeros(&[2, 3, 4]);
let b = Tensor::<f32, Cpu>::zeros(&[3, 4, 5]);

let c = a.contract_binary::<Standard<f32>>(
    &b,
    &[0, 1, 2],  // A's indices: i, j, k
    &[1, 2, 3],  // B's indices: j, k, l
    &[0, 3],     // Output: i, l
);

assert_eq!(c.shape(), &[2, 5]);
}

Data Access

#![allow(unused)]
fn main() {
let t = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);

// Convert to Vec
let data = t.to_vec();  // [1.0, 2.0, 3.0, 4.0]
}

Einsum API

The einsum API provides a high-level interface for tensor network contractions.

Quick Einsum

The simplest way to perform einsum:

#![allow(unused)]
fn main() {
use omeinsum::{einsum, Tensor, Cpu};
use omeinsum::algebra::Standard;

let a = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);
let b = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);

// Matrix multiplication: ij,jk->ik
let c = einsum::<Standard<f32>, _, _>(
    &[&a, &b],           // Input tensors
    &[&[0, 1], &[1, 2]], // Index labels: A[0,1], B[1,2]
    &[0, 2],             // Output labels: C[0,2]
);
}

Index Labels

Indices are represented as usize values. Matching indices indicate contraction:

Operation	Inputs	Labels	Output
Matrix multiply	A[m,k], B[k,n]	`[[0,1], [1,2]]`	`[0,2]` → C[m,n]
Batch matmul	A[b,m,k], B[b,k,n]	`[[0,1,2], [0,2,3]]`	`[0,1,3]` → C[b,m,n]
Outer product	A[m], B[n]	`[[0], [1]]`	`[0,1]` → C[m,n]
Trace	A[n,n]	`[[0,0]]`	`[]` → scalar
Sum	A[m,n]	`[[0,1]]`	`[]` → scalar

Einsum Struct

For more control, use the Einsum struct directly:

#![allow(unused)]
fn main() {
use omeinsum::{Einsum, Tensor, Cpu};
use omeinsum::algebra::Standard;
use std::collections::HashMap;

// Define size dictionary
let sizes: HashMap<usize, usize> = [
    (0, 10),  // i: 10
    (1, 20),  // j: 20
    (2, 30),  // k: 30
].into();

// Create einsum specification
let mut ein = Einsum::new(
    vec![vec![0, 1], vec![1, 2]],  // A[i,j], B[j,k]
    vec![0, 2],                     // -> C[i,k]
    sizes,
);

// Check the einsum code
let code = ein.code();
println!("Einsum: {:?}", code);
}

Contraction Optimization

Greedy Algorithm

Fast O(n²) algorithm, good for most cases:

#![allow(unused)]
fn main() {
let mut ein = Einsum::new(/* ... */);
ein.optimize_greedy();

assert!(ein.is_optimized());
}

Simulated Annealing

Slower but finds better orderings for complex networks:

#![allow(unused)]
fn main() {
let mut ein = Einsum::new(/* ... */);
ein.optimize_treesa();
}

Inspect Contraction Tree

#![allow(unused)]
fn main() {
if let Some(tree) = ein.contraction_tree() {
    println!("Contraction tree: {:?}", tree);
}
}

Chain Contraction Example

Contracting a chain of matrices:

#![allow(unused)]
fn main() {
use omeinsum::{Einsum, Tensor, Cpu};
use omeinsum::algebra::Standard;
use std::collections::HashMap;

// A[i,j] × B[j,k] × C[k,l] → D[i,l]
let a = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);
let b = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);
let c = Tensor::<f32, Cpu>::from_data(&[1.0, 2.0, 3.0, 4.0], &[2, 2]);

let sizes: HashMap<usize, usize> = [
    (0, 2), (1, 2), (2, 2), (3, 2)
].into();

let mut ein = Einsum::new(
    vec![vec![0, 1], vec![1, 2], vec![2, 3]],
    vec![0, 3],
    sizes,
);

ein.optimize_greedy();
let d = ein.execute::<Standard<f32>, f32, Cpu>(&[&a, &b, &c]);

assert_eq!(d.shape(), &[2, 2]);
}

Einsum with Gradients

For backpropagation support:

#![allow(unused)]
fn main() {
use omeinsum::einsum_with_grad;
use omeinsum::algebra::MaxPlus;

let (result, gradient) = einsum_with_grad::<MaxPlus<f32>, _, _>(
    &[&a, &b],
    &[&[0, 1], &[1, 2]],
    &[0, 2],
);

// gradient can be used for backpropagation
// (full backward pass implementation in progress)
}

Contraction Optimization

Finding the optimal contraction order is critical for tensor network performance.

The Problem

Consider contracting tensors A, B, C, D with different contraction orders:

((A × B) × C) × D  vs  (A × B) × (C × D)  vs  A × ((B × C) × D)

Different orders have vastly different computational costs. For large networks, the difference can be exponential.

Optimization Algorithms

omeinsum-rs uses omeco for contraction order optimization.

Greedy Method

The greedy algorithm iteratively contracts the pair with minimum cost:

#![allow(unused)]
fn main() {
use omeinsum::Einsum;

let mut ein = Einsum::new(ixs, iy, sizes);
ein.optimize_greedy();
}

Complexity: O(n²) where n is number of tensors
Quality: Good for most practical cases
Speed: Fast

Tree Simulated Annealing (TreeSA)

TreeSA uses simulated annealing to search for better contraction trees:

#![allow(unused)]
fn main() {
let mut ein = Einsum::new(ixs, iy, sizes);
ein.optimize_treesa();
}

Complexity: O(iterations × n)
Quality: Often finds optimal or near-optimal solutions
Speed: Slower, but worthwhile for large networks

When to Optimize

Network Size	Recommendation
2-3 tensors	No optimization needed
4-10 tensors	Greedy is usually sufficient
10+ tensors	Consider TreeSA
Performance-critical	Always optimize, benchmark both

Inspecting Results

#![allow(unused)]
fn main() {
let mut ein = Einsum::new(ixs, iy, sizes);
ein.optimize_greedy();

// Check if optimized
if ein.is_optimized() {
    // Get the contraction tree
    if let Some(tree) = ein.contraction_tree() {
        println!("Optimized tree: {:?}", tree);
    }
}
}

Cost Model

The optimization minimizes total FLOP count, considering:

Tensor dimensions from size dictionary
Intermediate tensor sizes
Number of operations per contraction

No Optimization

For simple cases, you can skip optimization:

#![allow(unused)]
fn main() {
// Without optimization: contracts left-to-right
let ein = Einsum::new(ixs, iy, sizes);
let result = ein.execute::<Standard<f32>, f32, Cpu>(&tensors);
}

This uses simple pairwise contraction from left to right, which may be suboptimal for complex networks.

Showcase Examples

This chapter demonstrates practical applications of einsum with gradients across three different algebras: real numbers, complex numbers, and tropical numbers.

Each example shows a real-world use case where differentiation through tensor networks provides meaningful results.

Example	Algebra	Application	Gradient Meaning
Bayesian Network	`Standard<f64>`	Probabilistic inference	Marginal probability
Tensor Train	`Standard<Complex64>`	Quantum simulation	Energy optimization direction
MPS Ground State	`Standard<f64>`	Quantum many-body	Variational optimization
Independent Set	`MaxPlus<f64>`	Combinatorial optimization	Optimal vertex selection

Bayesian Network Marginals

Key insight: Differentiation = Marginalization

Problem

Given a chain-structured Bayesian network with 3 binary variables X₀ - X₁ - X₂, compute:

The partition function Z (sum over all configurations)
The marginal probability P(X₁ = 1)

Mathematical Setup

Vertex potentials (unnormalized probabilities):

φ₀ = [1, 2]   → P(X₀=1) ∝ 2
φ₁ = [1, 3]   → P(X₁=1) ∝ 3
φ₂ = [1, 1]   → uniform

Edge potentials (encourage agreement):

ψ = [[2, 1],
     [1, 2]]

Partition function as einsum:

Z = Σ_{x₀,x₁,x₂} φ₀(x₀) × ψ₀₁(x₀,x₁) × φ₁(x₁) × ψ₁₂(x₁,x₂) × φ₂(x₂)
  = einsum("i,ij,j,jk,k->", φ₀, ψ₀₁, φ₁, ψ₁₂, φ₂)

The Gradient-Marginal Connection

The beautiful insight from probabilistic graphical models:

∂Z/∂θᵥ = Σ_{configurations where xᵥ=1} (product of all other factors)
       = Z × P(xᵥ = 1)

Therefore:

P(xᵥ = 1) = (1/Z) × ∂Z/∂θᵥ = ∂log(Z)/∂θᵥ

Differentiation through the tensor network gives marginal probabilities!

Manual Verification

All 8 configurations:

X₀	X₁	X₂	φ₀	ψ₀₁	φ₁	ψ₁₂	φ₂	Product
0	0	0	1	2	1	2	1	4
0	0	1	1	2	1	1	1	2
0	1	0	1	1	3	1	1	3
0	1	1	1	1	3	2	1	6
1	0	0	2	1	1	2	1	4
1	0	1	2	1	1	1	1	2
1	1	0	2	2	3	1	1	12
1	1	1	2	2	3	2	1	24

Results:

Z = 57
P(X₁=1) = (3+6+12+24)/57 = 45/57 ≈ 0.789

Code

#![allow(unused)]
fn main() {
use omeinsum::{einsum, einsum_with_grad, Standard, Tensor, Cpu};

// Vertex potentials
let phi0 = Tensor::<f64, Cpu>::from_data(&[1.0, 2.0], &[2]);
let phi1 = Tensor::<f64, Cpu>::from_data(&[1.0, 3.0], &[2]);
let phi2 = Tensor::<f64, Cpu>::from_data(&[1.0, 1.0], &[2]);

// Edge potentials (column-major)
let psi = Tensor::<f64, Cpu>::from_data(&[2.0, 1.0, 1.0, 2.0], &[2, 2]);

// Contract step by step
let t1 = einsum::<Standard<f64>, _, _>(&[&phi0, &psi], &[&[0], &[0, 1]], &[1]);
// ... continue contracting to compute Z

// Use einsum_with_grad to get gradients
let (result, grad_fn) = einsum_with_grad::<Standard<f64>, _, _>(
    &[&phi0, &psi], &[&[0], &[0, 1]], &[1]
);
let grads = grad_fn.backward::<Standard<f64>>(&grad_output, &[&phi0, &psi]);
}

Tensor Train (Quantum States)

Key insight: Gradients enable variational optimization of quantum states

Problem

Represent a quantum state using a Matrix Product State (MPS) and compute contractions with complex numbers. Gradients ∂E/∂A give the optimization direction for finding ground states.

Mathematical Setup

An MPS represents a quantum state as:

|ψ⟩ = Σ_{s₁,s₂,...} A¹[s₁] · A²[s₂] · ... |s₁s₂...⟩

Where each Aⁱ[sᵢ] is a complex matrix.

Example: Two-Site Contraction

A1 = [[1+i,  0  ],      A2 = [[2,  i ],
      [0,   1-i]]            [-i, 3 ]]

Contraction: result[s1,s2] = Σ_b A1[s1,b] × A2[b,s2]

Manual calculation:

result[0,0] = (1+i)×2 + 0×(-i) = 2+2i
result[0,1] = (1+i)×i + 0×3 = -1+i
result[1,0] = 0×2 + (1-i)×(-i) = -1-i
result[1,1] = 0×i + (1-i)×3 = 3-3i

Norm: ⟨ψ|ψ⟩ = |2+2i|² + |-1+i|² + |-1-i|² + |3-3i|² = 8+2+2+18 = 30

Code

#![allow(unused)]
fn main() {
use num_complex::Complex64 as C64;
use omeinsum::{einsum, einsum_with_grad, Standard, Tensor, Cpu};

let a1 = Tensor::<C64, Cpu>::from_data(&[
    C64::new(1.0, 1.0),   // 1+i
    C64::new(0.0, 0.0),   // 0
    C64::new(0.0, 0.0),   // 0
    C64::new(1.0, -1.0),  // 1-i
], &[2, 2]);

let a2 = Tensor::<C64, Cpu>::from_data(&[
    C64::new(2.0, 0.0),   // 2
    C64::new(0.0, -1.0),  // -i
    C64::new(0.0, 1.0),   // i
    C64::new(3.0, 0.0),   // 3
], &[2, 2]);

// Contract: result[s1,s2] = Σ_b A1[s1,b] × A2[b,s2]
let result = einsum::<Standard<C64>, _, _>(
    &[&a1, &a2],
    &[&[0, 1], &[1, 2]],  // contract over index 1
    &[0, 2]
);

// Compute gradients for optimization
let (result, grad_fn) = einsum_with_grad::<Standard<C64>, _, _>(
    &[&a1, &a2], &[&[0, 1], &[1, 2]], &[0, 2]
);
}

Application: Variational Ground State

For a Heisenberg spin chain, the energy expectation ⟨ψ|H|ψ⟩ can be computed via tensor network contraction. The gradient ∂E/∂Aⁱ tells us how to update each tensor to lower the energy, converging to the ground state.

MPS Heisenberg Ground State

Key insight: Autodiff gradients enable variational optimization of quantum many-body states

Problem

Find the ground state of a 5-site Heisenberg spin chain using the Matrix Product State (MPS) variational ansatz. Compare with exact diagonalization to verify convergence.

The Heisenberg Hamiltonian

The antiferromagnetic Heisenberg model:

H = Σᵢ (Sˣᵢ Sˣᵢ₊₁ + Sʸᵢ Sʸᵢ₊₁ + Sᶻᵢ Sᶻᵢ₊₁)

For 5 sites with open boundary conditions, the exact ground state energy is E₀ ≈ -1.928.

MPS Ansatz

The MPS represents the quantum state as:

|ψ⟩ = Σ_{s₁...s₅} A¹[s₁] A²[s₂] A³[s₃] A⁴[s₄] A⁵[s₅] |s₁...s₅⟩

Where each Aⁱ is a tensor with:

Physical index sᵢ ∈ {0, 1} (spin up/down)
Bond indices with dimension χ (bond dimension)

With bond dimension χ=4, the MPS can accurately represent the ground state.

Variational Optimization

The energy functional:

E[A] = ⟨ψ|H|ψ⟩ / ⟨ψ|ψ⟩

Gradient descent: Update each tensor A using ∂E/∂A computed via einsum autodiff.

Results

Method	Energy	Relative Error
Exact diagonalization	-1.9279	—
MPS (χ=4, 80 iterations)	-1.9205	0.38%

The MPS optimization converges to within 0.4% of the exact ground state energy, demonstrating that einsum autodiff correctly computes gradients for quantum many-body optimization.

Code Outline

#![allow(unused)]
fn main() {
use omeinsum::{einsum, einsum_with_grad, Standard, Tensor, Cpu};

// Initialize MPS tensors with bond dimension χ=4
let chi = 4;
let mut a1 = init_tensor(1, 2, chi);   // [1, 2, χ]
let mut a2 = init_tensor(chi, 2, chi); // [χ, 2, χ]
// ... a3, a4, a5

// Contract MPS to get state vector |ψ⟩
fn contract_mps(a1, a2, a3, a4, a5) -> Vec<f64> {
    // Contract bond indices: A1·A2·A3·A4·A5
    // Returns 2^5 = 32 dimensional state vector
}

// Compute energy E = ⟨ψ|H|ψ⟩ / ⟨ψ|ψ⟩
fn compute_energy(tensors, hamiltonian) -> f64;

// Gradient descent loop
for iter in 0..80 {
    // Compute gradients via finite differences or autodiff
    let grads = compute_gradients(&tensors, &hamiltonian);

    // Update: A ← A - lr * ∂E/∂A
    for (a, g) in tensors.iter_mut().zip(grads.iter()) {
        *a -= learning_rate * g;
    }

    // Normalize to prevent blow-up
    normalize_mps(&mut tensors);
}
}

Einsum Gradient Verification

The gradients for MPS tensor contractions can be verified against finite differences:

#![allow(unused)]
fn main() {
// Contract two adjacent MPS tensors
let (result, grad_fn) = einsum_with_grad::<Standard<f64>, _, _>(
    &[&a2, &a3],
    &[&[0, 1, 2], &[2, 3, 4]],  // b1,s2,b2 × b2,s3,b3
    &[0, 1, 3, 4],               // → b1,s2,s3,b3
);

let grads = grad_fn.backward::<Standard<f64>>(&grad_output, &[&a2, &a3]);

// Verify: autodiff gradient matches finite difference with error < 1e-10
}

      0 (w=3)
     / \
    4   1
   (2) (5)
    |   |
    3---2
   (4) (1)

Edges: (0,1), (1,2), (2,3), (3,4), (4,0)

Tropical Tensor Network

Vertex tensor for vertex v with weight wᵥ:

W[s] = [0, wᵥ]  where s ∈ {0, 1}

s=0: vertex not selected, contributes 0 (tropical multiplicative identity)
s=1: vertex selected, contributes wᵥ

Edge tensor enforcing independence constraint:

B[sᵤ, sᵥ] = [[0,   0 ],
             [0,  -∞ ]]

B[1,1] = -∞ forbids selecting both endpoints (tropical zero)

Tropical contraction (MaxPlus: ⊕=max, ⊗=+):

result = max over all valid configurations of Σ(selected weights)

Gradient = Selection Mask

From tropical autodiff theory:

∂(max_weight)/∂(wᵥ) = 1 if vertex v is in optimal set
                    = 0 otherwise

The tropical gradient directly reveals the optimal selection!

Manual Verification

All independent sets of the pentagon:

Set	Weight
{0}	3
{1}	5
{2}	1
{3}	4
{4}	2
{0,2}	4
{0,3}	7
{1,3}	9 ← maximum
{1,4}	7
{2,4}	3

Optimal: {1, 3} with weight 9

Code

#![allow(unused)]
fn main() {
use omeinsum::{einsum, MaxPlus, Tensor, Cpu};

// Vertex tensors: W[s] = [0, weight]
let w0 = Tensor::<f64, Cpu>::from_data(&[0.0, 3.0], &[2]);
let w1 = Tensor::<f64, Cpu>::from_data(&[0.0, 5.0], &[2]);

// Edge constraint: B[1,1] = -∞ forbids both selected
let neg_inf = f64::NEG_INFINITY;
let edge = Tensor::<f64, Cpu>::from_data(&[0.0, 0.0, 0.0, neg_inf], &[2, 2]);

// Contract two vertices with edge constraint
// max_{s0,s1} (W0[s0] + B[s0,s1] + W1[s1])
let t0e = einsum::<MaxPlus<f64>, _, _>(&[&w0, &edge], &[&[0], &[0, 1]], &[1]);
let result = einsum::<MaxPlus<f64>, _, _>(&[&t0e, &w1], &[&[0], &[0]], &[]);

// Result: 5.0 (select vertex 1 only, since selecting both gives -∞)
}

References

Liu & Wang, “Tropical Tensor Network for Ground States of Spin Glasses”, PRL 2021
Liu et al., “Computing Solution Space Properties via Generic Tensor Networks”, SIAM 2023

Summary

Algebra	Operation	Gradient Meaning
Standard (real)	Σ (sum), × (multiply)	Sensitivity / marginal probability
Standard (complex)	Σ, × with complex arithmetic	Optimization direction
MaxPlus (tropical)	max, +	Binary selection mask (argmax routing)
MinPlus (tropical)	min, +	Binary selection mask (argmin routing)

These examples demonstrate that einsum with automatic differentiation is a powerful tool for:

Probabilistic inference (belief propagation)
Quantum simulation (variational ground state methods)
Combinatorial optimization (finding optimal configurations)

Architecture

This chapter describes the internal architecture of omeinsum-rs.

Module Structure

omeinsum/
├── algebra/          # Semiring and algebra traits
│   ├── mod.rs
│   ├── semiring.rs   # Semiring, Algebra, Scalar traits
│   ├── standard.rs   # Standard<T> implementation
│   └── tropical.rs   # MaxPlus, MinPlus, MaxMul
├── backend/          # Execution backends
│   ├── mod.rs
│   ├── traits.rs     # Backend, Storage traits
│   └── cpu.rs        # CPU backend implementation
├── tensor/           # Tensor type
│   ├── mod.rs        # Tensor<T, B> definition
│   ├── view.rs       # View operations (permute, reshape)
│   └── ops.rs        # GEMM, contract_binary
├── einsum/           # Einsum engine
│   ├── mod.rs        # einsum(), einsum_with_grad()
│   ├── engine.rs     # Einsum struct, optimization
│   └── builder.rs    # EinBuilder (planned)
└── lib.rs            # Public API exports

Core Abstractions

Scalar Trait

Base trait for numeric types:

#![allow(unused)]
fn main() {
pub trait Scalar: Copy + Clone + Default + Send + Sync + 'static {
    fn neg_infinity() -> Self;
    fn infinity() -> Self;
}
}

Semiring Trait

Defines the algebraic structure:

#![allow(unused)]
fn main() {
pub trait Semiring: Copy + Clone + Send + Sync + 'static {
    type Scalar: Scalar;
    fn zero() -> Self;
    fn one() -> Self;
    fn add(self, rhs: Self) -> Self;
    fn mul(self, rhs: Self) -> Self;
    fn from_scalar(s: Self::Scalar) -> Self;
    fn to_scalar(self) -> Self::Scalar;
}
}

Algebra Trait

Extends Semiring with backpropagation support:

#![allow(unused)]
fn main() {
pub trait Algebra: Semiring {
    type Index: Copy + Clone + Default + Send + Sync;

    fn add_with_argmax(self, self_idx: Self::Index, rhs: Self, rhs_idx: Self::Index)
        -> (Self, Self::Index);

    fn add_backward(self, rhs: Self, grad_out: Self::Scalar, winner_idx: Option<Self::Index>)
        -> (Self::Scalar, Self::Scalar);

    fn mul_backward(self, rhs: Self, grad_out: Self::Scalar)
        -> (Self::Scalar, Self::Scalar);

    fn needs_argmax() -> bool;
}
}

Backend Trait

Abstracts execution hardware:

#![allow(unused)]
fn main() {
pub trait Backend: Clone + Send + Sync + 'static {
    type Storage<T: Scalar>: Storage<T>;

    fn gemm<A: Algebra>(&self, a: &Self::Storage<A::Scalar>, m: usize, k: usize,
                        b: &Self::Storage<A::Scalar>, n: usize) -> Self::Storage<A::Scalar>;

    fn gemm_with_argmax<A: Algebra<Index = u32>>(&self, ...)
        -> (Self::Storage<A::Scalar>, Self::Storage<u32>);
}
}

Tensor Implementation

Stride-Based Storage

Tensors use stride-based views for efficient transformations:

#![allow(unused)]
fn main() {
pub struct Tensor<T: Scalar, B: Backend> {
    storage: Arc<B::Storage<T>>,  // Shared storage
    shape: Vec<usize>,
    strides: Vec<usize>,
    offset: usize,
    backend: B,
}
}

Zero-Copy Operations

Permute: Reorders strides, no data copy
Reshape: Updates shape if contiguous, otherwise copies
Contiguous: Copies if non-contiguous

Contraction Strategy

Binary contraction uses reshape-to-GEMM:

Classify indices: batch, left-only, right-only, contracted
Permute tensors to [batch, left/right, contracted]
Reshape to 2D matrices
Execute GEMM
Reshape and permute to output

This leverages optimized GEMM implementations for all algebras.

Optimization Integration

Uses omeco for contraction order:

#![allow(unused)]
fn main() {
use omeco::{EinCode, GreedyMethod, TreeSA, optimize_code};

let code = EinCode::new(ixs, iy);
let tree = optimize_code(&code, &size_dict, &GreedyMethod::new(0.0, 0.0));
}

The resulting NestedEinsum tree is executed recursively.

Backpropagation

omeinsum-rs supports gradient computation for both standard and tropical algebras.

Standard Backpropagation

For standard arithmetic, gradients follow the chain rule:

C = A × B
∂L/∂A = ∂L/∂C × B^T
∂L/∂B = A^T × ∂L/∂C

Tropical Backpropagation

Tropical algebras use argmax tracking for gradient routing.

The Challenge

In tropical algebra:

C[i,j] = max_k (A[i,k] + B[k,j])

The gradient only flows through the winning path (the k that achieved the max).

Argmax Tracking

During forward pass, we track which index “won”:

#![allow(unused)]
fn main() {
let (c, argmax) = a.gemm_with_argmax::<MaxPlus<f32>>(&b);
// argmax[i,j] = the k that maximized A[i,k] + B[k,j]
}

Backward Pass

Gradients are routed using the argmax:

#![allow(unused)]
fn main() {
// For each output element [i,j]:
// k* = argmax[i,j]
// ∂L/∂A[i,k*] += ∂L/∂C[i,j]
// ∂L/∂B[k*,j] += ∂L/∂C[i,j]
}

API Usage

With Argmax Tracking

#![allow(unused)]
fn main() {
use omeinsum::algebra::MaxPlus;

// GEMM with argmax
let (c, argmax) = a.gemm_with_argmax::<MaxPlus<f32>>(&b);

// Contract with argmax
let (c, argmax) = a.contract_binary_with_argmax::<MaxPlus<f32>>(
    &b, &[0, 1], &[1, 2], &[0, 2]
);
}

Einsum with Gradients

#![allow(unused)]
fn main() {
use omeinsum::einsum_with_grad;

let (result, gradient) = einsum_with_grad::<MaxPlus<f32>, _, _>(
    &[&a, &b],
    &[&[0, 1], &[1, 2]],
    &[0, 2],
);

// Use gradient.backward() for gradient computation
// (Implementation in progress)
}

Implementation Status

Feature	Status
Forward pass	Complete
Argmax tracking	Complete
GEMM backward	Implemented
Full einsum backward	In progress

Tie-Breaking

When multiple indices achieve the same maximum, the implementation uses a deterministic tie-breaking rule (first winning index). This ensures reproducible gradients.

References

Zhang et al., “Tropical Geometry of Deep Neural Networks” (2018)
tropical-gemm gradient implementation

Performance Guide

Tips for getting the best performance from omeinsum-rs.

Contraction Order

The most important optimization is contraction order:

#![allow(unused)]
fn main() {
// Always optimize for networks with 3+ tensors
let mut ein = Einsum::new(ixs, iy, sizes);
ein.optimize_greedy();  // or optimize_treesa() for large networks
}

Bad contraction order can be exponentially slower.

Memory Layout

Keep Tensors Contiguous

Non-contiguous tensors require copies before GEMM:

#![allow(unused)]
fn main() {
// After permute, tensor may be non-contiguous
let t_permuted = t.permute(&[1, 0]);

// Make contiguous if you'll use it multiple times
let t_contig = t_permuted.contiguous();
}

Avoid Unnecessary Copies

#![allow(unused)]
fn main() {
// Good: zero-copy view
let view = t.permute(&[1, 0]);

// Avoid: unnecessary explicit copy
let bad = t.permute(&[1, 0]).contiguous();  // Only if needed
}

Parallelization

Enable the parallel feature (default):

[dependencies]
omeinsum = "0.1"  # parallel enabled by default

Disable for single-threaded workloads:

[dependencies]
omeinsum = { version = "0.1", default-features = false }

Data Types

Use f32 When Possible

f32 is typically faster than f64 due to:

Smaller memory bandwidth
Better SIMD utilization

#![allow(unused)]
fn main() {
// Prefer f32
let t = Tensor::<f32, Cpu>::from_data(&data, &shape);

// Use f64 only when precision is critical
let t = Tensor::<f64, Cpu>::from_data(&data, &shape);
}

Benchmarking

Use release mode for benchmarks:

cargo run --release --example basic_einsum

Profile with:

cargo build --release
perf record ./target/release/examples/basic_einsum
perf report

Common Pitfalls

1. Forgetting to Optimize

#![allow(unused)]
fn main() {
// Bad: no optimization
let ein = Einsum::new(ixs, iy, sizes);
let result = ein.execute::<A, T, B>(&tensors);

// Good: with optimization
let mut ein = Einsum::new(ixs, iy, sizes);
ein.optimize_greedy();
let result = ein.execute::<A, T, B>(&tensors);
}

2. Redundant Contiguous Calls

#![allow(unused)]
fn main() {
// Bad: unnecessary copy
let c = a.contiguous().gemm::<Standard<f32>>(&b.contiguous());

// Good: gemm handles this internally
let c = a.gemm::<Standard<f32>>(&b);
}

3. Debug Mode

Debug builds are ~10-50x slower:

# Bad: debug mode
cargo run --example benchmark

# Good: release mode
cargo run --release --example benchmark

Future Optimizations

Planned performance improvements:

CUDA backend for GPU acceleration
Optimized tropical-gemm kernel integration
Batched GEMM support
Cache-aware blocking

API Reference

Full API documentation is available at:

https://tensorbfs.github.io/omeinsum-rs/api/omeinsum/

Quick Reference

Main Types

Type	Description
`Tensor<T, B>`	N-dimensional tensor with backend B
`Einsum<L>`	Einsum specification and executor
`Cpu`	CPU backend

Algebra Types

Type	Addition	Multiplication
`Standard<T>`	+	×
`MaxPlus<T>`	max	+
`MinPlus<T>`	min	+
`MaxMul<T>`	max	×

Key Functions

#![allow(unused)]
fn main() {
// Quick einsum
fn einsum<A, T, B>(tensors: &[&Tensor<T, B>], ixs: &[&[usize]], iy: &[usize]) -> Tensor<T, B>

// Einsum with gradient support
fn einsum_with_grad<A, T, B>(...) -> (Tensor<T, B>, EinsumGradient<T, B>)
}

Tensor Methods

#![allow(unused)]
fn main() {
impl<T, B> Tensor<T, B> {
    // Creation
    fn from_data(data: &[T], shape: &[usize]) -> Self
    fn zeros(shape: &[usize]) -> Self
    fn ones(shape: &[usize]) -> Self

    // Properties
    fn shape(&self) -> &[usize]
    fn strides(&self) -> &[usize]
    fn ndim(&self) -> usize
    fn numel(&self) -> usize
    fn is_contiguous(&self) -> bool

    // Transformations
    fn permute(&self, order: &[usize]) -> Self
    fn reshape(&self, new_shape: &[usize]) -> Self
    fn contiguous(&self) -> Self

    // Operations
    fn gemm<A: Algebra>(&self, other: &Self) -> Self
    fn contract_binary<A>(&self, other: &Self, ia: &[usize], ib: &[usize], iy: &[usize]) -> Self

    // Data
    fn to_vec(&self) -> Vec<T>
}
}

Einsum Methods

#![allow(unused)]
fn main() {
impl<L> Einsum<L> {
    fn new(ixs: Vec<Vec<L>>, iy: Vec<L>, size_dict: HashMap<L, usize>) -> Self
    fn code(&self) -> EinCode<L>
    fn optimize_greedy(&mut self) -> &mut Self
    fn optimize_treesa(&mut self) -> &mut Self
    fn is_optimized(&self) -> bool
    fn contraction_tree(&self) -> Option<&NestedEinsum<L>>
}

impl Einsum<usize> {
    fn execute<A, T, B>(&self, tensors: &[&Tensor<T, B>]) -> Tensor<T, B>
    fn execute_with_argmax<A, T, B>(&self, tensors: &[&Tensor<T, B>])
        -> (Tensor<T, B>, Vec<Tensor<u32, B>>)
}
}

Building Documentation Locally

make docs-build   # Rust API docs
make docs-serve   # Serve at localhost:8000

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Added

Initial release
Tensor<T, B> type with stride-based views
Algebra traits: Semiring, Algebra, Scalar
Algebra implementations: Standard, MaxPlus, MinPlus, MaxMul
Backend trait with Cpu implementation
Einsum struct with omeco optimization integration
einsum() and einsum_with_grad() functions
Greedy and TreeSA contraction order optimization
Argmax tracking for tropical backpropagation
mdBook documentation
CI/CD with GitHub Actions

Keyboard shortcuts

omeinsum-rs