Performance Tips

Optimize contraction orders

Let us use a problem instance from the "Promedus" dataset of the UAI 2014 competition as an example.

using TensorInference
problem = problem_from_artifact("uai2014", "MAR", "Promedus", 11)
model, evidence = read_model(problem), read_evidence(problem);

Next, we select the tensor network contraction order optimizer.

optimizer = TreeSA(ntrials = 1, niters = 5, βs = 0.1:0.3:100)
TreeSA{Int64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, GreedyMethod{OMEinsumContractionOrders.MinSpaceOut}, Any}(20, 0.1:0.3:100.0, 1, 5, 1.0, 0.2, :greedy, 0, Any[], GreedyMethod{OMEinsumContractionOrders.MinSpaceOut}(OMEinsumContractionOrders.MinSpaceOut(), 1))

Here, we choose the local search based TreeSA algorithm, which often finds the smallest time/space complexity and supports slicing. One can type ?TreeSA in a Julia REPL for more information about how to configure the hyper-parameters of the TreeSA method, while the detailed algorithm explanation is in arXiv: 2108.05665. Alternative tensor network contraction order optimizers include

tn = TensorNetworkModel(model; optimizer, evidence);

The returned object tn contains a field code that specifies the tensor network with optimized contraction order. To check the contraction complexity, please type

contraction_complexity(tn)
Time complexity: 2^22.113790484293556
Space complexity: 2^18.0
Read-write complexity: 2^20.75872827391175

The returned object contains log2 values of the number of multiplications, the number elements in the largest tensor during contraction and the number of read-write operations to tensor elements.

probability(tn)
exp(-19.322038772705973) * fill(1.0)

Using the slicing technique to reduce the memory cost

For large scale applications, it is also possible to slice over certain degrees of freedom to reduce the space complexity, i.e. loop and accumulate over certain degrees of freedom so that one can have a smaller tensor network inside the loop due to the removal of these degrees of freedom. In the TreeSA optimizer, one can set nslices to a value larger than zero to turn on this feature. As a comparison we slice over 5 degrees of freedom, which can reduce the space complexity by at most 5. In this application, the slicing achieves the largest possible space complexity reduction 5, while the time and read-write complexity are only increased by less than 1, i.e. the peak memory usage is reduced by a factor $32$, while the (theoretical) computing time is increased by at a factor $< 2$.

optimizer = TreeSA(ntrials = 1, niters = 5, βs = 0.1:0.3:100, nslices=5)
tn = TensorNetworkModel(model; optimizer, evidence);
contraction_complexity(tn)
Time complexity: 2^20.91783823187383
Space complexity: 2^9.0
Read-write complexity: 2^20.104230170626643

Faster Tropical tensor contraction to speed up MAP and MMAP

One can enjoy the BLAS level speed provided by TropicalGEMM by importing the package with

using TropicalGEMM

The benchmark in the TropicalGEMM repo shows this performance is close to the theoretical optimal value. Its implementation on GPU is under development in Github repo CuTropicalGEMM.jl as a part of Open Source Promotion Plan summer program.

Working with GPUs

To upload the computation to GPU, you just add using CUDA before calling the solve function, and set the keyword argument usecuda to true.

julia> using CUDA
[ Info: OMEinsum loaded the CUDA module successfully

julia> marginals(tn; usecuda = true);

Functions support usecuda keyword argument includes

Benchmarks

For performance metrics, see the Performance evaluation section.


This page was generated using Literate.jl.