Performance Tips

Optimize contraction orders

Let us use a problem instance from the "Promedus" dataset of the UAI 2014 competition as an example.

using TensorInference
problem = problem_from_artifact("uai2014", "MAR", "Promedus", 11)
model, evidence = read_model(problem), read_evidence(problem);

Next, we select the tensor network contraction order optimizer.

optimizer = TreeSA(ntrials = 1, niters = 5, βs = 0.1:0.3:100)
TreeSA{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, OMEinsumContractionOrders.TreeDecomp}(0.1:0.3:100.0, 1, 5, :greedy, ScoreFunction(1.0, 1.0, 0.0, 20.0), OMEinsumContractionOrders.TreeDecomp())

Here, we choose the local search based TreeSA algorithm, which often finds the smallest time/space complexity and supports slicing. One can type ?TreeSA in a Julia REPL for more information about how to configure the hyper-parameters of the TreeSA method, while the detailed algorithm explanation is in arXiv: 2108.05665. Alternative tensor network contraction order optimizers include

tn = TensorNetworkModel(model; optimizer, evidence);

The returned object tn contains a field code that specifies the tensor network with optimized contraction order. To check the contraction complexity, please type

contraction_complexity(tn)
Time complexity: 2^19.85878237771716
Space complexity: 2^13.0
Read-write complexity: 2^17.262360993942238

The returned object contains log2 values of the number of multiplications, the number elements in the largest tensor during contraction and the number of read-write operations to tensor elements.

probability(tn)
exp(-19.32203877270598) * fill(1.0)

Using the slicing technique to reduce the memory cost

For large scale applications, it is also possible to slice over certain degrees of freedom to reduce the space complexity, i.e. loop and accumulate over certain degrees of freedom so that one can have a smaller tensor network inside the loop due to the removal of these degrees of freedom. One can use the slicer keyword argument to reduce the space complexity by slicing over certain degrees of freedom. In the following example, we use the TreeSASlicer to reduce the space complexity to sc_target=10. In this application, the slicing achieves the largest possible space complexity reduction 5, while the time and read-write complexity are only increased by less than 1, i.e. the peak memory usage is reduced by a factor $32$, while the (theoretical) computing time is increased by at a factor $< 2$.

optimizer = TreeSA(ntrials = 1, niters = 5, βs = 0.1:0.3:100)
tn = TensorNetworkModel(model; optimizer, evidence, slicer=TreeSASlicer(score=ScoreFunction(sc_target=10)));
contraction_complexity(tn)
Time complexity: 2^22.08908835496928
Space complexity: 2^10.0
Read-write complexity: 2^20.52279605612031

Faster Tropical tensor contraction to speed up MAP and MMAP

One can enjoy the BLAS level speed provided by TropicalGEMM by importing the package with

using TropicalGEMM

The benchmark in the TropicalGEMM repo shows this performance is close to the theoretical optimal value. Its implementation on GPU is under development in Github repo CuTropicalGEMM.jl as a part of Open Source Promotion Plan summer program.

Working with GPUs

To upload the computation to GPU, you just add using CUDA before calling the solve function, and set the keyword argument usecuda to true.

julia> using CUDA
[ Info: OMEinsum loaded the CUDA module successfully

julia> marginals(tn; usecuda = true);

Functions support usecuda keyword argument includes

Benchmarks

For performance metrics, see the Performance evaluation section.


This page was generated using Literate.jl.