Performance Tips

Optimize contraction orders

Let us use a problem instance from the "Promedus" dataset of the UAI 2014 competition as an example.

using TensorInference
problem = problem_from_artifact("uai2014", "MAR", "Promedus", 11)
model, evidence = read_model(problem), read_evidence(problem);

Next, we select the tensor network contraction order optimizer.

optimizer = TreeSA(ntrials = 1, niters = 5, βs = 0.1:0.3:100)

TreeSA{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, OMEinsumContractionOrders.TreeDecomp}(0.1:0.3:100.0, 1, 5, :greedy, ScoreFunction(1.0, 1.0, 0.0, 20.0), OMEinsumContractionOrders.TreeDecomp())

Here, we choose the local search based TreeSA algorithm, which often finds the smallest time/space complexity and supports slicing. One can type ?TreeSA in a Julia REPL for more information about how to configure the hyper-parameters of the TreeSA method, while the detailed algorithm explanation is in arXiv: 2108.05665. Alternative tensor network contraction order optimizers include

GreedyMethod (default, fastest in searching speed but worst in contraction complexity)
KaHyParBipartite
SABipartite

tn = TensorNetworkModel(model; optimizer, evidence);

The returned object tn contains a field code that specifies the tensor network with optimized contraction order. To check the contraction complexity, please type

contraction_complexity(tn)

Time complexity: 2^19.007005359915194
Space complexity: 2^13.0
Read-write complexity: 2^16.850638586017286

The returned object contains log2 values of the number of multiplications, the number elements in the largest tensor during contraction and the number of read-write operations to tensor elements.

probability(tn)

exp(-19.322038772705984) * fill(1.0)

Using the slicing technique to reduce the memory cost

For large scale applications, it is also possible to slice over certain degrees of freedom to reduce the space complexity, i.e. loop and accumulate over certain degrees of freedom so that one can have a smaller tensor network inside the loop due to the removal of these degrees of freedom. One can use the slicer keyword argument to reduce the space complexity by slicing over certain degrees of freedom. In the following example, we use the TreeSASlicer to reduce the space complexity to sc_target=10. In this application, the slicing achieves the largest possible space complexity reduction 5, while the time and read-write complexity are only increased by less than 1, i.e. the peak memory usage is reduced by a factor $32$, while the (theoretical) computing time is increased by at a factor $< 2$.

optimizer = TreeSA(ntrials = 1, niters = 5, βs = 0.1:0.3:100)
tn = TensorNetworkModel(model; optimizer, evidence, slicer=TreeSASlicer(score=ScoreFunction(sc_target=10)));
contraction_complexity(tn)

Time complexity: 2^20.287712379549454
Space complexity: 2^10.0
Read-write complexity: 2^18.671789823915915

Faster Tropical tensor contraction to speed up MAP and MMAP

One can enjoy the BLAS level speed provided by TropicalGEMM by importing the package with

using TropicalGEMM

The benchmark in the TropicalGEMM repo shows this performance is close to the theoretical optimal value. Its implementation on GPU is under development in Github repo CuTropicalGEMM.jl as a part of Open Source Promotion Plan summer program.

Working with GPUs

To upload the computation to GPU, you just add using CUDA before calling the solve function, and set the keyword argument usecuda to true.

julia> using CUDA
[ Info: OMEinsum loaded the CUDA module successfully

julia> marginals(tn; usecuda = true);

Functions support usecuda keyword argument includes

Benchmarks

For performance metrics, see the Performance evaluation section.

This page was generated using Literate.jl.