Performance Tips
Optimize contraction orders
Let us use a problem instance from the "Promedus" dataset of the UAI 2014 competition as an example.
using TensorInference
problem = problem_from_artifact("uai2014", "MAR", "Promedus", 11)
model, evidence = read_model(problem), read_evidence(problem);
Next, we select the tensor network contraction order optimizer.
optimizer = TreeSA(ntrials = 1, niters = 5, βs = 0.1:0.3:100)
TreeSA{Int64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, GreedyMethod{OMEinsumContractionOrders.MinSpaceOut}, Any}(20, 0.1:0.3:100.0, 1, 5, 1.0, 0.2, :greedy, 0, Any[], GreedyMethod{OMEinsumContractionOrders.MinSpaceOut}(OMEinsumContractionOrders.MinSpaceOut(), 1))
Here, we choose the local search based TreeSA
algorithm, which often finds the smallest time/space complexity and supports slicing. One can type ?TreeSA
in a Julia REPL for more information about how to configure the hyper-parameters of the TreeSA
method, while the detailed algorithm explanation is in arXiv: 2108.05665. Alternative tensor network contraction order optimizers include
GreedyMethod
(default, fastest in searching speed but worst in contraction complexity)KaHyParBipartite
SABipartite
tn = TensorNetworkModel(model; optimizer, evidence);
The returned object tn
contains a field code
that specifies the tensor network with optimized contraction order. To check the contraction complexity, please type
contraction_complexity(tn)
Time complexity: 2^22.113790484293556
Space complexity: 2^18.0
Read-write complexity: 2^20.75872827391175
The returned object contains log2 values of the number of multiplications, the number elements in the largest tensor during contraction and the number of read-write operations to tensor elements.
probability(tn)
exp(-19.322038772705973) * fill(1.0)
Using the slicing technique to reduce the memory cost
For large scale applications, it is also possible to slice over certain degrees of freedom to reduce the space complexity, i.e. loop and accumulate over certain degrees of freedom so that one can have a smaller tensor network inside the loop due to the removal of these degrees of freedom. In the TreeSA
optimizer, one can set nslices
to a value larger than zero to turn on this feature. As a comparison we slice over 5 degrees of freedom, which can reduce the space complexity by at most 5. In this application, the slicing achieves the largest possible space complexity reduction 5, while the time and read-write complexity are only increased by less than 1, i.e. the peak memory usage is reduced by a factor $32$, while the (theoretical) computing time is increased by at a factor $< 2$.
optimizer = TreeSA(ntrials = 1, niters = 5, βs = 0.1:0.3:100, nslices=5)
tn = TensorNetworkModel(model; optimizer, evidence);
contraction_complexity(tn)
Time complexity: 2^20.91783823187383
Space complexity: 2^9.0
Read-write complexity: 2^20.104230170626643
Faster Tropical tensor contraction to speed up MAP and MMAP
One can enjoy the BLAS level speed provided by TropicalGEMM
by importing the package with
using TropicalGEMM
The benchmark in the TropicalGEMM
repo shows this performance is close to the theoretical optimal value. Its implementation on GPU is under development in Github repo CuTropicalGEMM.jl
as a part of Open Source Promotion Plan summer program.
Working with GPUs
To upload the computation to GPU, you just add using CUDA
before calling the solve
function, and set the keyword argument usecuda
to true
.
julia> using CUDA
[ Info: OMEinsum loaded the CUDA module successfully
julia> marginals(tn; usecuda = true);
Functions support usecuda
keyword argument includes
Benchmarks
For performance metrics, see the Performance evaluation section.
This page was generated using Literate.jl.