improve Simpleupdate performances#361
Conversation
Yue-Zhengyuan
left a comment
There was a problem hiding this comment.
Thank you for improving SU! Left some initial comments.
| (d, r, c), = _nn_bondrev(sites..., (Nr, Nc)) | ||
| alg.bipartite && r > 1 && continue | ||
| ϵ′ = _su_iter!(state2, gate, env2, sites, alg) | ||
| ϵ′ = _su_iter_gate!(state2, gate, env2, sites[1], sites[2], alg) |
There was a problem hiding this comment.
Here I wanted _su_iter! to be able to handle both AbstractTensorMap{E, S, 2, 2} and length-2 MPO gates, though this is mainly for test purposes to ensure that MPO gates are applied correctly even for fermions. Is it possible to recover this behavior without loss of performance?
There was a problem hiding this comment.
Oh I now see the point of using dispatch. There is a traedoff here between code readibility/efficiency (gate is much faster) and this test feature. Is this something that is currently tested?
[copy from above] In su_iter, we need an if statement to distinguish nearest neighbor from long distance at run time. Therefore my idea was that dispatch does not bring much here, while using different function names makes it explicit and easier to find which function/method is called.
There was a problem hiding this comment.
Is this something that is currently tested?
It is tested with Hubbard model in test/timeevol/cluster_projectors.jl.
gate is much faster
The MPO way can still be optimized. When applying the gate MPO on the first and the last cluster sites, we can also use the trick of reduced bond tensors, which has not been done yet. Even for longer-range (e.g. NNN) bonds, this can still be useful. Separating out two unitary tensors at the two ends of an OBC-MPO will not affect finding the Vidal gauge.
We can try this out in a follow-up PR if you prefer not to do too much here.
There was a problem hiding this comment.
Ok I will revert to dispatch. Keeping this conversation, closing the other relative to this one.
|
There is another recent change that further slowed down SU for 2nd neighbor interaction. After introducing |
lkdvos
left a comment
There was a problem hiding this comment.
Some small comments, will try and actually look into this in detail next week :)
| ``` | ||
| """ | ||
| function _qr_bond(A::PT, B::PT; gate_ax::Int = 1, kwargs...) where {PT <: Union{PEPSTensor, PEPOTensor}} | ||
| function _qr_bond(A::PT, B::PT; gate_ax::Integer = 1, kwargs...) where {PT <: Union{PEPSTensor, PEPOTensor}} |
There was a problem hiding this comment.
For these functions, both for type stability and human readability, I would really prefer to just use dispatch and write out the two cases manually. _get_biperms is an indirection that doesn't hurt because the compiler decides to inline this, but depending on compiler-internals like this tends to not be great, and honestly I think this function would overall be better suited to simply be two functions with using dispatch and hardcoding the permutations (which could then use @tensor calls for the permutations, which is just more readable in general.
| (d, r, c), = _nn_bondrev(sites..., (Nr, Nc)) | ||
| alg.bipartite && r > 1 && continue | ||
| ϵ′ = _su_iter!(state2, gate, env2, sites, alg) | ||
| ϵ′ = _su_iter_gate!(state2, gate, env2, sites[1], sites[2], alg) |
There was a problem hiding this comment.
Is this something that is currently tested?
It is tested with Hubbard model in test/timeevol/cluster_projectors.jl.
gate is much faster
The MPO way can still be optimized. When applying the gate MPO on the first and the last cluster sites, we can also use the trick of reduced bond tensors, which has not been done yet. Even for longer-range (e.g. NNN) bonds, this can still be useful. Separating out two unitary tensors at the two ends of an OBC-MPO will not affect finding the Vidal gauge.
We can try this out in a follow-up PR if you prefer not to do too much here.
| ``` | ||
| """ | ||
| function _qr_bond(A::PT, B::PT; gate_ax::Int = 1, kwargs...) where {PT <: Union{PEPSTensor, PEPOTensor}} | ||
| function _qr_bond(A::PT, B::PT; gate_ax::Integer = 1, kwargs...) where {PT <: Union{PEPSTensor, PEPOTensor}} |
|
@ogauthe Do you mind if I temporarily take over to merge upstream changes and fix tests? |
I just merged master, go ahead if there are other changes you plan for. |
This is a work in progress to improve SimpleUpdate performances.
Context: SimpleUpdate is quite slow, especially for 2nd neighbor interaction. I had cases where it became the bottleneck, above CTMRG. I started working on improving the performances last week. I realized many parts could be ported upstream.
I followed 4 different strategies:
This is a work in progress. It already shows significant speed-up for e.g. finite temperature J1-J2 with SU(2) symmetry. I know it is possible to do much better. I just saw #360 which has overlap for
_get_cluster_truncand I realized it may be relevant to mention my work (objective is not to block any part of #360)