-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathch_LFT_MCMC.tex
More file actions
1062 lines (985 loc) · 50 KB
/
ch_LFT_MCMC.tex
File metadata and controls
1062 lines (985 loc) · 50 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{LFT: Computer Implementation}\label{ch:MCMC}
As discussed in \secref{sec:haar}, expectation values of
physical observables $X$ in pure $\SU(N_c)$ LGT
are given by functional integrals
\begin{equation}\label{eq:exobs}
\ev{X}=\frac{1}{Z}\int\DD{U}e^{-S(U)}X(U).
\end{equation}
Even though the integral~\eqref{eq:exobs} is well defined on a lattice
because there are
finitely many sites, it is not feasible to evaluate it numerically; even
relatively small lattices have $4\times10^4$ links. The goal of our
simulation is to estimate $\ev{X}$ by randomly generating configurations,
distributed with probability $e^{-S}$,
and on each configuration, making a measurement $X_i$. The average
\begin{equation}\label{eq:arithmeticaverage}
\bar{X}=\frac{1}{\nconf}\sum_{i=1}^{\nconf} X_i
\end{equation}
serves as the estimator.
The strategy of carrying out a statistical computation using random sampling
goes under the name of {\it Monte Carlo}\index{Monte Carlo}
algorithms\footnote{The idea of Monte Carlo algorithms dates back to the 1940s.
Stanislav Ulam wanted to know the probability of winning a game of Solitaire.
The calculation turned out to be too complicated to do by hand, so he wanted to
estimate this by playing repeated games, then calculating
$$
{\rm win\ chance}\approx\frac{\rm number\ of\ wins}{\rm number\ of\ games}.
$$
Of course this is tremendously tedious; it is much more appropriate for a
computer. Other scientists working with him at Los Alamos such as John von
Neumann and Nicholas Metropolis are early pioneers of this method.}.
The strategy of drawing these random configurations is to use a
Markov chain, and hence the backbone of a lattice calculation is
often referred to as {\it Markov chain Monte Carlo} (MCMC).
Further details on MCMC methods can be found in, for instance,
Ref.~\cite{berg_markov_2004,gattringer_quantum_2010}.
\section{Markov chain Monte Carlo}\label{sec:MCMCintro}
To generate our configurations, we start from some arbitrary configuration
$C_0$ and construct a stochastic sequence of configurations.
Configuration $C_i$ is generated based on\index{update}
configuration $C_{i-1}$, which we call an {\it update} or {\it Markov
step}
The result is a {\it Markov chain}
\begin{equation}
C_0\to C_1\to C_2\to...
\end{equation}
of configurations.
\index{MCMC}
{\it Markov chain Monte Carlo} (MCMC) is characterized by the probability
$W^{CC'}\equiv\pr{C'|C}$, the probability to jump to configuration
$C'$ given that the system started in configuration $C$.
The MCMC {\it transition matrix}\index{transition matrix}
\begin{equation}
W\equiv\Big(W^{CC'}\Big)
\end{equation}
is constructed to bring the system to {\it equilibrium}.
In equilibrium, the chain should have no sinks or sources of probability,
which means that the probability of jumping into a configuration $C'$
should be the same as jumping out of $C'$. This property
is called {\it balance}\index{balance}
\begin{equation}\label{eq:balance}
\sum\limits_CW^{CC'}\pr{C}=
\sum\limits_CW^{C'C}\pr{C'},
\end{equation}
with the LHS representing the total probability to end up in $C'$ and
the RHS representing the probability to transition out of $C'$.
If $W$ satisfies
\begin{enumerate}
\item {\it ergodicity}\footnote{Sometimes authors call this
{\it relaxed} ergodicity, to be distinguished from {\it strong}
ergodicity, where $n=1$. To my knowledge, all algorithms we consider
in this book satisfy only relaxed ergodicity, so I do not bother
with the distinction.},\index{ergodic} i.e.
\begin{equation}\label{eq:ergodicity}
\pr{C}>0\;\;\text{and}\;\;\pr{C'}>0\;\;\Rightarrow\;\;
\exists\;n\in\N\;\;\text{s.t.}\;\;\big(W^n\big)^{CC'}>0,
\end{equation}
which means that every possible configuration is accessible from
any other configuration in a finite number\footnote{In practice,
it's important that $n$ is not too large, otherwise some configurations
will be sampled poorly. This is why e.g.
topological freezing\index{topological!freezing}, discussed
in \secref{sec:toplat}, is a problem.} of Markov steps;
\item {\it normalization},\index{normalization} i.e.
\begin{equation}
\sum\limits_{C'}W^{CC'}=1,
\end{equation}
which means that starting from configuration $C$ you have
to move somewhere;
\item and balance,
\end{enumerate}
then the Markov process is guaranteed to bring the ensemble to the
target distribution $\pr{C}$. Indeed, using normalization,
one finds from \equatref{eq:balance}
\begin{equation}\label{eq:eigenProb}
\sum\limits_CW^{CC'}\pr{C}=\pr{C'},
\end{equation}
which shows that $\pr{C}$ is a fixed point of
the Markov chain.\footnote{Above, we called $W^{CC'}$ a transition matrix.
Equation~\eqref{eq:eigenProb} shows that we can think of the equilibrium
distribution as an eigenvector of $W$ with eigenvalue 1. Each component
of this vector thus yields the probability of a particular configuration.}
\index{Metropolis step}
Here and the following two sections,
we omit the Lorentz index and space-time point from link variables to
avoid clutter. We use $U$ to indicate the to-be-updated link,
$U^\sqcup$ to indicate the staple matrix attached to $U$, and
$U'$ to indicate a trial link. We will use the Boltzmann
distribution $\pr{C}\propto e^{-S_C}$.
One trivial way to satisfy the balance condition~\eqref{eq:balance} is
to find an update that satisfies it term-by-term. For such an update,
\begin{equation}\label{eq:detailedbalance}\index{detailed balance}
W^{CC'}\pr{C}=W^{C'C}\pr{C'}.
\end{equation}
This property is known as {\it detailed balance}.
One of the most well-known Monte Carlo updates that satisfies detailed
balance is the {\it Metropolis algorithm}~\cite{metropolis_equation_1953}.
In the Metropolis algorithm, a trial configuration $C'$ is selected
with some probability distribution $\prt{C'|C}$. Then $C'$
is accepted with likelihood
\begin{equation}\label{eq:metupdate}
\pr{C\to C'}=\min\left[1,\frac{\prt{C|C'}e^{-S_{C'}}}
{\prt{C'|C}e^{-S_C}}\right],
\end{equation}
where $S_C$ is the action corresponding to $C$.
If $C'$ is rejected, the unchanged configuration is counted
in the Markov chain. The total probability
to transition from $C$ to $C'$ using this strategy is
\begin{equation}\label{eq:MCMCtransition}
W^{CC'}=\prt{C'|C}\pr{C\to C'},
\end{equation}
and using this fact, one
can show that this update satisfies detailed balance.
The Metropolis algorithm is in some sense the most standard Markov chain
algorithm used in lattice calculations, with the acceptance step
\equatref{eq:metupdate} being the most crucial part, since
it guarantees your configuration is drawn with the correct probability.
A Markov algorithm that concludes with the acceptance step, is
said to be an\index{exact algorithm} {\it exact algorithm}.\footnote{One
could imagine update steps that may not be guaranteed to draw configurations
from the desired distribution. This can be corrected by simply adding
the acceptance step at the end, and the algorithm is still exact.
One may neglect this acceptance step at the end, if one has reason to
believe the configuration is still drawn appropriately, but in such
instances the algorithm is not exact anymore. We emphasize that the point
of calling an algorithm ``exact" is that it samples the target distribution,
which is guaranteed if it fulfills properties 1-3. Having the Metropolis
step is one such way to do this.}
In general, one could choose a trial configuration uniformly at random
from $\SU(N_c)$. On the other hand, as discussed above, if the trial
configuration increases the action substantially, then by
\equatref{eq:metupdate}, this trial is exponentially unlikely to
be accepted. If we are constantly suggesting such configurations,
we will make little movement in configuration space.
What makes an update efficient then is to suggest configurations that
have reasonable acceptance probabilities to begin with. This
strategy is often called {\it importance sampling}\index{importance sampling}.
All the algorithms we will consider in this book are constructed to be
efficient in the sense that they have reasonable likelihood to be accepted.
\section{General code structure}\label{sec:generalCodeStructure}
Now that we have introduced MCMC, we can sketch the
general code structure of an LFT simulation.
Typically, simulations use {\it local updates}, which
means we update the links one at a time. This is usually done in a systematic order,
because there is some computational advantage compared to updating in a
random order~\cite{berg_markov_2004}.
An updating {\it sweep}\index{sweep} updates every link on the lattice once.
An MCMC simulation of LFT then broadly consists of these essential steps:
\begin{enumerate}
\item {\it Initialization}: The first thing to do is get everything ready for
the simulation. This includes initializing the random number generator,
and setting up an initial configuration.
\item {\it Equilibration} or {\it thermalization}:
\index{equilibration}\index{thermalization}
To avoid over-sampling rare configurations,
one must perform many sweeps to bring the system to its equilibrium
distribution. The structure of this section looks like
\begin{verbatim}
do from n=1 to n=nequi
call MCMC update
end do
\end{verbatim}
\item {\it Configuration generation}: One we are in the equilibrium
distribution, we want to generate configurations on which we can
perform measurements. To help reduce correlations between
measurements, multiple updating sweeps are performed in between.
This section is structured as
\begin{verbatim}
do from n=1 to n=nconf
do from n=1 to n=ndiscarded
call MCMC update
end do
save configuration
end do
\end{verbatim}
\item {\it Measurements}: Now that we have a good sample of configuration
space, we are ready to perform measurements there. Some measurements
can be take {\it in situ}, i.e. they are incredibly cheap and can be
calculated the moment the configuration is saved. Measurements of this
type include simple link products like the Polyakov loop. More expensive
observables may require inverting a Dirac matrix thousands of times,
gauge fixing, or smoothing. For such observables it is better to
have separate code that runs on the saved configurations. This
code is structured as
\begin{verbatim}
do from n=1 to n=nconf
take measurement
end do
\end{verbatim}
\end{enumerate}
The details of the above steps, especially the MCMC update, depend on what
physics are being simulated. In the following, we discuss some strategies for
generating new gauge fields from old ones.
\subsection{Update: Heat bath}\index{heat bath}
The {\it heat bath} (HB) algorithm is a popular choice for the simulation
of pure $\SU(N_c)$ systems. Here, we construct our trial probability distribution
to draw from the Boltzmann distribution to begin with. Thus no
Metropolis step is required\footnote{The Kennedy-Pendleton algorithm has a
accept/reject step on one of the random variables to ensure that
links are drawn from the correct distribution. This is not
a Metropolis step, since it does not directly involve the action difference.}.
As mentioned at the beginning of \secref{sec:generalCodeStructure},
update sweeps proceed one link at a time.
For the $\SU(N_c)$ HB algorithm, the trial link distribution for that link is
\begin{equation}\label{eq:PTdist}
\dd{\prt{U'}}\propto \dd U'\exp\left(\frac{\beta}{N_c}\,\tr\,
U'U^\sqcup\right).
\end{equation}
This construction also satisfies detailed balance.
Numerically, the trick is to map \equatref{eq:PTdist} to a set
of real, random variables, which have to be specially
generated to reproduce the desired distribution.
A common strategy for doing this for the gauge group $\SU(2)$,
often referred to as the {\it Kennedy-Pendleton}\index{heat
bath!Kennedy-Pendleton} algorithm, is explained in an elementary way
in Ref.~\cite{gattringer_quantum_2010}; the original papers are
Refs.~\cite{fabricius_heat_1984,kennedy_improved_1985}.
When $N_c>2$, this strategy is extended by leveraging the fact
that $\SU(2)\leq\SU(N_c)$; in general, an $\SU(N_c)$ matrix
can be broken down into $N_c(N_c-1)/2$
$\SU(2)$ submatrices~\cite{cabibbo_new_1982}. One generates each
$\SU(2)$ submatrix using the above HB, then reconstructs
the $\SU(N_c)$ matrix from those, which is sometimes called
the {\it Cabibbo-Marinari} method\index{Cabibbo-Marinari}.
\subsection{Update: Over-relaxation}\index{over-relaxation}
A useful update for $\SU(N_c)$ is the
{\it over-relaxation}\index{over-relaxation} (OR) update.
Adler introduced OR algorithms \cite{adler_over-relaxation_1981} and
they were further developed by Creutz \cite{creutz_overrelaxation_1987}
and others. The idea of the OR algorithm is to speed up relaxation
by generating a group element ``far away" from $U$ without
destroying equilibrium, which for $\SU(2)$ is achieved by keeping that action
constant.
This update is not random in any way, but rather serves to move you around
in configuration space without impacting the acceptance rate too much.
More precisely let $U\in\SU(2)$ and suppose we have some method of
choosing another link variable $U_0$ that maximizes
the action for this staple. We assume that this method of selection has no
dependence on $U$. Pick some element $V\in\SU(2)$ such that $U=VU_0$; viewed
in this way, $U$ is ``on one side of $U_0$," and the element
``on the other side" is $U'=V^{-1}U_0$. Note that
\begin{equation}
V=U U_0^{-1},
\end{equation}
which implies
\begin{equation}
U' = U_0 U^{-1} U_0.
\end{equation}
This manner of constructing a new link variable $U'$, which generates
a group element ``far away" from $U$, is what we mean by over-relaxation.
In principle an OR update should be more efficient than a Monte
Carlo update. This is because we chose the new link variable to be two
group elements away from the old one, thrusting us further
along configuration space. However unlike Metropolis updates, OR updates
only sample the subspace of constant action, and are therefore not ergodic.
Hence to ensure an approach to equilibrium, they must be supplemented with,
for instance, HB updates.
For $\SU(2)$, the update is given by
\begin{equation}\label{eq:ORupdate}
U\to U'=\frac{1}{\det U^\sqcup}\left(U^\sqcup UU^\sqcup\right)^\dagger.
\end{equation}
\begin{proposition}{}{OR}\label{prp:OR}
This update does not change the $\SU(2)$ Wilson action.
\begin{proof}
Since $\det(kA)=k^n\det A$ for any constant $k$ and $n\times n$ matrix $A$,
one can show that the sum of two $\SU(2)$ matrices is proportional
to an $\SU(2)$ matrix. Hence we can write
$$
U^\sqcup=u^\sqcup\sqrt{\det U^\sqcup}
$$
where $u^\sqcup\in\SU(2)$. After updating, the local contribution
to the Wilson action becomes
\begin{equation*}
\tr U'U^\sqcup =\frac{1}{\det U^\sqcup}\tr
\left(U^\sqcup UU^\sqcup\right)^\dagger
=\tr U^\sqcup U\left(u^\sqcup\right)^\dagger u^\sqcup
=\tr U^\sqcup U,
\end{equation*}
which is what it was originally.
\end{proof}
\end{proposition}
This simple
behavior is special to $\U(1)$ and $\SU(2)$ LGT.
For $N_c>2$, one can carry out OR updates on the $\SU(2)$ subspaces of the
$\SU(N_c)$ matrix, i.e. using the method of Cabibbo and Marinari.
This in general does not keep the action constant, but it
does change the action only very little. Hence it is a good choice to supplement
HB updates for $\SU(N_c)$ gauge fields in general.
I close by noting that pairing HB with OR sweeps is the only way I have ever
seen pure $\SU(N_c)$ configurations generated in modern code.
\subsection{Update: Over-relaxation gauge fixing}\label{sec:ORgaugefix}
\index{gauge!fixing}
In general expectation values of gauge-dependent quantities are zero.
Nevertheless there are rare occasions where it is useful to calculate
a quantity in a particular gauge, which can then be related to some
other physical quantity. In this section we cover an OR
algorithm that fixes gauge configurations to the {\it Coulomb} or
{\it Landau} gauges.\index{gauge!Coulomb}\index{gauge!Landau}
In the continuum such gauges are given by the condition
\begin{equation}
\partial_\mu A_\mu=0,
\end{equation}
where $\mu$ runs over only spatial directions for the Coulomb gauge, and
over all four directions for the Lorentz gauge. On the lattice, these
{\it gauge conditions}\index{gauge!condition} are discretized as
\begin{equation}
\Delta(x)\equiv A_\mu(x)-A_\mu(x-a\hat\mu) =0.
\end{equation}
\begin{proposition}{}{ORgf}
A gauge satisfying $\Delta(x)=0$ is achieved by extremizing
$$
F\equiv-\Re\tr\sum_{x,\mu} g(x) U_\mu(x) g^\dagger(x+a\hat{\mu})
$$
\begin{proof}
Any gauge matrix can be written
$$
g=e^{is^aT^a},
$$
where $s^a\in\R$ and $T^a$ are the generators of the Lie group. We want
to find the special $g_0$ that extremizes $F$; this is equivalent to finding
the $s_0$ that extremizes it. We will look in the vicinity of such a
gauge, i.e. look at gauges
$$e^{i\tau^a T^a}, \qquad\tau\equiv s-s_0$$
with $\tau$ small. Then
\begin{equation*}\begin{aligned}
F&=-\Re\tr\sum_{x,\mu}&&
e^{i\tau^a(x)T^a}\,e^{iaA^b_\mu(x)T^b}\,e^{-i\tau^c(x+a\hat\mu)T^c}\\
&\approx &&\big(\id+i\tau^a(x)T^a\big)
\big(\id+iaA_\mu^b(x)T^b\big)
\big(\id-i\tau^c(x+a\hat\mu)T^c\big)\\
&\approx && \big(\id+a\tau^c(x+a\hat\mu)A_\mu^b(x)T^bT^c
-a\tau^a(x)A_\mu^b(x)T^aT^b\big),
\end{aligned}\end{equation*}
where we have neglected $\order{a^2, a\tau, \tau^2}$ terms and use that
the $T^a$ are traceless. At the extremum, derivatives with respect to $\tau$
will vanish. Exchanging our dummy group summation indices and shifting
the first term by one site (which is allowed because of
periodic BCs) we find
\begin{equation*}\begin{aligned}
0=\pdv{F}{\tau^d}
&=-\Re\tr\sum_{x,\mu}&&a\pdv{\tau^d}
\left[A_\mu^b(x-a\hat\mu)\tau^c(x)-A_\mu^b(x)\tau^c(x)\right]T^bT^c\\
&= &&a\left[A_\mu^b(x-a\hat\mu)-A_\mu^b(x)\right]T^bT^d.
\end{aligned}\end{equation*}
In order for the above to vanish, the term in brackets has to vanish,
which completes the proof.
\end{proof}
\end{proposition}
The strategy from here is to minimize $F$. As you can clearly see from
its definition, this can be accomplished through the replacement
\begin{equation}\label{eq:gfORupdate}
U\to U'=gU_\mu g^\dagger,
\end{equation}
which is reminiscent of the OR update~\eqref{eq:ORupdate}. Hence we employ
again OR, i.e. we find the $g$ minimizing $F$ and make the
replacement~\eqref{eq:gfORupdate}. In a similar strategy as
\propref{prp:OR}, we look at the local update to $F$,
\begin{equation}
\Re\tr g(x)\sum_\mu\left[U_\mu(x)g^\dagger(x+a\hat\mu)
+U_\mu^\dagger(x-a\hat\mu)g^\dagger(x-a\hat\mu)\right]
\equiv\Re\tr g(x)K(x).
\end{equation}
As before, $K(x)=\sqrt{\det K(x)}\hat K(x)$, where $\hat K(x)\in\SU(2)$.
To extremize $F$, we search for the $g$ rotating everything under the
trace to $\id$, which is clearly
\begin{equation}
g(x)=K^\dagger(x)/\sqrt{\det K(x)}.
\end{equation}
We will never achieve the condition $\Delta(x)=0$ on the computer exactly.
A measure of how close we are to satisfying the gauge condition is
\begin{equation}
\theta\equiv\frac{1}{N_c N_s^3}\sum_x\tr\Delta(x)\Delta^\dagger(x).
\end{equation}
This combination is constructed so that $\theta\in\R$.
For more information
about using OR to gauge fix, see Ref.~\cite{mandula_efficient_1990}.
For information about how to implement this on multiple GPUs, see
e.g. Ref.~\cite{schrock_coulomb_2013}.
\subsection{Update: Molecular dynamics}\label{sec:MD}\index{molecular dynamics}
We now discuss one of the most popular updating strategies on the market,
especially when carrying out simulations with dynamical fermions.
For this updating strategy, we leverage an isomorphism between
lattice field theory and a classical statistical mechanical
ensemble. We will use this
correspondence to find a clever way of suggesting a gauge
configuration, and then ultimately accept or reject it via
a standard Metropolis step.
To start let us imagine a theory with a scalar field $\phi$ on
the lattice. Then expectation values are computed like
\begin{equation}
\ev{X}=Z^{-1}\int\DD\phi e^{-S[\phi]}X[\phi],~~~~Z\equiv\int\DD\phi
e^{-S[\phi]}.
\end{equation}
Now let
\begin{equation}
H\equiv\frac{1}{2}\sum_{x\in\LAT}P(x)^2+S[\phi],
\end{equation}
where $P$ is a scalar field we will define in a moment. Since $P$ is some other
field, it follows that
\begin{equation}
\ev{X}=Z^{-1}\int\DD\phi\DD P e^{-H[\phi,P]}X[\phi],
~~~~Z\equiv\int\DD\phi\DD P e^{-H[\phi,P]}.
\end{equation}
Now let these fields evolve in Markov time $\tau$ and define
\begin{equation}
\dv{\phi(x)}{\tau}\equiv P(x).
\end{equation}
Then we have defined a classical system where $P$ is the momentum
conjugate to $\phi$. If we evolve $\phi$ and $P$ in $\tau$ according
to Hamilton's equations of motion, the energy is guaranteed to
be a constant. This procedure would yield 100\% acceptance if we
were able to integrate the equations of motion exactly. In practice
there is round off error, and the Metropolis step serves to protect
the probability distribution.
The above example is for a scalar field, which is a simplification of
Ref.~\cite{callaway_microcanonical_1982},
which does it for $\U(1)$. This was shortly thereafter extended
to fermions~\cite{polonyi_microcanonical_1983}.
This integration is completely deterministic, and hence does not
capture quantum fluctuations. To capture such quantum behavior,
one can intuitively imagine disturbing Hamilton's equations
with a random noise term. This is is an example of a
stochastic differential equation\footnote{A stochastic differential
equation is a differential equation where you include one
or more random forces. A simple example is the Langevin equation
$$
m \dv{v}{t}=-\lambda v+\eta(t)
$$
describing the Brownian motion of a particle of mass $m$
and velocity $v$ in a fluid. Here $\lambda$ is the drag, and
$\eta$ is a Gaussian-distributed noise term representing
random collisions of the particle with other particles in the fluid.
At a fixed value of the noise, such equations can be solved.
It becomes challenging when one wants to average over $\eta$.
}
like the Langevin equation, which can be used to model the inherent
randomness of the underlying quantum field
theory~\cite{parisi_perturbation_1981}.
Inspired by this, we might consider to give the conjugate momentum
random kicks, following the microcanonical path in between.
This strategy is often called a
{\it hybrid}~\cite{duane_hybrid_1985} strategy.
If not for numerical rounding error, this strategy would be expected
to generate the correct
distribution~\cite{duane_theory_1986}.
To correct for numerical error, one can again apply the Metropolis
step as before, which is referred to as
{\it hybrid Monte Carlo}\index{Monte Carlo!hybrid} (HMC).
Applying a series of molecular dynamics steps is usually called
a {\it trajectory}\index{trajectory}, after which comes the
Metropolis step. To fulfill detailed balance\footnote{Remember
we need this in order to ensure that the typical accept/reject
based on the configuration generated by the molecular dynamics
is a valid Metropolis step.}, we need to ensure
that our integration scheme
\begin{enumerate}
\item preserves the integration measure $\DD\phi\DD P$; and
\item is reversible.
\end{enumerate}
A commonly used scheme that fulfills these demands is known as the
{\it leapfrog} algorithm. Under this scheme, $\phi$ integrates
in $N$ steps of Monte Carlo length $\epsilon$, while $P$
starts with a step of length $\epsilon/2$, followed by $N-1$
steps of length $\epsilon$, and a final step of length $\epsilon/2$.
You can find an elementary proof that the leapfrog preserves area
and is reversible in Ref.~\cite{gattringer_quantum_2010}.
There one also finds a proof that the Metropolis fulfills detailed balance.
Let us synthesize what we have discussed to introduce an updating scheme that
represents a system of two indentical fermions with gauge interaction
at zero chemical potential.
This scheme for staggered fermions was first made explicit
in Ref.~\cite{gottlieb_hybrid-molecular-dynamics_1987}.
As discussed in \secref{sec:pseudofermions}, the most efficient way
to treat the determinant of the Dirac matrix is using pseudofermions.
Hence our path integral is\index{pseudofermion}
\begin{equation}\begin{aligned}\label{eq:twoFlavorRHMC}
Z&=\int\DD{U}\left(\det D\right)^2e^{-S_G}\\
&=
\int\DD{\Phi_R}\DD{\Phi_I}\DD{U}
e^{-S_G-\Phi^\dagger\left(D^\dagger D\right)^{-1}\Phi},
\end{aligned}\end{equation}
where we have introduced a three-color, complex scalar field
$\Phi=\Phi_R+i\Phi_I$, the pseudofermion field.
In \secref{sec:pseudofermions} we stressed that the treatment of two identical
fermions was important since it lets us put a $D^\dagger D$ in the
exponential, which makes it a well defined probability distribution.
We also saw that we can draw our pseudofermions according to the
above distribution if we define $\Phi=D^\dagger R$ and draw $R$ from
$\exp\left(-R^\dagger R\right)$. The strategy then goes:
\begin{enumerate}
\item Generate Gaussian $R$.
\item Solve $\Phi=D^\dagger R$ to get $\Phi$.
\item Given your starting configuration $U$ generate conjugate
momenta.
\item Keeping $\Phi$ fixed, use the molecular dynamics strategy
to integrate along a short trajectory, yielding a proposal for $U$.
\item Do a Metropolis accept/reject at the end of the trajectory for $U$.
\end{enumerate}
Let us discuss steps 2-4 in a little more detail; in particular we would like to
understand how the molecular dynamics works for $\SU(3)$. We express our gauge
field as
\begin{equation}
U_\mu=e^{i\omega_\mu^aT^a}=e^{iA_\mu}.
\end{equation}
We will therefore think of our Hamiltonian as a function of the
generalized coordinates $A_\mu$, or equivalently when convenient,
$\omega_\mu^a$. The corresponding conjugate momenta are
\begin{equation}
P_\mu=P^a_\mu T^a.
\end{equation}
The steps taken in the leapfrog change the momentum. With our fictitous
Hamiltonian in mind, what we seek is a force, i.e. a gradient of
the effective action. Looking at \equatref{eq:twoFlavorRHMC} our effective
action is
\begin{equation}
S_{\rm eff}=-S_G-\Phi^\dagger\left(D^\dagger D\right)^{-1}\Phi,
\end{equation}
and the force is
\begin{equation}
f=T^a\nabla^aS_{\rm eff}.
\end{equation}
The leapfrog thus proceeds in steps $\epsilon f$.
Based on the effective action, the force can be divided into the
{\it gauge force} and {\it fermion force}. The fermion force thus
depends on the chosen discretization. For example with a straightforward Wilson
gauge action, whose contribution at $U$ has the form
\begin{equation}
-\frac{\beta}{6}\tr\left(UA+A^\dagger U^\dagger\right),
\end{equation}
where $A$ is the staple sum. Using \equatref{eq:lieDeriv},
the fact that the derivative lies in the Lie algebra,
which lets you express it as a linear combination of the generators,
and \eqref{eq:SUNnorm}, the gauge force comes out to be
\begin{equation}
f_G=-\frac{i\beta}{12}\left(UA-A^\dagger U^\dagger\right).
\end{equation}
Meanwhile, using \propref{prp:MATwrtSCLinv},
the fermion force can be written
\begin{equation}
f_{\rm ferm}^a=
-T^a\left(\left(D^\dagger D\right)^{-1}\Phi\right)^\dagger
\left(\pdv{D}{\omega^a}D^\dagger+D\pdv{D^\dagger}{\omega^a}\right)
\left(\left(D^\dagger D\right)^{-1}\Phi\right).
\end{equation}
This shows that the inverse of $D^\dagger D$ must be computed
at each step of the molecular dynamics, which is far and away
the most expensive part.
Both here and in Step 2 we need to compute inverses of extremely
large matrices. One of the most efficient ways to approach this,
and indeed the strategy taken by codes such as
QUDA,
Grid, and
$\simulat$, is to use the conjugate gradient.
The difficulty of computing the inverse increases
substantially according to the smallest eigenvalue of $D$,
as mentioned in \secref{sec:numerics}. As a result
one may need to employ an SVD or a
{\it force filter}\index{force!filter}, which artificially
shifts low eigenmodes by an amount $\delta$.
\section{The HISQ force}\label{sec:HISQforce}
The HISQ action, discussed in \secref{sec:stagg} and \secref{sec:HISQ},
is widely used in state-of-the-art lattice calculations.
Here we discuss the computer implementation of HISQ fermions.
Constructing its
force is a bit complicated, as it is a synthesis of many constructs in
those sections along with \secref{sec:MD}. In my view,
implementing HISQ fermions is complicated enough to warrant its own section,
maybe even its own chapter. I also attempt to introduce some jargon that you may
encounter if you open up HISQ codes.
We begin by writing down the action itself in a way that is helpful
for computer implementation. We write
\begin{equation}
S_f=\Phi^\dagger\left(D^\dagger D\right)^{-N_f/4}\Phi
=\bra{\Phi}\left(D^{\dagger} D\right)^{-N_f/4}\ket{\Phi},
\end{equation}
where $N_f$ is the number of fermions and $\ket\Phi$ is the pseudofermion field.
For this presentation of the HISQ force, we follow
Ref.~\cite{Wong:2007uz}, and hence switch to bra-ket notation.
Remember, we chose this pseudofermion rewrite to be able to compute the
determinant of the Dirac matrix efficiently. We want a
$D^\dagger D$ in the exponential since it has positive eigenvalues,
and hence makes for a well defined probability distribution.
The $D^\dagger D$ would correspond to two identical quarks of that flavor, since
by the Matthews-Salam formula
\begin{equation}
\det D^\dagger D = \det D^\dagger \det D
=\int \dd{\bar{\psi}_f}\dd{\psi_f}\dd{\bar{\psi}_g}\dd{\psi_g}
\exp\left(\bar{\psi}_fD^\dagger\psi_f+\bar{\psi}_gD \psi_g\right).
\end{equation}
The trick to get around this, as done in
Ref.~\cite{gottlieb_hybrid-molecular-dynamics_1987}, is to define the
$\Phi$ fields on even sites only. This works because $D^\dagger D$ has no
matrix elements connecting even and odd sites. More precisely, if
$D_0$ is the massless part of the Dirac operator, then
\begin{equation}\begin{aligned}
\left(D^\dagger D\right)_{xy}
&=\left(2m\delta_{xz}+D^\dagger_{0,xz}\right)
\left(2m\delta_{zy}+D_{0,zy}\right)\\
&=4m^2\delta_{xy}+2m\left(D^\dagger_{0,xy}+D_{0,xy}\right)+D^\dagger_{0,xz}D_{0,zy}.
\end{aligned}\end{equation}
The second term in the last line vanishes since $D$ is antihermitian.
The first term is diagonal in the site basis. Similarly $D$ itself always sends
odd to even or vice versa, so $D_0^\dagger D_0$ must preserve parity.
Hence if we write $D^\dagger D$ in a basis organized so even indices come first
followed by odd, we have a block-diagonal matrix with two copies of
$4m^2+D_0^\dagger D_0$. Restricting to even or odd halves the number of degrees
of freedom, returning us once again to our interpretation of having a single
fermion species.
We have the issue, discussed in
\secref{sec:HISQ}, that staggered fermions lead to four degenerate tastes.
The solution offered in that section is to take the fourth root. In practice,
this fourth root is implemented as a rational
approximation~\cite{clark_rhmc_2004,clark_rational_2006},
\begin{equation}
\left(D^\dagger D\right)^{-N_f/4}\approx\alpha_0+\sum_l\frac{\alpha_l}{D^\dagger
D+\beta_l}.
\end{equation}
Using the rational approximation to estimate the fourth root in this hybrid
Monte Carlo scheme is called {\it rational hybrid Monte Carlo}\index{RHMC}
(RHMC).
Next we need to open up the $D$. For the following formulae, we will suppress
the Naik part to clean up the notation. The Naik term and its derivative can be
included in the same way as all the other terms. We write\footnote{It is common practice in
lattice publications to use $M$ rather than $D$ to indicate the Dirac matrix.
In that case, $D$ is sometimes used to indicate the (covariant) derivative part.
I took the notation of Gattringer and Lang~\cite{gattringer_quantum_2010}.}
\begin{equation}\label{eq:DHISQsimple}
D(y|x)=2m\delta_{x,y}
+\sum_\alpha \eta_\alpha(x)\left(\UHISQ_\alpha(x)\delta_{x,y-\hat\alpha}
-\left(\UHISQ_\alpha(x-\hat\alpha)\right)^\dagger\delta_{x,y+\hat\alpha}\right),
\end{equation}
where $\UHISQ$ is given by \equatref{eq:HISQsmear}:
\begin{equation}
\UHISQ=\mathcal{F}^{\rm AsqTad}\mathcal{U}\mathcal{F}^{\rm Fat7}U,
\end{equation}
i.e. we Fat7-smear the thin link, followed
by a reunitarization and then a second-level smear that has some lattice-spacing
corrections built in. This latter smear has the same structure as the AsqTad smear of
\equatref{eq:asqtad}, but different coefficients. If a Naik $\epsilon$ is required,
as is the case for example when you want dynamical charm quarks, this is
included in the AsqTad-like smear. The Fat7 smear also has the same form as the
AsqTad smear, but with still different coefficients, and no Naik $\epsilon$ or
LePage term.
The $\eta_\alpha$ are the staggered phases defined in \equatref{eq:staggeredPhase}.
To carry out molecular dynamics, we need the derivative of $S_f$ w.r.t. MC time,
which translates via the chain rule to a derivative w.r.t. the gauge field
$U$. From \equatref{eq:DHISQsimple}, successive applications of the chain rule
show us we need
\begin{equation}\label{eq:HISQchainrule}
\pdv{S_f}{U}=\pdv{S_f}{X}\pdv{X}{W}\pdv{W}{V}\pdv{V}{U}.
\end{equation}
Here we have adopted the notation of~\cite{bazavov_scaling_2010},
where
\begin{equation}\begin{aligned}
V&\equiv\mathcal{F}^{\rm Fat7}U,\\
W&\equiv\mathcal{U} V\\
X&\equiv\mathcal{F}^{\rm AsqTad} W.
\end{aligned}\end{equation}
We begin working out the derivative of the action. This is
\begin{equation}\begin{aligned}
\pdv{S_f}{U_{x,\mu}} &=
-\sum_l\alpha_l
\bra{\Phi\left(D^\dagger D+\beta_l\right)^{-1}}
\pdv{D^\dagger D}{U_{x,\mu}}
\ket{\left(D^\dagger D+\beta_l\right)^{-1}\Phi}\\
&=-\sum_l\alpha_l\left(
\bra{L^l}\pdv{D_0^\dagger}{U_{x,\mu}}\ket{R^l} +
\bra{R^l}\pdv{D_0^\dagger}{U_{x,\mu}}\ket{L^l}\right),
\end{aligned}\end{equation}
where
\begin{equation}\begin{aligned}
\ket{L^l}&\equiv\left(D^\dagger D+\beta_l\right)^{-1}\ket{\Phi}\\
\ket{R^l}&\equiv D_0\ket{L^l}.
\end{aligned}\end{equation}
Since we originally took $\ket{\Phi}$ to be defined on even sites only,
the pseudofermion fields $\ket{L^l}$ and $\ket{R^l}$ are defined on
even and odd sites, respectively. Expanding the spacetime sum in $D$ and writing
out color indices explicitly, we can rewrite this as
\begin{equation}
\pdv{S_f}{\left[U_{x,\mu}\right]_{ab}}=
\sum_{y,\nu}(-1)^y\eta_{y,\nu}\left(
\pdv{\left[U_{y,\nu}^{\rm HISQ}\right]_{mn}}{\left[U_{x,\mu}\right]_{ab}}
\left[f_{y,\nu}\right]_{mn}+
\pdv{\left[U_{y,\nu}^{\rm HISQ\,\dagger}\right]_{mn}}{\left[U_{x,\mu}\right]_{ab}}
\left[f^\dagger_{y,\nu}\right]_{mn}
\right),
\end{equation}
where $f_{y,\nu}$ is the {\it outer product}\index{outer product}
\begin{equation}\label{eq:HISQouterproduct}
\left[f_{y,\nu}\right]_{mn}=
\begin{cases}
\sum_l\alpha_l\left[R^l_{y+\nu}\right]_n\left[L^{l*}_y\right]_m & {\rm even}~y\\
\sum_l\alpha_l\left[L^l_{y+\nu}\right]_n\left[R^{l*}_y\right]_m & {\rm odd}~y\\
\end{cases}.
\end{equation}
We finish this section by giving a rough outline of how HISQ forces get
implemented in practice. We will once again include the Naik term in the
discussion. The smears can be thought of as a sum over their
different constructs, hence derivatives of the smears are usually written as
sums over the derivatives of those constructs. Note also that the HISQ smear
requires a reunitarization. This reunitarization is often approximated as a
function of the to-be-smeared link, and hence one also needs a derivative of
that function w.r.t. the links. The HISQ action, as implemented by MILC,
takes the coefficients
\begin{equation}\begin{aligned}
&c^{\rm Fat7}_1=\frac{1}{8},~~~
c^{\rm Fat7}_3 =\frac{1}{16},~~~
c^{\rm Fat7}_5 =\frac{1}{64},~~~
c^{\rm Fat7}_7 =\frac{1}{384},~~~
c_{\rm Naik} =-\frac{1}{24}(1+\epsilon),\\
&c^{\rm Asq}_1 =1+\frac{\epsilon}{8},~~~
c^{\rm Asq}_3 =\frac{1}{16},~~~
c^{\rm Asq}_5 =\frac{1}{64},~~~
c^{\rm Asq}_7 =\frac{1}{384},~~~
c^{\rm Asq}_L =-\frac{1}{8}.
\end{aligned}\end{equation}
One then roughly follows these steps:
\begin{enumerate}
\item Compute the gauge fields $V$ and $W$.
\item Prepare the pseudofermion fields $\ket{\Phi}$. As we saw in the last
section, this is randomly generated, drawn from a Gaussian.
\item Compute $\ket{L^l}$ and $\ket{R^l}$ from $\ket{\Phi}$.
This requires having implementations of
the massive and massless Dirac operators, along with some method of inversion.
The {\it shifts}\index{shift} $\beta_l$ enter here.
\item Compute the outer products $f_{y,\nu}$ and $f_{y,3\nu}$. The latter will
contribute only to the Naik derivatives. As indicated by
\equatref{eq:HISQouterproduct}, the {\it residues}\index{residue} $\alpha_l$ enter here.
\item Now we can construct the force. As discussed earlier, this can be
expressed as a sum over derivatives of various constructs. We work through the
chain rule \equatref{eq:HISQchainrule} from left to right:
\begin{verbatim}
force = naikLinkDeriv(f3nu,W,cnaik) # work out dX/dW
force += lepageLinkDeriv(f3nu,clp)
force += c1Asq*fnu
force += linkDeriv<3>(fnu,W,c3Asq)
force += linkDeriv<5>(fnu,W,c5Asq)
force += linkDeriv<7>(fnu,W,c7Asq)
temp = c1Fat7*projU3Deriv(V,force) # work out dW/dV
force += linkDeriv<3>(temp,W,c3Fat7) # work out dV/dU
force += linkDeriv<5>(temp,W,c5Fat7)
force += linkDeriv<7>(temp,W,c7Fat7)
force = U*force # close the loop
\end{verbatim}
\end{enumerate}
We note that when computing the derivative of the $\U(3)$ projection, the force
calculated up to that point is generally contracted with $\partial W/\partial
V$. This prevents having to deal with rank-4 tensors, which is nice in the
context of the code base, as you can store everything in gaugefield objects.
In the above pseudocode, this contracted derivative is saved in the
auxiliary field \texttt{temp}. We remind the reader that \texttt{cnaik} and
\texttt{c1Asq} depend on the Naik $\epsilon$.
\section{Improving performance}
The inversion of the Dirac matrix is generally the bottleneck of LFT
simulations. We generally get better throughput as hardware improves,
but waiting for hardware improvements is not sufficient for completing realistic
simulations in an acceptable amount of time. In \secref{sec:performance} we
already discussed some general methods to get the most out of your hardware.
Here, we will discuss some strategies that are specific to LFT.
\subsection{Using noise vectors}\label{sec:noiseVector}
Here we discuss a strategy to lessen the burden of computing the trace of some
operator, $\tr X$. This can be applied to correlators, but it's also useful for
operators like the number density, which we will show in \secref{sec:QCDtherm}
can be computed like $\tr D^{-1}$.
If we were to compute this particular trace
naively, we would need to compute propagators for $4N_cV$ point sources,
which of course is too expensive.
We will instead estimate the trace using a set of
noisy vectors, sometimes called the {\it Hutchinson
method}~\cite{hutchinson_stochastic_1990}.
In particular, consider $N$ uncorrelated complex random numbers
$\eta_i=\eta_R+i\eta_I$. Let $\eta$ be the vector formed from these numbers, and
define $\eta_R$ and $\eta_I$ similarly. We demand they have unit variance, i.e.
\begin{equation}\label{eq:noisenorm}
\ev{\eta_i\eta_j^*}_\eta = \delta_{ij}.
\end{equation}
This normalization condition can be satisfied using, e.g., Gaussian random
numbers, drawn according to
\begin{equation}
\pr{\eta}=\pi^{-N}\dd{\eta_R}\dd{\eta_I}\exp\left(-\eta^\dagger\eta\right),
\end{equation}
which can be seen by integrating $\eta_i\eta_j$ over the above measure.
The expectation value \equatref{eq:noisenorm} can be approximated in the usual
way, i.e. by drawing $\eta$ from the above distribution $K$ times,
\begin{equation}
\delta_{ij}\approx\frac{1}{K}\sum_k\eta_i^{k}\left(\eta_j^{k}\right)^*,
\end{equation}
which means traces can be estimated as
\begin{equation}
\tr X= X_{ij}\delta_{ij} \approx\frac{1}{K}\sum_k\left(\eta^k\right)^\dagger X\eta^{k}.
\end{equation}
This trick reduces the computational complexity from $\order{144V^2}$ to
$\order{12KV}$. It is sometimes called a {\it noisy
estimator}\index{estimator!noisy} of the trace.
We have explicitly indicated the dependence of the computational burden on
metaparameters such as the number of noise vectors. The burden also increases
with the accuracy of the solve. One thus faces the common dilemma of
trading accuracy for computational effort. We therefore discuss next a trick to
reduce solve times without sacrificing too much accuracy, the {\it truncated
solver method}\index{truncated solver method} (TSM)~\cite{bali_effective_2010}.
The idea is quite simple: We estimate $\tr X$ using two solve tolerances, a fine
solve carried out to high accuracy, say to machine precision, and a sloppy solve
where we terminate the inversion iterations early. Computing $\tr X$ using the
latter solves only would be a biased estimate, as we did not permit the
inversion to run fully to completion. To correct for the bias, we decompose
\begin{equation}\label{eq:TSM}
\tr X\approx \frac{1}{K_{\rm sloppy}}\sum_{k=1}^{K_{\rm sloppy}}
\left(\eta^k\right)^\dagger X_{\rm sloppy}\eta^{k}
+\frac{1}{K_{\rm fine}}\sum_{k=K_{\rm sloppy}+1}^{K_{\rm sloppy}+K_{\rm fine}}
\left(\eta^k\right)^\dagger
\left(X_{\rm fine}-X_{\rm sloppy}\right)\eta^{k},
\end{equation}
where the subscripts ``sloppy" and ``fine" indicate the accuracy to which
$\left(\eta^k\right)^\dagger X\eta^{k}$ was determined. The second term of
\equatref{eq:TSM} removes the bias from the sloppy solve. The idea then is to
choose $K_{\rm sloppy}$ and $K_{\rm fine}$ such that $K_{\rm sloppy}\gg K_{\rm
fine}$ while keeping the estimate for the bias correction reliable.
Sometimes, $K_{\rm fine}=1$ is sufficient.
\subsection{Deflation}\label{sec:deflation}
Consider a matrix $X$. {\it Deflation} refers to the process of constructing a
new matrix $X_{\rm deflated}$ using some subspace of the column space of $X$.
In the context of inverting $D$, one often uses deflation to
isolate the subspace formed by its low eigenmodes, $L$; the remaining subspace
is then $R={\rm range}~D-L$, and the Dirac matrix restricted to this
subspace is the deflated matrix $D_{\rm deflated}$. Since $D_{\rm deflated}$ has
the low eigenmodes excluded, its condition number is much better, and hence it
is easier to invert. The spectrum of $D$ is such that its eigenvalues tend to
increase rapidly, and hence ${\rm dim}~L$ tends to be comparatively quite small.
Finding the eigenvectors of $D_L$ then becomes computationally tractable.
\subsection{Point and wall sources}
In \equatref{eq:mesonCorrFormula} we showed a formula for computing a connected
meson correlator. The inversion of $D$ makes these correlators highly
computationally demanding. Moreover, these correlators can often be quite noisy,
especially for large Euclidean time separations. Here we discuss some commonly
used strategies to ameliorate these difficulties somewhat. Again we follow
Ref.~\cite{gattringer_quantum_2010}, so you can always look there fore more
details. We will assume $m_u=m_d\equiv m_l$, i.e. we consider the isospin
symmetric case, a case that is ubiquitous in modern lattice calculations and
allows for useful simplifications.
Let us start by discussing a general strategy for correlator calculation,
using our connected meson correlator example. First of all, note that while $D$
is generally at worst band diagonal, $D^{-1}$ is not, and it has\footnote{In
\dimens{4}, in the Dirac representation.} $16N_c^2V^2$ components. Some modern
lattice calculations that go to incredibly fine spacings require lattice sizes
like $144^3\times288$. Ignoring gauge fields, this would require about 1 ZB of
memory, which is obviously not possible to store, and even if it were, a direct
numerical inversion of $D$ would clearly be intractable.\footnote{And even if
this were doable, it would definitely be inefficient, as many entries of
$D^{-1}$ will be highly correlated with each other, effectively meaning one
wastes storage on redundant information.}
One way to proceed is to find a strategy to work out $D^{-1}(x,y)$, looping over
all possibilities for $x$ and $y$. This is an example of an {\it all-to-all}
propagator\index{propagator!all-to-all}.\footnote{The term ``all-to-all" can also be used
when propagating between all possible spatial sites of two time-slices. Such a set
up is relevant for spectroscopy.}
For our first easy performance gain, we note that \equatref{eq:mesonCorrFormula}
requires both the forward and backward propagator. One can avoid two inversions by
leveraging the $\gamma_5$-hermiticity of $D^{-1}$. So for instance if
$\Gamma=\gamma_5$ one can rewrite
the fermionic expectation value as
\begin{equation}
\ev{X(x)\bar{X}(y)}_F=-\sum_{\alpha,\beta,a,b}\left|D^{-1}(x,y)_{\alpha\beta
ab}\right|^2.
\end{equation}
Thus for such isospin-symmetric correlators, one needs carry out only one
inversion of $D$.
Inverting large matrices is a common problem encountered in computing contexts,
so it makes sense to transfer what has been learned there to LQCD. What is often
extremely efficient is to, effectively, try to determine $D^{-1}$ one column at
a time. We can isolate a column by hitting a vector with $D^{-1}$, i.e. we can
try to compute $G\equiv D^{-1}S$ for some {\it source vector}\index{source} $S$.
Equivalently, we want to solve the system of equations
\begin{equation}\label{eq:invertSchematic}
DG=S.
\end{equation}
There are many algorithms on the market for solving
\equatref{eq:invertSchematic}. The most popular by far in LQCD contexts are
variants of the conjugate gradient\index{conjugate gradient} method.
Next we discuss different possibilities for sources $S$. One has considerable
freedom in the construction of $S$, so long as one avoids changing the quantum
numbers and symmetries of the wavefunction. Perhaps most
straightforward is the {\it point source},\index{source!point}
\begin{equation}\label{eq:pointSource}
S_{\alpha a}(x)=\delta_{\alpha \alpha'}\delta_{a a'}\delta(x-x').
\end{equation}
Applying the propagator to a point source connects all sites of the lattice to
the site $x$; this is therefore sometimes referred to as a {\it one-to-all}
propagator\index{propagator!one-to-all}. Another possibility is the {\it wall
source}\index{source!wall}
\begin{equation}\label{eq:wallSource}
S_{\alpha a}(\vec{x},t)\propto 1\delta(t-t'),
\end{equation}
which can be viewed as an incoherent sum over all spatial sites on time-slice
$t$. Depending on the interpolator one is interested in, it may be beneficial to
scale this by e.g. a Gaussian.
As mentioned in \secref{sec:interpolators}, one can perform a Fourier transform
to project out a state of definite momentum. This transform amounts to a
weighted average over spatial sites at the sink. The variance of this average should
decrease with increasing spatial volume, which is sometimes referred to as {\it
volume averaging}\index{volume averaging}. Due to translational invariance,
which is inherited from the periodic BCs, one can heuristically see that
thoughtfully chosen wall sources can boost the correlator's signal.
Along the lines of signal boosting, one can implement {\it random
sources}\index{source!random}. This is the same strategy that
was presented in \secref{sec:noiseVector}: One generates an ensemble of noise
vectors $\{\eta^k\}$ satisfying
\begin{equation}
\ev{\eta(x)\otimes\eta^\dagger(y)}_\eta=\delta(x-y)
\end{equation}
and then takes $S(x)=\eta(x)$.
Since propagators decrease exponentially with separation, i.e.