-
Notifications
You must be signed in to change notification settings - Fork 7
Expand file tree
/
Copy pathrandmisc.html
More file actions
1406 lines (1154 loc) · 153 KB
/
randmisc.html
File metadata and controls
1406 lines (1154 loc) · 153 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html><html xmlns:dc="http://purl.org/dc/terms/" xmlns:og="http://ogp.me/ns#" ><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><title>Miscellaneous Observations on Randomization</title><meta name="citation_pdf_url" content="https://peteroupc.github.io/randmisc.pdf"><meta name="citation_url" content="https://peteroupc.github.io/randmisc.html"><meta name="citation_title" content="Miscellaneous Observations on Randomization"><meta name="dc.date" content="2026-04-12"><meta name="citation_date" content="2026/04/12"><meta name="citation_publication_date" content="2026/04/12"><meta name="citation_online_date" content="2026/04/12"><meta name="og:title" content="Miscellaneous Observations on Randomization"><meta name="og:type" content="article"><meta name="og:url" content="https://peteroupc.github.io/randmisc.html"><meta name="og:site_name" content="peteroupc.github.io"><meta name="dc.format" content="text/html"><meta name="dc.language" content="en"><meta name="title" content="Miscellaneous Observations on Randomization"><meta name="dc.title" content="Miscellaneous Observations on Randomization"><meta name="twitter:title" content="Miscellaneous Observations on Randomization"><meta name="dc.creator" content="Peter Occil"/><meta name="author" content="Peter Occil"/><meta name="citation_author" content="Peter Occil"/><meta name="viewport" content="width=device-width"><link rel=stylesheet type="text/css" href="/style.css">
<script type="text/x-mathjax-config"> MathJax.Hub.Config({"HTML-CSS": { availableFonts: ["STIX","TeX"], linebreaks: { automatic:true }, preferredFont: "TeX" },
tex2jax: { displayMath: [ ["$$","$$"], ["\\[", "\\]"] ], inlineMath: [ ["$", "$"], ["\\\\(","\\\\)"] ], processEscapes: true } });
</script><script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS_HTML-full"></script></head><body> <div class="header">
<nav><p><a href="#navigation">Menu</a> - <a href="#top">Top</a> - <a href="/">Home</a></nav></div>
<div class="mainarea" id="top">
<h1 id="miscellaneous-observations-on-randomization">Miscellaneous Observations on Randomization</h1><p>This version of the document is dated 2026-04-12. <a href='https://github.com/peteroupc/peteroupc.github.io/issues'>Post an issue or comment on this document</a>.</p>
<p><a href="mailto:poccil14@gmail.com"><strong>Peter Occil</strong></a></p>
<p>This page should be read in conjunction with the following articles:</p>
<ul>
<li><a href="https://peteroupc.github.io/randomfunc.html"><strong>Randomization and Sampling Methods</strong></a>.</li>
<li><a href="https://peteroupc.github.io/randomnotes.html"><strong>More Random Sampling Methods</strong></a>.</li>
</ul>
<p><a id="Contents"></a></p>
<h2 id="contents">Contents</h2>
<ul>
<li><a href="#Contents"><strong>Contents</strong></a></li>
<li><a href="#About_This_Document"><strong>About This Document</strong></a></li>
<li><a href="#Samplers_for_Certain_Discrete_Distributions"><strong>Samplers for Certain Discrete Distributions</strong></a>
<ul>
<li><a href="#On_a_Binomial_Sampler"><strong>On a Binomial Sampler</strong></a></li>
<li><a href="#On_Geometric_Samplers"><strong>On Geometric Samplers</strong></a>
<ul>
<li><a href="#Bounded_Geometric_Distribution"><strong>Bounded Geometric Distribution</strong></a></li>
<li><a href="#Symmetric_Geometric_Distribution"><strong>Symmetric Geometric Distribution</strong></a></li>
</ul>
</li>
<li><a href="#Weighted_Choice_for_Special_Distributions"><strong>Weighted Choice for Special Distributions</strong></a>
<ul>
<li><a href="#Weighted_Choice_with_Weights_Written_as_an_Integer_and_Fraction"><strong>Weighted Choice with Weights Written as an Integer and Fraction</strong></a></li>
<li><a href="#Distributions_with_nowhere_increasing_or_nowhere_decreasing_weights"><strong>Distributions with nowhere increasing or nowhere decreasing weights</strong></a></li>
<li><a href="#Unimodal_distributions_of_weights"><strong>Unimodal distributions of weights</strong></a></li>
<li><a href="#Weighted_Choice_with_Log_Probabilities"><strong>Weighted Choice with Log Probabilities</strong></a></li>
</ul>
</li>
<li><a href="#Bernoulli_Distribution_for_Cumulative_Distribution_Functions"><strong>Bernoulli Distribution for Cumulative Distribution Functions</strong></a></li>
<li><a href="#Bit_Vectors_with_Random_Bit_Flips"><strong>Bit Vectors with Random Bit Flips</strong></a></li>
<li><a href="#Log_Uniform_Distribution"><strong>Log-Uniform Distribution</strong></a></li>
</ul>
</li>
<li><a href="#Sampling_Unbounded_Monotone_Density_Functions"><strong>Sampling Unbounded Monotone Density Functions</strong></a></li>
<li><a href="#Certain_Families_of_Distributions"><strong>Certain Families of Distributions</strong></a></li>
<li><a href="#Certain_Distributions"><strong>Certain Distributions</strong></a></li>
<li><a href="#Random_Variate_Generation_via_Quantiles"><strong>Random Variate Generation via Quantiles</strong></a></li>
<li><a href="#Batching_Random_Samples_via_Randomness_Extraction"><strong>Batching Random Samples via Randomness Extraction</strong></a></li>
<li><a href="#Sampling_Distributions_Using_Incomplete_Information"><strong>Sampling Distributions Using Incomplete Information</strong></a>
<ul>
<li><a href="#Additional_Algorithms"><strong>Additional Algorithms</strong></a></li>
</ul>
</li>
<li><a href="#Acknowledgments"><strong>Acknowledgments</strong></a></li>
<li><a href="#Notes"><strong>Notes</strong></a></li>
<li><a href="#License"><strong>License</strong></a></li>
</ul>
<p><a id="About_This_Document"></a></p>
<h2 id="about-this-document">About This Document</h2>
<p><strong>This is an open-source document; for an updated version, see the</strong> <a href="https://github.com/peteroupc/peteroupc.github.io/raw/master/randmisc.md"><strong>source code</strong></a> <strong>or its</strong> <a href="https://github.com/peteroupc/peteroupc.github.io/blob/master/randmisc.md"><strong>rendering on GitHub</strong></a><strong>. You can send comments on this document on the</strong> <a href="https://github.com/peteroupc/peteroupc.github.io/issues"><strong>GitHub issues page</strong></a><strong>.</strong></p>
<p>My audience for this article is <strong>computer programmers with mathematics knowledge, but little or no familiarity with calculus</strong>.</p>
<p>I encourage readers to implement any of the algorithms given in this page, and report their implementation experiences. In particular, <a href="https://github.com/peteroupc/peteroupc.github.io/issues/18"><strong>I seek comments on the following aspects</strong></a>:</p>
<ul>
<li>Are the algorithms in this article easy to implement? Is each algorithm written so that someone could write code for that algorithm after reading the article?</li>
<li>Does this article have errors that should be corrected?</li>
<li>Are there ways to make this article more useful to the target audience?</li>
</ul>
<p>Comments on other aspects of this document are welcome.</p>
<p><a id="Samplers_for_Certain_Discrete_Distributions"></a></p>
<h2 id="samplers-for-certain-discrete-distributions">Samplers for Certain Discrete Distributions</h2>
<p>The following are exact samplers for certain <em>discrete distributions</em>, or probability distributions that take on values each mappable to a different integer.</p>
<p><a id="On_a_Binomial_Sampler"></a></p>
<h3 id="on-a-binomial-sampler">On a Binomial Sampler</h3>
<p>The binomial(<em>n</em>, <em>p</em>) distribution models the number of successful trials (“coin flips”) out of <em>n</em> of them, where the trials are independent and have success probability <em>p</em>.</p>
<p>Take the following sampler of a binomial(<em>n</em>, 1/2) distribution, where <em>n</em> is even, which is equivalent to the one that appeared in Bringmann et al. (2014)<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>, and adapted to be more programmer-friendly.</p>
<ol>
<li>If <em>n</em> is less than 4, generate <em>n</em> numbers that are each 1 or 0 with equal probability and return their sum. Otherwise, if <em>n</em> is odd<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>, set <em>ret</em> to the result of this algorithm with <em>n</em> = <em>n</em> − 1, then return <em>ret</em> plus a number that is 1 or 0 with equal probability.</li>
<li>Set <em>m</em> to floor(sqrt(<em>n</em>)) + 1.</li>
<li>(First, sample from an envelope of the binomial curve.) Generate numbers that are each 1 or 0 with equal probability until a zero is generated this way. Set <em>k</em> to the number of ones generated this way.</li>
<li>Set <em>s</em> to an integer in [0, <em>m</em>) chosen uniformly at random, then set <em>i</em> to <em>k</em>*<em>m</em> + <em>s</em>.</li>
<li>Generate 1 or 0 with equal probability. If that bit is 0, set <em>ret</em> to (<em>n</em>/2)+<em>i</em>. Otherwise, set <em>ret</em> to (<em>n</em>/2)−<em>i</em>−1.</li>
<li>(Second, accept or reject <em>ret</em>.) If <em>ret</em> < 0 or <em>ret</em> > <em>n</em>, go to step 3.</li>
<li>With probability choose(<em>n</em>, <em>ret</em>)*<em>m</em>*2<sup><em>k</em>−<em>n</em>−2</sup>, return <em>ret</em>. Otherwise, go to step 3. (Here, choose(<em>n</em>, <em>k</em>) is a <em>binomial coefficient</em>, or the number of ways to choose <em>k</em> out of <em>n</em> labeled items.<sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>)</li>
</ol>
<p>This algorithm has an acceptance rate of 1/16 regardless of the value of <em>n</em>. However, step 7 will generally require a growing amount of storage and time to exactly calculate the specified probability as <em>n</em> gets larger, notably due to the inherent factorial in the binomial coefficient. The Bringmann paper suggests approximating this factorial via Spouge’s approximation; however, it seems hard to do so without using floating-point arithmetic, which the paper ultimately resorts to. Alternatively, the logarithm of that probability can be calculated, then 0 minus an exponential random variate can be generated and compared with that logarithm to determine whether the step succeeds.</p>
<p>More specifically, step 7 can be changed as follows:</p>
<ul>
<li>(7.) Let <em>p</em> be loggamma(<em>n</em>+1)−loggamma(<em>ret</em>+1)−loggamma((<em>n</em>−<em>ret</em>)+1)+ln(<em>m</em>)+ln(2)*(<em>k</em>−<em>n</em>−2) (where loggamma(<em>x</em>) is the logarithm of the gamma function).</li>
<li>(7a.) Generate an exponential random variate with rate 1 (which is the negative natural logarithm of a uniform(0,1) random variate). Set <em>h</em> to 0 minus that number.</li>
<li>(7b.) If <em>h</em> is greater than <em>p</em>, go to step 3. Otherwise, return <em>ret</em>. (This step can be replaced by calculating lower and upper bounds that converge to <em>p</em>. In that case, go to step 3 if <em>h</em> is greater than the upper bound, or return <em>ret</em> if <em>h</em> is less than the lower bound, or compute better bounds and repeat this step otherwise. See also chapter 4 of (Devroye 1986)<sup id="fnref:4"><a href="#fn:4" class="footnote" rel="footnote" role="doc-noteref">4</a></sup>.)</li>
</ul>
<p>My implementation of loggamma and the natural logarithm (<a href="https://peteroupc.github.io/betadist.py"><strong>betadist.py</strong></a>) relies on so-called “constructive reals” as well as a fast converging version of Stirling’s formula for the factorial’s natural logarithm (Schumacher 2016)<sup id="fnref:5"><a href="#fn:5" class="footnote" rel="footnote" role="doc-noteref">5</a></sup>.</p>
<p>Also, according to the Bringmann paper, <em>m</em> can be set such that <em>m</em> is in the interval [sqrt(<em>n</em>), sqrt(<em>n</em>)+3], so I implement step 1 by starting with <em>u</em> = 2<sup>floor((1+<em>β</em>(<em>n</em>))/2)</sup>, then calculating <em>v</em> = floor((<em>u</em>+floor(<em>n</em>/<em>u</em>))/2), <em>w</em> = <em>u</em>, <em>u</em> = <em>v</em> until <em>v</em> ≥ <em>w</em>, then setting <em>m</em> to <em>w</em> + 1. Here, <em>β</em>(<em>n</em>) = ceil(ln(<em>n</em>+1)/ln(2)), or alternatively the minimum number of bits needed to store <em>n</em> (with <em>β</em>(0) = 0).</p>
<blockquote>
<p><strong>Notes:</strong></p>
<ul>
<li>A binomial(<em>n</em>, 1/2) random variate, where <em>n</em> is odd<sup id="fnref:2:1"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>, can be generated as a binomial(<em>n</em>−1, 1/2) random variate plus either 1 or 0 with equal probability.</li>
<li>As pointed out by Farach-Colton and Tsai (2015)<sup id="fnref:6"><a href="#fn:6" class="footnote" rel="footnote" role="doc-noteref">6</a></sup>, a binomial(<em>n</em>, <em>p</em>) random variate, where <em>p</em> is in the interval (0, 1), can be generated using binomial(<em>n</em>, 1/2) numbers using a procedure equivalent to the following:
<ol>
<li>Set <em>k</em> to 0 and <em>ret</em> to 0.</li>
<li>If the binary digit at position <em>k</em> after the point in <em>p</em>’s binary expansion (that is, 0.bbbb… where each b is a zero or one) is 1, add a binomial(<em>n</em>, 1/2) random variate to <em>ret</em> and subtract the same variate from <em>n</em>; otherwise, set <em>n</em> to a binomial(<em>n</em>, 1/2) random variate.</li>
<li>If <em>n</em> is greater than 0, add 1 to <em>k</em> and go to step 2; otherwise, return <em>ret</em>. (Positions start at 0 where 0 is the most significant digit after the point, 1 is the next, etc.)</li>
</ol>
</li>
</ul>
</blockquote>
<p> </p>
<p><a id="On_Geometric_Samplers"></a></p>
<h3 id="on-geometric-samplers">On Geometric Samplers</h3>
<p>As used in Bringmann and Friedrich (2013)<sup id="fnref:7"><a href="#fn:7" class="footnote" rel="footnote" role="doc-noteref">7</a></sup>, a geometric(<em>p</em>) random variate expresses the number of failing trials before the first success, where each trial (“coin flip”) is independent and has success probability <em>p</em>, satisfying 0 < <em>p</em> ≤ 1.</p>
<blockquote>
<p><strong>Note</strong>: The terms “geometric distribution” and “geometric random variate” have conflicting meanings in academic works.</p>
</blockquote>
<p>The following algorithm is equivalent to the geometric(<em>px</em>/<em>py</em>) sampler that appeared in that paper, but adapted to be more programmer-friendly. The algorithm uses the rational number <em>px</em>/<em>py</em>, not an arbitrary real number <em>p</em>; some of the notes in this section indicate how to adapt the algorithm to an arbitrary <em>p</em>.</p>
<ol>
<li>Set <em>pn</em> to <em>px</em>, <em>k</em> to 0, and <em>d</em> to 0.</li>
<li>While <em>pn</em>*2 ≤ <em>py</em>, add 1 to <em>k</em> and multiply <em>pn</em> by 2. (Equivalent to finding the largest <em>k</em> ≥ 0 such that <em>p</em>*2<sup><em>k</em></sup> ≤ 1. For the case when <em>p</em> need not be rational, enough of its binary expansion can be calculated to carry out this step accurately, but in this case any <em>k</em> such that <em>p</em> is greater than 1/(2<sup><em>k</em>+2</sup>) and less than or equal to 1/(2<sup><em>k</em></sup>) will suffice, as the Bringmann paper points out.)</li>
<li>With probability (1−<em>px</em>/<em>py</em>)<sup>2<sup><em>k</em></sup></sup>, add 1 to <em>d</em> and repeat this step. (To simulate this probability, the first subalgorithm below can be used.)</li>
<li>Generate a uniform random integer in [0, 2<sup><em>k</em></sup>), call it <em>m</em>, then with probability (1−<em>px</em>/<em>py</em>)<sup><em>m</em></sup>, return <em>d</em>*2<sup><em>k</em></sup>+<em>m</em>. Otherwise, repeat this step. (The Bringmann paper, though, suggests to simulate this probability by sampling only as many bits of <em>m</em> as needed to do so, rather than just generating <em>m</em> in one go, then using the first subalgorithm on <em>m</em>. However, the implementation, given as the second subalgorithm below, is much more complicated and is not crucial for correctness.)</li>
</ol>
<p>The first subalgorithm returns 1 with probability (1−<em>px</em>/<em>py</em>)<sup><em>n</em></sup>, assuming that <em>n</em>*<em>px</em>/<em>py</em> ≤ 1. It implements the approach from the Bringmann paper by rewriting the probability using the binomial theorem. (More generally, to return 1 with probability (1−<em>p</em>)<sup><em>n</em></sup>, it’s enough to flip a coin that shows heads with probability <em>p</em>, <em>n</em> times or until it shows heads, whichever comes first, and then return either 1 if all the flips showed tails, or 0 otherwise. See also “<a href="https://peteroupc.github.io/bernoulli.html"><strong>Bernoulli Factory Algorithms</strong></a>”.)</p>
<ol>
<li>Set <em>pnum</em>, <em>pden</em>, and <em>j</em> to 1, then set <em>r</em> to 0, then set <em>qnum</em> to <em>px</em>, and <em>qden</em> to <em>py</em>, then set <em>i</em> to 2.</li>
<li>If <em>j</em> is greater than <em>n</em>, go to step 5.</li>
<li>If <em>j</em> is even<sup id="fnref:8"><a href="#fn:8" class="footnote" rel="footnote" role="doc-noteref">8</a></sup>, set <em>pnum</em> to <em>pnum</em>*<em>qden</em> + <em>pden</em>*<em>qnum</em>*choose(<em>n</em>,<em>j</em>). Otherwise, set <em>pnum</em> to <em>pnum</em>*<em>qden</em> − <em>pden</em>*<em>qnum</em>*choose(<em>n</em>,<em>j</em>).</li>
<li>Multiply <em>pden</em> by <em>qden</em>, then multiply <em>qnum</em> by <em>px</em>, then multiply <em>qden</em> by <em>py</em>, then add 1 to <em>j</em>.</li>
<li>If <em>j</em> is less than or equal to 2 and less than or equal to <em>n</em>, go to step 2.</li>
<li>Multiply <em>r</em> by 2, then set <em>r</em> to <em>r</em> plus either 1 or 0 with equal probability.</li>
<li>If <em>r</em> ≤ floor((<em>pnum</em>*<em>i</em>)/<em>pden</em>) − 2, return 1. If <em>r</em> ≥ floor((<em>pnum</em>*<em>i</em>)/<em>pden</em>) + 1, return 0. If neither is the case, multiply <em>i</em> by 2 and go to step 2.</li>
</ol>
<p>The second subalgorithm returns either an integer <em>m</em> that satisfies $0\le m\lt 2^k$, with probability (1−<em>px</em>/<em>py</em>)<sup><em>m</em></sup>, or −1 with the opposite probability. It assumes that $2^k (px/py)\le 1$.</p>
<ol>
<li>Set <em>r</em> and <em>m</em> to 0.</li>
<li>Set <em>b</em> to 0, then while <em>b</em> is less than <em>k</em>:
<ol>
<li>(Sum <em>b</em>+2 summands of the binomial equivalent of the desired probability. First, append an additional bit to <em>m</em>, from most to least significant.) Generate 1 or 0 with equal probability. If that bit is 1, add 2<sup><em>k</em>−<em>b</em></sup> to <em>m</em>.</li>
<li>(Now build up the binomial probability.) Set <em>pnum</em>, <em>pden</em>, and <em>j</em> to 1, then set <em>qnum</em> to <em>px</em>, and <em>qden</em> to <em>py</em>.</li>
<li>If <em>j</em> is greater than <em>m</em> or greater than <em>b</em> + 2, go to the sixth substep.</li>
<li>If <em>j</em> is even<sup id="fnref:8:1"><a href="#fn:8" class="footnote" rel="footnote" role="doc-noteref">8</a></sup>, set <em>pnum</em> to <em>pnum</em>*<em>qden</em> + <em>pden</em>*<em>qnum</em>*choose(<em>m</em>,<em>j</em>). Otherwise, set <em>pnum</em> to <em>pnum</em>*<em>qden</em> − <em>pden</em>*<em>qnum</em>*choose(<em>m</em>,<em>j</em>).</li>
<li>Multiply <em>pden</em> by <em>qden</em>, then multiply <em>qnum</em> by <em>px</em>, then multiply <em>qden</em> by <em>py</em>, then add 1 to <em>j</em>, then go to the third substep.</li>
<li>(Now check the probability.) Multiply <em>r</em> by 2, then set <em>r</em> to <em>r</em> plus either 1 or 0 with equal probability.</li>
<li>If <em>r</em> ≤ floor((<em>pnum</em>*2<sup><em>b</em></sup>)/<em>pden</em>) − 2, add a uniform random integer in [0, 2<sup><em>k</em>*<em>b</em></sup>) to <em>m</em> and return <em>m</em> (and, if requested, the number <em>k</em>−<em>b</em>−1). If <em>r</em> ≥ floor((<em>pnum</em>*2<sup><em>b</em></sup>)/<em>pden</em>) + 1, return −1 (and, if requested, an arbitrary value). If neither is the case, add 1 to <em>b</em>.</li>
</ol>
</li>
<li>Set <em>m</em> to <em>m</em> plus either 1 or 0 with equal probability. (At this point, <em>m</em> is fully sampled.)</li>
<li>Run the first subalgorithm with <em>n</em> = <em>m</em>, except in step 1 of that subalgorithm, set <em>r</em> to the value of <em>r</em> built up by this algorithm, rather than 0, and set <em>i</em> to 2<sup><em>k</em></sup>, rather than 2. If that subalgorithm returns 1, return <em>m</em> (and, if requested, the number −1). Otherwise, return −1 (and, if requested, an arbitrary value).</li>
</ol>
<p><a id="Bounded_Geometric_Distribution"></a></p>
<h4 id="bounded-geometric-distribution">Bounded Geometric Distribution</h4>
<p>As used in the Bringmann paper, a bounded geometric(<em>p</em>, <em>n</em>) random variate is a geometric(<em>p</em>) random variate or <em>n</em> (an integer greater than 0), whichever is less. The following algorithm is equivalent to the algorithm given in that paper, but adapted to be more programmer-friendly.</p>
<ol>
<li>Set <em>pn</em> to <em>px</em>, <em>k</em> to 0, <em>d</em> to 0, and <em>m2</em> to the smallest power of 2 that is greater than <em>n</em> (or equivalently, 2<sup><em>bits</em></sup> where <em>bits</em> is the minimum number of bits needed to store <em>n</em>).</li>
<li>While <em>pn</em>*2 ≤ <em>py</em>, add 1 to <em>k</em> and multiply <em>pn</em> by 2.</li>
<li>With probability (1−<em>px</em>/<em>py</em>)<sup>2<sup><em>k</em></sup></sup>, add 1 to <em>d</em> and then either return <em>n</em> if <em>d</em>*2<sup><em>k</em></sup> is greater than or equal to <em>m2</em>, or repeat this step if less. (To simulate this probability, the first subalgorithm above can be used.)</li>
<li>Generate a uniform random integer in [0, 2<sup><em>k</em></sup>), call it <em>m</em>, then with probability (1−<em>px</em>/<em>py</em>)<sup><em>m</em></sup>, return min(<em>n</em>, <em>d</em>*2<sup><em>k</em></sup>+<em>m</em>). In the Bringmann paper, this step is implemented in a manner equivalent to the following (this alternative implementation, though, is not crucial for correctness):
<ol>
<li>Run the second subalgorithm above, except return two values, rather than one, in the situations given in the subalgorithm. Call these two values <em>m</em> and <em>mbit</em>.</li>
<li>If <em>m</em> < 0, go to the first substep.</li>
<li>If <em>mbit</em> ≥ 0, set <em>m</em> to <em>m</em> plus 2<sup><em>mbit</em></sup> times either 1 or 0 with equal probability, and subtract 1 from <em>mbit</em>. If that bit is 1 or <em>mbit</em> < 0, go to the next substep; otherwise, repeat this substep.</li>
<li>Return <em>n</em> if <em>d</em>*2<sup><em>k</em></sup> is greater than or equal to <em>m2</em>.</li>
<li>Add a uniform random integer in [0, 2<sup><em>mbit</em>+1</sup>) to <em>m</em>, then return min(<em>n</em>, <em>d</em>*2<sup><em>k</em></sup>+<em>m</em>).</li>
</ol>
</li>
</ol>
<p><a id="Symmetric_Geometric_Distribution"></a></p>
<h4 id="symmetric-geometric-distribution">Symmetric Geometric Distribution</h4>
<p>Samples from the symmetric geometric distribution from (Ghosh et al. 2012)<sup id="fnref:9"><a href="#fn:9" class="footnote" rel="footnote" role="doc-noteref">9</a></sup>, with parameter <em>λ</em> (a real number satisfying 0 < <em>λ</em> ≤ 1), in the form of an input coin with unknown probability of heads of <em>λ</em>.</p>
<ol>
<li>Flip the input coin until it returns 1. Set <em>n</em> to the number of times the coin returned 0 this way.</li>
<li>Run a <strong>Bernoulli factory algorithm for 1/(2−<em>λ</em>)</strong>, using the input coin. If the run returns 1, return <em>n</em>. Otherwise, return −1 − <em>n</em>.</li>
</ol>
<p>This is similar to an algorithm mentioned in an appendix in Li (2021)<sup id="fnref:10"><a href="#fn:10" class="footnote" rel="footnote" role="doc-noteref">10</a></sup>, in which the input coin—</p>
<ul>
<li>has <em>λ</em> = 1−exp(−<em>ε</em>), where <em>ε</em> > 0, and</li>
<li>can be built as follows using another input coin: “Run the <strong>ExpMinus</strong> algorithm with parameter <em>ε</em>, then return 1 minus the result.”</li>
</ul>
<p>The algorithm of Li generates a variate from the <em>discrete Laplace distribution</em> with parameter <em>ε</em>, and Canonne et al. (2020)<sup id="fnref:11"><a href="#fn:11" class="footnote" rel="footnote" role="doc-noteref">11</a></sup> likewise gave an exact algorithm for that distribution where <em>ε</em> = <em>s</em>/<em>t</em> is a rational number, where <em>s</em> > 0 and <em>t</em> > 0 are integers, namely an algorithm equivalent to the following:</p>
<ol>
<li>Generate a uniform random integer <em>u</em> that satisfies 0 ≤ <em>u</em> < <em>t</em>.</li>
<li>Run the <strong>ExpMinus</strong> algorithm with parameter <em>u</em>/<em>t</em>. If it returns 0, go to step 1.</li>
<li>Run the <strong>ExpMinus</strong> algorithm with parameter 1, until a run returns 0, then set <em>n</em> to the number of times the algorithm returned 1 this way.</li>
<li>Set <em>y</em> to floor((<em>u</em>+<em>n</em>*<em>t</em>)/<em>s</em>).</li>
<li>Generate 1 or 0 with equal probability. If 0 was generated this way, return <em>y</em>. Otherwise, if <em>y</em> is 0, go to step 1. Otherwise, return −<em>y</em>.</li>
</ol>
<p><a id="Weighted_Choice_for_Special_Distributions"></a></p>
<h3 id="weighted-choice-for-special-distributions">Weighted Choice for Special Distributions</h3>
<p>The following are algorithms to sample items whose “weights” (which are related to the probability of sampling each item) are given in a special way. They supplement the section “<a href="https://peteroupc.github.io/randomfunc.html#Weighted_Choice"><strong>Weighted Choice</strong></a>” in my article “Randomization and Sampling Methods”.</p>
<p><a id="Weighted_Choice_with_Weights_Written_as_an_Integer_and_Fraction"></a></p>
<h4 id="weighted-choice-with-weights-written-as-an-integer-and-fraction">Weighted Choice with Weights Written as an Integer and Fraction</h4>
<p>Suppose there is a list called <em>weights</em>. This is a list of <em>n</em> weights, with labels starting at 0 and ending at <em>n</em>−1.</p>
<p>Each weight—</p>
<ol>
<li>can store an integer part <em>m</em> and have <em>ν</em> represent a “coin” that implements an algorithm that returns 1 (or outputs heads) with probability exactly equal to the fractional part <em>ν</em> (<em>m</em> ≥ 0, and 0 ≤ <em>ν</em> ≤ 1), or</li>
<li>can store a <a href="https://peteroupc.github.io/exporand.html"><strong>partially-sampled random number</strong></a> (PSRN), with the integer part equal to <em>m</em> and the fractional part equal to <em>ν</em> (<em>m</em> ≥ 0, and 0 ≤ <em>ν</em> ≤ 1), or</li>
<li>can store a rational number <em>x</em>/<em>y</em>, where <em>x</em>≥0 and <em>y</em>>0 are integers, such that <em>m</em> = floor(<em>x</em>/<em>y</em>) and <em>ν</em> = <em>x</em>/<em>y</em>−<em>m</em>.</li>
</ol>
<p>Given this list of weights, the following algorithm chooses an integer in [0, <em>n</em>) with probability proportional to its weight.</p>
<ol>
<li>Create an empty list, then for each weight starting with weight 0, append the weight’s integer part (<em>m</em>) plus 1 to that list. For example, if the weights are PSRNs written as [2.22…,0.001…,1.3…], in that order, the list will be [3, 1, 2], corresponding to integers 0, 1, and 2, in that order. Call the list just created the <em>rounded weights list</em>.</li>
<li>Choose an integer <em>i</em> with probability proportional to the weights in the rounded weights list. This can be done, for example, by taking the result of <strong>WeightedChoice</strong>(<em>list</em>), where <em>list</em> is the rounded weights list and <strong>WeightedChoice</strong> is given in “<a href="https://peteroupc.github.io/randomfunc.html#Weighted_Choice_With_Replacement"><strong>Randomization and Sampling Methods</strong></a>”. Let <em>w</em> be the original weight for integer <em>i</em>, and let <em>rw</em> be the rounded weight for integer <em>i</em> in the rounded weights list.</li>
<li>
<p>Generate <em>j</em>, a uniform random integer that is 0 or greater and less than <em>rw</em>. If <em>j</em> is less than <em>rw</em>−1, return <em>i</em>. Otherwise:</p>
<ul>
<li>If <em>w</em> is written as in case 1, earlier, flip the “coin” represented by <em>ν</em> (the weight’s fractional part). If it returns 1, return <em>i</em>. Otherwise, go to step 2.</li>
<li>If <em>w</em> is written as in case 2, run <strong>SampleGeometricBag</strong> on the PSRN. If the result is 1, return <em>i</em>. Otherwise, go to step 2.</li>
<li>If <em>w</em> is written as in case 3, let <em>r</em> = rem(<em>x</em>, <em>y</em>) = <em>x</em>−floor(<em>x</em>/<em>y</em>)*<em>y</em>, then with probability <em>r</em>/<em>y</em>, return <em>i</em>. (For example, generate <em>z</em>, a uniform random integer satisfying 0≤<em>z</em><<em>y</em>, then if <em>z</em><<em>r</em>, return <em>i</em>.) Otherwise, go to step 2.</li>
</ul>
</li>
</ol>
<p><a id="Distributions_with_nowhere_increasing_or_nowhere_decreasing_weights"></a></p>
<h4 id="distributions-with-nowhere-increasing-or-nowhere-decreasing-weights">Distributions with nowhere increasing or nowhere decreasing weights</h4>
<p>An algorithm for sampling an integer in the interval [<em>a</em>, <em>b</em>) with probability proportional to weights listed in <em>nowhere increasing</em> order (example: [10, 3, 2, 1, 1] when <em>a</em> = 0 and <em>b</em> = 5) can be implemented as follows (Chewi et al. 2022)<sup id="fnref:12"><a href="#fn:12" class="footnote" rel="footnote" role="doc-noteref">12</a></sup>. It has a logarithmic time complexity in terms of setup and sampling.</p>
<ul>
<li>Setup: Let <em>w</em>[<em>i</em>] be the weight for integer <em>i</em> (with <em>i</em> starting at <em>a</em>).
<ol>
<li>(Envelope weights.) Build a list <em>q</em> as follows: The first item is <em>w</em>[<em>a</em>], then set <em>j</em> to 1, then while <em>j</em> < <em>b</em>−<em>a</em>, append <em>w</em>[<em>a</em> + <em>j</em>] and multiply <em>j</em> by 2. The list <em>q</em>’s items should be rational numbers that equal the ideal values, if possible, or overestimate them if not.</li>
<li>(Envelope chunk weights.) Build a list <em>r</em> as follows: The first item is <em>q</em>[0], then set <em>j</em> to 1 and <em>m</em> to 1, then while <em>j</em> < <em>b</em>−<em>a</em>, append <em>q</em>[<em>m</em>]*min((<em>b</em>−<em>a</em>) − <em>j</em>, <em>j</em>) and multiply <em>j</em> by 2 and add 1 to <em>m</em>.</li>
<li>(Start and end points of each chunk.) Build a list <em>D</em> as follows: The first item is the list [<em>a</em>, <em>a</em>+1], then set <em>j</em> to 1, then while <em>j</em> < <em>n</em>, append the list [<em>j</em>, <em>j</em> + min((<em>b</em>−<em>a</em>) − <em>j</em>, <em>j</em>)] and multiply <em>j</em> by 2.</li>
</ol>
</li>
<li>Sampling:
<ol>
<li>Choose an integer in [0, <em>s</em>) with probability proportional to the weights in <em>r</em>, where <em>s</em> is the number of items in <em>r</em>. Call the chosen integer <em>k</em>.</li>
<li>Set <em>x</em> to an integer chosen uniformly at random such that <em>x</em> is greater than or equal to <em>D</em>[<em>k</em>][0] and is less than <em>D</em>[<em>k</em>][1].</li>
<li>With probability <em>w</em>[<em>x</em>] / <em>q</em>[<em>k</em>], return <em>x</em>. Otherwise, go to step 1.</li>
</ol>
</li>
</ul>
<p>For <em>nowhere decreasing</em> rather than nowhere increasing weights, the algorithm is as follows instead:</p>
<ul>
<li>Setup: Let <em>w</em>[<em>i</em>] be the weight for integer <em>i</em> (with <em>i</em> starting at <em>a</em>).
<ol>
<li>(Envelope weights.) Build a list <em>q</em> as follows: The first item is <em>w</em>[<em>b</em>−1], then set <em>j</em> to 1, then while <em>j</em> < (<em>b</em>−<em>a</em>), append <em>w</em>[<em>b</em>−1−<em>j</em>] and multiply <em>j</em> by 2. The list <em>q</em>’s items should be rational numbers that equal the ideal values, if possible, or overestimate them if not.</li>
<li>(Envelope chunk weights.) Build a list <em>r</em> as given in step 2 of the previous algorithm’s setup.</li>
<li>(Start and end points of each chunk.) Build a list <em>D</em> as follows: The first item is the list [<em>b</em>−1, <em>b</em>], then set <em>j</em> to 1, then while <em>j</em> < (<em>b</em>−<em>a</em>), append the list [(<em>b</em>−<em>j</em>) − min((<em>b</em>−<em>a</em>) − <em>j</em>, <em>j</em>), <em>b</em>−<em>j</em>] and multiply <em>j</em> by 2.</li>
</ol>
</li>
<li>The sampling is the same as for the previous algorithm.</li>
</ul>
<blockquote>
<p><strong>Notes:</strong></p>
<ol>
<li>The weights can be base-_β_ logarithms, especially since logarithms preserve order, but in this case the algorithm requires changes. In the setup step 2, replace “<em>q</em>[<em>m</em>]*min((<em>b</em>−<em>a</em>)” with “<em>q</em>[<em>m</em>]+ln(min((<em>b</em>−<em>a</em>))/ln(<em>β</em>)” (which is generally inexact unless <em>β</em> is 2); in sampling step 1, use an algorithm that takes base-_β_ logarithms as weights; and replace sampling step 3 with “Generate an exponential random variate with rate ln(<em>β</em>) (that is, the variate is <em>E</em>/ln(<em>β</em>) where <em>E</em> is an exponential random variate with rate 1). If that variate is greater than <em>q</em>[<em>k</em>] minus <em>w</em>[<em>x</em>], return <em>x</em>. Otherwise, go to step 1.”<br />Applying these modifications to this section’s algorithms can introduce numerical errors unless care is taken (see note 2). The same is true for running the unmodified algorithms with weights that are not rational numbers.</li>
<li>If an algorithm will operate on potentially irrational numbers, then to avoid numerical errors, it should store and operate on real numbers in the form of constructive reals or recursive reals (see, for example, Boehm 1987<sup id="fnref:13"><a href="#fn:13" class="footnote" rel="footnote" role="doc-noteref">13</a></sup>, 2020<sup id="fnref:14"><a href="#fn:14" class="footnote" rel="footnote" role="doc-noteref">14</a></sup>), or in the form of partially-sampled random numbers (PSRNs) together with algorithms with <a href="https://peteroupc.github.io/exporand.html#Properties"><strong>desirable properties for PSRN samplers</strong></a>.</li>
</ol>
</blockquote>
<p><a id="Unimodal_distributions_of_weights"></a></p>
<h4 id="unimodal-distributions-of-weights">Unimodal distributions of weights</h4>
<p>The following is an algorithm for sampling an integer in the interval [<em>a</em>, <em>b</em>) with probability proportional to a <em>unimodal distribution</em> of weights (that is, nowhere decreasing on the left and nowhere increasing on the right) (Chewi et al. 2022)<sup id="fnref:12:1"><a href="#fn:12" class="footnote" rel="footnote" role="doc-noteref">12</a></sup>. It assumes the mode (the point with the highest weight) is known. An example is [1, 3, 9, 4, 4] when <em>a</em> = 0 and <em>b</em> = 5, and the <em>mode</em> is 2, which corresponds to the weight 9. It has a logarithmic time complexity in terms of setup and sampling.</p>
<ul>
<li>Setup:
<ol>
<li>Find the point with the highest weight, such as via binary search. Call this point <em>mode</em>.</li>
<li>Run the setup for <em>nowhere decreasing</em> weights on the interval [<em>a</em>, <em>mode</em>), then run the setup for <em>nowhere increasing</em> weights on the interval [<em>mode</em>, <em>b</em>). Both setups are described in the previous section. Then, concatenate the two <em>q</em> lists into one, the two <em>r</em> lists into one, and the two <em>D</em> lists into one.</li>
</ol>
</li>
<li>The sampling is the same as for the algorithms in the previous section.</li>
</ul>
<p><a id="Weighted_Choice_with_Log_Probabilities"></a></p>
<h4 id="weighted-choice-with-log-probabilities">Weighted Choice with Log Probabilities</h4>
<p>Huijben et al. (2022)<sup id="fnref:15"><a href="#fn:15" class="footnote" rel="footnote" role="doc-noteref">15</a></sup> reviews the Gumbel max trick and Gumbel softmax distributions.</p>
<blockquote>
<p><strong>Note</strong>: Because these algorithms involve adding one real number to another and calculating <code>exp</code> of a real number, they can introduce numerical errors unless care is taken (see note 2 in “Distributions with nowhere increasing or nowhere decreasing weights”, earlier).</p>
</blockquote>
<p><strong>Weighted choice with the Gumbel max trick.</strong> Let <em>C</em>>0 be an unknown number. Then, given—</p>
<ul>
<li>a vector of the form [<em>p</em><sub>0</sub>, <em>p</em><sub>1</sub>, …, <em>p</em><sub><em>n</em></sub>], where <em>p</em><sub><em>i</em></sub> is a so-called “unnormalized log probability” of the form ln(<em>x</em>)+<em>C</em> (where <em>C</em> is a constant and <em>x</em> is the probability of getting <em>i</em>),</li>
</ul>
<p>an integer in the closed interval [0, <em>n</em>] can be sampled as follows:</p>
<ol>
<li>(“Gumbel”.) For each <em>p</em><sub><em>i</em></sub>, generate a “Gumbel variate” <em>G</em>, then set <em>q</em><sub><em>i</em></sub> to <em>p</em><sub><em>i</em></sub>+<em>G</em>. (A so-called “Gumbel variate” is distributed as −ln(−ln(<em>U</em>)), where <em>U</em> is a uniform random variate greater than 0 and less than 1.<sup id="fnref:16"><a href="#fn:16" class="footnote" rel="footnote" role="doc-noteref">16</a></sup>)</li>
<li>(“Max”.) Return the integer <em>i</em> corresponding to the highest <em>q</em><sub><em>i</em></sub> value.</li>
</ol>
<blockquote>
<p><strong>Note:</strong> “Gumbel top <em>k</em> sampling” samples <em>k</em> items according to their “unnormalized log probabilities” (see Fig. 7 of Huijben et al. (2022)<sup id="fnref:15:1"><a href="#fn:15" class="footnote" rel="footnote" role="doc-noteref">15</a></sup>); this sampling works by doing step 1, then choosing the <em>k</em> integers corresponding to the <em>k</em> highest <em>q</em><sub><em>i</em></sub> values. With this sampling, though, the probability of getting <em>i</em> (if the plain Gumbel max trick were used) is not necessarily the probability that <em>i</em> is included in the <em>k</em>-item sample (Tillé 2023)<sup id="fnref:17"><a href="#fn:17" class="footnote" rel="footnote" role="doc-noteref">17</a></sup>.</p>
</blockquote>
<p><strong>Weighted choice with the Gumbel softmax trick.</strong> Given a vector described earlier as well as a “temperature” parameter <em>λ</em> > 0, a “continuous relaxation” or “concrete distribution” (which transforms the vector to a new one) can be sampled as follows:</p>
<ol>
<li>(“Gumbel”.) For each <em>p</em><sub><em>i</em></sub>, generate a “Gumbel variate” <em>G</em>, then set <em>q</em><sub><em>i</em></sub> to <em>p</em><sub><em>i</em></sub>+<em>G</em>.</li>
<li>(“Softmax”.) For each <em>q</em><sub><em>i</em></sub>, set it to exp(<em>q</em><sub><em>i</em></sub>/<em>λ</em>).</li>
<li>Set <em>d</em> to the sum of all values of <em>q</em><sub><em>i</em></sub>.</li>
<li>For each <em>q</em><sub><em>i</em></sub>, divide it by <em>d</em>.</li>
</ol>
<p>The algorithm’s result is a vector <em>q</em>, which can be used only once to sample <em>i</em> with probability proportional to <em>q</em><sub><em>i</em></sub> (which is not a “log probability”). (In this case, steps 3 and 4 above can be omitted if that sampling method can work with weights that need not sum to 1.)</p>
<p><a id="Bernoulli_Distribution_for_Cumulative_Distribution_Functions"></a></p>
<h3 id="bernoulli-distribution-for-cumulative-distribution-functions">Bernoulli Distribution for Cumulative Distribution Functions</h3>
<p>Suppose a real number <em>z</em> is given (which might be a partially-sampled random number [PSRN] or a rational number). If a probability distribution—</p>
<ul>
<li>has a probability density function (PDF) (as with the normal or exponential distribution), and</li>
<li>has an arbitrary-precision sampler that returns a PSRN <em>X</em>,</li>
</ul>
<p>then it’s possible to generate 1 with the same probability as the sampler returns an <em>X</em> that is less than or equal to <em>z</em>, as follows:</p>
<ol>
<li>Run the arbitrary-precision sampler to generate <em>X</em>, a uniform PSRN.</li>
<li>Run <strong>RandLess</strong> (if <em>z</em> is a PSRN) or <strong>RandLessThanReal</strong> (if <em>z</em> is a real number) with parameters <em>X</em> and <em>z</em>, in that order, and return the result.</li>
</ol>
<p>Specifically, the probability of returning 1 is the <em>cumulative distribution function</em> (CDF) for the distribution of <em>X</em>.</p>
<blockquote>
<p><strong>Notes:</strong></p>
<ol>
<li>Although step 2 of the algorithm checks whether <em>X</em> is merely less than <em>z</em>, this is still correct; because the distribution of <em>X</em> has a PDF, <em>X</em> is less than <em>z</em> with the same probability as <em>X</em> is less than or equal to <em>z</em>.</li>
<li>All probability distributions have a CDF, not just those with a PDF, but also discrete ones such as Poisson or binomial.</li>
</ol>
</blockquote>
<p><a id="Bit_Vectors_with_Random_Bit_Flips"></a></p>
<h3 id="bit-vectors-with-random-bit-flips">Bit Vectors with Random Bit Flips</h3>
<p>Chakraborty and Vardeman (2021)<sup id="fnref:18"><a href="#fn:18" class="footnote" rel="footnote" role="doc-noteref">18</a></sup> describes distributions of bit vectors with a random number of bit flips. Given three parameters — <em>μ</em> is a <em>p</em>-item vector (list) with only zeros and ones in any combination; <em>p</em> is the size of <em>μ</em>; and <em>α</em> is a spread parameter greater than 0 and less than 1 — do the following to generate such a vector:</p>
<ol>
<li>Generate a random integer <em>c</em> in the interval [0, <em>p</em>] in some way. (<em>c</em> need not be uniformly distributed. This is the number of bit flips.)</li>
<li>Create a <em>p</em>-item list <em>ν</em>, where the first <em>c</em> items are ones and the rest are zeros. <a href="https://peteroupc.github.io/randomfunc.html#Shuffling"><strong>Shuffle</strong></a> the list.</li>
<li>Create a copy of <em>μ</em>, call it <em>M</em>. Then for each <em>i</em> where <em>ν</em>[<em>i</em>] = 1, set <em>M</em>[<em>i</em>] to 1 − <em>M</em>[<em>i</em>]. Then return <em>M</em>.</li>
</ol>
<p>The paper describes two ways to establish the weights for <em>c</em> in step 1 (there are others as well):</p>
<ul>
<li>Generate <em>c</em> with probability proportional to the following weights: [<em>α</em><sup>0</sup>, <em>α</em><sup>1</sup>, …, <em>α</em><sup><em>p</em></sup>]. (Since each weight is 1 or less, this can be implemented as follows, for example. Generate a uniform random integer in [0, <em>p</em>], call it <em>d</em>, then flip a coin that shows heads with probability <em>α</em>, <em>d</em> times, then either return <em>d</em> if <em>d</em> is 0 or all the flips are heads, or repeat this process otherwise.)</li>
<li>Generate <em>c</em> with probability proportional to the following weights: [<em>α</em><sup>0</sup>*choose(<em>p</em>,0), <em>α</em><sup>1</sup>*choose(<em>p</em>,1), …, <em>α</em><sup><em>p</em></sup>*choose(<em>p</em>,<em>p</em>)]. (Since the sum of weights is no more than $2^p$, each weight can be divided by $2^p$ to get weights that are 1 or less, so that this can be implemented as follows, for example. Generate a uniform random integer in [0, <em>p</em>], call it <em>d</em>, then flip a coin that shows heads with probability <em>α</em>, <em>d</em> times, and a coin that shows heads with probability choose(<em>p</em>, <em>d</em>)/2<sup><em>p</em></sup> once, then either return <em>d</em> if all the flips are heads, or repeat this process otherwise. Note that the probability choose(<em>p</em>, <em>d</em>)/2<sup><em>p</em></sup> is simple to simulate for being a rational number.)</li>
</ul>
<p><a id="Log_Uniform_Distribution"></a></p>
<h3 id="log-uniform-distribution">Log-Uniform Distribution</h3>
<p>Samples from the so-called “log uniform distribution” as used by the Abseil programming library. This algorithm takes a maximum <em>mx</em> and a logarithmic base <em>b</em>, and chooses an integer in [0, <em>mx</em>] such that two values are chosen with the same probability if their base-_b_ logarithms are equal in their integer parts (which roughly means that lower numbers occur with an exponentially greater probability). Although this algorithm works, in principle, for every <em>b</em> > 0, Abseil supports only integer bases <em>b</em>.</p>
<ol>
<li>Let <em>L</em> be ceil(ln(<em>mx</em>+1)/ln(<em>b</em>)). Choose a uniform random integer in the closed interval [0, <em>L</em>], call it <em>u</em>.</li>
<li>If <em>u</em> is 0, return 0.</li>
<li>Set <em>st</em> to min(<em>mx</em>, ceil(<em>b</em><sup><em>u</em>−1</sup>)).</li>
<li>Set <em>en</em> to min(<em>mx</em>, ceil(<em>b</em><sup><em>u</em></sup>) − 1).</li>
<li>Choose a uniform random integer in the closed interval [<em>st</em>, <em>en</em>], and return it.</li>
</ol>
<p><a id="Sampling_Unbounded_Monotone_Density_Functions"></a></p>
<h2 id="sampling-unbounded-monotone-density-functions">Sampling Unbounded Monotone Density Functions</h2>
<p>This section shows a preprocessing algorithm to generate a random variate in the closed interval [0, 1] from a distribution whose probability density function (PDF)—</p>
<ul>
<li>is continuous in the interval [0, 1],</li>
<li>is strictly decreasing in [0, 1], and</li>
<li>has an unbounded peak at 0.</li>
</ul>
<p>The trick here is to sample the peak in such a way that the result is either forced to be 0 or forced to belong to the bounded part of the PDF. This algorithm does not require the area under the curve of the PDF in [0, 1] to be 1; in other words, this algorithm works even if the PDF is known up to a normalizing constant. The algorithm is as follows.</p>
<ol>
<li>Set <em>i</em> to 1.</li>
<li>Calculate the cumulative probability of the interval [0, 2<sup>−<em>i</em></sup>] and that of [0, 2<sup>−(<em>i</em> − 1)</sup>], call them <em>p</em> and <em>t</em>, respectively.</li>
<li>With probability <em>p</em>/<em>t</em>, add 1 to <em>i</em> and go to step 2. (Alternatively, if <em>i</em> is equal to or higher than the desired number of fractional bits in the result, return 0 instead of adding 1 and going to step 2.)</li>
<li>At this point, the PDF at [2<sup>−<em>i</em></sup>, 2<sup>−(<em>i</em> − 1)</sup>) is less than or equal to a finite number, so sample a random variate in this interval using any appropriate algorithm, including rejection sampling. Because the PDF is strictly decreasing, the peak of the PDF at this interval is located at 2<sup>−<em>i</em></sup>, so that rejection sampling becomes trivial.</li>
</ol>
<p>It is relatively straightforward to adapt this algorithm for strictly increasing PDFs with the unbounded peak at 1, or to PDFs with a different domain than [0, 1].</p>
<p>This algorithm is similar to the “inversion–rejection” algorithm mentioned in section 4.4 of chapter 7 of Devroye’s <em>Non-Uniform Random Variate Generation</em> (1986)<sup id="fnref:4:1"><a href="#fn:4" class="footnote" rel="footnote" role="doc-noteref">4</a></sup>. I was unaware of that algorithm at the time I started writing the text that became this section (Jul. 25, 2020). The difference here is that it assumes the whole distribution has support [0, 1] (“support” is defined later), while the algorithm presented in this article doesn’t make that assumption (for example, the interval [0, 1] can cover only part of the distribution’s support).</p>
<p>By the way, this algorithm arose while trying to devise an algorithm that can generate an integer power of a uniform random variate, with arbitrary precision, without actually calculating that power (a naïve calculation that is merely an approximation and usually introduces bias); for more information, see the article on <a href="https://peteroupc.github.io/exporand.html"><strong>partially-sampled random numbers</strong></a>. Even so, the algorithm I have come up with in this note may be of independent interest.</p>
<p>In the case of powers of a uniform random variate between 0 and 1, call the variate <em>X</em>, namely <em>X</em><sup><em>n</em></sup>, the ratio <em>p</em>/<em>t</em> in this algorithm has a very simple form, namely (1/2)<sup>1/<em>n</em></sup>. Note that this formula is the same regardless of <em>i</em>. (To return 1 with probability (1/2)<sup>1/<em>n</em></sup>, the algorithm for <strong>(<em>a</em>/<em>b</em>)<sup><em>z</em></sup></strong> in “<a href="https://peteroupc.github.io/bernoulli.html"><strong>Bernoulli Factory Algorithms</strong></a>” can be used with <em>a</em>=1, <em>b</em>=2, and <em>z</em>=1/<em>n</em>.) This is found by taking the PDF <em>f</em>(<em>x</em>) = <em>x</em><sup>1/<em>n</em></sup>/(<em>x</em> * <em>n</em>)</sup> and finding the appropriate <em>p</em>/<em>t</em> ratios by integrating <em>f</em> over the two intervals mentioned in step 2 of the algorithm.</p>
<p><a id="Certain_Families_of_Distributions"></a></p>
<h2 id="certain-families-of-distributions">Certain Families of Distributions</h2>
<p>This section is a note on certain families of univariate (one-variable) probability distributions, with emphasis on generating random variates from them. Some of these families are described in Ahmad et al. (2019)<sup id="fnref:19"><a href="#fn:19" class="footnote" rel="footnote" role="doc-noteref">19</a></sup>, Jones (2015)<sup id="fnref:20"><a href="#fn:20" class="footnote" rel="footnote" role="doc-noteref">20</a></sup>.</p>
<p>The following mathematical definitions are used:</p>
<ul>
<li>A probability distribution’s <em>quantile function</em> (also known as <em>inverse cumulative distribution function</em> or <em>inverse CDF</em>) is a nowhere decreasing function that maps uniform random variates greater than 0 and less than 1 to numbers that follow the distribution.</li>
<li>A probability distribution’s <em>support</em> is the set of values the distribution can take on, plus that set’s endpoints. For example, the beta distribution’s support is the closed interval [0, 1], and the normal distribution’s support is the entire real line.</li>
<li>The <em>zero-truncated Poisson</em> distribution: To generate a random variate that follows this distribution (with parameter <em>λ</em> > 0), generate random variates from the <a href="https://peteroupc.github.io/randomfunc.html#Poisson_Distribution"><strong>Poisson distribution</strong></a> with parameter <em>λ</em> until a variate other than 0 is generated this way, then take the last generated variate.</li>
</ul>
<p><strong>G families.</strong> In general, families of the form “X-G” (such as “beta-G” (Eugene et al., 2002)<sup id="fnref:21"><a href="#fn:21" class="footnote" rel="footnote" role="doc-noteref">21</a></sup>) use two distributions, X and G, where—</p>
<ul>
<li>X is a probability distribution whose support is the closed interval [0, 1], and</li>
<li>G is a probability distribution that should have an easy-to-compute quantile function.</li>
</ul>
<p>The following algorithm samples a random variate following a distribution from this kind of family:</p>
<ol>
<li>Generate a random variate that follows the distribution X. (Or generate a uniform <a href="https://peteroupc.github.io/exporand.html"><strong>partially-sampled random number (PSRN)</strong></a> that follows the distribution X.) Call the number <em>x</em>.</li>
<li>Calculate the quantile for G of <em>x</em>, and return that quantile. (If <em>x</em> is a uniform PSRN, see “Random Variate Generation via Quantiles”, later.)</li>
</ol>
<p>Certain special cases of the “X-G” families, such as the following, use a specially designed distribution for X:</p>
<ul>
<li>The <em>exp-G</em> family (Barreto-Souza and Simas 2010/2013)<sup id="fnref:22"><a href="#fn:22" class="footnote" rel="footnote" role="doc-noteref">22</a></sup>, where X is an exponential distribution, truncated to the interval [0, 1], with parameter <em>λ</em> ≥ 0; step 1 is modified to read: “Generate <em>U</em>, a uniform random variate between 0 and 1, then set <em>x</em> to −ln((exp(−<em>λ</em>)−1)*<em>U</em> + 1)/<em>λ</em> if <em>λ</em> != 0, and <em>U</em> otherwise.” (The <em>alpha power</em> or <em>alpha power transformed</em> family (Mahdavi and Kundu 2017)<sup id="fnref:23"><a href="#fn:23" class="footnote" rel="footnote" role="doc-noteref">23</a></sup> uses the same distribution for X, but with <em>λ</em>=−ln(<em>α</em>) where <em>α</em> is in (0, 1]; see also Jones (2018)<sup id="fnref:24"><a href="#fn:24" class="footnote" rel="footnote" role="doc-noteref">24</a></sup>.)</li>
<li>One family uses a shape parameter <em>a</em> > 0; step 1 is modified to read: “Generate <em>u</em>, a uniform random variate between 0 and 1, then set <em>x</em> to <em>u</em><sup>1/<em>a</em></sup>.” This family is mentioned in Lehmann (1953)<sup id="fnref:25"><a href="#fn:25" class="footnote" rel="footnote" role="doc-noteref">25</a></sup>, Durrans (1992)<sup id="fnref:26"><a href="#fn:26" class="footnote" rel="footnote" role="doc-noteref">26</a></sup>, and Mudholkar and Srivastava (1993)<sup id="fnref:27"><a href="#fn:27" class="footnote" rel="footnote" role="doc-noteref">27</a></sup>, which called it <em>exponentiated</em>.</li>
<li>The <em>transmuted-G</em> family (Shaw and Buckley 2007)<sup id="fnref:28"><a href="#fn:28" class="footnote" rel="footnote" role="doc-noteref">28</a></sup>. The family uses a shape parameter <em>η</em> satisfying −1 ≤ <em>η</em> ≤ 1; step 1 is modified to read: “Generate a piecewise linear random variate between 0 and 1 with weight 1−<em>η</em> at 0 and weight 1+<em>η</em> at 1, call the number <em>x</em>. (It can be generated as follows, see also (Devroye 1986, p. 71-72)<sup id="fnref:4:2"><a href="#fn:4" class="footnote" rel="footnote" role="doc-noteref">4</a></sup>: With probability min(1−<em>η</em>, 1+<em>η</em>), generate <em>x</em>, a uniform random variate between 0 and 1. Otherwise, generate two uniform random variates between 0 and 1, set <em>x</em> to the higher of the two, then if <em>η</em> is less than 0, set <em>x</em> to 1−<em>x</em>.)”. ((Granzotto et al. 2017)<sup id="fnref:29"><a href="#fn:29" class="footnote" rel="footnote" role="doc-noteref">29</a></sup> mentions the same distribution, but with a parameter <em>λ</em> = <em>η</em> + 1 satisfying 0 ≤ <em>λ</em> ≤ 2.)</li>
<li>A <em>cubic rank transmuted</em> distribution (Granzotto et al. 2017)<sup id="fnref:29:1"><a href="#fn:29" class="footnote" rel="footnote" role="doc-noteref">29</a></sup> uses parameters <em>λ</em><sub>0</sub> and <em>λ</em><sub>1</sub> in the interval [0, 1]; step 1 is modified to read: “Generate three uniform random variates between 0 and 1, then sort them in ascending order. Then, choose 1, 2, or 3 with probability proportional to these weights: [<em>λ</em><sub>0</sub>, <em>λ</em><sub>1</sub>, 3−<em>λ</em><sub>0</sub>−<em>λ</em><sub>1</sub>]. Then set <em>x</em> to the first, second, or third variate if 1, 2, or 3 is chosen this way, respectively.”</li>
<li>Biweight distribution (Al-Khazaleh and Alzoubi 2021)<sup id="fnref:52"><a href="#fn:52" class="footnote" rel="footnote" role="doc-noteref">30</a></sup>: Step 1 is modified to read: “Generate a uniform random variate <em>x</em> in [0, 1], then with probability (1−<em>x</em><sup>2</sup>)<sup>2</sup>, go to the next step. Otherwise, repeat this process.”; or “Create a uniform PSRN <em>x</em> with positive sign and integer part 0, then run <strong>SampleGeometricBag</strong> on that PSRN four times. If the first two results are not both 1 and if the last two results are not both 1, go to the next step; otherwise, repeat this process.”</li>
</ul>
<p><strong>Transformed–transformer family.</strong> In fact, the “X-G” families are a special case of the so-called “transformed–transformer” family of distributions introduced by Alzaatreh et al. (2013)<sup id="fnref:30"><a href="#fn:30" class="footnote" rel="footnote" role="doc-noteref">31</a></sup> that uses two distributions, X and G, where X (the “transformed”) is an arbitrary distribution with a probability density function; G (the “transformer”) is a distribution with an easy-to-compute quantile function; and <em>W</em> is a nowhere decreasing function that, among other conditions, maps a number in the closed interval [0, 1] to a number with the same support as X. The following algorithm samples a random variate from this kind of family:</p>
<ol>
<li>Generate a random variate that follows the distribution X. (Or generate a uniform PSRN that follows X.) Call the number <em>x</em>.</li>
<li>Calculate <em>w</em> = <em>W</em><sup>−1</sup>(<em>x</em>) (where <em>W</em><sup>−1</sup>(.) is the inverse of <em>W</em>), then calculate the quantile for G of <em>w</em> and return that quantile. (If <em>x</em> is a uniform PSRN, see “Random Variate Generation via Quantiles”, later.)</li>
</ol>
<p>The following are special cases of the “transformed–transformer” family:</p>
<ul>
<li>The “T-R{<em>Y</em>}” family (Aljarrah et al., 2014)<sup id="fnref:31"><a href="#fn:31" class="footnote" rel="footnote" role="doc-noteref">32</a></sup>, in which <em>T</em> is an arbitrary distribution with a PDF (X in the algorithm above), <em>R</em> is a distribution with an easy-to-compute quantile function (G in the algorithm above), and <em>W</em> is the quantile function for the distribution <em>Y</em>, whose support must contain the support of <em>T</em> (so that <em>W</em><sup>−1</sup>(<em>x</em>) is the cumulative distribution function for <em>Y</em>, or the probability that a <em>Y</em>-distributed number is <em>x</em> or less).</li>
<li>Several versions of <em>W</em> have been proposed for the case when distribution X’s support is [0, ∞), such as the Rayleigh and gamma distributions. They include:
<ul>
<li><em>W</em>(<em>x</em>) = −ln(1−<em>x</em>) (<em>W</em><sup>−1</sup>(<em>x</em>) = 1−exp(−<em>x</em>)). Suggested in the original paper by Alzaatreh et al.</li>
<li><em>W</em>(<em>x</em>) = <em>x</em>/(1−<em>x</em>) (<em>W</em><sup>−1</sup>(<em>x</em>) = <em>x</em>/(1+<em>x</em>)). Suggested in the original paper by Alzaatreh et al. This choice forms the so-called “odd X G” family, and one example is the “odd log-logistic G” family (Gleaton and Lynch 2006)<sup id="fnref:32"><a href="#fn:32" class="footnote" rel="footnote" role="doc-noteref">33</a></sup>.</li>
</ul>
</li>
</ul>
<blockquote>
<p><strong>Example:</strong> For the “generalized odd gamma-G” family (Hosseini et al. 2018)<sup id="fnref:33"><a href="#fn:33" class="footnote" rel="footnote" role="doc-noteref">34</a></sup>, X is the gamma(<em>α</em>) distribution, <em>W</em><sup>−1</sup>(<em>x</em>) = (<em>x</em>/(1+<em>x</em>))<sup>1/<em>β</em></sup>, G is arbitrary, <em>α</em>>0, and <em>β</em>>0.</p>
</blockquote>
<p>A family very similar to the “transformed–transformer” family uses a <em>decreasing</em> <em>W</em>.</p>
<ul>
<li>When distribution X’s support is [0, ∞), one such <em>W</em> that has been proposed is <em>W</em>(<em>x</em>) = −ln(<em>x</em>) (<em>W</em><sup>−1</sup>(<em>x</em>) = exp(−<em>x</em>); examples include the “Rayleigh-G” family or “Rayleigh–Rayleigh” distribution (Al Noor and Assi 2020)<sup id="fnref:34"><a href="#fn:34" class="footnote" rel="footnote" role="doc-noteref">35</a></sup>, as well as the “generalized gamma-G” family, where “generalized gamma” refers to the Stacy distribution (Boshi et al. 2020)<sup id="fnref:35"><a href="#fn:35" class="footnote" rel="footnote" role="doc-noteref">36</a></sup>).</li>
</ul>
<p><strong>Minimums, maximums, and sums.</strong> Some distributions are described as a minimum, maximum, or sum of <em>N</em> independent random variates distributed as <em>X</em>, where <em>N</em> ≥ 1 is an independent integer distributed as the discrete distribution <em>Y</em>.</p>
<ul>
<li>Tahir and Cordeiro (2016)<sup id="fnref:36"><a href="#fn:36" class="footnote" rel="footnote" role="doc-noteref">37</a></sup> calls a distribution of minimums a <em>compound distribution</em>, and a distribution of maximums a <em>complementary compound distribution</em>.</li>
<li>Pérez-Casany et al. (2016)<sup id="fnref:37"><a href="#fn:37" class="footnote" rel="footnote" role="doc-noteref">38</a></sup> calls a distribution of minimums or of maximums a <em>random-stopped extreme distribution</em>.</li>
<li>Let <em>S</em> be a sum of <em>N</em> variates as described earlier. Then Amponsah et al. (2021)<sup id="fnref:38"><a href="#fn:38" class="footnote" rel="footnote" role="doc-noteref">39</a></sup> describe the distribution of (<em>S</em>, <em>N</em>), a two-variable random variate often called an <em>episode</em>.</li>
<li>A distribution of sums can be called a <em>stopped-sum distribution</em> (Johnson et al. 2005)<sup id="fnref:39"><a href="#fn:39" class="footnote" rel="footnote" role="doc-noteref">40</a></sup>. (In this case, <em>N</em> can be 0 so that $N$ is zero or a positive integer distributed as <em>Y</em>.)</li>
</ul>
<p>A variate following a distribution of minimums or of maximums can be generated as follows (Duarte-López et al. 2021)<sup id="fnref:40"><a href="#fn:40" class="footnote" rel="footnote" role="doc-noteref">41</a></sup>:</p>
<ol>
<li>Generate a uniform random variate between 0 and 1. (Or generate a uniform PSRN with integer part 0, positive sign, and empty fractional part.) Call the number <em>x</em>.</li>
<li>For minimums, calculate the quantile for <em>X</em> of 1−<em>W</em><sup>−1</sup>(<em>x</em>) (where <em>W</em><sup>−1</sup>(.) is the inverse of <em>Y</em>’s probability generating function), and return that quantile.<sup id="fnref:41"><a href="#fn:41" class="footnote" rel="footnote" role="doc-noteref">42</a></sup> (If <em>x</em> is a uniform PSRN, see “Random Variate Generation via Quantiles”, later. <em>Y</em>’s probability generating function is <em>W</em>(<em>z</em>) = <em>a</em>[0]*<em>z</em><sup>0</sup> + <em>a</em>[1]*<em>z</em><sup>1</sup> + …, where 0 < <em>z</em> < 1 and <em>a</em>[<em>i</em>] is the probability that a <em>Y</em>-distributed variate equals <em>i</em>. See example below.)</li>
<li>For maximums, calculate the quantile for <em>X</em> of <em>W</em><sup>−1</sup>(<em>x</em>), and return that quantile.</li>
</ol>
<blockquote>
<p><strong>Examples:</strong></p>
<table>
<thead>
<tr>
<th>This distribution:</th>
<th>Is a distribution of:</th>
<th>Where <em>X</em> is:</th>
<th>And <em>Y</em> is:</th>
</tr>
</thead>
<tbody>
<tr>
<td>Geometric zero-truncated Poisson (Akdoğan et al., 2020)<sup id="fnref:42"><a href="#fn:42" class="footnote" rel="footnote" role="doc-noteref">43</a></sup>.</td>
<td>Maximums.</td>
<td>1 plus the number of failures before the first success, with each success having the same probability.</td>
<td>Zero-truncated Poisson.</td>
</tr>
<tr>
<td>GMDP(<em>α</em>, <em>β</em>, <em>δ</em>, <em>p</em>) (Amponsah et al. 2021)<sup id="fnref:38:1"><a href="#fn:38" class="footnote" rel="footnote" role="doc-noteref">39</a></sup> (<em>α</em>>0, <em>β</em>>0, <em>δ</em>>0, 0<<em>p</em><1).</td>
<td>(<em>S</em>, <em>N</em>) episodes.</td>
<td>Gamma(<em>α</em>) variate divided by <em>β</em>.</td>
<td>Discrete Pareto(<em>δ</em>, <em>p</em>) (see “Certain Distributions”).</td>
</tr>
<tr>
<td>Bivariate gamma geometric(<em>α</em>, <em>β</em>, <em>p</em>) (Barreto-Souza 2012)<sup id="fnref:43"><a href="#fn:43" class="footnote" rel="footnote" role="doc-noteref">44</a></sup> (<em>α</em>>0, <em>β</em>>0, 0<<em>p</em><1).</td>
<td>(<em>S</em>, <em>N</em>) episodes.</td>
<td>Gamma(<em>α</em>) variate divided by <em>β</em>.</td>
<td>1 plus the number of failures before the first success, with each success having probability <em>p</em>.</td>
</tr>
<tr>
<td>Exponential Poisson (Kuş 2007)<sup id="fnref:44"><a href="#fn:44" class="footnote" rel="footnote" role="doc-noteref">45</a></sup>.</td>
<td>Minimums.</td>
<td>Exponential.</td>
<td>Zero-truncated Poisson.</td>
</tr>
<tr>
<td>Poisson exponential (Cancho et al. 2011)<sup id="fnref:45"><a href="#fn:45" class="footnote" rel="footnote" role="doc-noteref">46</a></sup>.</td>
<td>Maximums.</td>
<td>Exponential.</td>
<td>Zero-truncated Poisson.</td>
</tr>
<tr>
<td>Right-truncated Weibull(<em>a</em>, <em>b</em>, <em>c</em>) (Jodrá 2020)<sup id="fnref:46"><a href="#fn:46" class="footnote" rel="footnote" role="doc-noteref">47</a></sup> (<em>a</em>, <em>b</em>, and <em>c</em> are greater than 0).</td>
<td>Minimums.</td>
<td>Power function(<em>b</em>, <em>c</em>).</td>
<td>Zero-truncated Poisson(<em>a</em>*<em>c</em><sup><em>b</em></sup>).</td>
</tr>
</tbody>
</table>
<p><strong>Example:</strong> If <em>Y</em> is zero-truncated Poisson with parameter <em>λ</em>, its probability generating function is $W(z)=\frac{1-\exp(z\lambda)}{1-\exp(\lambda)}$, and solving for <em>x</em> leads to its inverse: $W^{-1}(x)=\ln(1-x+x\times\exp(\lambda))/\lambda$.</p>
<p><strong>Note:</strong> Bivariate exponential geometric (Barreto-Souza 2012)<sup id="fnref:43:1"><a href="#fn:43" class="footnote" rel="footnote" role="doc-noteref">44</a></sup> is a special case of bivariate gamma geometric with <em>α</em>=1.</p>
</blockquote>
<p><strong>Inverse distributions.</strong> An <em>inverse X distribution</em> (or <em>inverted X distribution</em>) is generally the distribution of 1 divided by a random variate distributed as <em>X</em>. For example, an <em>inverse exponential</em> random variate (Keller and Kamath 1982)<sup id="fnref:47"><a href="#fn:47" class="footnote" rel="footnote" role="doc-noteref">48</a></sup> is 1 divided by an exponential random variate with rate 1 (and so is distributed as −1/ln(<em>U</em>) where <em>U</em> is a uniform random variate between 0 and 1) and may be multiplied by a parameter <em>θ</em> > 0.</p>
<p><strong>Weighted distributions.</strong> A <em>weighted X distribution</em> uses a distribution X and a weight function <em>w</em>(<em>x</em>) whose values lie in [0, 1] everywhere in X’s support. The following algorithm samples from a weighted distribution (see also (Devroye 1986, p. 47)<sup id="fnref:4:3"><a href="#fn:4" class="footnote" rel="footnote" role="doc-noteref">4</a></sup>):</p>
<ol>
<li>Generate a random variate that follows the distribution X. (Or generate a uniform PSRN that follows X.) Call the number <em>x</em>.</li>
<li>With probability <em>w</em>(<em>x</em>), return <em>x</em>. Otherwise, go to step 1.</li>
</ol>
<p>Some weighted distributions allow any weight function <em>w</em>(<em>x</em>) whose values are nonnegative everywhere in X’s support (Rao 1985)<sup id="fnref:48"><a href="#fn:48" class="footnote" rel="footnote" role="doc-noteref">49</a></sup>. (If <em>w</em>(<em>x</em>) = <em>x</em>, the distribution is often called a <em>length-biased</em> or <em>size-biased distribution</em>; if <em>w</em>(<em>x</em>) = <em>x</em><sup>2</sup>, <em>area-biased</em>.) Their probability density functions (PDFs) are proportional to the original PDFs multiplied by <em>w</em>(<em>x</em>).</p>
<p><strong>Inflated distributions.</strong> To generate an <em>inflated X</em> (also called <em>c-inflated X</em> or <em>c-adjusted X</em>) random variate with parameters <em>c</em> and <em>α</em>, generate—</p>
<ul>
<li><em>c</em> with probability <em>α</em>, and</li>
<li>a random variate distributed as X otherwise.</li>
</ul>
<p>For example, a <em>zero-inflated beta</em> random variate is 0 with probability <em>α</em> and a beta random variate otherwise (the parameter <em>c</em> is 0) (Ospina and Ferrari 2010)<sup id="fnref:49"><a href="#fn:49" class="footnote" rel="footnote" role="doc-noteref">50</a></sup> A zero-and-one inflated X distribution is 1 or 0 with probability <em>α</em> and distributed as X otherwise. For example, to generate a <em>zero-and-one-inflated unit Lindley</em> random variate (with parameters <em>α</em>, <em>θ</em>, and <em>p</em>) (Chakraborty and Bhattacharjee 2021)<sup id="fnref:50"><a href="#fn:50" class="footnote" rel="footnote" role="doc-noteref">51</a></sup>:</p>
<ol>
<li>With probability <em>α</em>, return a number that is 0 with probability <em>p</em> and 1 otherwise.</li>
<li>Generate a unit Lindley(<em>θ</em>) random variate, that is, generate <em>x</em>/(1+<em>x</em>) where <em>x</em> is a <a href="https://peteroupc.github.io/exporand.html#Lindley_Distribution_and_Lindley_Like_Mixtures"><strong>Lindley(<em>θ</em>) random variate</strong></a>.</li>
</ol>
<blockquote>
<p><strong>Note:</strong> A zero-inflated X distribution where X takes on 0 with probability 0 is also called a <em>hurdle distribution</em> (Mullahy 1986)<sup id="fnref:51"><a href="#fn:51" class="footnote" rel="footnote" role="doc-noteref">52</a></sup>.</p>
</blockquote>
<p><strong>Unit distributions.</strong> To generate a <em>unit X</em> random variate (where X is a distribution whose support is the positive real line), generate a random variate distributed as X, call it <em>x</em>, then return exp(−<em>x</em>) or 1 −exp(−<em>x</em>) (also known as “Type I” or “Type II”, respectively). For example, a unit gamma distribution is also known as the <em>Grassia distribution</em> (Grassia 1977)<sup id="fnref:52:1"><a href="#fn:52" class="footnote" rel="footnote" role="doc-noteref">30</a></sup>.</p>
<p><strong>CDF–quantile family.</strong> Given two distributions X and Y (which can be the same), a location parameter <em>μ</em> ≥ 0, and a dispersion parameter <em>σ</em>>0, a variate from this family of distributions can be generated as follows (Smithson and Shou 2019)<sup id="fnref:42:1"><a href="#fn:42" class="footnote" rel="footnote" role="doc-noteref">43</a></sup>:</p>
<ol>
<li>Generate a random variate that follows the distribution X. (Or generate a uniform PSRN that follows X.) Call the number <em>x</em>.</li>
<li>If distribution X’s support is the positive real line, calculate <em>x</em> as ln(<em>x</em>).</li>
<li>Calculate <em>z</em> as <em>μ</em>+<em>σ</em>*<em>x</em>.</li>
<li>If distribution Y’s support is the positive real line, calculate <em>z</em> as exp(<em>z</em>).</li>
<li>Return <em>H</em>(<em>z</em>).</li>
</ol>
<p>In this algorithm:</p>
<ul>
<li>X and Y are distributions that each have support on either the whole real line or the positive real line. However, the book intends X to have an easy-to-compute quantile function.</li>
<li><em>H</em>(<em>z</em>) is Y’s <em>cumulative distribution function</em>, or the probability that a Y-distributed random variate is <em>z</em> or less. The book likewise intends <em>H</em> to be easy to compute.</li>
</ul>
<blockquote>
<p><strong>Note:</strong> An important property for use in statistical estimation is <em>identifiability</em>. A family of distributions is <em>identifiable</em> if it has the property that if two parameter vectors (<em>θ</em><sub>1</sub> and <em>θ</em><sub>2</sub>) determine the same distribution, then <em>θ</em><sub>1</sub> must equal <em>θ</em><sub>2</sub>.</p>
</blockquote>
<p>Although many statistical distributions (especially single-variable ones) tend to be presented in research papers nowadays, they don’t end up in a serious application. Thus, no distribution introduced in 2023 or later will be mentioned in this article unless it is cited in a paper that—</p>
<ul>
<li>is not mainly about new properties of the distribution,</li>
<li>does not itself present a new statistical distribution, and</li>
<li>uses the distribution in a nontrivial application (much more than distribution fitting or a simulation study).</li>
</ul>
<p><a id="Certain_Distributions"></a></p>
<h2 id="certain-distributions">Certain Distributions</h2>
<p>In the following table, <em>U</em> is a uniform random variate between 0 and 1, and all random variates are independently generated.</p>
<table>
<thead>
<tr>
<th>This distribution:</th>
<th>Is distributed as:</th>
<th>And uses these parameters:</th>
</tr>
</thead>
<tbody>
<tr>
<td>Power function(<em>a</em>, <em>c</em>).</td>
<td><em>c</em>*<em>U</em><sup>1/<em>a</em></sup>.</td>
<td><em>a</em> > 0, <em>c</em> > 0.</td>
</tr>
<tr>
<td>Lehmann Weibull(<em>a1</em>, <em>a2</em>, <em>β</em>) (Elgohari and Yousof 2020)<sup id="fnref:53"><a href="#fn:53" class="footnote" rel="footnote" role="doc-noteref">53</a></sup>.</td>
<td>(ln(1/<em>U</em>)/<em>β</em>)<sup>1/<em>a1</em></sup>/<em>a2</em> or (<em>E</em>/<em>β</em>)<sup>1/<em>a1</em></sup>/<em>a2</em></td>
<td><em>a1</em>, <em>a2</em>, <em>β</em> > 0. <em>E</em> is an exponential random variate with rate 1.</td>
</tr>
<tr>
<td>Marshall–Olkin(<em>α</em>) (Marshall and Olkin 1997)<sup id="fnref:54"><a href="#fn:54" class="footnote" rel="footnote" role="doc-noteref">54</a></sup></td>
<td>(1−<em>U</em>)/(<em>U</em>*(<em>α</em>−1) + 1).</td>
<td><em>α</em> in [0, 1].</td>
</tr>
<tr>
<td>Slash (Rogers and Tukey 1972)<sup id="fnref:55"><a href="#fn:55" class="footnote" rel="footnote" role="doc-noteref">55</a></sup></td>
<td><em>N</em>/<em>U</em><sup>1/<em>q</em></sup>, where <em>N</em> is a normal(0, 1) random variate.</td>
<td><em>q</em> > 0.</td>
</tr>
<tr>
<td>Lomax(<em>α</em>).</td>
<td>(1−<em>U</em>)<sup>−1/<em>α</em></sup>−1.</td>
<td><em>α</em> > 0.</td>
</tr>
<tr>
<td>Power Lomax(<em>α</em>, <em>β</em>) (Rady et al. 2016)<sup id="fnref:56"><a href="#fn:56" class="footnote" rel="footnote" role="doc-noteref">56</a></sup>.</td>
<td><em>L</em><sup>1/<em>β</em></sup></td>
<td><em>β</em> > 0; <em>L</em> is Lomax(<em>α</em>).</td>
</tr>
<tr>
<td>Topp–Leone(<em>α</em>).</td>
<td>1−sqrt(1−<em>U</em><sup>1/<em>α</em></sup>).</td>
<td><em>α</em> > 0.</td>
</tr>
<tr>
<td>Bell–Touchard(<em>a</em>, <em>b</em>) (Castellares et al. 2020)<sup id="fnref:57"><a href="#fn:57" class="footnote" rel="footnote" role="doc-noteref">57</a></sup>.</td>
<td>Sum of <em>P</em> zero-truncated Poisson(<em>a</em>) random variates, where <em>P</em> is Poisson with parameter <em>b</em>*exp(<em>a</em>)−<em>b</em>.<sup id="fnref:58"><a href="#fn:58" class="footnote" rel="footnote" role="doc-noteref">58</a></sup></td>
<td><em>a</em>>0, <em>b</em>>0.</td>
</tr>
<tr>
<td>Bell(<em>a</em>) (Castellares et al. 2020)<sup id="fnref:57:1"><a href="#fn:57" class="footnote" rel="footnote" role="doc-noteref">57</a></sup>.</td>
<td>Bell–Touchard(<em>a</em>, 0).</td>
<td><em>a</em>>0.</td>
</tr>
<tr>
<td>Discrete Pareto(<em>δ</em>, <em>p</em>) (Buddana and Kozubowski 2014)<sup id="fnref:59"><a href="#fn:59" class="footnote" rel="footnote" role="doc-noteref">59</a></sup></td>
<td>1 plus the number of failures before the first success, with each success having probability 1−exp(−<em>Z</em>), where <em>Z</em> is a gamma(1/<em>δ</em>) variate times −<em>δ</em>*ln(1−<em>p</em>).</td>
<td><em>δ</em> > 0, and 0<<em>p</em><1.</td>
</tr>
<tr>
<td>Neyman type A(<em>δ</em>, <em>τ</em>) (Batsidis and Lemonte 2021)<sup id="fnref:60"><a href="#fn:60" class="footnote" rel="footnote" role="doc-noteref">60</a></sup></td>
<td>Bell–Touchard(<em>τ</em>, <em>δ</em>*exp(−<em>τ</em>)).</td>
<td><em>δ</em>>0, <em>τ</em>>0.</td>
</tr>
<tr>
<td>Gamma exponential (Kudryavtsev 2019)<sup id="fnref:61"><a href="#fn:61" class="footnote" rel="footnote" role="doc-noteref">61</a></sup>.</td>
<td><em>δ</em>*Gamma(<em>t</em>)<sup>1/<em>ν</em></sup>/Gamma(<em>s</em>)<sup><em>r</em>/<em>ν</em></sup>, where Gamma(<em>x</em>) is a gamma(<em>x</em>) variate.</td>
<td>0 ≤ <em>r</em> < 1; <em>ν</em> ≠ 0; <em>s</em>>0; <em>t</em>>0; <em>δ</em>>0.</td>
</tr>
<tr>
<td>Extended xgamma (Saha et al. 2019)<sup id="fnref:62"><a href="#fn:62" class="footnote" rel="footnote" role="doc-noteref">62</a></sup></td>
<td>Gamma(<em>α</em> + <em>c</em>) variate divided by <em>θ</em>, where <em>c</em> is either 0 with probability <em>θ</em>/(<em>θ</em>+<em>β</em>), or 2 otherwise.</td>
<td><em>θ</em>>0, <em>α</em>>0, <em>β</em> ≥ 0.</td>
</tr>
<tr>
<td>Generalized Pareto(<em>a</em>, <em>b</em>) (McNeil et al. 2010)<sup id="fnref:63"><a href="#fn:63" class="footnote" rel="footnote" role="doc-noteref">63</a></sup></td>
<td><em>a</em>*((1/(1−<em>U</em>))<sup><em>b</em></sup>−1)/<em>b</em>.</td>
<td><em>a</em>>0; <em>b</em>>0.</td>
</tr>
<tr>
<td>Skew symmetric or symmetry-modulated (Azzalini and Capitanio 2003)<sup id="fnref:64"><a href="#fn:64" class="footnote" rel="footnote" role="doc-noteref">64</a></sup>, (Azzalini 2022)<sup id="fnref:65"><a href="#fn:65" class="footnote" rel="footnote" role="doc-noteref">65</a></sup>.</td>
<td><em>Z</em> if <em>T</em> ≤ <em>w</em>(<em>Z</em>), or −<em>Z</em> otherwise.</td>
<td><em>Z</em> follows a symmetric distribution around 0; <em>T</em> follows a symmetric distribution (not necessarily around 0). <em>w</em>(<em>x</em>) satisfies −<em>w</em>(<em>x</em>) = <em>w</em>(−<em>x</em>).</td>
</tr>
<tr>
<td>Skew normal (Azzalini 1985)<sup id="fnref:66"><a href="#fn:66" class="footnote" rel="footnote" role="doc-noteref">66</a></sup>.</td>
<td>Skew symmetric with <em>Z</em> and <em>T</em> both separate Normal(0, 1) variates, and <em>w</em>(<em>x</em>) = <em>x</em>*<em>α</em>.</td>
<td><em>α</em> is a real number.</td>
</tr>
<tr>
<td>Logarithmic skew normal (Gómez-Déniz et al. 2020)<sup id="fnref:67"><a href="#fn:67" class="footnote" rel="footnote" role="doc-noteref">67</a></sup></td>
<td>exp(SNE(<em>λ</em>,<em>λ</em>)*<em>σ</em>+<em>μ</em>).</td>
<td><em>μ</em> and <em>λ</em> are real numbers; <em>σ</em> > 0. SNE is described later.</td>
</tr>
<tr>
<td>Tilted beta binomial (Hahn 2022)<sup id="fnref:68"><a href="#fn:68" class="footnote" rel="footnote" role="doc-noteref">68</a></sup></td>
<td>Binomial(<em>n</em>, Tilted-beta(<em>θ</em>, <em>v</em>, <em>α</em>, <em>β</em>)) variate.</td>
<td>0 ≤ <em>θ</em> ≤ 1; 0 ≤ <em>v</em> ≤ 1; <em>α</em>>0, <em>β</em>>0; $n$ is zero or a positive integer.</td>
</tr>
<tr>
<td>Two-piece distribution (Rubio and Steel 2020)<sup id="fnref:69"><a href="#fn:69" class="footnote" rel="footnote" role="doc-noteref">69</a></sup>.</td>
<td><em>μ</em> − abs(<em>Z</em>)*<em>sigma1</em> with probability <em>sigma1</em>/(<em>sigma1</em>+<em>sigma2</em>), or <em>μ</em> + abs(<em>Z</em>)*<em>sigma2</em> otherwise.</td>
<td><em>μ</em> is a real number; <em>sigma1</em>>0; <em>sigma2</em>>0; <em>Z</em> follows a symmetric distribution around 0.</td>
</tr>
<tr>
<td>Asymmetric generalized Gaussian (Tesei and Regazzoni 1996)<sup id="fnref:70"><a href="#fn:70" class="footnote" rel="footnote" role="doc-noteref">70</a></sup></td>
<td>Two-piece distribution where <em>Z</em> is exponential-power(<em>α</em>).</td>
<td><em>α</em>>0; <em>μ</em> is a real number; <em>sigma1</em>>0; <em>sigma2</em>>0.</td>
</tr>
</tbody>
</table>
<table>
<thead>
<tr>
<th>This distribution:</th>
<th>Can be sampled with the following algorithms:</th>
<th>And uses these parameters:</th>
</tr>
</thead>
<tbody>
<tr>
<td>Offset-symmetric Gaussian (Sadeghi and Korki 2021)<sup id="fnref:71"><a href="#fn:71" class="footnote" rel="footnote" role="doc-noteref">71</a></sup></td>
<td>(1) Generate either 1 or 0 with equal probability, call it <em>b</em>; (2) generate <em>Y</em>, a Normal(0, <em>σ</em>) random variate (standard deviation <em>σ</em>), and if <em>Y</em> < <em>m</em>, repeat this step; (3) return (<em>Y</em> − <em>m</em>)*(<em>b</em>*2 − 1).</td>
<td><em>m</em>>0; <em>σ</em>>0.</td>
</tr>
<tr>
<td>Generalized skew normal (SNE(<em>λ</em>,<em>ξ</em>)) (Henze 1986)<sup id="fnref:72"><a href="#fn:72" class="footnote" rel="footnote" role="doc-noteref">72</a></sup></td>
<td><strong>First algorithm:</strong> (1) Generate <em>Y</em> and <em>Z</em>, two Normal(0,1) variates; (2) if <em>Z</em><<em>Y</em>*<em>λ</em>+<em>ξ</em>, return <em>Y</em>; else go to 1. <strong>Second algorithm:</strong> (1) Let <em>il</em>=1/sqrt(1+<em>λ</em><sup>2</sup>); (2) Generate <em>Y</em> and <em>Z</em>, two Normal(0,1) variates; (3) if <em>Y</em>>−<em>ξ</em>*<em>il</em>, return <em>Y</em>*<em>λ</em>*<em>il</em> + <em>Z</em>; else go to 2.</td>
<td><em>λ</em> and <em>ξ</em> are real numbers.</td>
</tr>
<tr>
<td>Generalized geometric (Francis-Staite and White 2022)<sup id="fnref:73"><a href="#fn:73" class="footnote" rel="footnote" role="doc-noteref">73</a></sup></td>
<td>(1) Set <em>ret</em> to 1; (2) with probability <em>ρ</em>(<em>ret</em>), add 1 to <em>ret</em> and repeat this step; otherwise, return <em>ret</em>.</td>
<td>0 ≤ <em>ρ</em>(<em>k</em>) ≤ 1 for each <em>k</em>.</td>
</tr>
<tr>
<td>Generalized Sibuya (Kozubowski and Podgórski 2018)<sup id="fnref:74"><a href="#fn:74" class="footnote" rel="footnote" role="doc-noteref">74</a></sup></td>
<td>(1) Set <em>ret</em> to 1; (2) with probability <em>α</em>/(<em>ν</em>+<em>ret</em>), return <em>ret</em>; otherwise, add 1 to <em>ret</em> and repeat this step.</td>
<td><em>α</em> < <em>ν</em> + 1, and <em>ν</em> ≥ 0.<sup id="fnref:75"><a href="#fn:75" class="footnote" rel="footnote" role="doc-noteref">75</a></sup></td>
</tr>
<tr>
<td>Himanshu (Agarwal and Pandey 2022)<sup id="fnref:76"><a href="#fn:76" class="footnote" rel="footnote" role="doc-noteref">76</a></sup></td>
<td>(1) Set <em>ret</em> to 0; (2) flip coin that shows heads with probability <em>p</em>, <em>n</em> times; (3) if any flip shows 0 (tails), add 1 to <em>ret</em> and go to 2; otherwise, return <em>ret</em>.</td>
<td>0 ≤ <em>p</em> ≤ 1; $n$ is a positive integer.</td>
</tr>
<tr>
<td>Tilted beta (Hahn and López Martín 2005)<sup id="fnref:77"><a href="#fn:77" class="footnote" rel="footnote" role="doc-noteref">77</a></sup></td>
<td>(1) With probability <em>θ</em>, return a beta(<em>α</em>, <em>β</em>) variate; (2) Generate a uniform variate in (0, 1), call it <em>x</em>; (3) Flip coin that returns 1 with probability <em>x</em>, and another that returns 1 with probability <em>v</em>; (4) If both coins return 1 or both return 0, return <em>x</em>; otherwise go to step 2.</td>
<td>0 ≤ <em>θ</em> ≤ 1; 0 ≤ <em>v</em> ≤ 1; <em>α</em>>0; <em>β</em>>0.</td>
</tr>
</tbody>
</table>
<p><a id="Random_Variate_Generation_via_Quantiles"></a></p>
<h2 id="random-variate-generation-via-quantiles">Random Variate Generation via Quantiles</h2>
<p>This note is about generating random variates from a nondiscrete distribution via <em>inverse transform sampling</em>, using uniform <a href="https://peteroupc.github.io/exporand.html"><strong>partially-sampled random numbers (PSRNs)</strong></a>.</p>
<p>In this section:</p>
<ul>
<li>A distribution’s <em>quantile function</em> (also known as <em>inverse cumulative distribution function</em> or <em>inverse CDF</em>) is a nowhere decreasing function that maps uniform random variates greater than 0 and less than 1 to numbers that follow the distribution.</li>
<li>A <em>uniform PSRN</em> is ultimately a number that lies in an interval; it contains a sign, an integer part, and a fractional part made up of digits sampled on demand.</li>
</ul>
<p>Take the following situation:</p>
<ul>
<li>Let <em>f</em>(.) be a function applied to <em>a</em> or <em>b</em> before calculating the quantile.</li>
<li>Let <em>Q</em>(<em>z</em>) be the quantile function for the desired distribution.</li>
<li>Let <em>x</em> be a random variate in the form of a uniform PSRN, so that this PSRN will lie in the interval [<em>a</em>, <em>b</em>]. If <em>f</em>(<em>t</em>) = <em>t</em> (the identity function), the PSRN <em>x</em> must have a positive sign and an integer part of 0, so that the interval [<em>a</em>, <em>b</em>] is either the interval [0, 1] or a closed interval in [0, 1], depending on the PSRN’s fractional part. For example, if the PSRN is 2.147…, then the interval is [2.147, 2.148].</li>
<li>Let <em>β</em> be the digit base of digits in <em>x</em>’s fractional part (such as 2 for binary).</li>
<li>Suppose <em>Q</em>(<em>z</em>) is continuous on the open interval (<em>a</em>, <em>b</em>).</li>
</ul>
<p>Then the following algorithm transforms that number to a random variate for the desired distribution, which comes within the desired error tolerance of <em>ε</em> with probability 1 (see (Devroye and Gravel 2020)<sup id="fnref:78"><a href="#fn:78" class="footnote" rel="footnote" role="doc-noteref">78</a></sup>):</p>
<ol>
<li>Generate additional digits of <em>x</em> uniformly at random—thus shortening the interval [<em>a</em>, <em>b</em>]—until a lower bound of <em>Q</em>(<em>f</em>(<em>a</em>)) and an upper bound of <em>Q</em>(<em>f</em>(<em>b</em>)) differ by no more than 2*<em>ε</em>. Call the two bounds <em>low</em> and <em>high</em>, respectively.</li>
<li>Return <em>low</em>+(<em>high</em>−<em>low</em>)/2.</li>
</ol>
<p>In some cases, it may be possible to calculate the needed digit size in advance.</p>
<p>As one example, if <em>f</em>(<em>t</em>) = <em>t</em> (the identity function) and the quantile function is <em>Lipschitz continuous</em> with Lipschitz constant <em>L</em> or less on the interval [<em>a</em>, <em>b</em>]<sup id="fnref:79"><a href="#fn:79" class="footnote" rel="footnote" role="doc-noteref">79</a></sup>, then the following algorithm generates a quantile with error tolerance <em>ε</em>:</p>
<ol>
<li>Let <em>d</em> be ceil((ln(max(1,<em>L</em>)) − ln(<em>ε</em>)) / ln(<em>β</em>)). For each digit among the first <em>d</em> digits in <em>x</em>’s fractional part, if that digit is unsampled, set it to a digit chosen uniformly at random.</li>
<li>The PSRN <em>x</em> now lies in the interval [<em>a</em>, <em>b</em>]. Calculate lower and upper bounds of <em>Q</em>(<em>a</em>) and <em>Q</em>(<em>b</em>), respectively, that are within <em>ε</em>/2 of the ideal quantiles, call the bounds <em>low</em> and <em>high</em>, respectively.</li>
<li>Return <em>low</em>+(<em>high</em>−<em>low</em>)/2.</li>
</ol>
<p>This algorithm chooses a random interval of size equal to <em>β</em><sup><em>d</em></sup>, and because the quantile function is Lipschitz continuous, the values at the interval’s bounds are guaranteed to vary by no more than 2*<em>ε</em> (actually <em>ε</em>, but the calculation in step 2 adds an additional error of at most <em>ε</em>), which is needed to meet the tolerance <em>ε</em> (see also Devroye and Gravel 2020<sup id="fnref:78:1"><a href="#fn:78" class="footnote" rel="footnote" role="doc-noteref">78</a></sup>).</p>
<p>A similar algorithm can exist even if the quantile function <em>Q</em> is not Lipschitz continuous on the interval [<em>a</em>, <em>b</em>].</p>
<p>Specifically, if—</p>
<ul>
<li><em>f</em>(<em>t</em>) = <em>t</em> (the identity function),</li>
<li><em>Q</em> on the interval [<em>a</em>, <em>b</em>] is continuous and has a minimum and maximum, and</li>
<li><em>Q</em> on [<em>a</em>, <em>b</em>] admits a continuous and strictly increasing function <em>ω</em>(<em>δ</em>) as a <em>modulus of continuity</em>,</li>
</ul>
<p>then <em>d</em> in step 1 above can be calculated as—<br /> max(0, ceil(−ln(<em>ω</em><sup>−1</sup>(<em>ε</em>))/ln(<em>β</em>))),<br />where <em>ω</em><sup>−1</sup>(<em>ε</em>) is the inverse of the modulus of continuity. (Loosely speaking, a modulus of continuity <em>ω</em>(<em>δ</em>) gives the quantile function’s maximum-minus-minimum in a window of size <em>δ</em>, and the inverse modulus <em>ω</em><sup>−1</sup>(<em>ε</em>) finds a window small enough that the quantile function differs by no more than <em>ε</em> in the window.<sup id="fnref:80"><a href="#fn:80" class="footnote" rel="footnote" role="doc-noteref">80</a></sup>).<sup id="fnref:81"><a href="#fn:81" class="footnote" rel="footnote" role="doc-noteref">81</a></sup></p>
<p>For example—</p>
<ul>
<li>if <em>Q</em> is Lipschitz continuous<sup id="fnref:79:1"><a href="#fn:79" class="footnote" rel="footnote" role="doc-noteref">79</a></sup> with Lipschitz constant <em>L</em> or less on [<em>a</em>, <em>b</em>], then the function is no “steeper” than that of <em>ω</em>(<em>δ</em>) = <em>L</em>*<em>δ</em>, so <em>ω</em><sup>−1</sup>(<em>ε</em>) = <em>ε</em>/<em>L</em>, and</li>
<li>if <em>Q</em> is Hölder continuous with Hölder constant <em>M</em> or less and Hölder exponent <em>α</em> on that interval <sup id="fnref:82"><a href="#fn:82" class="footnote" rel="footnote" role="doc-noteref">82</a></sup>, then the function is no “steeper” than that of <em>ω</em>(<em>δ</em>) = <em>M</em>*<em>δ</em><sup><em>α</em></sup>, so <em>ω</em><sup>−1</sup>(<em>ε</em>) = (<em>ε</em>/<em>M</em>)<sup>1/<em>α</em></sup>.</li>
</ul>
<p>The algorithms given earlier in this section have a disadvantage: the desired error tolerance has to be made known to the algorithm in advance. (Indeed, for this reason, the algorithms don’t satisfy <a href="https://peteroupc.github.io/exporand.html#Properties"><strong>desirable properties for PSRN samplers</strong></a>.) To generate a quantile to any error tolerance (even if the tolerance is not known in advance), a rejection sampling approach is needed. For this to work:</p>
<ul>
<li>The target distribution must have a probability density function (PDF), as is the case with the normal and exponential distributions.</li>
<li>That PDF, or a function proportional to it, must be known, must be less than or equal to a finite number, and must be continuous “almost everywhere” (the set of discontinuous points is “zero-volume”, that is, has Lebesgue measure zero) (see also (Devroye and Gravel 2020)<sup id="fnref:78:2"><a href="#fn:78" class="footnote" rel="footnote" role="doc-noteref">78</a></sup>).</li>
</ul>
<p>Here is a sketch of how this rejection sampler might work:</p>
<ol>
<li>After using one of the algorithms given earlier in this section to sample digits of <em>x</em> as needed, let <em>a</em> and <em>b</em> be <em>x</em>’s upper and lower bounds. Calculate lower and upper bounds of the quantiles of <em>f</em>(<em>a</em>) and <em>f</em>(<em>b</em>) (the bounds are [<em>alow</em>, <em>ahigh</em>] and [<em>blow</em>, <em>bhigh</em>] respectively).</li>
<li>Given the target function’s PDF or a function proportional to it, sample a uniform PSRN, <em>y</em>, in the interval [<em>alow</em>, <em>bhigh</em>] using an arbitrary-precision rejection sampler such as Oberhoff’s method (described in an <a href="https://peteroupc.github.io/exporand.html#Oberhoff_s_Exact_Rejection_Sampling_Method"><strong>appendix to the PSRN article</strong></a>).</li>
<li>Accept <em>y</em> (and return it) if it clearly lies in [<em>ahigh</em>, <em>blow</em>]. Reject <em>y</em> (and go to the previous step) if it clearly lies outside [<em>alow</em>, <em>bhigh</em>]. If <em>y</em> clearly lies in [<em>alow</em>, <em>ahigh</em>] or in [<em>blow</em>, <em>bhigh</em>], generate more digits of <em>x</em>, uniformly at random, and go to the first step.</li>
<li>If <em>y</em> doesn’t clearly fall in any of the cases in the previous step, generate more digits of <em>y</em>, uniformly at random, and go to the previous step.</li>
</ol>
<p><a id="Batching_Random_Samples_via_Randomness_Extraction"></a></p>
<h2 id="batching-random-samples-via-randomness-extraction">Batching Random Samples via Randomness Extraction</h2>
<p>Devroye and Gravel (2020)<sup id="fnref:78:3"><a href="#fn:78" class="footnote" rel="footnote" role="doc-noteref">78</a></sup> suggest the following randomness extractor to reduce the number of random bits needed to produce a batch of samples by a sampling algorithm. The extractor works based on the probability that the algorithm consumes <em>X</em> random bits given that it produces a specific output <em>Y</em> (or <em>P</em>(<em>X</em> | <em>Y</em>) for short):</p>
<ol>
<li>Start with the interval [0, 1].</li>
<li>For each pair (<em>X</em>, <em>Y</em>) in the batch, the interval shrinks from below by <em>P</em>(<em>X</em>−1 | <em>Y</em>) and from above by <em>P</em>(<em>X</em> | <em>Y</em>). (For example, if [0.2, 0.8] (range 0.6) shrinks from below by 0.1 and from above by 0.8, the new interval is [0.2+0.1*0.6, 0.2+0.8*0.6] = [0.26, 0.68]. For correctness, though, the interval is not allowed to shrink to a single point, since otherwise step 3 would run forever.)</li>
<li>Extract the bits, starting from the binary point, that the final interval’s lower and upper bound have in common (or 0 bits if the upper bound is 1). (For example, if the final interval is [0.101010…, 0.101110…] in binary, the bits 1, 0, 1 are extracted, since the common bits starting from the point are 101.)</li>
</ol>
<p>After a sampling method produces an output <em>Y</em>, both <em>X</em> (the number of random bits the sampler consumed) and <em>Y</em> (the output) are added to the batch and fed to the extractor, and new bits extracted this way are added to a queue for the sampling method to use to produce future outputs. (Notice that the number of bits extracted by the algorithm above grows as the batch grows, so only the new bits extracted this way are added to the queue this way.)</p>
<p>The issue of finding <em>P</em>(<em>X</em> | <em>Y__X</em> | <em>Y</em>) is now discussed. Generally, if the sampling method implements a random walk on a binary tree that is driven by numbers that each equal 1 or 0 with equal probability and has leaves labeled with one outcome each (Knuth and Yao 1976)<sup id="fnref:83"><a href="#fn:83" class="footnote" rel="footnote" role="doc-noteref">83</a></sup>, <em>P</em>(<em>X</em> | <em>Y</em>) is found as follows (and Claude Gravel clarified to me that this is the intention of the extractor algorithm): Take a weighted count of all leaves labeled <em>Y</em> up to depth <em>X</em> (where the weight for depth <em>z</em> is 1/2<sup><em>z</em></sup>), then divide it by a weighted count of all leaves labeled <em>Y</em> at all depths (for instance, if the tree has two leaves labeled <em>Y</em> at <em>z</em>=2, three at <em>z</em>=3, and three at <em>z</em>=4, and <em>X</em> is 3, then <em>P</em>(<em>X</em> | <em>Y</em>) is (2/2<sup>2</sup>+3/2<sup>3</sup>) / (2/2<sup>2</sup>+3/2<sup>3</sup>+3/2<sup>4</sup>)). In the special case where the tree has at most 1 leaf labeled <em>Y</em> at every depth, this is implemented by finding <em>P</em>(<em>Y</em>), or the probability to give out <em>Y</em>, then chopping <em>P</em>(<em>Y</em>) up to the <em>X</em><sup>th</sup> binary digit after the point and dividing by the original <em>P</em>(<em>Y</em>) (for instance, if <em>X</em> is 4 and P(<em>Y</em>) is 0.101011…, then <em>P</em>(<em>X</em> | <em>Y</em>) is 0.1010 / 0.101011…).</p>
<p>Unfortunately, <em>P</em>(<em>X</em> | <em>Y</em>) is not easy to calculate when the number of values <em>Y</em> can take on is large or even unbounded. In this case, I can suggest the following ad hoc algorithm, which uses a randomness extractor that takes <em>bits</em> as input, such as the von Neumann, Peres, or Zhou–Bruck extractor (see “<a href="https://peteroupc.github.io/randextract.html"><strong>Notes on Randomness Extraction</strong></a>”). The algorithm counts the number of bits it consumes (<em>X</em>) to produce an output, then feeds <em>X</em> to the extractor as follows.</p>
<ol>
<li>Let <em>z</em> be abs(<em>X</em>−<em>lastX</em>), where <em>lastX</em> is either the last value of <em>X</em> fed to this extractor for this batch or 0 if there is no such value.</li>
<li>If <em>z</em> is greater than 0, feed the bits of <em>z</em> from most significant to least significant to a queue of extractor inputs.</li>
<li>Now, when the sampler would generate 1 or 0 with equal probability, it first checks the input queue. As long as 64 bits or more are in the input queue, the sampler dequeues 64 bits from it, runs the extractor on those bits, and adds the extracted bits to an output queue. (The number 64 can instead be any even number greater than 2.) Then, if the output queue is not empty, the sampler dequeues a bit from that queue and uses that bit; otherwise it generates 1 or 0 with equal probability, as usual.</li>
</ol>
<p><a id="Sampling_Distributions_Using_Incomplete_Information"></a></p>
<h2 id="sampling-distributions-using-incomplete-information">Sampling Distributions Using Incomplete Information</h2>
<p>The Bernoulli factory problem (the problem of turning one biased coin into another biased coin; see “<a href="https://peteroupc.github.io/bernoulli.html"><strong>Bernoulli Factory Algorithms</strong></a>”) is a special case of the problem of <strong>sampling a probability distribution with unknown parameters</strong>. This problem can be described as sampling from a new distribution using an endless stream of random variates from an incompletely known distribution. The problem is described in more detail in “<a href="https://peteroupc.github.io/sampling.html"><strong>The Sampling Problem</strong></a>”.</p>
<p>In this section:</p>
<ul>
<li>An <em>oracle</em> is an endless stream of independent random variates of an incompletely known probability distribution. In the Bernoulli factory problem, this oracle is a <em>coin that shows heads or tails where the probability of heads is unknown</em>.</li>
</ul>
<p>The rest of this section deals with oracles that go beyond coins.</p>
<p><strong>Algorithm 1.</strong> Suppose the oracle produces random variates on a closed interval [<em>a</em>, <em>b</em>], with an unknown mean (or expected value or “long-run average”) of <em>μ</em>. The goal is now to produce nonnegative random variates whose mean is <em>f</em>(<em>μ</em>). Unless <em>f</em> is constant, this is possible if and only if—</p>
<ul>
<li><em>f</em> is continuous on the closed interval, and</li>
<li><em>f</em>(<em>μ</em>) is greater than or equal to <em>ε</em>*min((<em>μ</em> − <em>a</em>)<sup><em>n</em></sup>, (<em>b</em> − <em>μ</em>)<sup><em>n</em></sup>) for some integer <em>n</em> and some <em>ε</em> greater than 0 (loosely speaking, <em>f</em> is nonnegative and neither touches 0 in the interior of the interval nor moves away from 0 more slowly than a polynomial)</li>
</ul>
<p>(Jacob and Thiery 2015)<sup id="fnref:84"><a href="#fn:84" class="footnote" rel="footnote" role="doc-noteref">84</a></sup>. (Here, <em>a</em> and <em>b</em> are both rational numbers and may be less than 0.)</p>
<p>In the algorithm below, let <em>K</em> be a rational number greater than the maximum value of <em>f</em> on the closed interval [<em>a</em>, <em>b</em>], and let <em>g</em>(<em>λ</em>) = <em>f</em>(<em>a</em> + (<em>b</em>−<em>a</em>)*<em>λ</em>)/<em>K</em>.</p>
<ol>
<li>Create a <em>λ</em> input coin that does the following: “Take a number from the oracle, call it <em>x</em>. With probability (<em>x</em>−<em>a</em>)/(<em>b</em>−<em>a</em>) (see note later), return 1. Otherwise, return 0.”</li>
<li>Run a Bernoulli factory algorithm for <em>g</em>(<em>λ</em>), using the <em>λ</em> input coin. Then return <em>K</em> times the result.</li>
</ol>
<blockquote>
<p><strong>Note:</strong> The check “With probability (<em>x</em>−<em>a</em>)/(<em>b</em>−<em>a</em>)” is exact if the oracle produces only rational numbers. Otherwise, calculating the probability can lead to numerical errors unless care is taken (see note 2 in “Distributions with nowhere increasing or nowhere decreasing weights”, earlier). With uniform partially-sampled random numbers (PSRNs), the check can be implemented as follows. Let <em>x</em> be a uniform PSRN representing a number generated by the oracle. Set <em>y</em> to <strong>RandUniformFromReal</strong>(<em>b</em>−<em>a</em>), then the check succeeds if <strong>RandLess</strong>(<em>y</em>, <strong>UniformAddRational</strong>(<em>x</em>, −<em>a</em>)) returns 1, and fails otherwise.</p>
<p><strong>Example:</strong> Suppose an oracle produces random variates in the interval [3, 13] with unknown mean <em>μ</em>, and the goal is to use the oracle to produce nonnegative random variates with mean <em>f</em>(<em>μ</em>) = −319/100 + <em>μ</em>*103/50 − <em>μ</em><sup>2</sup>*11/100, which is a polynomial with Bernstein coefficients [2, 9, 5] in the specified interval. Then since 8 is greater than the maximum of <em>f</em> in that interval, <em>g</em>(<em>λ</em>) is a degree-2 polynomial in the interval [0, 1] that has Bernstein coefficients [2/8, 9/8, 5/8]. <em>g</em> can’t be simulated as is, though, but increasing <em>g</em>’s degree to 3 leads to the Bernstein coefficients [1/4, 5/6, 23/24, 5/8], which are all less than 1 so that the following algorithm can be used (see “<a href="https://peteroupc.github.io/bernoulli.html#Certain_Polynomials"><strong>Certain Polynomials</strong></a>”):</p>
<ol>
<li>Set <em>heads</em> to 0.</li>
<li>Generate three random variates from the oracle (which must produce random variates in the interval [3, 13]). For each number <em>x</em>: With probability (<em>x</em>−3)/(10−3), add 1 to <em>heads</em>.</li>
<li>Depending on <em>heads</em>, return 8 (that is, 1 times the upper bound) with the specified probability, or 0 otherwise: <em>heads</em>=0 → probability 1/4; 1 → 5/6; 2 → 23/24; 3 → 5/8.</li>
</ol>
</blockquote>
<p><strong>Algorithm 2.</strong> Suppose the oracle is in the form of a fair die. The number of faces of the die, <em>n</em>, is at least 2 but otherwise unknown. Each face shows a different integer 0 or greater and less than <em>n</em>. The question arises: Which probability distributions based on the number of faces can be sampled with this oracle? This question was studied in the French-language dissertation of R. Duvignau (2015, section 5.2)<sup id="fnref:85"><a href="#fn:85" class="footnote" rel="footnote" role="doc-noteref">85</a></sup>, and the following are four of these distributions.</p>
<p><strong><em>Bernoulli 1/n.</em></strong> It’s trivial to generate a Bernoulli variate that is 1 with probability 1/<em>n</em> and 0 otherwise: just take a number from the oracle and return either 1 if that number is 0, or 0 otherwise. Alternatively, take two numbers from the oracle and return either 1 if both are the same, or 0 otherwise (Duvignau 2015, p. 153)<sup id="fnref:85:1"><a href="#fn:85" class="footnote" rel="footnote" role="doc-noteref">85</a></sup>.</p>
<p><strong><em>Random variate with mean n.</em></strong> Likewise, it’s trivial to generate variates with a mean of <em>n</em>: Do “Bernoulli 1/n” trials as described earlier until a trial returns 0, then return the number of trials done this way. (This is related to the ambiguously defined “geometric” random variates.)</p>
<p><strong><em>Binomial with parameters n and 1/n.</em></strong> Using the oracle, the following algorithm generates a binomial variate of this kind (Duvignau 2015, Algorithm 20)<sup id="fnref:85:2"><a href="#fn:85" class="footnote" rel="footnote" role="doc-noteref">85</a></sup>:</p>
<ol>
<li>Take items from the oracle until the same item is taken twice.</li>
<li>Create a list consisting of the items taken in step 1, except for the last item taken, then shuffle that list.</li>
<li>In the shuffled list, count the number of items that didn’t change position after being shuffled, then return that number.</li>
</ol>
<p><strong><em>Binomial with parameters n and k/n.</em></strong> Duvignau 2015 also includes an algorithm (Algorithm 25) to generate a binomial variate of this kind using the oracle (where <em>k</em> is a known integer such that 0 < <em>k</em> and <em>k</em> ≤ <em>n</em>):</p>
<ol>
<li>Take items from the oracle until <em>k</em> different items were taken this way. Let <em>U</em> be a list of these <em>k</em> items, in the order in which they were first taken.</li>
<li>Create an empty list <em>L</em>.</li>
<li>For each integer <em>i</em> satisfying 0 ≤ <em>i</em> < <em>k</em>:
<ol>
<li>Create an empty list <em>M</em>.</li>
<li>Take an item from the oracle. If the item is in <em>U</em> at a position <strong>less than <em>i</em></strong> (positions start at 0), repeat this substep. Otherwise, if the item is not in <em>M</em>, add it to <em>M</em> and repeat this substep. Otherwise, go to the next substep.</li>
<li>Shuffle the list <em>M</em>, then add to <em>L</em> each item that didn’t change position after being shuffled (if not already present in <em>L</em>).</li>
</ol>
</li>
<li>For each integer <em>i</em> satisfying 0 ≤ <em>i</em> < <em>k</em>:
<ol>
<li>Let <em>P</em> be the item at position <em>i</em> in <em>U</em>.</li>
<li>Take an item from the oracle. If the item is in <em>U</em> at position <strong><em>i</em> or less</strong> (positions start at 0), repeat this substep.</li>
<li>If the last item taken in the previous substep is in <em>U</em> at a position <strong>greater than <em>i</em></strong>, add <em>P</em> to <em>L</em> (if not already present).</li>
</ol>
</li>
<li>Return the number of items in <em>L</em>.</li>
</ol>
<blockquote>