Skip to content

Commit 1e4b0f5

Browse files
committed
add a rmultinom() function to Eidos
1 parent f11de0d commit 1e4b0f5

8 files changed

Lines changed: 154 additions & 0 deletions

EidosScribe/EidosHelpFunctions.rtf

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2187,6 +2187,32 @@ Returns a vector of
21872187
\f2 \
21882188
\pard\pardeftab397\li720\fi-446\ri720\sb180\sa60\partightenfactor0
21892189

2190+
\f1\fs18 \cf2 (integer)rmultinom(integer$\'a0n, integer$\'a0size, numeric\'a0prob)\
2191+
\pard\pardeftab397\li547\ri720\sb60\sa60\partightenfactor0
2192+
2193+
\f3\fs20 \cf2 Returns a matrix of
2194+
\f1\fs18 n
2195+
\f3\fs20
2196+
\f0\b random vectors from the specified multinomial distribution
2197+
\f3\b0 . For each random vector drawn from the multinomial distribution,
2198+
\f1\fs18 size
2199+
\f3\fs20 objects will be put into
2200+
\f7\i k
2201+
\f3\i0 boxes, where
2202+
\f1\fs18 prob
2203+
\f3\fs20 is a vector of probabilities of length
2204+
\f7\i k
2205+
\f3\i0 specifying the probability for each box. If
2206+
\f1\fs18 prob
2207+
\f3\fs20 is not normalized to sum to
2208+
\f1\fs18 1.0
2209+
\f3\fs20 , its entries are treated as weights and normalized appropriately. The draws are returned as a matrix with
2210+
\f7\i k
2211+
\f3\i0 rows (one row per box) and
2212+
\f1\fs18 n
2213+
\f3\fs20 columns (one column per drawn random vector).\
2214+
\pard\pardeftab397\li720\fi-446\ri720\sb180\sa60\partightenfactor0
2215+
21902216
\f1\fs18 \cf2 \expnd0\expndtw0\kerning0
21912217
(float)rmvnorm(integer$\'a0n, numeric\'a0mu, numeric\'a0sigma)\
21922218
\pard\pardeftab720\li547\ri720\sb60\sa60\partightenfactor0

QtSLiM/help/EidosHelpFunctions.html

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,8 @@
194194
<p class="p5">Returns a vector of <span class="s2">n</span> <b>random draws from a Laplace distribution</b> with shape parameter <span class="s2">b</span>.<span class="Apple-converted-space">  </span>The <span class="s2">b</span> parameter may either be a singleton, specifying a single value to be used for all of the draws, or it may be a vector of length <span class="s2">n</span>, specifying a value for each draw.</p>
195195
<p class="p2">(float)rlnorm(integer$<span class="s1"> </span>n, [numeric meanlog = 0], [numeric sdlog = 1])</p>
196196
<p class="p3">Returns a vector of <span class="s2">n</span> <b>random draws from a lognormal distribution</b> with mean <span class="s2">meanlog</span> and standard deviation <span class="s2">sdlog</span>, specified on the log scale.<span class="Apple-converted-space">  </span>The <span class="s2">meanlog</span> and <span class="s2">sdlog</span> parameters may either be singletons, specifying a single value to be used for all of the draws, or they may be vectors of length <span class="s2">n</span>, specifying a value for each draw.</p>
197+
<p class="p4">(integer)rmultinom(integer$ n, integer$ size, numeric prob)</p>
198+
<p class="p5">Returns a matrix of <span class="s2">n</span> <b>random vectors from the specified multinomial distribution</b>.<span class="Apple-converted-space">  </span>For each random vector drawn from the multinomial distribution, <span class="s2">size</span> objects will be put into <i>k</i> boxes, where <span class="s2">prob</span> is a vector of probabilities of length <i>k</i> specifying the probability for each box.<span class="Apple-converted-space">  </span>If <span class="s2">prob</span> is not normalized to sum to <span class="s2">1.0</span>, its entries are treated as weights and normalized appropriately.<span class="Apple-converted-space">  </span>The draws are returned as a matrix with <i>k</i> rows (one row per box) and <span class="s2">n</span> columns (one column per drawn random vector).</p>
197199
<p class="p4"><span class="s5">(float)rmvnorm(integer$ n, numeric mu, numeric sigma)</span></p>
198200
<p class="p5"><span class="s5">Returns a matrix of </span><span class="s9">n</span><span class="s5"> <b>random draws from a <i>k</i>-dimensional multivariate normal distribution</b> with a length <i>k</i> mean vector </span><span class="s9">mu</span><span class="s5"> and a <i>k</i> × <i>k</i> variance-covariance matrix </span><span class="s9">sigma</span><span class="s5">.<span class="Apple-converted-space">  </span>The </span><span class="s9">mu</span><span class="s5"> and </span><span class="s9">sigma</span><span class="s5"> parameters are used for all </span><span class="s9">n</span><span class="s5"> draws.<span class="Apple-converted-space">  </span>The draws are returned as a matrix with </span><span class="s9">n</span><span class="s5"> rows (one row per draw) and <i>k</i> columns (one column per dimension).<span class="Apple-converted-space">  </span>The number of dimensions <i>k</i> must be at least two; for <i>k</i>=1, use </span><span class="s9">rnorm()</span><span class="s5">.</span></p>
199201
<p class="p5"><span class="s5">Cholesky decomposition of the variance-covariance matrix </span><span class="s9">sigma</span><span class="s5"> is involved as an internal step, and this requires that </span><span class="s9">sigma</span><span class="s5"> be positive-definite; if it is not, an error will result.<span class="Apple-converted-space">  </span>When more than one draw is needed, it is much more efficient to call </span><span class="s9">rmvnorm()</span><span class="s5"> once to generate all of the draws, since the Cholesky decomposition of </span><span class="s9">sigma</span><span class="s5"> can then be done just once.</span></p>

VERSIONS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Note that not every commit will be logged here; that is what the Github commit h
66

77

88
development head (in the master branch):
9+
add a rmultinom() function to draw from a multinomial distribution, matching rmultinom() in R
910

1011

1112
version 5.2 (Eidos version 4.2):

eidos/eidos_functions.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,7 @@ const std::vector<EidosFunctionSignature_CSP> &EidosInterpreter::BuiltInFunction
185185
signatures->emplace_back((EidosFunctionSignature *)(new EidosFunctionSignature("rgeom", Eidos_ExecuteFunction_rgeom, kEidosValueMaskInt))->AddInt_S(gEidosStr_n)->AddFloat("p"));
186186
signatures->emplace_back((EidosFunctionSignature *)(new EidosFunctionSignature("rlaplace", Eidos_ExecuteFunction_rlaplace, kEidosValueMaskFloat))->AddInt_S(gEidosStr_n)->AddNumeric_O("b", gStaticEidosValue_Integer1));
187187
signatures->emplace_back((EidosFunctionSignature *)(new EidosFunctionSignature("rlnorm", Eidos_ExecuteFunction_rlnorm, kEidosValueMaskFloat))->AddInt_S(gEidosStr_n)->AddNumeric_O("meanlog", gStaticEidosValue_Integer0)->AddNumeric_O("sdlog", gStaticEidosValue_Integer1));
188+
signatures->emplace_back((EidosFunctionSignature *)(new EidosFunctionSignature("rmultinom", Eidos_ExecuteFunction_rmultinom, kEidosValueMaskInt))->AddInt_S(gEidosStr_n)->AddInt_S(gEidosStr_size)->AddNumeric("prob"));
188189
signatures->emplace_back((EidosFunctionSignature *)(new EidosFunctionSignature("rmvnorm", Eidos_ExecuteFunction_rmvnorm, kEidosValueMaskFloat))->AddInt_S(gEidosStr_n)->AddNumeric("mu")->AddNumeric("sigma"));
189190
signatures->emplace_back((EidosFunctionSignature *)(new EidosFunctionSignature("rnbinom", Eidos_ExecuteFunction_rnbinom, kEidosValueMaskInt))->AddInt_S(gEidosStr_n)->AddNumeric("size")->AddFloat("prob"));
190191
signatures->emplace_back((EidosFunctionSignature *)(new EidosFunctionSignature("rnorm", Eidos_ExecuteFunction_rnorm, kEidosValueMaskFloat))->AddInt_S(gEidosStr_n)->AddNumeric_O("mean", gStaticEidosValue_Integer0)->AddNumeric_O("sd", gStaticEidosValue_Integer1));

eidos/eidos_functions.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ EidosValue_SP Eidos_ExecuteFunction_rgamma(const std::vector<EidosValue_SP> &p_a
122122
EidosValue_SP Eidos_ExecuteFunction_rgeom(const std::vector<EidosValue_SP> &p_arguments, EidosInterpreter &p_interpreter);
123123
EidosValue_SP Eidos_ExecuteFunction_rlaplace(const std::vector<EidosValue_SP> &p_arguments, EidosInterpreter &p_interpreter);
124124
EidosValue_SP Eidos_ExecuteFunction_rlnorm(const std::vector<EidosValue_SP> &p_arguments, EidosInterpreter &p_interpreter);
125+
EidosValue_SP Eidos_ExecuteFunction_rmultinom(const std::vector<EidosValue_SP> &p_arguments, EidosInterpreter &p_interpreter);
125126
EidosValue_SP Eidos_ExecuteFunction_rmvnorm(const std::vector<EidosValue_SP> &p_arguments, EidosInterpreter &p_interpreter);
126127
EidosValue_SP Eidos_ExecuteFunction_rnbinom(const std::vector<EidosValue_SP> &p_arguments, EidosInterpreter &p_interpreter);
127128
EidosValue_SP Eidos_ExecuteFunction_rnorm(const std::vector<EidosValue_SP> &p_arguments, EidosInterpreter &p_interpreter);

eidos/eidos_functions_distributions.cpp

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1656,6 +1656,108 @@ EidosValue_SP Eidos_ExecuteFunction_rlnorm(const std::vector<EidosValue_SP> &p_a
16561656
return result_SP;
16571657
}
16581658

1659+
// (integer)rmultinom(integer$ n, integer$ size, numeric prob)
1660+
EidosValue_SP Eidos_ExecuteFunction_rmultinom(const std::vector<EidosValue_SP> &p_arguments, __attribute__((unused)) EidosInterpreter &p_interpreter)
1661+
{
1662+
EidosValue_SP result_SP(nullptr);
1663+
1664+
EidosValue *arg_n = p_arguments[0].get();
1665+
EidosValue *arg_size = p_arguments[1].get();
1666+
EidosValue *arg_prob = p_arguments[2].get();
1667+
1668+
// get and bounds-check the number of draws we will do
1669+
int64_t num_draws = arg_n->IntAtIndex_NOCAST(0, nullptr);
1670+
1671+
if (num_draws < 1)
1672+
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_rmultinom): function rmultinom() requires n to be greater than or equal to 1 (" << num_draws << " supplied)." << EidosTerminate(nullptr);
1673+
1674+
// get and check the size of each multinomial draw
1675+
// the upper limit is because the GSL requires size to be cast down to unsigned int
1676+
int64_t size = arg_size->IntAtIndex_NOCAST(0, nullptr);
1677+
1678+
if (size < 0)
1679+
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_rmultinom): function rmultinom() requires size to be greater than or equal to 0." << EidosTerminate(nullptr);
1680+
if (size > 1000000000)
1681+
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_rmultinom): function rmultinom() requires size to be less than or equal to 1000000000." << EidosTerminate(nullptr);
1682+
1683+
// get and check the number of "boxes", i.e., the number of probabilities supplied
1684+
// the upper limit is arbitrary, but it seems wise to have a limit
1685+
int k = arg_prob->Count();
1686+
1687+
if (k <= 0)
1688+
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_rmultinom): function rmultinom() requires prob to be of length 1 or greater." << EidosTerminate(nullptr);
1689+
if (k > 1000000)
1690+
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_rmultinom): function rmultinom() requires prob to be of length 1000000 or less." << EidosTerminate(nullptr);
1691+
1692+
// fetch the probability vector into a local static buffer, checking as we go; note the gsl normalizes prob_vec for us
1693+
double prob_sum = 0.0;
1694+
static std::vector<double> prob_vec;
1695+
prob_vec.reserve(k);
1696+
prob_vec.resize(0);
1697+
1698+
for (int prob_index = 0; prob_index < k; ++prob_index)
1699+
{
1700+
double prob = arg_prob->FloatAtIndex_CAST(prob_index, nullptr);
1701+
1702+
if (prob < 0.0)
1703+
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_rmultinom): function rmultinom() requires all probabilities in prob to be >= 0.0." << EidosTerminate(nullptr);
1704+
if (!std::isfinite(prob))
1705+
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_rmultinom): function rmultinom() requires all probabilities in prob to be finite (not INF or NAN)." << EidosTerminate(nullptr);
1706+
1707+
prob_sum += prob;
1708+
prob_vec.push_back(prob);
1709+
}
1710+
1711+
if (prob_sum == 0.0)
1712+
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_rmultinom): function rmultinom() requires the sum of prob to be greater than zero; there must be at least one non-zero probability." << EidosTerminate(nullptr);
1713+
if (!std::isfinite(prob_sum))
1714+
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_rmultinom): function rmultinom() could not compute the multinomial draws because the sum of prob overflowed to infinity." << EidosTerminate(nullptr);
1715+
1716+
if (k == 1)
1717+
{
1718+
// If k == 1, every object goes into the same single box, so the result is trivial
1719+
EidosValue_Int *int_result = (new (gEidosValuePool->AllocateChunk()) EidosValue_Int())->resize_no_initialize((int)num_draws);
1720+
result_SP = EidosValue_SP(int_result);
1721+
1722+
for (int draw_index = 0; draw_index < num_draws; ++draw_index)
1723+
int_result->set_int_no_check(size, draw_index);
1724+
}
1725+
else if (size == 0)
1726+
{
1727+
// if size == 0, zero objects are drawn in each multinomial trial, so the result is trivial
1728+
EidosValue_Int *int_result = (new (gEidosValuePool->AllocateChunk()) EidosValue_Int())->reserve((int)num_draws * k);
1729+
result_SP = EidosValue_SP(int_result);
1730+
1731+
for (int draw_index = 0; draw_index < num_draws; ++draw_index)
1732+
for (int k_index = 0; k_index < k; ++k_index)
1733+
int_result->push_int_no_check(0);
1734+
}
1735+
else
1736+
{
1737+
// the main case, where we need to call gsl_ran_multinomial() to do the work
1738+
// we put the drawn values into a local static buffer, and then copy them into the Eidos result
1739+
static std::vector<unsigned int> draw_vec;
1740+
draw_vec.resize(k);
1741+
1742+
gsl_rng *rng_gsl = EIDOS_GSL_RNG(omp_get_thread_num());
1743+
EidosValue_Int *int_result = (new (gEidosValuePool->AllocateChunk()) EidosValue_Int())->reserve((int)num_draws * k);
1744+
result_SP = EidosValue_SP(int_result);
1745+
1746+
for (int draw_index = 0; draw_index < num_draws; ++draw_index)
1747+
{
1748+
gsl_ran_multinomial(rng_gsl, k, (unsigned int)size, prob_vec.data(), draw_vec.data());
1749+
1750+
for (int k_index = 0; k_index < k; ++k_index)
1751+
int_result->push_int_no_check(draw_vec[k_index]);
1752+
}
1753+
}
1754+
1755+
int64_t dim_buffer[2] = {k, num_draws};
1756+
result_SP->SetDimensions(2, dim_buffer);
1757+
1758+
return result_SP;
1759+
}
1760+
16591761
// (float)rmvnorm(integer$ n, numeric mu, numeric sigma)
16601762
EidosValue_SP Eidos_ExecuteFunction_rmvnorm(const std::vector<EidosValue_SP> &p_arguments, __attribute__((unused)) EidosInterpreter &p_interpreter)
16611763
{

eidos/eidos_test_builtins.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1160,6 +1160,16 @@ if (abs(m - 5) > 0.02) stop('Mismatch in expectation vs. realization of rlnorm()
11601160

11611161
// ***********************************************************************************************
11621162

1163+
setSeed(asInteger(clock() * 100000));
1164+
x = rmultinom(10, 1000000000, c(0.1, 0.5, 0.1, 5.0, 0.25));
1165+
if (!identical(colSums(x), rep(1000000000, 10))) stop('ERROR (rmultinom): (internal error) colSums incorrect');
1166+
r = rowSums(x);
1167+
norm = c(0.1, 0.5, 0.1, 5.0, 0.25) / sum(c(0.1, 0.5, 0.1, 5.0, 0.25));
1168+
expected = 1000000000 * norm * 10;
1169+
if (sum(abs(r - expected)) > 300000) stop('Mismatch in expectation vs. realization of rmultinom() - could be random chance (but very unlikely), rerun test');
1170+
1171+
// ***********************************************************************************************
1172+
11631173
setSeed(asInteger(clock() * 100000));
11641174
x = rmvnorm(100000, c(-1, 5), matrix(c(1, 0.2, 0.2, 2), nrow=2));
11651175
m1 = mean(x[,0]);

eidos/eidos_test_functions_statistics.cpp

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -961,6 +961,17 @@ void _RunFunctionDistributionTests(void)
961961
EidosAssertScriptSuccess("rlnorm(1, NAN, 100);", gStaticEidosValue_FloatNAN);
962962
EidosAssertScriptSuccess("rlnorm(1, 1, NAN);", gStaticEidosValue_FloatNAN);
963963

964+
// rmultinom()
965+
EidosAssertScriptRaise("rmultinom(0, 10, c(0.1, 1.0, 10.0));", 0, "requires n to be");
966+
EidosAssertScriptRaise("rmultinom(5, -1, c(0.1, 1.0, 10.0));", 0, "requires size to be");
967+
EidosAssertScriptRaise("rmultinom(5, 10, float(0));", 0, "requires prob to be");
968+
EidosAssertScriptRaise("rmultinom(5, 10, c(-0.1, 1.0, 10.0));", 0, "requires all probabilities in prob to be");
969+
EidosAssertScriptRaise("rmultinom(5, 10, c(INF, 1.0, 10.0));", 0, "requires all probabilities in prob to be");
970+
EidosAssertScriptRaise("rmultinom(5, 10, c(NAN, 1.0, 10.0));", 0, "requires all probabilities in prob to be");
971+
EidosAssertScriptRaise("rmultinom(5, 10, c(0.0, 0.0, 0.0));", 0, "requires the sum of prob");
972+
EidosAssertScriptSuccess_L("x = rmultinom(5, 10, c(0.1)); identical(x, matrix(rep(10, 5*1), ncol=5));", true);
973+
EidosAssertScriptSuccess_L("x = rmultinom(5, 0, c(0.1, 1.0, 10.0)); identical(x, matrix(rep(0, 5*3), ncol=5));", true);
974+
964975
// rmvnorm()
965976
EidosAssertScriptRaise("rmvnorm(0, c(0,2), matrix(c(10,3), nrow=2));", 0, "requires n to be");
966977
EidosAssertScriptRaise("rmvnorm(5, matrix(c(0,0)), matrix(c(10,3,3,2), nrow=2));", 0, "plain vector of length k");

0 commit comments

Comments
 (0)