bayesian-learning-with-R/BruceCampbell_ST540_HW_7.Rmd at master · NCSU-STATS/bayesian-learning-with-R · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
---
title: "Applied Bayesian Analysis : NCSU ST 540"
subtitle: "Homework 7"
author: "Bruce Campbell"
fontsize: 11pt
output: pdf_document
bibliography: BruceCampbell_ST540_HW_1.bib
---

---
```{r setup, include=FALSE,echo=FALSE}
rm(list = ls())
setwd("c:/e/brucebcampbell-git/bayesian-learning-with-R")
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(dev = 'pdf')
knitr::opts_chunk$set(cache=TRUE)
knitr::opts_chunk$set(tidy=TRUE)
knitr::opts_chunk$set(prompt=FALSE)
knitr::opts_chunk$set(fig.height=10)
knitr::opts_chunk$set(fig.width=8)
knitr::opts_chunk$set(warning=FALSE)
knitr::opts_chunk$set(message=FALSE)
knitr::opts_knit$set(root.dir = ".")
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=34),tidy=TRUE)
library(latex2exp)
library(pander)
library(ggplot2)
library(GGally)
```

In this assignment we perform Bayesian linear regression for the microbiome data on the
course website
```
https://www4.stat.ncsu.edu/~reich/ABA/assignments/homes.RData
```
Let $Y_i$ be the precipitation for observation $i$ and $X_{ij}$ equal one if OTU $j$ is present in sample $i$.

First, extract the 50 OTU with the largest absolute correlation between $X_{ij}$ and $Y_i$. Then fit a Bayesian linear regression model precipitation as the response and with these 50 covariates (and an intercept term) using two priors:

(1) Uninformative normal priors: $\beta_j \sim Normal(0, 100^2)$

(2) Hierarchical normal priors: $\beta_j | \tau \sim Normal(0, \tau^2)$ where $\tau^2 \sim InvGamma(0:01 , 0:01)$

(3) Bayesian LASSO: $\beta_j | \tau^2 \sim DE(0, \tau^2)$ where $\tau^2 \sim InvGamma(0:01, 0:01)$

Compare convergence and the posterior distribution of the regression coeffcients under these three priors. In particular, are the same OTU's significant in all three fits?

###Load data and select 50 most ocrrelated OUT variables.

```{r}
library(rjags)
library(coda)
library(modeest)
load("homes.RData")

X <- OTU!=0
Y <- homes$MeanAnnualPrecipitation

C_xy <- cor(X,Y)

top <- function(x, n){
    tail( order(x), n )
}
#One of the X is all 1's - resulting in an NA for the correlation.
indices <- top(C_xy,51)
#Remove the NA - I'm sure there's a more elegant way...
indices <- indices[1:50]

X <- X[, indices]

predictor.names <- names(OTU)[indices]
predictor.names[51] <- "intercept"

top.corr <- C_xy[indices]

#Y <- scale(Y)
#X <- scale(X)

DEBUG <- FALSE
if(DEBUG)
{
nSamples <- 100
n.chains <- 1
} else
{
nSamples <- 100
n.chains <- 1
}
```

We sample from our model after burn in. Not all of the diagnostic plots are not presented.
See the diagnostic plots in
```https://github.com/brucebcampbell/bayesian-learning-with-R.git```
we assesed convergence by;
- viewing the time sereies for the intercept and each of the predictors. For this we utilized the
coda package.
- ran multiple chains and viewed evaluated the autocorrelation plots.
- calculated the posterior means for the intercept and the $beta_j$
- utilized the mlv funtions in the modeest to calculate the MAP estimated of the posterior
modes
- compared the 95% prediction intervals for the intercepts against the p-values from the logistic
regression maximum likelihood model
- Gelman plots are optionally produced whem the nuymber of MCMC chains is greater than one.

Some of the code is run conditionally through the DEBUG flag. We ran under debug mode and noted that all models convereged.  All but one of the predictors were the same for all the modesl.  Depending on the run OTU_624 was swapped with another pridictor in the uninformative model.

This was an interesting project. I iterated several versions and had to debug working jags models to get things running well.  Also we encountrered a NaN in the correlation due to one of the predictors being all 1's.  If this was not accounted for the run times went up and convergence was bad.  The width of the credible intervals could be investigated. We'll be running a longer simulation as a follow on task for fun.  We set the precision of the model error to be 0.01 and 0.1 and got similar results.

###  Normal Uniformative
It's not specified what the prior variance is for $E[Y_j|X_j]$. We wull assume $Y|\beta \sim N(y \large\cdot \beta, \sigma^2)$ where $\sigma^2 \sim InvGamma(0.1,0.1)$

```{r}
n <- nrow(X)

sigma.beta  <- 100
inv.gamma.param  <- 0.1
p <- ncol(X)

model_string.normal_uniformative <- "model{
  # Likelihood
  for(i in 1:n){
    Y[i]   ~ dnorm(mu[i],inv.var)
    mu[i] <- intercept +inprod(X[i,],beta[])
  }

  # Prior for beta
  for(j in 1:p){
    beta[j] ~ dnorm(0,1/sigma.beta^2)
  }
  intercept ~ dnorm(0,1/sigma.beta^2)

  # Prior for the inverse variance
  inv.var   ~ dgamma(inv.gamma.param, inv.gamma.param)
  sigma     <- 1/sqrt(inv.var)
}"

model.normal_uniformative <- jags.model(textConnection(model_string.normal_uniformative), data = list(Y=Y,X=X,n=n,p=p,sigma.beta=sigma.beta, inv.gamma.param=inv.gamma.param),n.chains = n.chains)## Compiling model graph
update(model.normal_uniformative, nSamples, progress.bar="none"); # Burnin
samp.coeff.normal_uniformative <- coda.samples(model.normal_uniformative, variable.names=c("intercept","beta"),n.iter=2*nSamples)

sum.normal_uniformative <-  summary(samp.coeff.normal_uniformative)
quantiles<-sum.normal_uniformative$quantiles
left.05.quantile.sign  <- sign(quantiles[,1])==-1
right.95.quantile.sign <- sign(quantiles[,5])==1
significant <- xor(left.05.quantile.sign ,right.95.quantile.sign)
beta.significant <- quantiles[significant,]


pander(data.frame(beta.significant), caption = "significant normal uninformative ")

credible.widths <- beta.significant[,5]-beta.significant[,1]

predictor.names.significant <- predictor.names[significant]

pander(data.frame(predictor.names.significant,credible.widths), caption = "credible widths normal uninformative ")


if (DEBUG)
  {
  autocorr.plot(samp.coeff.normal_uniformative)

  plot(samp.coeff.normal_uniformative)

  #Sample again and estimate posterior means and MAP posterior modes.
  samp.coeff.normal_uniformative.jags <- jags.samples(model.normal_uniformative, variable.names = c("intercept","beta"), n.iter = nSamples, progress.bar = "none")
  posterior_means.normal_uniformative <- lapply(samp.coeff.normal_uniformative.jags, apply, 1, "mean")
  pander(posterior_means.normal_uniformative, caption = "posterior means second sample")

  posterior_modes.normal_uniformative <- lapply(samp.coeff.normal_uniformative.jags, apply, 1, "mlv")
  posterior_modes.normal_uniformative

  if(n.chains>1)
  {
  gelman.plot(samp.coeff)
  }
}
```

## Hierarchical Normal Priors
$\beta_j | \tau \sim Normal(0, \tau^2)$ where $\tau^2 \sim InvGamma(0:01 , 0:01)$

```{r}
beta.inv.gamma.param  <- 0.01
variance.inv.gamma.param  <- 0.1
p <- ncol(X)

model_string.normal_hierarchical <- "model{
  # Likelihood
  for(i in 1:n){
    Y[i]   ~ dnorm(mu[i],inv.var)
    mu[i] <- intercept +inprod(X[i,],beta[])
  }

  # Prior for beta
  for(j in 1:p){
    beta[j] ~ dnorm(0,beta.inv.gamma.param)
  }
  intercept ~ dnorm(0,beta.inv.gamma.param)

  # Prior for the inverse variance
  inv.var   ~ dgamma(variance.inv.gamma.param, variance.inv.gamma.param)
  sigma     <- 1/sqrt(inv.var)

  #Beta Prior for the inverse variance
  inv.var.beta   ~ dgamma(beta.inv.gamma.param, beta.inv.gamma.param)
}"

model.normal_hierarchical <- jags.model(textConnection(model_string.normal_hierarchical), data = list(Y=Y,X=X,n=n,p=p,variance.inv.gamma.param=variance.inv.gamma.param,beta.inv.gamma.param=beta.inv.gamma.param),n.chains = n.chains)## Compiling model graph

update(model.normal_hierarchical, nSamples, progress.bar="none"); # Burnin

samp.coeff.normal_hierarchical <-coda.samples(model.normal_hierarchical,variable.names=c("intercept","beta"),n.iter=2*nSamples)

sum.normal_hierarchical <-  summary(samp.coeff.normal_hierarchical)
quantiles<-sum.normal_hierarchical$quantiles
left.05.quantile.sign  <- sign(quantiles[,1])==-1
right.95.quantile.sign <- sign(quantiles[,5])==1
significant <- xor(left.05.quantile.sign ,right.95.quantile.sign)
beta.significant <- quantiles[significant,]

pander(data.frame(beta.significant), caption = "significant normal hierarchical ")

credible.widths <- beta.significant[,5]-beta.significant[,1]

predictor.names.significant <- predictor.names[significant]

pander(data.frame(predictor.names.significant,credible.widths), caption = "credible widths normal hierarchical ")


if (DEBUG)
{
  autocorr.plot(samp.coeff.normal_hierarchical)

  plot(samp.coeff.normal_hierarchical)

  #Sample again and estimate posterior means and MAP posterior modes.
  samp.coeff.normal_hierarchical.jags <- jags.samples(model.normal_hierarchical, variable.names = c("intercept","beta"), n.iter = nSamples, progress.bar = "none")
  posterior_means.normal_hierarchical <- lapply(samp.coeff.normal_hierarchical.jags, apply, 1, "mean")
  pander(posterior_means.normal_hierarchical, caption = "posterior means second sample")
  posterior_modes.normal_hierarchical <- lapply(samp.coeff.normal_hierarchical.jags, apply, 1, "mlv")
  posterior_modes.normal_hierarchical
  if(n.chains>1)
  {
  gelman.plot(samp.coeff)
  }
}
```

## BLASSO

```{r}
beta.inv.gamma.param  <- 0.01
variance.inv.gamma.param  <- 0.1
p <- ncol(X)

model_string.normal_blasso <- "model{
  # Likelihood
  for(i in 1:n){
    Y[i]   ~ dnorm(mu[i],inv.var)
    mu[i] <- intercept +inprod(X[i,],beta[])
  }

  # Prior for beta
  for(j in 1:p){
    beta[j] ~ ddexp(0,beta.inv.gamma.param)
  }
  intercept ~ ddexp(0,beta.inv.gamma.param)

  # Prior for the inverse variance
  inv.var   ~ dgamma(variance.inv.gamma.param, variance.inv.gamma.param)
  sigma     <- 1/sqrt(inv.var)

  #Beta Prior for the inverse variance
  inv.var.beta   ~ dgamma(beta.inv.gamma.param, beta.inv.gamma.param)
}"

model.normal_blasso <- jags.model(textConnection(model_string.normal_blasso), data = list(Y=Y,X=X,n=n,p=p,variance.inv.gamma.param=variance.inv.gamma.param,beta.inv.gamma.param=beta.inv.gamma.param),n.chains = n.chains)## Compiling model graph

update(model.normal_blasso, nSamples, progress.bar="none"); # Burnin

samp.coeff.normal_blasso <- coda.samples(model.normal_blasso,variable.names=c("intercept","beta"),n.iter=2*nSamples)

sum.normal_blasso <-  summary(samp.coeff.normal_blasso)
quantiles<-sum.normal_blasso$quantiles
left.05.quantile.sign  <- sign(quantiles[,1])==-1
right.95.quantile.sign <- sign(quantiles[,5])==1
significant <- xor(left.05.quantile.sign ,right.95.quantile.sign)
beta.significant <- quantiles[significant,]

pander(data.frame(beta.significant), caption = "significant normal BLASSO ")

credible.widths <- beta.significant[,5]-beta.significant[,1]

predictor.names.significant <- predictor.names[significant]

pander(data.frame(predictor.names.significant,credible.widths), caption = "credible widths normal BLASSO ")

if (DEBUG)
{
  autocorr.plot(samp.coeff.normal_blasso)

  plot(samp.coeff.normal_blasso)

  #Sample again and estimate posterior means and MAP posterior modes.
  samp.coeff.normal_blasso.jags <- jags.samples(model.normal_blasso, variable.names = c("intercept","beta"), n.iter = nSamples, progress.bar = "none")
  posterior_means.normal_blasso <- lapply(samp.coeff.normal_blasso.jags, apply, 1, "mean")
  pander(posterior_means.normal_blasso, caption = "posterior means second sample")
  posterior_modes.normal_blasso <- lapply(samp.coeff.normal_blasso.jags, apply, 1, "mlv")
  posterior_modes.normal_blasso
  if(n.chains>1)
  {
  gelman.plot(samp.coeff)
  }
}
```