Simple Bayesian Model (T-test) in R using either WinBUGs or JAGS.

WinBUGs and JAGS (Just Another Gibbs Sampler) are convenient and effective tools for estimating bayesian models for your data. Both use an openBUGs style syntax, although they do have their differences (e.g. censored data, or matrix operations). Even more convenient is that there are packages that allow you to pass BUGs syntax, call the Gibbs sampler, and pull the MCMC chain(s) back all within R.

For me, the choice of whether to use WinBUGS, or JAGS (or openBUGs for that matter) is more a question of what OS you are running. WinBUGs is a Windows application and JAGS runs on OSX. I've never attempted to run these on Linux, so if anyone knows, feel free to provide that in the comments. If you need to obtain WinBUGS or JAGS, click on the hyperlinks to download them.

The data I'm using comes from the 2010 US Census Data. It contains the median age for men and women per "place". In census language, "place" typically refers to cities or towns. Some cleaning and organization had to take place to get the information I wanted. All my code is included at the end.


I'm going to fit a Bayesian model to these data that models the average (aggregated) median age from cities and towns in the United States. Keep in mind this is an example for illustrative purposes. I'm going to model the data in a way that would allow me to conduct a "two sample t-test" using Bayesian methods. Let's first take a look at empirical densities of median age.


 Overall average median age looks to hover around 40. The data may not be normally distributed but we'll assume that parameterization for this example. Viewing distinct distributions for men and women provides the graphic below.
We see some distinction, but it is hard to tell if that is a "significant" difference. Next step is to model this data and find out.

We'll begin using JAGS for this model. To pass the BUGS syntax to JAGS from R, we create a small text file using the following code.

As a brief explanation of the code, if you are not familiar with BUGS syntax, is that I have taken the data vector "MedAge" and parameterized it as a normal distribution with mean paramater "mu" and precision parameter "prec". The precision is the inverse of variance so prec = 1/var. Thus, the standard devision would be sd = 1/sqrt(prec). I set the priors for the parameters after I model the data likelihood.

We need to inform JAGS of the data we are passing to it, as well as, the parameters we would like it to monitor. To do this we use the following code.


JAGS or WinBUGS don't automatically know what "n" or "MedAge" are so we need to store those in R and pass those arguments on to the JAGS function. The "parms" object contains the parameters we want JAGS to monitor as it runs the MCMC iterations. The call to JAGS is as follows,

ageGen.sim<-jags(data=jagdat,inits=NULL,parameters.to.save=parms,model.file='ageGenMod.txt',jags.seed=123,n.iter=11000,n.burnin=1000,n.chains=1,n.thin=1)

It is in this function that you tell it the size of the burnin and MCMC iterations. You can pass initial values to it using the "inits" argument, but it usually is fine to generate these on its own. Once jags starts running, it will display some information about model syntax along with a completion bar that monitors the MCMC process. It will look something like this.


Sometimes it will monitor the burnin separate from the rest of the iterations. To be honest, I don't know why it does or doesn't do that.

There are small difference in how you pass the BUGS code to WinBUGS.

The "bugs" function is almost identical to the "jags" function. The only real difference is that you need to tell it where to find the WinBUGS executable file. Rather than running in R, the bugs function will call WinBUGs, open it, and run the model there. Once the model has completed, then the results will appear in WinBUGs log window, which will contain trace plots and a model summary. Once you close WinBUGS, this information will get passed to R and you can proceed forward in R in the same way you would with JAGS.

If you wish to view trace plots for your MCMC simulations or autocorrelation diagnostics you can do the following.

 Trace plots are called when you plot an MCMC object. Raftery diagnostics near one mean that there is no need for thinning. Once the dependence factors start show larger values (I think about thinning with I > 5, but up to some interpretation) is when you need to consider thinning your iterations to remove autocorrelation.


To test for a difference in means you can difference the posterior distributions for the "mu" parameters (there are two; mu[1] & mu[2]). For example,

diffdist<-ageGen.sim$BUGSoutput$sims.matrix[,2]-ageGen.sim$BUGSoutput$sims.matrix[,3]
mean(diffdist>0)

In the Bayesian sense, there is a "significant" difference in means here. With posterior distributions, you can make probabilistic statements and the credible interval of the difference in means will come from "diffdist".

I hope that was helpful for people learning to do bayesian modeling in R. If you have any specific questions, I can answer them here.

R Code
### Bayesian Modeling ###
library(R2jags)

ageXsex<-read.csv("MedianAgebySex.csv",stringsAsFactors=F)
colnames(ageXsex)<-c("GeoID","GeoID2","Place","Med_Male_Female","Male","Female")
ageXsex<-ageXsex[-grep("--",ageXsex$Place),]


library(magrittr)
state<-strsplit(ageXsex$Place,split=",") %>% lapply(function(x) x[2]) %>% unlist()
ageXsex$state<-gsub("[[:space:]]","",state)
placesToRemove<-names(table(ageXsex$state)[which(table(ageXsex$state)==1)])
ageXsex<-ageXsex[-which(ageXsex$state %in% placesToRemove),]

library(ggplot2)

ageDist<-ggplot(aes(x=Med_Male_Female),data=ageXsex) + labs(title="Aggregate Median Age for Both Men and Women in the United States of America",x="Age in Years") + geom_density(color="green",fill="forestgreen",alpha=0.5)
ageDist
# Age Densities for Men and Women - Create tall data format for ggplot
ageDat<-tidyr::gather(ageXsex[,5:7],"Sex","MedianAge",Male:Female)
ageComp<-ggplot(aes(x=MedianAge,color=Sex,fill=Sex),data=ageDat) + labs(title="Aggregate Median Age for Men and Women in the United States of America",x="Age in Years") + geom_density(alpha=0.5)
ageComp + scale_color_manual(values=c("maroon3","blue")) + scale_fill_manual(values=c("maroon1","deepskyblue2"))

# Propose Bayesian Model

ageGenMod<-"
model{
  for(i in 1:n){
MedAge[i]~dnorm(mu[sex[i]],prec)
}
 #set priors
for(j in 1:2){
  mu[j]~dnorm(40,0.01)
}
prec~dgamma(1,0.01)
}
"
writeLines(ageGenMod,'ageGenMod.txt')

MedAge<-ageDat$MedianAge
sex<-as.factor(ageDat$Sex) %>% as.numeric()
n<-nrow(ageDat)

jagdat<-c('MedAge','sex','n')
parms<-c('mu','prec')

#might have to provide inits.
inits<-function(){
  (list('theta'=rgamma(10,1,3),'a'=1,'b'=3))
}

ageGen.sim<-jags(data=jagdat,inits=NULL,parameters.to.save=parms,model.file='ageGenMod.txt',jags.seed=123,n.iter=11000,n.burnin=1000,n.chains=1,n.thin=1)

samp<-as.mcmc(ageGen.sim)
plot(samp)
raftery.diag(samp)

diffdist<-ageGen.sim$BUGSoutput$sims.matrix[,2]-ageGen.sim$BUGSoutput$sims.matrix[,3]


### WinBUGS code for the same

ageGenMod<-function(){
  for(i in 1:n){
MedAge[i]~dnorm(mu[sex[i]],prec)
}
 #set priors
for(j in 1:2){
  mu[j]~dnorm(40,0.01)
}
prec~dgamma(1,0.01)
}

ageGenFile<-file.path(tempdir(),"ageGenMod.bug")
write.model(ageGenMod,ageGenFile)

ageGen.sim<-bugs(data,inits=NULL,parms,model.file="ageGenMod.bug",bugs.directory="C:/Program Files/WinBUGS14",n.iter=11000,n.burnin=1000,n.chains=1,n.thin=1,debug=T)








Comments

Popular posts from this blog

How to Get Started Playing Super Metroid / Link to the Past Crossover Randomizer.

Two-Step fix for rJava library installation on Mac OS

Structural Machine Learning in R: Predicting Probabilistic Offender Profiles using FBI's NIBRS Data