# bayesian statistics in r book

If the data are consistent with a hypothesis, my belief in that hypothesis is strengthened. If you peek at your data after every single observation, there is a 49% chance that you will make a Type I error. BayesFactor: Computation of Bayes Factors for Common Designs. Yes, you might try to defend $$p$$-values by saying that it’s the fault of the researcher for not using them properly. The data that you need to give to this function is the contingency table itself (i.e., the crosstab variable above), so you might be expecting to use a command like this: However, if you try this you’ll get an error message. Much easier to understand, and you can interpret this using the table above. Before reading any further, I urge you to take some time to think about it. And because it assumes the experiment is over, it only considers two possible decisions. The bolded section is just plain wrong. MCMC for a model with temporal pseudoreplication. Installing JAGS on your computer. I hate to bring this up, but some statisticians would object to me using the word “likelihood” here. For example, I would avoid writing this: A Bayesian test of association found a significant result (BF=15.92). So I’m not actually introducing any “new” rules here, I’m just using the same rule in a different way.↩, Obviously, this is a highly simplified story. The material presented here has been used by students of different levels and disciplines, including advanced undergraduates studying Mathematics and Statistics and students in graduate programs in Statistics, Biostatistics, Engineering, Economics, Marketing, Pharmacy, and Psychology. For instance, if we want to identify the best model we could use the same commands that we used in the last section. The easiest way to do it with this data set is to use the x argument to specify one variable and the y argument to specify the other. You design a study comparing two groups. You’ve got a significant result! Think Bayes: Bayesian Statistics in Python - Ebook written by Allen B. Downey. If I’d chosen a 5:1 Bayes factor instead, the results would look even better for the Bayesian approach.↩, http://www.quotationspage.com/quotes/Ambrosius_Macrobius/↩, Okay, I just know that some knowledgeable frequentists will read this and start complaining about this section. For example, if we wanted to get an estimate of the mean height of people, we could use our prior knowledge that people are generally between 5 and 6 feet tall … There’s nothing stopping you from including that information, and I’ve done so myself on occasions, but you don’t strictly need it. I listed it way back in Table 9.1, but I didn’t make a big deal out of it at the time and you probably ignored it. In the middle, we have the Bayes factor, which describes the amount of evidence provided by the data: $In this case, the alternative is that there is a relationship between species and choice: that is, they are not independent. There is a pdf version of this booklet available at:https://media.readthedocs.org/pdf/ I didn’t bother indicating whether this was “moderate” evidence or “strong” evidence, because the odds themselves tell you! Fortunately, no-one will notice. That’s not surprising, of course: that’s our prior. TensorFlow, on the other hand, is far more recent. However, for the sake of everyone’s sanity, throughout this chapter I’ve decided to rely on one R package to do the work. Specifically, let’s say our data look like this: The Bayesian test with hypergeometric sampling gives us this: The Bayes factor of 8:1 provides modest evidence that the labels were being assigned in a way that correlates gender with colour, but it’s not conclusive. The cake is a lie. A flexible extension of maximum likelihood. The example I used originally is the clin.trial data frame, which looks like this. 1974. If we do that, we end up with the following table: This table captures all the information about which of the four possibilities are likely. Posted by. Bayes’ rule cannot stop people from lying, nor can it stop them from rigging an experiment. \frac{P(h_1 | d)}{P(h_0 | d)} = \frac{0.75}{0.25} = 3 Actually, this equation is worth expanding on. Here are some possibilities: Which would you choose? Within the Bayesian framework, it is perfectly sensible and allowable to refer to “the probability that a hypothesis is true”. In his opinion, if we take $$p<.05$$ to mean there is “a real effect”, then “we shall not often be astray”. However, in this case I’m doing it because I want to use a model with more than one predictor as my example! Becasue of this, the anovaBF() reports the output in much the same way. Johnson, Valen E. 2013. Statistical Methods for Research Workers. 2015. Finally, I devoted some space to talking about why I think Bayesian methods are worth using (Section 17.3. Again, I find it useful to frame things the other way around, so I’d refer to this as evidence of about 3 to 1 in favour of an effect of therapy. This seems so obvious to a human, yet it is explicitly forbidden within the orthodox framework. First, we have to go back and save the Bayes factor information to a variable: Let’s say I want to see the best three models. Again, we obtain a $$p$$-value less than 0.05, so we reject the null hypothesis. That might change in the future if Bayesian methods become standard and some task force starts writing up style guides, but in the meantime I would suggest using some common sense. The odds of 0.98 to 1 imply that these two models are fairly evenly matched. Let’s take a look: This looks very similar to the output we obtained from the regressionBF() function, and with good reason. As before, we use formula to indicate what the full regression model looks like, and the data argument to specify the data frame. You collected some data, the results weren’t conclusive, so now what you want to do is collect more data until the the results are conclusive. In one sense, that’s true. By way of comparison, imagine that you had used the following strategy. Bayesian Data Analysis (3rd ed.). You are not allowed to use the data to decide when to terminate the experiment. This book is based on over a dozen years teaching a Bayesian Statistics course. Also it does incorporate some humour into the bundle like Bayesian Statistics… And as a consequence you’ve transformed the decision-making procedure into one that looks more like this: The “basic” theory of null hypothesis testing isn’t built to handle this sort of thing, not in the form I described back in Chapter 11. Most of the examples are simple, and similar to other online sources. It describes how a learner starts out with prior beliefs about the plausibility of different hypotheses, and tells you how those beliefs should be revised in the face of data. Welcome to Applied Statistics with R! All we do is change the subscript: \[ The result is significant with a sample size of $$N=50$$, so wouldn’t it be wasteful and inefficient to keep collecting data? What this table is telling you is that, after being told that I’m carrying an umbrella, you believe that there’s a 51.4% chance that today will be a rainy day, and a 48.6% chance that it won’t. Edinburgh, UK: Oliver; Boyd. I mean, it sounds like a perfectly reasonable strategy doesn’t it? You can type ?ttestBF to get more details.↩, I don’t even disagree with them: it’s not at all obvious why a Bayesian ANOVA should reproduce (say) the same set of model comparisons that the Type II testing strategy uses. 1.1 About This Book This book was originally (and currently) designed for use with STAT 420, Methods of Applied Statistics, at the University of Illinois at Urbana-Champaign. Bayes Bayes Bayes Bayes Bayes. Jeffreys, Harold. All you have to do is be honest about what you believed before you ran the study, and then report what you learned from doing it. Afterwards, I provide a brief overview of how you can do Bayesian versions of chi-square tests (Section 17.6), $$t$$-tests (Section 17.7), regression (Section 17.8) and ANOVA (Section 17.9). This “conditional probability” is written $$P(d|h)$$, which you can read as “the probability of $$d$$ given $$h$$”. One variant that I find quite useful is this: By “dividing” the models output by the best model (i.e., max(models)), what R is doing is using the best model (which in this case is drugs + therapy) as the denominator, which gives you a pretty good sense of how close the competitors are. It looks like you’re stuck with option 4. But let’s keep things simple, shall we?↩, You might notice that this equation is actually a restatement of the same basic rule I listed at the start of the last section. BIC is one of the Bayesian criteria used for Bayesian model selection, and tends to be one of the most popular criteria. That’s it! So what regressionBF() does is treat the intercept only model as the null hypothesis, and print out the Bayes factors for all other models when compared against that null. To a frequentist, such statements are a nonsense because “the theory is true” is not a repeatable event. What two numbers should we put in the empty cells? I’m not alone in doing this. Please check your email for instructions on resetting your password. Ultimately, isn’t that what you want your statistical tests to tell you? For instance, the model that contains the interaction term is almost as good as the model without the interaction, since the Bayes factor is 0.98. The second half of the chapter was a lot more practical, and focused on tools provided by the BayesFactor package. The material presented here has been used by students of different levels and disciplines, including advanced undergraduates studying Mathematics and Statistics and students in graduate programs in Statistics, Biostatistics, Engineering, Economics, Marketing, Pharmacy, and Psychology. The odds in favour of the null here are only 0.35 to 1. Except when the sampling procedure is fixed by an external constraint, I’m guessing the answer is “most people have done it”. Others will claim that the evidence is ambiguous, and that you should collect more data until you get a clear significant result. In Sections 17.1 through 17.3 I talk about what Bayesian statistics are all about, covering the basic mathematical rules for how it works as well as an explanation for why I think the Bayesian approach is so useful. Use the link below to share a full-text version of this article with your friends and colleagues. All the complexity of real life Bayesian hypothesis testing comes down to how you calculate the likelihood $$P(d|h)$$ when the hypothesis $$h$$ is a complex and vague thing. This book was written as a companion for the Course Bayesian Statistics from the Statistics with R specialization available on Coursera. The early chapters present the basic tenets of Bayesian thinking by use of familiar one and two-parameter inferential problems. User account menu. So you might write out a little table like this: It’s important to remember that each cell in this table describes your beliefs about what data $$d$$ will be observed, given the truth of a particular hypothesis $$h$$. In real life, how many people do you think have “peeked” at their data before the experiment was finished and adapted their subsequent behaviour after seeing what the data looked like? First, the concept of “statistical significance” is pretty closely tied with $$p$$-values, so it reads slightly strangely. What’s the Bayesian analog of this? The best model is drug + therapy, so all the other models are being compared to that. \uparrow && \uparrow && \uparrow \\[6pt] As you might expect, the answers would be diffrent again if it were the columns of the contingency table that the experimental design fixed. After introducing the theory, the book covers the analysis of contingency tables, t-tests, ANOVAs and regression. The $$\pm0\%$$ part is not very interesting: essentially, all it’s telling you is that R has calculated an exact Bayes factor, so the uncertainty about the Bayes factor is 0%.270 In any case, the data are telling us that we have moderate evidence for the alternative hypothesis. There’s a reason why, back in Section 11.5, I repeatedly warned you not to interpret the $$p$$-value as the probability of that the null hypothesis is true. In fact, almost every textbook given to undergraduate psychology students presents the opinions of the frequentist statistician as the theory of inferential statistics, the one true way to do things. Second, the “BF=15.92” part will only make sense to people who already understand Bayesian methods, and not everyone does. The BDA_R_demos repository contains some R demos and additional notes for the book Bayesian Data Analysis, 3rd ed by Gelman, Carlin, Stern, Dunson, Vehtari, and Rubin (BDA3). Bayesian statistics for realistically complicated models. Compared to other intro to statistics books like Bayesian Statistics: The Fun Way, it is more practical because of this constant programming flow that accompanies the theory. Gudmund R. Iversen. \frac{P(h_1 | d)}{P(h_0 | d)} = \frac{P(d|h_1)}{P(d|h_0)} \times \frac{P(h_1)}{P(h_0)} If you do not receive an email within 10 minutes, your email address may not be registered, Otherwise continue testing. Really bloody annoying, right? That’s not what 95% confidence means to a frequentist statistician. http://en.wikipedia.org/wiki/Climate_of_Adelaide↩, It’s a leap of faith, I know, but let’s run with it okay?↩, Um. Using the data from Johnson (2013), we see that if you reject the null at $$p<.05$$, you’ll be correct about 80% of the time. Morey and Rouder (2015) built their Bayesian tests of association using the paper by Gunel and Dickey (1974), the specific test we used assumes that the experiment relied on a joint multinomial sampling plan, and indeed the Bayes factor of 15.92 is moderately strong evidence. – Portal263. You’re very diligent, so you run a power analysis to work out what your sample size should be, and you run the study. Statistical Rethinking: A Bayesian Course with Examples in R and Stan builds your knowledge of and confidence in making inferences from data. … an error message. Guess what? The sampling plan actually does matter. The alternative model adds the interaction. 48: 19313–7. Specifically, I’m going to use the BayesFactor package written by Jeff Rouder and Rich Morey, which as of this writing is in version 0.9.10. (Version 0.6.1), http://CRAN.R-project.org/package=BayesFactor, http://en.wikipedia.org/wiki/Climate_of_Adelaide, http://www.imdb.com/title/tt0093779/quotes, http://about.abc.net.au/reports-publications/appreciation-survey-summary-report-2013/, http://knowyourmeme.com/memes/the-cake-is-a-lie, http://www.quotationspage.com/quotes/Ambrosius_Macrobius/, You conclude that there is no effect, and try to publish it as a null result, You guess that there might be an effect, and try to publish it as a “borderline significant” result. A theory of statistical inference that is so completely naive about humans that it doesn’t even consider the possibility that the researcher might look at their own data isn’t a theory worth having. However, if you’ve got a lot of possible models in the output, it’s handy to know that you can use the head() function to pick out the best few models. We could probably reject the null with some confidence! Look, I’m not dumb. That seems silly. However, I have to stop somewhere, and so there’s only one other topic I want to cover: Bayesian ANOVA. If you give up and try a new project else every time you find yourself faced with ambiguity, your work will never be published. However, one big practical advantage of the Bayesian approach relative to the orthodox approach is that it also allows you to quantify evidence for the null. The example I gave in the previous section is a pretty extreme situation. He would have marveled at the presentations in the book of many new and strong statistical and computer analyses. Sometimes it’s sensible to do this, even when it’s not the one with the highest Bayes factor. Or, to put it another way, the null hypothesis is that these two variables are independent. Consider the quote above by Sir Ronald Fisher, one of the founders of what has become the orthodox approach to statistics. You might guess that I’m not a complete idiot,256 and I try to carry umbrellas only on rainy days. In fact, you might have decided to take a quick look on Wikipedia255 and discovered that Adelaide gets an average of 4.4 days of rain across the 31 days of January. Now take a look at the column sums, and notice that they tell us something that we haven’t explicitly stated yet. To work out that there was a 0.514 probability of “rain”, all I did was take the 0.045 probability of “rain and umbrella” and divide it by the 0.0875 chance of “umbrella”. The help documentation to the contingencyTableBF() gives this explanation: “the argument priorConcentration indexes the expected deviation from the null hypothesis under the alternative, and corresponds to Gunel and Dickey’s (1974) $$a$$ parameter.” As I write this I’m about halfway through the Gunel and Dickey paper, and I agree that setting $$a=1$$ is a pretty sensible default choice, since it corresponds to an assumption that you have very little a priori knowledge about the contingency table.↩, In some of the later examples, you’ll see that this number is not always 0%. You already know that you’re analysing a contingency table, and you already know that you specified a joint multinomial sampling plan. Finally, if we turn to hypergeometric sampling in which everything is fixed, we get….$. The 15.9 part is the Bayes factor, and it’s telling you that the odds for the alternative hypothesis against the null are about 16:1. Unfortunately, the theory of null hypothesis testing as I described it in Chapter 11 forbids you from doing this.264 The reason is that the theory assumes that the experiment is finished and all the data are in. Similarly, we can work out how much belief to place in the alternative hypothesis using essentially the same equation. I wrote it that way deliberately, in order to help make things a little clearer for people who are new to statistics. In the meantime, let’s imagine we have data from the “toy labelling” experiment I described earlier in this section. In any case, note that all the numbers listed above make sense if the Bayes factor is greater than 1 (i.e., the evidence favours the alternative hypothesis). In essence, my point is this: Good laws have their origins in bad morals. I don’t know about you, but in my opinion an evidentiary standard that ensures you’ll be wrong on 20% of your decisions isn’t good enough. Bayesian methods provide a powerful alternative to the frequentist methods that are ingrained in the standard statistics curriculum. The fact remains that, quite contrary to Fisher’s claim, if you reject at $$p<.05$$ you shall quite often go astray. In essence, the $$p<.05$$ convention is assumed to represent a fairly stringent evidentiary standard. Well, how true is that? They’ll argue it’s borderline significant. To me, it makes a lot more sense to turn the equation “upside down”, and report the amount op evidence in favour of the null. Burlington, MA: Academic Press. If that’s right, then Fisher’s claim is a bit of a stretch. In my experience that’s a pretty typical outcome. Focusing on the most standard statistical models and backed up by real datasets and an all-inclusive R (CRAN) package called bayess, the book provides an operational methodology for conducting Bayesian inference, rather than focusing on its theoretical and philosophical justifications. If I were to follow the same progression that I used when developing the orthodox tests you’d expect to see ANOVA next, but I think it’s a little clearer if we start with regression. To remind you of what that data set looks like, here’s the first six observations: Back in Chapter 15 I proposed a theory in which my grumpiness (dan.grump) on any given day is related to the amount of sleep I got the night before (dan.sleep), and possibly to the amount of sleep our baby got (baby.sleep), though probably not to the day on which we took the measurement. In order to cut costs, you start collecting data, but every time a new observation arrives you run a $$t$$-test on your data. Given the difficulties in publishing an “ambiguous” result like $$p=.072$$, option number 3 might seem tempting: give up and do something else. 62 to rent \$57.21 to buy. P(\mbox{rainy}, \mbox{umbrella}) & = & P(\mbox{umbrella} | \mbox{rainy}) \times P(\mbox{rainy}) \\ However, notice that there’s no analog of the var.equal argument. In other words, what we have written down is a proper probability distribution defined over all possible combinations of data and hypothesis. One way to approach this question is to try to convert $$p$$-values to Bayes factors, and see how the two compare. P(d,h) = P(d|h) P(h) This distinction matters in some contexts, but it’s not important for our purposes.↩, If we were being a bit more sophisticated, we could extend the example to accommodate the possibility that I’m lying about the umbrella. There’s a reason why almost every textbook on statstics is forced to repeat that warning. In this example, I’m going to pretend that you decided that dan.grump ~ dan.sleep + baby.sleep is the model you think is best. There are three different terms here that you should know. So the command is: So that’s pretty straightforward: it’s exactly what we’ve been doing throughout the book. For the chapek9 data, I implied that we designed the study such that the total sample size $$N$$ was fixed, so we should set sampleType = "jointMulti". What about the design in which the row columns (or column totals) are fixed? “Bayes Factors.” Journal of the American Statistical Association 90: 773–95. It’s because people desperately want that to be the correct interpretation. And the reason why “data peeking” is such a concern is that it’s so tempting, even for honest researchers. 2 years ago. As we discussed earlier, the prior tells us that the probability of a rainy day is 15%, and the likelihood tells us that the probability of me remembering my umbrella on a rainy day is 30%. In other words, before being told anything about what actually happened, you think that there is a 4.5% probability that today will be a rainy day and that I will remember an umbrella. Suppose we want to test the main effect of drug. \]. This chapter comes in two parts. Sounds like an absurd claim, right? In the meantime, I thought I should show you the trick for how I do this in practice. As you can tell, the BayesFactor package is pretty flexible, and it can do Bayesian versions of pretty much everything in this book. If it were up to me, I’d have called the “positive evidence” category “weak evidence”. If you run an experiment and you compute a Bayes factor of 4, it means that the evidence provided by your data corresponds to betting odds of 4:1 in favour of the alternative. If you are a frequentist, the answer is “very wrong”. Download for offline reading, highlight, bookmark or take notes while you read Think Bayes: Bayesian Statistics in Python. The (Intercept) term isn’t usually interesting, though it is highly significant. In this kind of data analysis situation, we have a cross-tabulation of one variable against another one, and the goal is to find out if there is some association between these variables. To me, anything in the range 3:1 to 20:1 is “weak” or “modest” evidence at best. You’ve found the regression model with the highest Bayes factor (i.e., dan.grump ~ dan.sleep), and you know that the evidence for that model over the next best alternative (i.e., dan.grump ~ dan.sleep + day) is about 16:1. If you’re a cognitive psychologist, you might want to check out Michael Lee and E.J. Without knowing anything else, you might conclude that the probability of January rain in Adelaide is about 15%, and the probability of a dry day is 85%. Once you’ve made the jump, you no longer have to wrap your head around counterinuitive definitions of $$p$$-values. \mbox{BF} = \frac{P(d|h_1)}{P(d|h_0)} = \frac{0.1}{0.2} = 0.5 It uses a pretty standard formula and data structure, so the command should look really familiar. I indicated exactly what the effect is (i.e., “a relationship between species and choice”) and how strong the evidence was. That’s because the citation itself includes that information (go check my reference list if you don’t believe me). & = & 0.045 But the fact remains that if you want your $$p$$-values to be honest, then you either have to switch to a completely different way of doing hypothesis tests, or you must enforce a strict rule: no peeking. As usual we have a formula argument in which we specify the outcome variable on the left hand side and the grouping variable on the right. This view is hardly unusual: in my experience, most practitioners express views very similar to Fisher’s. In my opinion, there’s a fairly big problem built into the way most (but not all) orthodox hypothesis tests are constructed. That’s not an unreasonable view to take, but in my view the problem is a little more severe than that. If [$$p$$] is below .02 it is strongly indicated that the [null] hypothesis fails to account for the whole of the facts. I find this hard to understand. To really get the full picture, though, it helps to add the row totals and column totals. Again, let’s not worry about the maths, and instead think about our intuitions. Bayesian statistical methods are based on the idea that one can assert prior probability distributions for parameters of interest. You use your “preferred” model as the formula argument, and then the output will show you the Bayes factors that result when you try to drop predictors from this model: Okay, so now you can see the results a bit more clearly. Unfortunately – in my opinion at least – the current practice in psychology is often misguided, and the reliance on frequentist methods is partly to blame. Now, sure, you know you said that you’d keep running the study out to a sample size of $$N=80$$, but it seems sort of pointless now, right? In this data set, we supposedly sampled 180 beings and measured two things. The BayesFactor package contains a function called anovaBF() that does this for you. So what we expect to see in our final table is some numbers that preserve the fact that “rain and umbrella” is slightly more plausible than “dry and umbrella”, while still ensuring that numbers in the table add up. Instead, we tend to talk in terms of the posterior odds ratio. So, what’s the chance that you’ll make it to the end of the experiment and (correctly) conclude that there is no effect? – Ambrosius Macrobius267, Good rules for statistical testing have to acknowledge human frailty. It’s such an appealing idea that even trained statisticians fall prey to the mistake of trying to interpret a $$p$$-value this way. Okay, let’s say you’ve settled on a specific regression model. Even the 3:1 standard, which most Bayesians would consider unacceptably lax, is much safer than the $$p<.05$$ rule. Analysts who need to incorporate their work into real-world decisions, as opposed to formal statistical inference for publication, will be especially interested. It has interfaces for many popular data analysis languages including Python, MATLAB, Julia, and Stata.The R interface for Stan is called rstan and rstanarm is a front-end to rstan that allows regression models to be fit using a standard R regression model interface. So the command I would use is: Again, the Bayes factor is different, with the evidence for the alternative dropping to a mere 9:1. Let’s say that limit kicks in at $$N=1000$$ observations. The command that we need is. When does Dan carry an umbrella? If the $$t$$-tests says $$p<.05$$ then you stop the experiment and report a significant result. The joint probability of the hypothesis and the data is written $$P(d,h)$$, and you can calculate it by multiplying the prior $$P(h)$$ by the likelihood $$P(d|h)$$. You are strictly required to follow these rules, otherwise the $$p$$-values you calculate will be nonsense. “Revised Standards for Statistical Evidence.” Proceedings of the National Academy of Sciences, no. The easiest way is to use the regressionBF() function instead of lm(). Or if we look at line 1, we can see that the odds are about $$1.6 \times 10^{34}$$ that a model containing the dan.sleep variable (but no others) is better than the intercept only model. The main effect of therapy is weaker, and the evidence here is only 2.8:1. \uparrow && \uparrow && \uparrow \$6pt] 7.1.1 Definition of BIC. How do we do the same thing using Bayesian methods?$, It’s all so simple that I feel like an idiot even bothering to write these equations down, since all I’m doing is copying Bayes rule from the previous section.260. It’s a reasonable, sensible and rational thing to do. And this formula, folks, is known as Bayes’ rule. You have two possible hypotheses, $$h$$: either it rains today or it does not. To my mind, this write up is unclear. Bayesian statistics are covered at the end of the book. How should you solve this problem? But until that day arrives, I stand by my claim that default Bayes factor methods are much more robust in the face of data analysis practices as they exist in the real world. So here’s our command: At this point, I hope you can read this output without any difficulty. The full text of this article hosted at iucr.org is unavailable due to technical difficulties. When you report $$p<.05$$ in your paper, what you’re really saying is $$p<.08$$. To an ideological frequentist, this sentence should be meaningless. That being said, I can talk a little about why I prefer the Bayesian approach. How can that last part be true? (a=1) : 8.294321 @plusorminus0%, #Bayes factor type: BFcontingencyTable, hypergeometric, "mood.gain ~ drug + therapy + drug:therapy", Learning statistics with R: A tutorial for psychology students and other beginners. You can choose to report a Bayes factor less than 1, but to be honest I find it confusing. As it happens, I ran the simulations for this scenario too, and the results are shown as the dashed line in Figure 17.1. Stan (also discussed in Richard’s book) is a statistical programming language famous for its MCMC framework. Learn more. All the $$p$$-values you calculated in the past and all the $$p$$-values you will calculate in the future. Sounds nice, doesn’t it? Well, keep in mind that if you do, your Type I error rate at $$p<.05$$ just ballooned out to 8%. The trick to understanding this output is to recognise that if we’re interested in working out which of the 3 predictor variables are related to dan.grump, there are actually 8 possible regression models that could be considered. First, let’s remind ourselves of what the data were. Lee, Michael D, and Eric-Jan Wagenmakers. It may certainly be used elsewhere, but any references to “this course” in this book specifically refer to STAT 420. Download for offline reading, highlight, bookmark or take notes while you read Doing Bayesian Data Analysis: A Tutorial Introduction with R. However, sequential analysis methods are constructed in a very different fashion to the “standard” version of null hypothesis testing. When writing up the results, my experience has been that there aren’t quite so many “rules” for how you “should” report Bayesian hypothesis tests. It was and is current practice among psychologists to use frequentist methods. Finally, the evidence against an interaction is very weak, at 1.01:1. Using the ttestBF() function, we can obtain a Bayesian analog of Student’s independent samples $$t$$-test using the following command: Notice that format of this command is pretty standard. We are going to discuss the Bayesian model selections using the Bayesian information criterion, or BIC. Reading the results off this table is sort of counterintuitive, because you have to read off the answers from the “wrong” part of the table. One or two reviewers might even be on your side, but you’ll be fighting an uphill battle to get it through. A First Course in Bayesian Statistical Methods. At this point, all the elements are in place. What happens? None of us are beyond temptation. I did so in order to be charitable to the $$p$$-value. In any case, by convention we like to pretend that we give equal consideration to both the null hypothesis and the alternative, in which case the prior odds equals 1, and the posterior odds becomes the same as the Bayes factor. You don’t have conclusive results, so you decide to collect some more data and re-run the analysis. When the study starts out you follow the rules, refusing to look at the data or run any tests. Well, like every other bloody thing in statistics, there’s a lot of different ways you could do it. The resulting Bayes factor of 15.92 to 1 in favour of the alternative hypothesis indicates that there is moderately strong evidence for the non-independence of species and choice. A guy carrying an umbrella on a summer day in a hot dry city is pretty unusual, and so you really weren’t expecting that. I’ll talk a little about Bayesian versions of the independent samples $$t$$-tests and the paired samples $$t$$-test in this section. In order to estimate the regression model we used the lm() function, like so: The hypothesis tests for each of the terms in the regression model were extracted using the summary() function as shown below: When interpreting the results, each row in this table corresponds to one of the possible predictors. Some reviewers will think that $$p=.072$$ is not really a null result. See also Bayesian Data Analysis course material . Our goal in developing the course was to provide an introduction to Bayesian inference in decision making without requiring calculus, with the book providing more details and background on Bayesian Inference. When a frequentist says the same thing, they’re referring to the same table, but to them “a likelihood function” almost always refers to one of the columns. For the purposes of this section, I’ll assume you want Type II tests, because those are the ones I think are most sensible in general. See? Archived. To say the same thing using fancy statistical jargon, what I’ve done here is divide the joint probability of the hypothesis and the data $$P(d,h)$$ by the marginal probability of the data $$P(d)$$, and this is what gives us the posterior probability of the hypothesis given that we know the data have been observed. Here we will take the Bayesian propectives. Bayesian methods usually require more evidence before rejecting the null. Back in Chapter@refch:ttest I suggested you could analyse this kind of data using the independentSamplesTTest() function in the lsr package. The BayesFactor package contains a function called ttestBF() that is flexible enough to run several different versions of the $$t$$-test. In real life, people don’t run hypothesis tests every time a new observation arrives. The data argument is used to specify the data frame containing the variables. So the relevant comparison is between lines 2 and 1 in the table. A wise man, therefore, proportions his belief to the evidence. \end{array} Similarly, I didn’t bother to indicate that I ran the “joint multinomial” sampling plan, because I’m assuming that the method section of my write up would make clear how the experiment was designed. part refers to the alternative hypothesis. As with most R commands, the output initially looks suspiciously similar to utter gibberish. The book would also be valuable to the statistical practitioner who wishes to learn more about the R language and Bayesian methodology. Given all of the above, what is the take home message? So let’s begin. \]. The second type of statistical inference problem discussed in this book is the comparison between two means, discussed in some detail in the chapter on $$t$$-tests (Chapter 13. That’s the answer to our problem! \]. For example, Johnson (2013) presents a pretty compelling case that (for $$t$$-tests at least) the $$p<.05$$ threshold corresponds roughly to a Bayes factor of somewhere between 3:1 and 5:1 in favour of the alternative. So the only part that really matters is this line here: Ignore the r=0.707 part: it refers to a technical detail that we won’t worry about in this chapter.273 Instead, you should focus on the part that reads 1.754927. Other reviewers will agree it’s a null result, but will claim that even though some null results are publishable, yours isn’t. Press question mark to learn the rest of the keyboard shortcuts. My bayesian-guru professor from Carnegie Mellon agrees with me on this. \]. For example, if you want to run a Student’s $$t$$-test, you’d use a command like this: Like most of the functions that I wrote for this book, the independentSamplesTTest() is very wordy. CRC (2013) The Gelman book isn't constrained to R but also uses Stan, a probabilistic programming language similar to BUGS or JAGS. \end{array} All of them. Ultimately it depends on what you think is right. What Bayes factors should you report? And to be perfectly honest, I can’t answer this question for you. This wouldn’t have been a problem, except for the fact that the way that Bayesians use the word turns out to be quite different to the way frequentists do. In contrast, notice that the Bayesian test doesn’t even reach 2:1 odds in favour of an effect, and would be considered very weak evidence at best. \], $\mbox{Posterior odds} && \mbox{Bayes factor} && \mbox{Prior odds} \begin{array} This will get you confortable with the main theoretical concepts of statistical reasoning while also teaching you to code them using examples in the R programming language. Up to this point I’ve been talking about what Bayesian inference is and why you might consider using it. The problem is that the word “likelihood” has a very specific meaning in frequentist statistics, and it’s not quite the same as what it means in Bayesian statistics. As with the other examples, I think it’s useful to start with a reminder of how I discussed ANOVA earlier in the book. 17.1 Probabilistic reasoning by rational agents. To an actual human being, this would seem to be the whole point of doing statistics: to determine what is true and what isn’t. “Bayes Factors for Independence in Contingency Tables.” Biometrika, 545–57. However, I haven’t had time to do this yet, nor have I made up my mind about whether it’s really a good idea to do this. Imagine you’re a really super-enthusiastic researcher on a tight budget who didn’t pay any attention to my warnings above. Applied Bayesian Statistics: With R and OpenBUGS Examples (Springer Texts in Statistics (98)) Part of: Springer Texts in Statistics (72 Books) 2.4 out of 5 stars 4. Using the equations given above, Bayes factor here would be: \[ My point is the same one I made at the very beginning of the book in Section 1.1: the reason why we run statistical tests is to protect us from ourselves. Before moving on, it’s worth highlighting the difference between the orthodox test results and the Bayesian one. Honestly, there’s nothing wrong with it. If this is really what you believe about Adelaide rainfall (and now that I’ve told it to you, I’m betting that this really is what you believe) then what I have written here is your prior distribution, written $$P(h)$$: To solve the reasoning problem, you need a theory about my behaviour. Suppose, for instance, the posterior probability of the null hypothesis is 25%, and the posterior probability of the alternative is 75%. More to the point, the other two Bayes factors are both less than 1, indicating that they’re all worse than that model. In the rainy day problem, you are told that I really am carrying an umbrella. Bayesian Inference is a way of combining information from data with things we think we already know. To write this as an equation:259 \[ The Theory of Probability. Seems sensible, but unfortunately for you, if you do this all of your $$p$$-values are now incorrect. In other words, what we calculate is this: \[ The rule in question is the one that talks about the probability that two things are true. And in fact you’re right: the city of Adelaide where I live has a Mediterranean climate, very similar to southern California, southern Europe or northern Africa. \end{array} Specifically, I talked about using the contingencyTableBF() function to do Bayesian analogs of chi-square tests (Section 17.6, the ttestBF() function to do Bayesian $$t$$-tests, (Section 17.7), the regressionBF() function to do Bayesian regressions, and finally the anovaBF() function for Bayesian ANOVA. As far as I can tell, Bayesians didn’t originally have any agreed upon name for the likelihood, and so it became common practice for people to use the frequentist terminology. Not just the $$p$$-values that you calculated for this study. Okay, at this point you might be thinking that the real problem is not with orthodox statistics, just the $$p<.05$$ standard. How do we run an equivalent test as a Bayesian? On the right hand side, we have the prior odds, which indicates what you thought before seeing the data. Writing BUGS models. You might be thinking that this is all pretty laborious, and I’ll concede that’s true. Stan, rstan, and rstanarm. Let’s start out with one of the rules of probability theory. The answer is shown as the solid black line in Figure 17.1, and it’s astoundingly bad. So you might have one sentence like this: All analyses were conducted using the BayesFactor package in R , and unless otherwise stated default parameter values were used. This is because the contingencyTestBF() function needs one other piece of information from you: it needs to know what sampling plan you used to run your experiment. You’re breaking the rules: you’re running tests repeatedly, “peeking” at your data to see if you’ve gotten a significant result, and all bets are off. Read this book using Google Play Books app on your PC, android, iOS devices. For example, suppose I deliberately sampled 87 humans and 93 robots, then I would need to indicate that the fixedMargin of the contingency table is the "rows". Prior to running the experiment we have some beliefs $$P(h)$$ about which hypotheses are true. Only 7 left in stock - order soon. We run an experiment and obtain data $$d$$. Frequentist dogma notwithstanding, a lifetime of experience of teaching undergraduates and of doing data analysis on a daily basis suggests to me that most actual humans thing that “the probability that the hypothesis is true” is not only meaningful, it’s the thing we care most about. Andrew Gelman et. You should take this course if you are familiar with R and with Bayesian statistics at the introductory level, and work with or interpret statistical models and need to incorporate Bayesian methods. This is the Bayes factor: the evidence provided by these data are about 1.8:1 in favour of the alternative. All of them. I spelled out “Bayes factor” rather than truncating it to “BF” because not everyone knows the abbreviation. Using this notation, the table looks like this: The table we laid out in the last section is a very powerful tool for solving the rainy day problem, because it considers all four logical possibilities and states exactly how confident you are in each of them before being given any data. Also, you know for a fact that I am carrying an umbrella, so the column sum on the left must be 1 to correctly describe the fact that $$P(\mbox{umbrella})=1$$. According to the orthodox test, we obtained a significant result, though only barely. In most situations the intercept only model is one that you don’t really care about at all. You can’t compute a $$p$$-value when you don’t know the decision making procedure that the researcher used. At the bottom we have some techical rubbish, and at the top we have some information about the Bayes factors. I don’t know which of these hypotheses is true, but do I have some beliefs … In other words, what we want is the Bayes factor corresponding to this comparison: As it happens, we can read the answer to this straight off the table because it corresponds to a comparison between the model in line 2 of the table and the model in line 3: the Bayes factor in this case represents evidence for the null of 0.001 to 1. Mathematically, we say that: \[ Consider the following reasoning problem: I’m carrying an umbrella. And so the reported $$p$$-value remains a lie. (2003), Carlin and Louis (2009), Press (2003), Gill (2008), or Lee (2004). You’ll get published, and you’ll have lied. Finally, in order to test an interaction effect, the null model here is one that contains both main effects but no interaction. and you may need to create a new Wiley Online Library account. In the line above, the text Null, mu1-mu2 = 0 is just telling you that the null hypothesis is that there are no differences between means. Rich Morey and colleagues had the idea first. 3rd ed. Potentially the most information-efficient method to fit a statistical model. I’m not going to talk about those complexities in this book, but I do want to highlight that although this simple story is true as far as it goes, real life is messier than I’m able to cover in an introductory stats textbook.↩, http://www.imdb.com/title/tt0093779/quotes. It is both concise and timely, and provides a good collection of overviews and reviews of important tools used in Bayesian statistical methods." For the analysis of contingency tables, the BayesFactor package contains a function called contingencyTableBF(). I’ve rounded 15.92 to 16, because there’s not really any important difference between 15.92:1 and 16:1. For example, suppose that the likelihood of the data under the null hypothesis $$P(d|h_0)$$ is equal to 0.2, and the corresponding likelihood $$P(d|h_0)$$ under the alternative hypothesis is 0.1. In our reasonings concerning matter of fact, there are all imaginable degrees of assurance, from the highest certainty to the lowest species of moral evidence. Do you want to be an orthodox statistician, relying on sampling distributions and $$p$$-values to guide your decisions? You keep using that word. As I discussed back in Section 16.10, Type II tests for a two-way ANOVA are reasonably straightforward, but if you have forgotten that section it wouldn’t be a bad idea to read it again before continuing. Read this book using Google Play Books app on your PC, android, iOS devices. We’ve talked about the idea of “probability as a degree of belief”, and what it implies about how a rational agent should reason about the world. To me, one of the biggest advantages to the Bayesian approach is that it answers the right questions. Even if you happen to arrive at the same decision as the hypothesis test, you aren’t following the decision process it implies, and it’s this failure to follow the process that is causing the problem.265 Your $$p$$-values are a lie. This is something of a surprising event: according to our table, the probability of me carrying an umbrella is only 8.75%. At some stage I might consider adding a function to the lsr package that would automate this process and construct something like a “Bayesian Type II ANOVA table” from the output of the anovaBF() function. The ideas I’ve presented to you in this book describe inferential statistics from the frequentist perspective. However, there have been some attempts to quantify the standards of evidence that would be considered meaningful in a scientific context. The Bayesian approach to statistics considers parameters as random variables that are characterised by a prior distribution which is combined with the traditional likelihood to obtain the posterior distribution of the parameter of interest on which the statistical inference is based. If you’re the kind of person who would choose to “collect more data” in real life, it implies that you are not making decisions in accordance with the rules of null hypothesis testing. It’s now time to consider what happens to our beliefs when we are actually given the data. From the perspective of these two possibilities, very little has changed. What’s all this about?$ The Bayes factor (sometimes abbreviated as BF) has a special place in the Bayesian hypothesis testing, because it serves a similar role to the $$p$$-value in orthodox hypothesis testing: it quantifies the strength of evidence provided by the data, and as such it is the Bayes factor that people tend to report when running a Bayesian hypothesis test. Again, you need to specify the sampleType argument, but this time you need to specify whether you fixed the rows or the columns. On the left hand side, we have the posterior odds, which tells you what you believe about the relative plausibilty of the null hypothesis and the alternative hypothesis after seeing the data. programs in statistics for which this book would be appropriate. In this case, it’s easy enough to see that the best model is actually the one that contains dan.sleep only (line 1), because it has the largest Bayes factor. MCMC for a model with binomial errors Think of it like betting. So I should probably tell you what your options are! To see what I mean, here’s the original output: The best model corresponds to row 1 in this table, and the second best model corresponds to row 4. \mbox{Posterior odds} && \mbox{Bayes factor} && \mbox{Prior odds} Even in the classical version of ANOVA there are several different “things” that ANOVA might correspond to. The cake is a lie. So, what might you believe about whether it will rain today? The $$r$$ value here relates to how big the effect is expected to be according to the alternative. Bayesian statistics for realistically complicated models, Packages in R for carrying out Bayesian analysis, MCMC for a model with temporal pseudoreplication. Nevertheless, many people would happily accept $$p=.043$$ as reasonably strong evidence for an effect. On the other hand, let’s suppose you are a Bayesian. Now consider this … the scientific literature is filled with $$t$$-tests, ANOVAs, regressions and chi-square tests. However, that’s a pretty technical paper. P(h_0 | d) = \frac{P(d|h_0) P(h_0)}{P(d)} I don’t know which of these hypotheses is true, but do I have some beliefs about which hypotheses are plausible and which are not. This is the new, fully-revised edition to the book Bayesian Core: A Practical Approach to Computational Bayesian Statistics. In the rainy day problem, the data corresponds to the observation that I do or do not have an umbrella. Back in Section 13.5 I discussed the chico data frame in which students grades were measured on two tests, and we were interested in finding out whether grades went up from test 1 to test 2. In any case, if you know what you’re looking for, you can look at this table and then report the results of the Bayesian analysis in a way that is pretty closely analogous to how you’d report a regular Type II ANOVA. \]. It’s your call, and your call alone. What should you do? 2. Well, consider the following scenario. This book is based on over a dozen years teaching a Bayesian Statistics course. Nevertheless, the problem tells you that it is true. Or, more helpfully, the odds are about 1000 to 1 against the null. That’s, um, quite a bit bigger than the 5% that it’s supposed to be. Any time that you aren’t exactly sure about what the truth is, you should use the language of probability theory to say things like “there is an 80% chance that Theory A is true, but a 20% chance that Theory B is true instead”. I absolutely know that if you adopt a sequential analysis perspective you can avoid these errors within the orthodox framework. So the probability that both of these things are true is calculated by multiplying the two: \[ 2014. Remember what I said in Section 16.10 about ANOVA being complicated. We shall not often be astray if we draw a conventional line at .05 and consider that [smaller values of $$p$$] indicate a real discrepancy. Unlike frequentist statistics Bayesian statistics does allow to talk about the probability that the null hypothesis is true. In real life, the things we actually know how to write down are the priors and the likelihood, so let’s substitute those back into the equation. Orthodox methods cannot tell you that “there is a 95% chance that a real change has occurred”, because this is not the kind of event to which frequentist probabilities may be assigned. That way, anyone reading the paper can multiply the Bayes factor by their own personal prior odds, and they can work out for themselves what the posterior odds would be. Just as we saw with the contingencyTableBF() function, the output is pretty dense. It is a well-written book on elementary Bayesian inference, and the material is easily accessible. Although the bolded passage is the wrong definition of a $$p$$-value, it’s pretty much exactly what a Bayesian means when they say that the posterior probability of the alternative hypothesis is greater than 95%. Book on Bayesian statistics for a "statistican" Close. See Rouder et al.