Cramming Sam's top tips from chapter 3
Click on the topic to read Sam's tips from the book
Problems with NHST
- A lot of scientists misunderstand NHST. A few examples of poorly understood things related to significance testing are:
- A significant effect is not necessarily an important one.
- A non-significant result does not mean that the null hypothesis is true.
- A significant result does not mean that the null hypothesis is false.
- NHST encourages all-or-nothing thinking whereby an effect with a p-value just below 0.05 is perceived as important whereas one with a p-value just above 0.05 is perceived as unimportant.
- NHST is biased by researchers deviating from their initial sampling frame (e.g., by stopping data collection earlier than planned).
- There are lots of ways that scientists can influence the p-value. These are known as researcher degrees of freedom and include selective exclusion of data, fitting dsifferent statistical models but reporting only the one with the most favourable results, stopping data collection at a point other than that decided at the study’s conception, and including only control variables that influence the p-value in a favourable way.
- Incentive structures in science that reward publication of significant results also reward the use of researcher degrees of freedom.
- p-hacking refers to practices that lead to the selective reporting of significant p-values, most commonly trying multiple analyses and reporting only the one that yields significant results.
- Hypothesizing after the results are known (HARKing) occurs when scientists present a hypothesis that was made after data analysis as though it were made at the study’s conception.
Effect sizes and meta-analysis
- An effect size is a way of measuring the size of an observed effect, usually relative to the background error.
- Cohen’s d is the difference between two means divided by the standard deviation of the mean of the control group, or by a pooled estimate based on the standard deviations of both groups.
- Pearson’s correlation coefficient, r, is a versatile effect size measure that can quantify the strength (and direction) of relationship between two continuous variables, and can also quantify the difference between groups along a continuous variable. It ranges from −1 (a perfect negative relationship) through 0 (no relationship at all) to +1 (a perfect positive relationship).
- The odds ratio is the ratio of the odds of an event occurring in one category compared to another. An odds ratio of 1 indicates that the odds of a particular outcome are equal in both categories.
- Estimating the size of an effect in the population by combining effect sizes from different studies that test the same hypothesis is called meta-analysis.
Summary of the Bayesian process
1. Define a prior that represents your subjective beliefs about a hypothesis (the prior is a single value) or a parameter (the prior is a distribution of possibilities). The prior can range from completely uninformative, which means that you are prepared to believe pretty much anything, to strongly informative, which means that your initial beliefs are quite narrow and specific.
2 Inspect the relevant data. In our frivolous example, this was observing the behaviour of your crush. In science, the process would be a bit more formal than that.
3 Bayes’ theorem is used to update the prior distribution with the data. The result is a posterior probability, which can be a single value representing your new belief in a hypothesis, or a distribution that represents your beliefs in plausible values of a parameter, after seeing the data.
4 A posterior distribution can be used to obtain a point estimate (perhaps the peak of the distribution) or an interval estimate (a boundary containing a certain percentage, for example 95%, of the posterior distribution) of the parameter in which you were originally interested.
- Bayes’ theorem can be used to update your prior belief in a hypothesis based on the observed data.
- The probability of the alternative hypothesis given the data relative to the probability of the null hypothesis given the data is quantified by the posterior odds.
- A Bayes factor is the ratio of the probability of the data given the alternative hypothesis to that for the null hypothesis. A Bayes factor greater than 1 suggests that the observed data are more likely given the alternative hypothesis than given the null. Values less than 1 suggest the opposite. Values between 1 and 3 reflect evidence for the alternative hypothesis that is ‘barely worth mentioning’, values between 1 and 3 is evidence that ‘has substance’, and values between 3 and 10 are ‘strong’ evidence (Jeffreys, 1961).