Lecture 5/6

Cookbook of commonly used statistical tests

  • Comparing two groups
  • Comparing more than two groups

Introduction to Biostatistics

By: Peter Kamerman    (view at painblogR)

A quick recap on:
p-values & hypothesis testing

Definition of a p-value

“The probability of observing a result as great as (or greater than) you observed if the null hypothesis is true.”


If the data are unlikely under the null hypothesis (small p-value), then either we observed a low probability event, or it must be that the null hypothesis is not true.

…only one of these can be correct.

Hypothesis testing

Jerzy Neyman and Egon Pearson:

  • Works by setting a threshold \( (\alpha) \) that the p-value must cross.

  • You state a null hypothesis and an alternative hypothesis and use the threshold p-value as a decision rule.

  • The p-value threshold is chosen to control false-positive inference (usually set at \( \alpha \) = 0.05).

  • You have to abide by the statistical test's 'decision' if you wish to protect against false-positive errors.

Which test?

plot of chunk cheatsheet_1

Which test?

plot of chunk cheatsheet_2

Parametric tests

Experimental groups may differ for two reasons:

  1. Real effect of intervention

  2. Random variation between samples drawn from the same population

You must decide whether:

[1] is large enough relative to [2] to conclude
that a treatment had an effect.

Parametric tests

Calculate the ratio of variances

  1. between-group variance \( (\sigma^2_{bet}) \)

  2. within-group variance \( (\sigma^2_{with}) \)

If samples are from the same population,
the variances will be similar, and…

\( \frac{\sigma^2_{bet}}{\sigma^2_{with}} \rightarrow 1 \)

Degrees of Freedom (df) determine the critical value the ratio (test statistic) must reach for the null hypothesis to be rejected.

Assumptions for parametric tests

  • The distribution of the data in the population is Gaussian

  • Equal variance across groups
    (the basis on which the test statistic is calculated)

  • The errors are independent
    (the 'error' refers to the difference between each value and the mean)

  • Data are unmatched (for unpaired data) / matching is effective (for repeated measures data)

Student's t-test

First have a look at the data

data(sleep)
head(sleep)
  extra group ID
1   0.7     1  1
2  -1.6     1  2
3  -0.2     1  3
4  -1.2     1  4
5  -0.1     1  5
6   3.4     1  6


boxplot(extra~group, data = sleep)

plot of chunk t_test_2

Student's t-test

Run t-test

# When you