Two main types of resampling procedures

To assess the reliability of an estimate

Used to test hypotheses, typically involving two or more groups

Permutation test

Permute

: to change the order of a set of values

Entails combining and shuffling samples from all groups together, and randomly (or exhaustively) reallocating the observations to resamples, and statistic of interest is calculated

This is the logical embodiment of the null hypothesis, that the groups do not differ

The null hypothesis is tested by randomly drawing groups (without replacement) from the combined set, and seeing how much they differ from one another

Compare the observed difference with the permuted differences

If the observed difference lies outside most of the permutation distribution, the difference is likely

not

due to chance

Example: web stickiness

See

for the R Markdown notebook.

Exhaustive and bootstrap permutation test

Two variants of the permutation test

Exhaustive permutation test

Instead of randomly shuffling and dividing the data, figure out all the possible ways it could be divided

Only practical for relatively small sample sizes

Sometimes called

exact tests

, due to their statistical property of guaranteeing that the null model will not test as "significant" more than the alpha level of the test

Bootstrap permutation test

The draws are made

with replacement

The random assignment of treatment to subject

The random selection of subjects from a population

Permutation tests: the bottom line for data science

Permutation tests are useful heuristic procedures for

exploring the role of random variation

As compared to formula-based statistics, permutation tests have fewer assumptions on the data

Data can be numeric or binary

Sample sizes can be same or difference

Normal distribution not needed