Resampling

Repeatedly sample values from observed data to assess random variability in a statistics

Intelligence Refinery

⁠

Two main types of resampling procedures

⁠

The bootstrap test⁠

⁠

To assess the reliability of an estimate

The permutation tests

Used to test hypotheses, typically involving two or more groups

⁠

Permutation test

Permute: to change the order of a set of values

Entails combining and shuffling samples from all groups together, and randomly (or exhaustively) reallocating the observations to resamples, and statistic of interest is calculated

This is the logical embodiment of the null hypothesis, that the groups do not differ

The null hypothesis is tested by randomly drawing groups (without replacement) from the combined set, and seeing how much they differ from one another

Compare the observed difference with the permuted differences

If the observed difference lies outside most of the permutation distribution, the difference is likely not due to chance

⁠

Example: web stickiness

See

here⁠

for the R Markdown notebook.

⁠

Exhaustive and bootstrap permutation test

Two variants of the permutation test

Exhaustive permutation test

Instead of randomly shuffling and dividing the data, figure out all the possible ways it could be divided

Only practical for relatively small sample sizes

Sometimes called exact tests, due to their statistical property of guaranteeing that the null model will not test as "significant" more than the alpha level of the test

Bootstrap permutation test

The draws are made with replacement

This models both

The random assignment of treatment to subject

The random selection of subjects from a population

⁠