JavaScript required
We’re sorry, but Coda doesn’t work properly without JavaScript enabled.
Practical Statistics for Data Scientists
Share
Explore
Practical Statistics for Data Scientists
1. Exploratory data analysis
Elements of structured data
Estimates of location
Estimates of variability
Exploring the data distribution
Exploring binary and categorical data
Correlation
Exploring two or more variables
2. Data distributions
Random sampling and sample bias
Selection bias
Sampling distribution of a statistic
The bootstrap
Confidence intervals
Normal distribution
Long-tailed distributions
Student's t-distribution
Binomial distribution
Poisson and related distributions
3. Statistical experiments
A/B testing
Hypothesis tests
Resampling
Statistical significance and p-values
t-Tests
Multiple testing
Degrees of freedom
ANOVA
Chi-squre test
Multi-arm bandit algorithm
Power and sample size
4. Regression
Simple linear regression
Multiple linear regression
Prediction using regression
Factor variables in regression
Interpreting the regression equation
Testing the assumptions: regression diagnostics
Polynomial and spline regression
5. Classification
Naive Bayes
Discriminant analysis
Logistic regression
Evaluating classification models
Strategies for imbalanced data
6. Statistical ML
K-nearest neighbours
Tree models
Bagging and random forest
Boosting
7. Unsupervised learning
Principal components analysis
K-means clustering
Hierarchical clustering
Model-based clustering
Scaling and categorical variables
3. Statistical experiments
A/B testing
A/B testing
An experiment with two groups to establish which is superior
An experiment with two groups to establish which is superior
IR
Intelligence Refinery
Why have a control group?
In a
properly designed A/B test:
Subjects are
randomized
to either treatments
Need to collect data on treatments A and B in such a way that any observed difference between A and B must be due to either:
Random chance in assignment of subjects
A true difference between A and B
Must establish a single test statistic before conducting the test
, otherwise leave room for researcher bias
Why just A/B? Why not C, D…?
If testing which is the best out of multiple possible conditions, use
multi-arm bandit
Why have a control group?
image.gifWhy just A/B? Why not C, D…?
Gallery
Share
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
Ctrl
P
) instead.