When working with different statistical distributions, we often want to make probabilistic statements based on the distribution. We typically want to know one of four things:
A random draw of values from a particular distribution. The density (pdf) at a particular value. The distribution (cdf) at a particular value. The quantile value corresponding to a particular probability. For each probability distribution there are typically four functions available that start with a r, d, p, and q.
The r function is the one that actually simulates random numbers from that distribution. d for density, : compute the pdf or pmf p for computing the cumulative distribution function q for quantile function (inverse cumulative distribution) If you’re only interested in simulating random numbers, then you will likely only need the r functions and not the others. However, if you intend to simulate from arbitrary probability distributions using something like rejection sampling, then you will need the other functions too.
Binomial Distribution (✅ or ❌)
In R, the function dbinom provides the pmf of the binomial distribution:
dbinom(x = success ,n = size, p = probability) gives:
P(X=x) for X∼Binom(n,p).
Example 🪙 For X the number of heads when three coins are tossed, the pmf is computed with R:
x <- 0:3
dbinom(x, 3, 0.5)
## [1] 0.125 0.375 0.375 0.125
Example🎲 What is the probability of getting exactly three “2's” in eight rolls?
# P(X = 3)
#function dbinom has this frame:
#dbinom( success - in this case landing on a value three times,
# size - in this case # of rolls,
# probability of event happening -landing on a "2" = 1/6)
dbinom(3, 8, 1/6)
## [1] 0.125 0.375 0.375 0.125
Example 🎲
Suppose 100 dice are thrown. What is the expected number of sixes? What is the probability of observing 10 or fewer sixes?
We assume that the results of the dice are independent and that the probability of rolling a six is 1/6. The random variable X is the number of sixes observed, and
X∼Binom(100,1/6).
Then E[X]=100⋅16≈16.67.
That is, we expect 1/6 of the 100 rolls to be a six. The probability of observing 10 or fewer sixes in R:
sum(dbinom(0:10,100,1/6)
## [1] 0.04269568
R also provides the function pbinom, which is the cumulative sum of the pmf.
pbinom(x,n,p)
This gives:
P(X≤x) for X∼Binom(n,p).
We could compute the probability of observing 10 or fewer sixes in 10 rolls as:
# P(X <= 10)
pbinom(10,100,1/6)
## [1] 0.04269568
Example 🗳️
Suppose Alice and Bob are running for office, and 46% of all voters prefer Alice. A poll randomly selects 300 voters and asks their preference. What is the expected number of voters who will report a preference for Alice? What is the probability that the poll results suggest Alice will win?
Let “success” be a preference for Alice, and X be the random variable equal to the number of polled voters who prefer Alice. It is reasonable to assume that
X∼Binom(300,0.46)
as long as our sample of 300 voters is a small portion of the population of all voters. We expect that 0.46⋅300=138 of the 300 voters will report a preference for Alice.
For the poll results to show Alice in the lead, we need (X > 150). To find this, we must compute:
1
−
P
(
X
≤
150
)
There is about a 7.4% chance the poll will show Alice in the lead, despite her imminent defeat.
R provides the function rbinom to simulate binomial random variables. The first argument to rbinom is the number of random values to simulate, and the next arguments are n and p. Here are 15 simulations of the Alice vs. Bob poll:
## [1] 132 116 129 139 165 137 138 142 134 140 140 134 134 126 149
In this series of simulated polls, Alice appears to be losing in all except the fifth poll where she was preferred by 165/300= 55% of the selected voters.
We can compute P(X>150) using:
X <- rbinom(10000,300,0.46) ## [1] 0.0714
Geometric Distribution ❌❌✅
X∼Geom(p)
The number of failures before the first success occurs.
The functions dgeom, pgeom and rgeom are available for working with a geometric random variable X∼Geom(p):
dgeom(x,p) is the pmf, and gives P(X=x) rgeom(N,p) simulates N random values of X. Example 🎲
A die is tossed until the first 6 occurs. What is the probability that it takes 4 or more tosses?
We define success as a roll of six, and let X be the number of failures before the first success. Then X∼Geom(1/6), a geometric random variable with probability of success 1/6.
We cannot perform the infinite sum with dgeom, but we can come close by summing to a large value of x:
Rather than summing the pmf, we may use pgeom:
Finally, we can use simulation to approximate the result:
Example 🏀
Let X be the random variable which counts the number of free throws Steph Curry makes before missing one. We model a Steph Curry free throw as a Bernoulli trial, but we choose “success” to be a missed free throw, so that P=0.1 and X∼Geom(0.1).
The expected number of “failures” is
E[X]=(0.90/0.1)=9
which means we expect Steph to make 9 free throws before missing one. To make 20 in a row,
P
(
X
≥
20
)
=
1
−
P
(
X
≤
19
)
,
we see that Steph Curry could run off 20 (or more) free throws in a row about 12% of the times he wants to try. (This is not factual, please don't come at me sideways ❤️)
Normal Distribution: 📊
We can use pnorm() to calculate cumulative probabilities. For example, let’s calculate the probability that our random variable X takes a value less than or equal to -1.
pnorm(-1, 0, 1)
## [1] 0.1586553
We can also calculate the probability that X takes some value within a certain interval. For example, let’s calculate P(-2 <= X <= 1)
pnorm(1, 0, 1) - pnorm(-2, 0, 1)
## [1] 0.8185946
The R function pnorm computes the cdf of the normal distribution, as
pnorm(x) =P(Z≤x).
Using this, we can compute the probability that Z lies within 1, 2, and 3 standard deviations of its mean: (It's okay if you have no idea what this all means)
Now lets plot the Normal curve in Rstudio
# create a sequence of 100 equaly spaced numbers between -4 and 4
temp1 <- seq(-4, 4, length=100)