Skip to content
Gallery
sdsusdsu
STATS 550 SDSU
Share
Explore
STATS 550 SDSU

icon picker
Final Prep

Last edited 250 days ago by Eddie Coda

Question 1.

A die is tossed until the first 6 occurs. What is the probability that it takes 4 or more tosses? Estimate the probability for this geometric distribution by simulating 1000 random samples. Create a histogram of your simulations and describe the shape of the distribution.
# Step 1: Set up the problem
p_success <- 1/6
p_failure <- 5/6

# Step 2: Calculate the probability using the CCDF
p_4or_more <- (1 - p_failure^3)
p_4or_more

# Step 3: Simulate 10,000 random samples using the geometric distribution
set.seed(42) # Set the seed for reproducibility
n <- 10000
simulations <- rgeom(n, prob = p_success) + 1

# Step 4: Create a histogram using ggplot2
library(ggplot2)
ggplot(data.frame(simulations), aes(x = simulations)) +
geom_histogram(binwidth = 1, color = "black", fill = "skyblue") +
labs(title = "Waiting for the 6: A Board Game Enthusiast's Journey",
x = "Number of Rolls to Get the First 6",
y = "Frequency") +
theme_minimal()

# Step 5: Describe the distribution
image.png

The histogram resembles a mountain, with the peak located at the first roll, and the slopes gradually descending as we move to the right. It shows that the majority of the time, Zoe will get her first 6 within the initial few rolls. As the number of rolls increases, the frequency decreases, making it less likely for Zoe to wait too long for her favorite number to appear.

Question 2.

UFO sightings have been reported to occur at an average rate of five per hour during certain clear nights. What is the probability that a UFO hunter will spot exactly ten UFOs in two hours?
- Run a random sample of this event and simulate it to estimate the probability and compare it to the exact probability.
- Create a histogram and describe the shape of the distribution.
lambda <- 5 * 2 # Rate per hour * number of hours
k <- 10 # Number of UFOs
exact_prob <- dpois(k, lambda)

n_simulations <- 10000
simulated_UFOs <- rpois(n_simulations, lambda)

estimated_prob <- sum(simulated_UFOs == k) / n_simulations

hist(
simulated_UFOs,
main="Simulated UFO Sightings 🛸",
xlab="Number of UFOs Spotted",
col="lightblue", border="black",
breaks=seq(min(simulated_UFOs
), max(simulated_UFOs), 1))
image.png

The histogram of simulated UFO sightings 🛸 displays a unimodal distribution, with a peak around 12 UFOs. The distribution appears slightly right-skewed, with a longer tail extending toward higher numbers of UFO sightings. Overall, the shape of the histogram suggests that most UFO hunters are likely to spot between 8 and 16 UFOs during the 2-hour observation period, with fewer sightings as we move further away from this range.

Question 3.

Let W ∼ Uniform(8, 12). Let M be the growth of a mystical tree in centimeters after being exposed to enchanted unicorn droppings, with the growth rate per day being equal to W.
- Use R to simulate W. Simulate the mean and pdf of M and compare to the exact results.
- Create one graph with both the theoretical density and the simulated distribution.
n_simulations <- 10000
simulated_W <- runif(n_simulations, min = 8, max = 12)

# Calculate the exact mean of W:
exact_mean_W <- (8 + 12) / 2

# Estimate the mean of M using the simulated data:
estimated_mean_M <- mean(simulated_W)

# Compare the exact and estimated means:
comparison_table <- data.frame(
Means = c("Exact", "Estimated"),
Values = c(exact_mean_W, estimated_mean_M)
)
print(comparison_table)

# Create a density plot of the simulated data:
plot(density(simulated_W), main = "Mystical Tree Growth Distribution", xlab = "Growth Rate (cm/day)", ylim = c(0, 0.35), col = "blue")

# Overlay the theoretical density of the uniform distribution:
curve(dunif(x, min = 8, max = 12), add = TRUE, col = "red", lwd = 2)
image.png

The growth rate ranges from 8 to 12 cm per day, and the density is relatively constant throughout this range. This indicates that each growth rate within the specified range has an equal probability of occurring. The height of the blue density curve remains close to the theoretical density of 0.25, which further confirms the accuracy of our simulation.

Question 4.

As an adventurer, you've found a legendary key that can open secret passages in an ancient temple. The key, being centuries old, has a 12% chance of breaking permanently each day. You want to calculate the probability that the key remains intact on each day, from day 1 to day 30. You also want to create a plot of this to demonstrate.
days <- 1:30
prob_breaking <- 0.12
prob_intact <- pgeom(days - 1, prob_breaking, lower.tail = FALSE)

# Create a plot of the probabilities:
plot(days, prob_intact, type = "l", main = "Probability of the Legendary Key Remaining Intact", xlab = "Day", ylab = "Probability", col = "darkgreen", lwd = 2)
image.png

The graph displays the probability of the legendary key remaining intact over a period of 30 days. The dark green line represents the probability that the key has not broken yet on each given day. As we move along the x-axis from day 1 to day 30, the probability of the key being intact decreases, following a downward trend. This indicates that as more days pass, the likelihood of the key surviving without breaking reduces.

Question 5.

In a thrilling game of "Guess the Jellybeans," there are exactly 100 jellybeans in a jar, with 70 being red and 30 being green. Participants need to draw 5 jellybeans without looking. What is the probability that a participant draws 3 red jellybeans and 2 green jellybeans? Use a simulation to estimate the probability and compare it to the exact probability. Create a histogram of your simulations and describe the shape of the distribution.
Understand the problem: We need to find the probability of drawing 3 red jellybeans and 2 green jellybeans from a jar with 100 jellybeans (70 red and 30 green) in a single draw of 5 jellybeans.
Identify the distribution: This is a hypergeometric distribution problem because we have a finite population (100 jellybeans) and we're trying to find the probability of a specific outcome without replacement (3 red and 2 green jellybeans).
Calculate the exact probability using the hypergeometric distribution formula in R:
k_red <- 3 # Number of red jellybeans
k_green <- 2 # Number of green jellybeans
N_red <- 70 # Total red jellybeans
N_green <- 30 # Total green jellybeans
n_draw <- 5 # Number of jellybeans drawn

exact_prob <- dhyper(k_red, N_red, N_green, n_draw)

# 4 Simulate the situation:
n_simulations <- 10000
jellybean_colors <- c(rep("red", 70), rep("green", 30))
simulated_draws <- replicate(n_simulations, sample(jellybean_colors, n_draw))
################START CODE HERE################
num_red_in_draws <- ??? # hint: use apply(), function(draw)
#################END CODE HERE#################

# 5 Estimate the probability from the simulation:
estimated_prob <- sum(num_red_in_draws == k_red) / n_simulations

# 6 Compare the exact and estimated probabilities:
# - Print out the exact and estimated probabilities.
# - Discuss the differences, if any.
comparison_table <- data.frame(
Probabilities = c("Exact", "Estimated"),
Values = c(exact_prob, estimated_prob)
)
print(comparison_table)

# 7: Create a histogram of the simulated data:
hist(num_red_in_draws, main="Simulated Jellybean Draws", xlab="Number of Red Jellybeans", col="purple", border="black", breaks=seq(min(num_red_in_draws), max(num_red_in_draws), 1))
image.png

The histogram appears as …

Question 6.

A group of 200 students is taking an online statistics course. On average, 35% of students complete their homework each day. Simulate the number of students who complete their homework on a given day using a binomial distribution. Run 10000 simulations and create a histogram to visualize the distribution.
# Step 1: Set up the problem
n_students <- 200
p_complete <- 0.35

# Step 2: Run simulations
set.seed(42) # Set the seed for reproducibility
n_simulations <- 10000
################START CODE HERE################
students_complete <- ??? # hint: use rbinom()
#################END CODE HERE#################

# Step 3: Create a histogram
hist(students_complete, main = "Online Stats Course: Daily Homework Completion", xlab = "Number of Students Completing Homework", col = "orange", border = "black", breaks = seq(min(students_complete), max(students_complete), 1))

image.png

The histogram displays the distribution of the number of students completing their homework in an online statistics course each day. The distribution appears to be symmetric and bell-shaped, indicating a normal-like distribution. Most of the students' homework completion rates cluster around the average, with fewer students completing homework as we move further away from the center.

Question 7.

In a fantasy game, a player can find rare gemstones in a cave with a 5% chance of success. The player is allowed to enter the cave 15 times per day. What is the probability that the player will find at least 3 gemstones in a day? Run a simulation to estimate the probability and create a histogram to visualize the distribution.
# Step 1: Set up the problem
n_tries <- 15
p_success <- 0.05

# Step 2: Run simulations
set.seed(42) # Set the seed for reproducibility
n_simulations <- 10000

################START CODE HERE################
gemstones_found <- ??? # hint: use rbinom()
#################END CODE HERE#################

# Step 3: Estimate the probability
prob_at_least_3_gemstones <- sum(gemstones_found >= 3) / n_simulations

# Step 4: Create a histogram
hist(gemstones_found, main = "Fantasy Game: Gemstones Found in a Day", xlab = "Number of Gemstones Found", col = "purple", border = "black", breaks = seq(min(gemstones_found), max(gemstones_found), 1))

image.png

The histogram illustrates the distribution of the number of gemstones found by a player in the fantasy game. The distribution is right-skewed, with a peak at 0 gemstones found. The frequency decreases as the number of gemstones found increases, making it less likely for the player to find more gemstones in a single day. However, there's still a chance of finding 3 or more gemstones, which can be considered a fortunate day for the player in their quest for rare treasures!

Question 8.

In the world of "Sleepy Scholars," college students observe their classmates during lectures. On average, they notice 5 classmates dozing off during a single lecture. The number of dozing students follows a Poisson distribution with a mean of 5. What is the probability of witnessing at least 8 students dozing off during a single lecture? Run a simulation to estimate the probability and create a histogram to illustrate the distribution.
Understand the problem: We need to find the probability of witnessing at least 8 students dozing off during a lecture, given that the number of dozing students follows a Poisson distribution with a mean of 5.
# Step 1: Calculate the exact probability using Poisson

lambda <- 5 # Mean of the Poisson distribution
k <- 8 # Number of dozing students
exact_prob <- 1 - ppois(k - 1, lambda)

# Step 3: Simulate the situation
n_simulations <- 10000
################START CODE HERE################
simulated_dozing_students <- ??? # hint: use rpois()
#################END CODE HERE#################

# Setp 4: Estimate the probability from the simulation
estimated_prob <- sum(simulated_dozing_students >= k) / n_simulations

# Step 5: Compare the exact and estimated probabilities
comparison_table <- data.frame(
Probabilities = c("Exact", "Estimated"),
Values = c(exact_prob, estimated_prob)
)
print(comparison_table)

# Step 6: Create a histogram of the simulated data:
hist(
simulated_dozing_students,
main = "Sleepy Scholars 😴",
xlab = "Number of Dozing Students",
col = "lightblue", border = "black",
breaks = seq(min(simulated_dozing_students),
max(simulated_dozing_students), 1)
)

# Step 7: Interpret the results


Based on the simulation, we estimated the probability of witnessing at least 8 students dozing off during a lecture to be approximately 0.13 (the estimated probability may vary slightly due to the random nature of simulations). The histogram illustrates that the majority of the time, students are likely to see around 3 to 5 classmates dozing off. The estimated probability is close to the exact probability calculated using the Poisson distribution formula, which is around 0.13. This indicates that our simulation provides a good approximation of the true probability.


Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.