STATS 550 SDSU

Explore

STATS 550 SDSU

Final Prep

Last edited 461 days ago by Eddie Coda

Question 1.

A die is tossed until the first 6 occurs. What is the probability that it takes 4 or more tosses? Estimate the probability for this geometric distribution by simulating 1000 random samples. Create a histogram of your simulations and describe the shape of the distribution.

# Step 1: Set up the problem

p_success <- 1/6

p_failure <- 5/6

# Step 2: Calculate the probability using the CCDF

p_4or_more <- (1 - p_failure^3)

p_4or_more

# Step 3: Simulate 10,000 random samples using the geometric distribution

set.seed(42) # Set the seed for reproducibility

n <- 10000

simulations <- rgeom(n, prob = p_success) + 1

# Step 4: Create a histogram using ggplot2

library(ggplot2)

ggplot(data.frame(simulations), aes(x = simulations)) +

geom_histogram(binwidth = 1, color = "black", fill = "skyblue") +

labs(title = "Waiting for the 6: A Board Game Enthusiast's Journey",

x = "Number of Rolls to Get the First 6",

y = "Frequency") +

theme_minimal()

# Step 5: Describe the distribution

⁠

The histogram resembles a mountain, with the peak located at the first roll, and the slopes gradually descending as we move to the right. It shows that the majority of the time, Zoe will get her first 6 within the initial few rolls. As the number of rolls increases, the frequency decreases, making it less likely for Zoe to wait too long for her favorite number to appear.

⁠

Question 2.

UFO sightings have been reported to occur at an average rate of five per hour during certain clear nights. What is the probability that a UFO hunter will spot exactly ten UFOs in two hours?

- Run a random sample of this event and simulate it to estimate the probability and compare it to the exact probability.

- Create a histogram and describe the shape of the distribution.

lambda <- 5 * 2 # Rate per hour * number of hours

k <- 10 # Number of UFOs

exact_prob <- dpois(k, lambda)

n_simulations <- 10000

simulated_UFOs <- rpois(n_simulations, lambda)

estimated_prob <- sum(simulated_UFOs == k) / n_simulations

hist(

simulated_UFOs,

main="Simulated UFO Sightings 🛸",

xlab="Number of UFOs Spotted",

col="lightblue", border="black",

breaks=seq(min(simulated_UFOs

), max(simulated_UFOs), 1))

⁠

The histogram of simulated UFO sightings 🛸 displays a unimodal distribution, with a peak around 12 UFOs. The distribution appears slightly right-skewed, with a longer tail extending toward higher numbers of UFO sightings. Overall, the shape of the histogram suggests that most UFO hunters are likely to spot between 8 and 16 UFOs during the 2-hour observation period, with fewer sightings as we move further away from this range.

⁠

Question 3.

Let W ∼ Uniform(8, 12). Let M be the growth of a mystical tree in centimeters after being exposed to enchanted unicorn droppings, with the growth rate per day being equal to W.

- Use R to simulate W. Simulate the mean and pdf of M and compare to the exact results.

- Create one graph with both the theoretical density and the simulated distribution.

n_simulations <- 10000

simulated_W <- runif(n_simulations, min = 8, max = 12)

# Calculate the exact mean of W:

exact_mean_W <- (8 + 12) / 2

# Estimate the mean of M using the simulated data:

estimated_mean_M <- mean(simulated_W)

# Compare the exact and estimated means:

comparison_table <- data.frame(

Means = c("Exact", "Estimated"),

Values = c(exact_mean_W, estimated_mean_M)

)

print(comparison_table)

# Create a density plot of the simulated data:

plot(density(simulated_W), main = "Mystical Tree Growth Distribution", xlab = "Growth Rate (cm/day)", ylim = c(0, 0.35), col = "blue")

# Overlay the theoretical density of the uniform distribution:

curve(dunif(x, min = 8, max = 12), add = TRUE, col = "red", lwd = 2)

⁠

The growth rate ranges from 8 to 12 cm per day, and the density is relatively constant throughout this range. This indicates that each growth rate within the specified range has an equal probability of occurring. The height of the blue density curve remains close to the theoretical density of 0.25, which further confirms the accuracy of our simulation.

⁠

Question 4.

As an adventurer, you've found a legendary key that can open secret passages in an ancient temple. The key, being centuries old, has a 12% chance of breaking permanently each day. You want to calculate the probability that the key remains intact on each day, from day 1 to day 30. You also want to create a plot of this to demonstrate.

days <- 1:30

prob_breaking <- 0.12

prob_intact <- pgeom(days - 1, prob_breaking, lower.tail = FALSE)

# Create a plot of the probabilities:

plot(days, prob_intact, type = "l", main = "Probability of the Legendary Key Remaining Intact", xlab = "Day", ylab = "Probability", col = "darkgreen", lwd = 2)

⁠

The graph displays the probability of the legendary key remaining intact over a period of 30 days. The dark green line represents the probability that the key has not broken yet on each given day. As we move along the x-axis from day 1 to day 30, the probability of the key being intact decreases, following a downward trend. This indicates that as more days pass, the likelihood of the key surviving without breaking reduces.

⁠

Question 5.

In a thrilling game of "Guess the Jellybeans," there are exactly 100 jellybeans in a jar, with 70 being red and 30 being green. Participants need to draw 5 jellybeans without looking. What is the probability that a participant draws 3 red jellybeans and 2 green jellybeans? Use a simulation to estimate the probability and compare it to the exact probability. Create a histogram of your simulations and describe the shape of the distribution.

Understand the problem: We need to find the probability of drawing 3 red jellybeans and 2 green jellybeans from a jar with 100 jellybeans (70 red and 30 green) in a single draw of 5 jellybeans.

Identify the distribution: This is a hypergeometric distribution problem because we have a finite population (100 jellybeans) and we're trying to find the probability of a specific outcome without replacement (3 red and 2 green jellybeans).

Calculate the exact probability using the hypergeometric distribution formula in R:

k_red <- 3 # Number of red jellybeans

k_green <- 2 # Number of green jellybeans

N_red <- 70 # Total red jellybeans

N_green <- 30 # Total green jellybeans

n_draw <- 5 # Number of jellybeans drawn

exact_prob <- dhyper(k_red, N_red, N_green, n_draw)

# 4 Simulate the situation:

n_simulations <- 10000

jellybean_colors <- c(rep("red", 70), rep("green", 30))

simulated_draws <- replicate(n_simulations, sample(jellybean_colors, n_draw))

################START CODE HERE################

num_red_in_draws <- ??? # hint: use apply(), function(draw)

#################END CODE HERE#################

# 5 Estimate the probability from the simulation:

estimated_prob <- sum(num_red_in_draws == k_red) / n_simulations

# 6 Compare the exact and estimated probabilities:

# - Print out the exact and estimated probabilities.

# - Discuss the differences, if any.

comparison_table <- data.frame(

Probabilities = c("Exact", "Estimated"),

Values = c(exact_prob, estimated_prob)

)

print(comparison_table)

# 7: Create a histogram of the simulated data:

hist(num_red_in_draws, main="Simulated Jellybean Draws", xlab="Number of Red Jellybeans", col="purple", border="black", breaks=seq(min(num_red_in_draws), max(num_red_in_draws), 1))

⁠

The histogram appears as …

⁠

Question 6.

A group of 200 students is taking an online statistics course. On average, 35% of students complete their homework each day. Simulate the number of students who complete their homework on a given day using a binomial distribution. Run 10000 simulations and create a histogram to visualize the distribution.

# Step 1: Set up the problem

n_students <- 200

p_complete <- 0.35

# Step 2: Run simulations

set.seed(42) # Set the seed for reproducibility

n_simulations <- 10000

################START CODE HERE################

students_complete <- ??? # hint: use rbinom()

#################END CODE HERE#################

# Step 3: Create a histogram

hist(students_complete, main = "Online Stats Course: Daily Homework Completion", xlab = "Number of Students Completing Homework", col = "orange", border = "black", breaks = seq(min(students_complete), max(students_complete), 1))

⁠

The histogram displays the distribution of the number of students completing their homework in an online statistics course each day. The distribution appears to be symmetric and bell-shaped, indicating a normal-like distribution. Most of the students' homework completion rates cluster around the average, with fewer students completing homework as we move further away from the center.

⁠

Question 7.

In a fantasy game, a player can find rare gemstones in a cave with a 5% chance of success. The player is allowed to enter the cave 15 times per day. What is the probability that the player will find at least 3 gemstones in a day? Run a simulation to estimate the probability and create a histogram to visualize the distribution.

# Step 1: Set up the problem

n_tries <- 15

p_success <- 0.05

# Step 2: Run simulations

set.seed(42) # Set the seed for reproducibility

n_simulations <- 10000

################START CODE HERE################

gemstones_found <- ??? # hint: use rbinom()

#################END CODE HERE#################

# Step 3: Estimate the probability

prob_at_least_3_gemstones <- sum(gemstones_found >= 3) / n_simulations

# Step 4: Create a histogram

hist(gemstones_found, main = "Fantasy Game: Gemstones Found in a Day", xlab = "Number of Gemstones Found", col = "purple", border = "black", breaks = seq(min(gemstones_found), max(gemstones_found), 1))

⁠

The histogram illustrates the distribution of the number of gemstones found by a player in the fantasy game. The distribution is right-skewed, with a peak at 0 gemstones found. The frequency decreases as the number of gemstones found increases, making it less likely for the player to find more gemstones in a single day. However, there's still a chance of finding 3 or more gemstones, which can be considered a fortunate day for the player in their quest for rare treasures!

⁠

Question 8.

In the world of "Sleepy Scholars," college students observe their classmates during lectures. On average, they notice 5 classmates dozing off during a single lecture. The number of dozing students follows a Poisson distribution with a mean of 5. What is the probability of witnessing at least 8 students dozing off during a single lecture? Run a simulation to estimate the probability and create a histogram to illustrate the distribution.

Understand the problem: We need to find the probability of witnessing at least 8 students dozing off during a lecture, given that the number of dozing students follows a Poisson distribution with a mean of 5.

# Step 1: Calculate the exact probability using Poisson

lambda <- 5 # Mean of the Poisson distribution

k <- 8 # Number of dozing students

exact_prob <- 1 - ppois(k - 1, lambda)

# Step 3: Simulate the situation

n_simulations <- 10000

################START CODE HERE################

simulated_dozing_students <- ??? # hint: use rpois()

#################END CODE HERE#################

# Setp 4: Estimate the probability from the simulation

estimated_prob <- sum(simulated_dozing_students >= k) / n_simulations

# Step 5: Compare the exact and estimated probabilities

comparison_table <- data.frame(

Probabilities = c("Exact", "Estimated"),

Values = c(exact_prob, estimated_prob)

)

print(comparison_table)

# Step 6: Create a histogram of the simulated data:

hist(

simulated_dozing_students,

main = "Sleepy Scholars 😴",

xlab = "Number of Dozing Students",

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.