library(tidyverse)
library(ggplot2)
# How many simulations to run?
<- 10000
NumberOfSims <- 500 - 14
NumberofBalls
set.seed(123) # set the seed for the random number generator - this makes sure the results are reproducible when we are debugging
# create blank vectors to store values
<- vector()
vec <- vector()
vec2 <- vector()
vec3 <- vector()
vec4 theme_set(theme_minimal())
Problem
Allan Rossman used to live along a golf course and collected the golf balls that landed in his yard. Most of these golf balls had a number on them. Allan tallied the numbers on the first 500 golf balls that landed in his yard one summer.
Specifically. he collected the following data: 137 golf balls numbered 1 138 golf balls numbered 2 107 golf balls numbered 3 104 golf balls numbered 4 14 “others” (Either with a different number, or no number at all. We will ignore these other balls for the purposes of this question.)
Question: What is the distribution of these numbers? In particular, are the numbers 1, 2, 3, and 4 equally likely?
My solutions
Simulation
# compute maximum frequency for each simlation
for (j in 1:NumberOfSims) {
for (i in 1:NumberofBalls) {
<- sample(1:4, 1)
vec[i]
}<- max(table(vec))
vec2[j]
}
# compute minimum frequency for each simlation
for (j in 1:NumberOfSims) {
for (i in 1:NumberofBalls) {
<- sample(1:4, 1)
vec[i]
}<- min(table(vec))
vec3[j]
}
# compute range of frequency for each simlation
<- vec2 - vec3
range
# compute variance of frequency for each simlation
for (j in 1:NumberOfSims) {
for (i in 1:NumberofBalls) {
<- sample(1:4, 1)
vec[i]
}<- var(table(vec))
vec4[j]
}
<- cbind(vec2, vec3, vec4, range) %>% as.data.frame()
df
colnames(df) <- c("max", "min", "variance", "range")
dim(df)
[1] 10000 4
head(df)
max min variance range
1 136 112 11.66667 24
2 136 117 268.33333 19
3 138 116 43.00000 22
4 133 108 95.00000 25
5 129 116 73.66667 13
6 137 111 107.00000 26
Minimum frequency
# observed vector
<- c(137, 138, 107, 104)
obs
# calculate test statistics
min(obs)
[1] 104
# calculate p-value
<- df %>%
a filter(min > min(obs)) %>%
nrow()
<- 1 - a / NumberOfSims
pvalue pvalue
[1] 0.1481
ggplot(aes(x = min), data = df) +
geom_histogram(binwidth = 2) +
geom_vline(xintercept = min(obs), size = 1.4, color = "#AFAFFF") +
annotate("text", x = min(obs) - 5, y = 1000, label = " test statistics = 104 \n pvalue = 0.1481", size = 4)
Variance
# calculate test statistics
var(obs)
[1] 343
# calculate p-value
<- df %>%
a filter(variance > var(obs)) %>%
nrow()
<- a / NumberOfSims
pvalue pvalue
[1] 0.0336
# draw the graph
ggplot(aes(x = variance), data = df) +
geom_histogram() +
geom_vline(xintercept = var(obs), size = 1.4, color = "#AFAFFF") +
annotate("text", x = var(obs) + 150, y = 600, label = " test statistics = 343 \n pvalue = 0.0336", size = 4)
Range
# calculate test statistics
max(obs) - min(obs)
[1] 34
# calculate p-value
<- df %>%
a filter(range > max(obs) - min(obs)) %>%
nrow()
<- a / NumberOfSims
pvalue pvalue
[1] 0.0742
# draw the graph
ggplot(aes(x = range), data = df) +
geom_histogram() +
geom_vline(xintercept = max(obs) - min(obs), size = 1.4, color = "#AFAFFF") +
annotate("text", x = max(obs) - min(obs) + 8, y = 600, label = " test statistics = 34 \n pvalue = 0.0742", size = 4)
I tried 3 test statistics using simulation-based hypothesis tests. My null hypothesis here is that the numbers 1, 2, 3, and 4 distribute equally. My alternative hypothesis here is that the numbers 1, 2, 3, and 4 do not distribute equally.
Using minimum frequency of ball number among 486 balls as the test statistics, we simulated 10000 times and made a histogram for these 10000 minimum frequency. Our observed test statistics = 104 and our pvalue = 0.1481. Thus, with the significance level of 0.05, we fail to reject the null hypothesis that the numbers 1, 2, 3, and 4 distribute equally likely.
Using variance of the frequency of the numbers 1, 2, 3, and 4 as the test statistics, we simulated 10000 times and made a histogram for these 10000 minimum frequency. Our observed test statistics = 343 and our pvalue = 0.0336. Thus, with the significance level of 0.05, we reject the null hypothesis and conclude that the numbers 1, 2, 3, and 4 do not distribute equally.
Using range of the frequency of the numbers 1, 2, 3, and 4 as the test statistics, we simulated 10000 times and made a histogram for these 10000 minimum frequency. Our observed test statistics = 34 and our pvalue = 0.0742 Thus, with the significance level of 0.05, we fail to reject the null hypothesis that the numbers 1, 2, 3, and 4 distribute equally.
Reuse
Citation
@online{xu2020,
author = {Keren Xu},
editor = {},
title = {Golf {Balls} {Simulation}},
date = {2020-02-02},
url = {https://xukeren.github.io//posts/2020-02-02-golf-balls-simulation},
langid = {en}
}