vignettes/perc_warning_example.Rmd
perc_warning_example.Rmd
While the other vignette shows you how to use perccalc
appropriately, there are instances where there’s just too few categories to estimate percentiles properly. Imagine estimating a distribution of 1:100
percentiles with only three ordered categories, it just sounds too far fetched.
Let’s load our packages.
For example, take the survey
data on smoking habits.
smoking_data <-
MASS::survey %>% # you will need to install the MASS package
as_tibble() %>%
select(Sex, Smoke, Pulse) %>%
rename(
gender = Sex,
smoke = Smoke,
pulse_rate = Pulse
)
The final results is this dataset:
## # A tibble: 237 x 3
## gender smoke pulse_rate
## <fct> <fct> <int>
## 1 Male Never 35
## 2 Female Never 40
## 3 Female Never 48
## 4 Male Never 48
## 5 Female Never 50
## 6 Female Regul 50
## 7 Male Regul 54
## 8 Male Never 55
## 9 Male Never 56
## 10 Male Never 59
## # … with 227 more rows
Note that there’s only four categories in the smoke
variable. Let’s try to estimate the percentile difference.
smoking_data <-
smoking_data %>%
mutate(smoke = factor(smoke,
levels = c("Never", "Occas", "Regul", "Heavy"),
ordered = TRUE))
perc_diff(smoking_data, smoke, pulse_rate)
## Warning in perc_diff_(data_model = data_model, categorical_var =
## categorical_var, : Too few categories in categorical variable to estimate the
## variance-covariance matrix and standard errors. Proceeding without estimated
## standard errors but perhaps you should increase the number of categories
## difference se
## 390.6092 NA
perc_diff
returns the estimated coefficient but also warns you that it’s difficult for the function to estimate the standard error. This happens similarly for perc_dist
.
## Warning in perc_dist(smoking_data, smoke, pulse_rate): Too few categories in
## categorical variable to estimate the variance-covariance matrix and standard
## errors. Proceeding without estimated standard errors but perhaps you should
## increase the number of categories
## # A tibble: 6 x 2
## percentile estimate
## <int> <dbl>
## 1 1 24.5
## 2 2 48.4
## 3 3 71.7
## 4 4 94.3
## 5 5 116.
## 6 6 138.