R/perc_dist.R
perc_dist.Rd
Calculate a distribution of percentiles from an ordered categorical variable and a continuous variable.
perc_dist(data_model, categorical_var, continuous_var, weights = NULL)
data_model | A data frame with at least the categorical and continuous variables from which to estimate the percentiles |
---|---|
categorical_var | The bare unquoted name of the categorical variable. This variable should be an ordered factor. If not, will raise an error. |
continuous_var | The bare unquoted name of the continuous variable from which to estimate the percentiles |
weights | The bare unquoted name of the optional weight variable. If not specified, then equal weights are assumed. |
A data frame with the scores and standard errors for each percentile
perc_dist
drops missing observations silently for calculating
the linear combination of coefficients.
set.seed(23131) N <- 1000 K <- 20 toy_data <- data.frame(id = 1:N, score = rnorm(N, sd = 2), type = rep(paste0("inc", 1:20), each = N/K), wt = 1) # perc_diff(toy_data, type, score) # type is not an ordered factor! toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE) perc_dist(toy_data, type, score)#> # A tibble: 100 x 3 #> percentile estimate std.error #> <int> <dbl> <dbl> #> 1 1 0.0116 0.0182 #> 2 2 0.0222 0.0356 #> 3 3 0.0320 0.0522 #> 4 4 0.0408 0.0680 #> 5 5 0.0488 0.0830 #> 6 6 0.0559 0.0973 #> 7 7 0.0622 0.111 #> 8 8 0.0677 0.124 #> 9 9 0.0724 0.136 #> 10 10 0.0764 0.147 #> # … with 90 more rows