Calculate a distribution of percentiles from an ordered categorical variable and a continuous variable.

perc_dist(data_model, categorical_var, continuous_var, weights = NULL)

Arguments

data_model	A data frame with at least the categorical and continuous variables from which to estimate the percentiles
categorical_var	The bare unquoted name of the categorical variable. This variable should be an ordered factor. If not, will raise an error.
continuous_var	The bare unquoted name of the continuous variable from which to estimate the percentiles
weights	The bare unquoted name of the optional weight variable. If not specified, then equal weights are assumed.

Value

A data frame with the scores and standard errors for each percentile

Details

perc_dist drops missing observations silently for calculating the linear combination of coefficients.

Examples


set.seed(23131)
N <- 1000
K <- 20

toy_data <- data.frame(id = 1:N,
                       score = rnorm(N, sd = 2),
                       type = rep(paste0("inc", 1:20), each = N/K),
                       wt = 1)


# perc_diff(toy_data, type, score)
# type is not an ordered factor!

toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE)

perc_dist(toy_data, type, score)
#> # A tibble: 100 x 3
#>    percentile estimate std.error
#>         <int>    <dbl>     <dbl>
#>  1          1   0.0116    0.0182
#>  2          2   0.0222    0.0356
#>  3          3   0.0320    0.0522
#>  4          4   0.0408    0.0680
#>  5          5   0.0488    0.0830
#>  6          6   0.0559    0.0973
#>  7          7   0.0622    0.111 
#>  8          8   0.0677    0.124 
#>  9          9   0.0724    0.136 
#> 10         10   0.0764    0.147 
#> # … with 90 more rows