plug_recipe()
specifies the type of recipe used in the analysis. It
accepts a function .f
that will be applied to the data. Only
functions which return a recipe
object will be allowed. See
package recipes
for how to create a recipe.
drop_recipe()
removes the recipe function from the tidyflow. Note
that it keeps other preprocessing steps such as the split and resample.
replace_recipe()
first removes the recipe function, then adds the new
recipe function. Any model that has already been fit based on this
recipe will need to be refit.
plug_recipe(x, .f, ..., blueprint = NULL)
drop_recipe(x)
replace_recipe(x, .f, ..., blueprint = NULL)
A tidyflow
A function or a formula with a recipe inside. See the details section.
Not used.
A hardhat blueprint used for fine tuning the preprocessing.
If NULL
, hardhat::default_recipe_blueprint()
is used.
The tidyflow x
, updated with either a new or removed recipe function.
To fit a tidyflow, one of plug_formula()
or plug_recipe()
must be
specified, but not both.
.f
can be either a function or a formula. In either case, both
things should have only one argument and return the recipe applied to
the only argument, which is assumed to be the data.
If a function is supplied, it is assumed that there is one argument and that argument is for the data. The output should be the recipe applied to the main argument. The function is used as is.
If a formula, e.g. ~ recipe(mpg ~ cyl, data = .)
, it is
converted to a function. It is also assumed that the first argument in the
recipe function is passed to the data. Other arguments will be ignored.
If a formula, the argument name can be either .
or .x
. See the
examples section for more details.
Since the recipe step in a tidyflow
is not the ideal step for
exploration, we suggest that the user constructs the recipe outside
the tidyflow
and applies it to the data beforehand, just to make sure
it works. After making sure the recipe can be fitted without errors, the user
can provide the function or formula for the recipe. Defining a recipe without
testing on the data can lead to errors on recipe
that are best fixed
in an interactive fashion.
library(recipes)
library(parsnip)
# Passing a function to `plug_recipe`
recipe_fun <- function(.x) {
recipe(mpg ~ ., data = .x) %>%
step_center(all_predictors()) %>%
step_scale(all_predictors())
}
# Let's make sure that it works with the data first
recipe_fun(mtcars)
#> Recipe
#>
#> Inputs:
#>
#> role #variables
#> outcome 1
#> predictor 10
#>
#> Operations:
#>
#> Centering for all_predictors()
#> Scaling for all_predictors()
# Specify the function to be applied to the data in `plug_recipe`
tflow <-
mtcars %>%
tidyflow() %>%
plug_recipe(recipe_fun) %>%
plug_model(set_engine(linear_reg(), "lm"))
# Fit the model
fit(tflow)
#> ══ Tidyflow [trained] ══════════════════════════════════════════════════════════
#> Data: 32 rows x 11 columns
#> Split: None
#> Recipe: available
#> Resample: None
#> Grid: None
#> Model:
#> Linear Regression Model Specification (regression)
#>
#> Computational engine: lm
#>
#> ══ Results ═════════════════════════════════════════════════════════════════════
#>
#>
#> Fitted model:
#>
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#>
#> Coefficients:
#>
#> ...
#> and 5 more lines.
# Specify a formula of a recipe. Remove the old one and specify one on the
# fly:
tflow %>%
replace_recipe(~ recipe(mpg ~ cyl, data = .) %>% step_log(cyl, base = 10)) %>%
fit()
#> ══ Tidyflow [trained] ══════════════════════════════════════════════════════════
#> Data: 32 rows x 11 columns
#> Split: None
#> Recipe: available
#> Resample: None
#> Grid: None
#> Model:
#> Linear Regression Model Specification (regression)
#>
#> Computational engine: lm
#>
#> ══ Results ═════════════════════════════════════════════════════════════════════
#>
#>
#> Fitted model:
#>
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#>
#> Coefficients:
#>
#> ...
#> and 3 more lines.
# Note how the function argument can be either `.` or `.x`
tflow %>%
replace_recipe(~ {
.x %>%
recipe(mpg ~ cyl + am) %>%
step_log(cyl, base = 10) %>%
step_mutate(am = factor(am)) %>%
step_dummy(am)
}) %>%
fit()
#> ══ Tidyflow [trained] ══════════════════════════════════════════════════════════
#> Data: 32 rows x 11 columns
#> Split: None
#> Recipe: available
#> Resample: None
#> Grid: None
#> Model:
#> Linear Regression Model Specification (regression)
#>
#> Computational engine: lm
#>
#> ══ Results ═════════════════════════════════════════════════════════════════════
#>
#>
#> Fitted model:
#>
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#>
#> Coefficients:
#>
#> ...
#> and 3 more lines.