Fit a tidyflow object
# S3 method for tidyflow
fit(x, control = control_tidyflow(), ...)A tidyflow object
A control_tidyflow object with the options
for the model. See control_tidyflow for all options and
examples.
Not used right now
tidyflow can return two type of results: a model object or a resamples.
Depending on the specification of the tidyflow you can get either.
Here are all possible results:
Any tidyflow without a plug_resample or
plug_grid specificiation, will always return a model. Since
there is no grid search or resampling happening, fit will always return
a tidyflow with a model. If needed, this model can be extracted with
pull_tflow_fit.
Any tidyflow with either a plug_resample or
a combination of plug_resample and plug_grid
will always return a tidyflow with a resample object. The resample
object can be extracted with pull_tflow_fit_tuning. If a
tuning grid is supplied with plug_grid, one can finalize
the best model with complete_tflow. See
complete_tflow for examples on how to use it.
Fitting a tidyflow currently involves several steps:
Check that there is at least a minimum tidyflow: data, the formula/recipe and the model
Execute each step in the tidyflow in this order:
Apply the split passed by initial_split and
extract only the training data.
Apply the formula or the recipe to the training data. Whenever the
user specifies a resample or a grid, the recipe is not applied. Instead,
the recipe is passed to either fit_resamples or
tune_grid and let these functions apply it.
Apply the resample function to the training data. As described in the item above, the recipe is not applied to the training data previous to this step.
Apply the grid function to the parameters defined in the
parsnip model and the recipe. Alternatively, extract
the arguments defined in plug_grid. This generates a
grid of values to explore.
Run the model/grid search/resample depending on the specification.
This is the sacred order used for execution in tidyflow. One can
specify some of these steps and exclude others. tidyflow will
generate errors accordingly if something is missing or needed. For example,
one can create the combination of data + formula + resample + model and then
fit. In a follow up, one can add a grid and the execution order will always
be the one described above skipping any steps that are not specified.
if (FALSE) {
library(parsnip)
library(recipes)
library(tune)
library(dials)
library(rsample)
# Fit a simple linear model
model <- set_engine(linear_reg(), "lm")
formula_tidyflow <-
mtcars %>%
tidyflow() %>%
plug_formula(mpg ~ cyl + log(disp)) %>%
plug_model(model)
# The result is a model since we didn't specify any resample/grid
res <- fit(formula_tidyflow)
# You can extract the model fit if neede
res %>%
pull_tflow_fit()
# Alternatively, we can add a split specification and
# predict on the training data automatically:
formula_tidyflow <-
formula_tidyflow %>%
plug_split(initial_split)
res2 <- fit(formula_tidyflow)
res2 %>%
predict_training()
# This has the advantage that `predict_training` or `predict_testing` will
# apply the recipe/formula automatically for you:
recipe_tidyflow <-
formula_tidyflow %>%
drop_formula() %>%
plug_recipe(~ recipe(mpg ~ ., .x) %>% step_log(disp))
res3 <- fit(recipe_tidyflow)
res3 %>%
predict_testing()
# We can accumulate steps and add a cross-validation and tuning grid.
# Fit a regularized regression through a grid search.
# Do this by updating the already defined model:
new_mod <- set_engine(linear_reg(penalty = tune(), mixture = tune()),
"glmnet")
tuned_res <-
recipe_tidyflow %>%
plug_resample(vfold_cv, v = 2) %>%
replace_model(new_mod) %>%
plug_grid(grid_regular, levels = 2) %>%
fit()
# Since we specified a resample/grid, the result is now a `tidyflow`
# with a resample object
tuned_res
# If needed, we can extract that resample:
tuned_res %>%
pull_tflow_fit_tuning() %>%
autoplot()
# When the model tuning is finished, `complete_tflow` can
# finalize the model with the best model. It can pick
# the best model for you.
tuned_res %>%
complete_tflow(metric = "rmse") %>%
predict_training()
# `complete_tflow` is powerful as it already applied the recipe
# and retrained the model on the entire training data with
# the best tuning parameter from the tuning grid.
# The power of this model building is that you can replace any step
# and rerun the fit:
bootstrap_res <-
tuned_res %>%
replace_resample(bootstraps, times = 2) %>%
fit()
bootstrap_res %>%
complete_tflow(metric = "rsq") %>%
predict_training()
}