Fit a tidyflow object
# S3 method for tidyflow
fit(x, control = control_tidyflow(), ...)
A tidyflow object
A control_tidyflow
object with the options
for the model. See control_tidyflow
for all options and
examples.
Not used right now
tidyflow
can return two type of results: a model object or a resamples.
Depending on the specification of the tidyflow
you can get either.
Here are all possible results:
Any tidyflow
without a plug_resample
or
plug_grid
specificiation, will always return a model. Since
there is no grid search or resampling happening, fit
will always return
a tidyflow with a model. If needed, this model can be extracted with
pull_tflow_fit
.
Any tidyflow
with either a plug_resample
or
a combination of plug_resample
and plug_grid
will always return a tidyflow
with a resample object. The resample
object can be extracted with pull_tflow_fit_tuning
. If a
tuning grid is supplied with plug_grid
, one can finalize
the best model with complete_tflow
. See
complete_tflow
for examples on how to use it.
Fitting a tidyflow currently involves several steps:
Check that there is at least a minimum tidyflow: data, the formula/recipe and the model
Execute each step in the tidyflow
in this order:
Apply the split passed by initial_split
and
extract only the training data.
Apply the formula or the recipe to the training data. Whenever the
user specifies a resample or a grid, the recipe is not applied. Instead,
the recipe is passed to either fit_resamples
or
tune_grid
and let these functions apply it.
Apply the resample function to the training data. As described in the item above, the recipe is not applied to the training data previous to this step.
Apply the grid function to the parameters defined in the
parsnip
model and the recipe
. Alternatively, extract
the arguments defined in plug_grid
. This generates a
grid of values to explore.
Run the model/grid search/resample depending on the specification.
This is the sacred order used for execution in tidyflow
. One can
specify some of these steps and exclude others. tidyflow
will
generate errors accordingly if something is missing or needed. For example,
one can create the combination of data + formula + resample + model and then
fit. In a follow up, one can add a grid and the execution order will always
be the one described above skipping any steps that are not specified.
if (FALSE) {
library(parsnip)
library(recipes)
library(tune)
library(dials)
library(rsample)
# Fit a simple linear model
model <- set_engine(linear_reg(), "lm")
formula_tidyflow <-
mtcars %>%
tidyflow() %>%
plug_formula(mpg ~ cyl + log(disp)) %>%
plug_model(model)
# The result is a model since we didn't specify any resample/grid
res <- fit(formula_tidyflow)
# You can extract the model fit if neede
res %>%
pull_tflow_fit()
# Alternatively, we can add a split specification and
# predict on the training data automatically:
formula_tidyflow <-
formula_tidyflow %>%
plug_split(initial_split)
res2 <- fit(formula_tidyflow)
res2 %>%
predict_training()
# This has the advantage that `predict_training` or `predict_testing` will
# apply the recipe/formula automatically for you:
recipe_tidyflow <-
formula_tidyflow %>%
drop_formula() %>%
plug_recipe(~ recipe(mpg ~ ., .x) %>% step_log(disp))
res3 <- fit(recipe_tidyflow)
res3 %>%
predict_testing()
# We can accumulate steps and add a cross-validation and tuning grid.
# Fit a regularized regression through a grid search.
# Do this by updating the already defined model:
new_mod <- set_engine(linear_reg(penalty = tune(), mixture = tune()),
"glmnet")
tuned_res <-
recipe_tidyflow %>%
plug_resample(vfold_cv, v = 2) %>%
replace_model(new_mod) %>%
plug_grid(grid_regular, levels = 2) %>%
fit()
# Since we specified a resample/grid, the result is now a `tidyflow`
# with a resample object
tuned_res
# If needed, we can extract that resample:
tuned_res %>%
pull_tflow_fit_tuning() %>%
autoplot()
# When the model tuning is finished, `complete_tflow` can
# finalize the model with the best model. It can pick
# the best model for you.
tuned_res %>%
complete_tflow(metric = "rmse") %>%
predict_training()
# `complete_tflow` is powerful as it already applied the recipe
# and retrained the model on the entire training data with
# the best tuning parameter from the tuning grid.
# The power of this model building is that you can replace any step
# and rerun the fit:
bootstrap_res <-
tuned_res %>%
replace_resample(bootstraps, times = 2) %>%
fit()
bootstrap_res %>%
complete_tflow(metric = "rsq") %>%
predict_training()
}