Fit a tidyflow object

# S3 method for tidyflow
fit(x, control = control_tidyflow(), ...)

Arguments

x: A tidyflow object
control: A control_tidyflow object with the options for the model. See control_tidyflow for all options and examples.
...: Not used right now

Value

tidyflow can return two type of results: a model object or a resamples. Depending on the specification of the tidyflow you can get either.

Here are all possible results:

Any tidyflow without a plug_resample or plug_grid specificiation, will always return a model. Since there is no grid search or resampling happening, fit will always return a tidyflow with a model. If needed, this model can be extracted with pull_tflow_fit.
Any tidyflow with either a plug_resample or a combination of plug_resample and plug_grid will always return a tidyflow with a resample object. The resample object can be extracted with pull_tflow_fit_tuning. If a tuning grid is supplied with plug_grid, one can finalize the best model with complete_tflow. See complete_tflow for examples on how to use it.

Details

Fitting a tidyflow currently involves several steps:

Check that there is at least a minimum tidyflow: data, the formula/recipe and the model
Execute each step in the tidyflow in this order:
- Apply the split passed by initial_split and extract only the training data.
- Apply the formula or the recipe to the training data. Whenever the user specifies a resample or a grid, the recipe is not applied. Instead, the recipe is passed to either fit_resamples or tune_grid and let these functions apply it.
- Apply the resample function to the training data. As described in the item above, the recipe is not applied to the training data previous to this step.
- Apply the grid function to the parameters defined in the parsnip model and the recipe. Alternatively, extract the arguments defined in plug_grid. This generates a grid of values to explore.
- Run the model/grid search/resample depending on the specification.
This is the sacred order used for execution in tidyflow. One can specify some of these steps and exclude others. tidyflow will generate errors accordingly if something is missing or needed. For example, one can create the combination of data + formula + resample + model and then fit. In a follow up, one can add a grid and the execution order will always be the one described above skipping any steps that are not specified.

Examples

if (FALSE) {
library(parsnip)
library(recipes)
library(tune)
library(dials)
library(rsample)

# Fit a simple linear model
model <- set_engine(linear_reg(), "lm")

formula_tidyflow <-
 mtcars %>%
 tidyflow() %>%
 plug_formula(mpg ~ cyl + log(disp)) %>%
 plug_model(model)

# The result is a model since we didn't specify any resample/grid
res <- fit(formula_tidyflow)

# You can extract the model fit if neede
res %>%
 pull_tflow_fit()

# Alternatively, we can add a split specification and
# predict on the training data automatically:
formula_tidyflow <-
 formula_tidyflow %>%
 plug_split(initial_split)

res2 <- fit(formula_tidyflow)

res2 %>%
 predict_training()

# This has the advantage that `predict_training` or `predict_testing` will
# apply the recipe/formula automatically for you:

recipe_tidyflow <-
 formula_tidyflow %>%
 drop_formula() %>% 
 plug_recipe(~ recipe(mpg ~ ., .x) %>% step_log(disp))

res3 <- fit(recipe_tidyflow)
res3 %>%
 predict_testing()

# We can accumulate steps and add a cross-validation and tuning grid.
# Fit a regularized regression through a grid search.
# Do this by updating the already defined model:
new_mod <- set_engine(linear_reg(penalty = tune(), mixture = tune()),
                      "glmnet")
tuned_res <-
  recipe_tidyflow %>%
  plug_resample(vfold_cv, v = 2) %>% 
  replace_model(new_mod) %>%
  plug_grid(grid_regular, levels = 2) %>%
  fit()

# Since we specified a resample/grid, the result is now a `tidyflow`
# with a resample object
tuned_res

# If needed, we can extract that resample:
tuned_res %>%
 pull_tflow_fit_tuning() %>%
 autoplot()

# When the model tuning is finished, `complete_tflow` can
# finalize the model with the best model. It can pick
# the best model for you.

tuned_res %>%
 complete_tflow(metric = "rmse") %>%
 predict_training()

# `complete_tflow` is powerful as it already applied the recipe
# and retrained the model on the entire training data with
# the best tuning parameter from the tuning grid.

# The power of this model building is that you can replace any step
# and rerun the fit:
bootstrap_res <-
 tuned_res %>%
 replace_resample(bootstraps, times = 2) %>%
 fit()

bootstrap_res %>%
 complete_tflow(metric = "rsq") %>%
 predict_training()
}