Fit a tidyflow object

# S3 method for tidyflow
fit(x, control = control_tidyflow(), ...)

Arguments

x

A tidyflow object

control

A control_tidyflow object with the options for the model. See control_tidyflow for all options and examples.

...

Not used right now

Value

tidyflow can return two type of results: a model object or a resamples. Depending on the specification of the tidyflow you can get either.

Here are all possible results:

Details

Fitting a tidyflow currently involves several steps:

  • Check that there is at least a minimum tidyflow: data, the formula/recipe and the model

  • Execute each step in the tidyflow in this order:

    • Apply the split passed by initial_split and extract only the training data.

    • Apply the formula or the recipe to the training data. Whenever the user specifies a resample or a grid, the recipe is not applied. Instead, the recipe is passed to either fit_resamples or tune_grid and let these functions apply it.

    • Apply the resample function to the training data. As described in the item above, the recipe is not applied to the training data previous to this step.

    • Apply the grid function to the parameters defined in the parsnip model and the recipe. Alternatively, extract the arguments defined in plug_grid. This generates a grid of values to explore.

    • Run the model/grid search/resample depending on the specification.

  • This is the sacred order used for execution in tidyflow. One can specify some of these steps and exclude others. tidyflow will generate errors accordingly if something is missing or needed. For example, one can create the combination of data + formula + resample + model and then fit. In a follow up, one can add a grid and the execution order will always be the one described above skipping any steps that are not specified.

Examples

if (FALSE) {
library(parsnip)
library(recipes)
library(tune)
library(dials)
library(rsample)

# Fit a simple linear model
model <- set_engine(linear_reg(), "lm")

formula_tidyflow <-
 mtcars %>%
 tidyflow() %>%
 plug_formula(mpg ~ cyl + log(disp)) %>%
 plug_model(model)

# The result is a model since we didn't specify any resample/grid
res <- fit(formula_tidyflow)

# You can extract the model fit if neede
res %>%
 pull_tflow_fit()

# Alternatively, we can add a split specification and
# predict on the training data automatically:
formula_tidyflow <-
 formula_tidyflow %>%
 plug_split(initial_split)

res2 <- fit(formula_tidyflow)

res2 %>%
 predict_training()

# This has the advantage that `predict_training` or `predict_testing` will
# apply the recipe/formula automatically for you:

recipe_tidyflow <-
 formula_tidyflow %>%
 drop_formula() %>% 
 plug_recipe(~ recipe(mpg ~ ., .x) %>% step_log(disp))

res3 <- fit(recipe_tidyflow)
res3 %>%
 predict_testing()

# We can accumulate steps and add a cross-validation and tuning grid.
# Fit a regularized regression through a grid search.
# Do this by updating the already defined model:
new_mod <- set_engine(linear_reg(penalty = tune(), mixture = tune()),
                      "glmnet")
tuned_res <-
  recipe_tidyflow %>%
  plug_resample(vfold_cv, v = 2) %>% 
  replace_model(new_mod) %>%
  plug_grid(grid_regular, levels = 2) %>%
  fit()

# Since we specified a resample/grid, the result is now a `tidyflow`
# with a resample object
tuned_res

# If needed, we can extract that resample:
tuned_res %>%
 pull_tflow_fit_tuning() %>%
 autoplot()

# When the model tuning is finished, `complete_tflow` can
# finalize the model with the best model. It can pick
# the best model for you.

tuned_res %>%
 complete_tflow(metric = "rmse") %>%
 predict_training()

# `complete_tflow` is powerful as it already applied the recipe
# and retrained the model on the entire training data with
# the best tuning parameter from the tuning grid.

# The power of this model building is that you can replace any step
# and rerun the fit:
bootstrap_res <-
 tuned_res %>%
 replace_resample(bootstraps, times = 2) %>%
 fit()

bootstrap_res %>%
 complete_tflow(metric = "rsq") %>%
 predict_training()
}