Create a tidyflow — tidyflow • tidyflow

A tidyflow is a container object that aggregates information required to fit and predict from a model. This information might be the main dataset, specified through plug_data(), a recipe used in preprocessing, specified through plug_recipe(), or the model specification to fit, specified through plug_model(). However, it supports more complex workflows, such as plug_split(), plug_resample(), plug_grid().

tidyflow(data = NULL, seed = NULL)

Arguments

data: A data frame or tibble used to begin the tidyflow. This is optional as the data can be specified with plug_data().
seed: A seed to be used for reproducibility across the complete workflow. This seed is used for every step, to ensure the same result when doing random splitting, resampling and model fitting.

Value

A new tidyflow object.

Examples

library(recipes)
library(rsample)
library(dials)
library(parsnip)
library(tune)

wflow <-
 mtcars %>%
 tidyflow(seed = 23113) %>%
 plug_recipe(~ recipe(mpg ~ cyl, .x) %>% step_log(cyl))

# tidyflow gives a prinout of the current specification
# in the order of execution:
wflow
#> ══ Tidyflow ════════════════════════════════════════════════════════════════════
#> Data: 32 rows x 11 columns
#> Split: None
#> Recipe: available
#> Resample: None
#> Grid: None
#> Model: None

# The minimum tidyflow is: data + recipe/formula + model
wflow <-
 wflow %>%
 plug_model(set_engine(linear_reg(), "lm"))

# The output shows that we have the data, the recipe and the model
wflow
#> ══ Tidyflow ════════════════════════════════════════════════════════════════════
#> Data: 32 rows x 11 columns
#> Split: None
#> Recipe: available
#> Resample: None
#> Grid: None
#> Model:
#> Linear Regression Model Specification (regression)
#> 
#> Computational engine: lm 
#> 

# We can fit that model and we get a brief print out of the model:
fit(wflow)
#> ══ Tidyflow [trained] ══════════════════════════════════════════════════════════
#> Data: 32 rows x 11 columns
#> Split: None
#> Recipe: available
#> Resample: None
#> Grid: None
#> Model:
#> Linear Regression Model Specification (regression)
#> 
#> Computational engine: lm 
#> 
#> ══ Results ═════════════════════════════════════════════════════════════════════
#> 
#> 
#> Fitted model:
#> 
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#> 
#> Coefficients:
#> 
#> ...
#> and 3 more lines.

# We can add further steps and the print out will tell you the
# workflow specification:
wflow <-
 wflow %>%
 plug_split(initial_split) %>%
 plug_resample(vfold_cv, v = 2) %>%
 plug_grid(grid_regular)

# The print out shows that we have a split/resample/grid
# now set correcly.
wflow
#> ══ Tidyflow ════════════════════════════════════════════════════════════════════
#> Data: 32 rows x 11 columns
#> Split: initial_split w/ default args
#> Recipe: available
#> Resample: vfold_cv w/ v = ~2
#> Grid: grid_regular w/ default args
#> Model:
#> Linear Regression Model Specification (regression)
#> 
#> Computational engine: lm 
#>