A tidyflow
is a container object that aggregates information required to
fit and predict from a model. This information might be the main dataset,
specified through plug_data()
, a recipe used in
preprocessing, specified through plug_recipe()
, or the model specification
to fit, specified through plug_model()
. However, it supports more
complex workflows, such as plug_split()
, plug_resample()
,
plug_grid()
.
tidyflow(data = NULL, seed = NULL)
A data frame or tibble used to begin the tidyflow. This is
optional as the data can be specified with plug_data()
.
A seed to be used for reproducibility across the complete workflow. This seed is used for every step, to ensure the same result when doing random splitting, resampling and model fitting.
A new tidyflow
object.
library(recipes)
library(rsample)
library(dials)
library(parsnip)
library(tune)
wflow <-
mtcars %>%
tidyflow(seed = 23113) %>%
plug_recipe(~ recipe(mpg ~ cyl, .x) %>% step_log(cyl))
# tidyflow gives a prinout of the current specification
# in the order of execution:
wflow
#> ══ Tidyflow ════════════════════════════════════════════════════════════════════
#> Data: 32 rows x 11 columns
#> Split: None
#> Recipe: available
#> Resample: None
#> Grid: None
#> Model: None
# The minimum tidyflow is: data + recipe/formula + model
wflow <-
wflow %>%
plug_model(set_engine(linear_reg(), "lm"))
# The output shows that we have the data, the recipe and the model
wflow
#> ══ Tidyflow ════════════════════════════════════════════════════════════════════
#> Data: 32 rows x 11 columns
#> Split: None
#> Recipe: available
#> Resample: None
#> Grid: None
#> Model:
#> Linear Regression Model Specification (regression)
#>
#> Computational engine: lm
#>
# We can fit that model and we get a brief print out of the model:
fit(wflow)
#> ══ Tidyflow [trained] ══════════════════════════════════════════════════════════
#> Data: 32 rows x 11 columns
#> Split: None
#> Recipe: available
#> Resample: None
#> Grid: None
#> Model:
#> Linear Regression Model Specification (regression)
#>
#> Computational engine: lm
#>
#> ══ Results ═════════════════════════════════════════════════════════════════════
#>
#>
#> Fitted model:
#>
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#>
#> Coefficients:
#>
#> ...
#> and 3 more lines.
# We can add further steps and the print out will tell you the
# workflow specification:
wflow <-
wflow %>%
plug_split(initial_split) %>%
plug_resample(vfold_cv, v = 2) %>%
plug_grid(grid_regular)
# The print out shows that we have a split/resample/grid
# now set correcly.
wflow
#> ══ Tidyflow ════════════════════════════════════════════════════════════════════
#> Data: 32 rows x 11 columns
#> Split: initial_split w/ default args
#> Recipe: available
#> Resample: vfold_cv w/ v = ~2
#> Grid: grid_regular w/ default args
#> Model:
#> Linear Regression Model Specification (regression)
#>
#> Computational engine: lm
#>