Predict from a tidyflow — predict-tidyflow • tidyflow

predict() method for a fitted tidyflow object. This method can be applied to new data and if a recipe is defined in the tidyflow, the steps are applied to the new data. Alternatively, when a split is specified, predict_training and predict_testing automatically predict and apply any preprocessing to the training and testing data.

# S3 method for tidyflow
predict(object, new_data, type = NULL, opts = list(), ...)

predict_training(object, type = NULL, opts = list(), ...)

predict_testing(object, type = NULL, opts = list(), ...)

Arguments

object

A tidyflow that has been fitted by fit.tidyflow()

new_data

A data frame containing the new predictors to preprocess and predict on. Usually, this would be extracted from the tidyflow with pull_tflow_testing or pull_tflow_training. Note that predict.tidyflow already applies the recipe or formula automatically. It is not advised to preprocess the newdata before passing it to predict.tidyflow.

type

A single character value or NULL. Possible values are "numeric", "class", "prob", "conf_int", "pred_int", "quantile", or "raw". When NULL, predict() will choose an appropriate value based on the model's mode.

opts

A list of optional arguments to the underlying predict function that will be used when type = "raw". The list should not include options for the model object or the new data being predicted.

...

Arguments to the underlying model's prediction function cannot be passed here (see opts). There are some parsnip related options that can be passed, depending on the value of type. Possible arguments are:

level: for types of "conf_int" and "pred_int" this is the parameter for the tail area of the intervals (e.g. confidence level for confidence intervals). Default value is 0.95.
std_error: add the standard error of fit or prediction (on the scale of the linear predictors) for types of "conf_int" and "pred_int". Default value is FALSE.
quantile: the quantile(s) for quantile regression (not implemented yet)
time: the time(s) for hazard probability estimates (not implemented yet)

Value

A data frame of model predictions, with as many rows as new_data has.

Examples

if (FALSE) {
library(parsnip)
library(recipes)
library(rsample)
library(dials)
library(tune)

model <- set_engine(linear_reg(), "lm")

tflow <-
 mtcars %>%
 tidyflow() %>%
 plug_split(initial_split) %>%
 plug_model(model) %>%
 plug_recipe(~ recipe(mpg ~ cyl + disp, .) %>% step_log(disp))

tflow <- fit(tflow)

# This will automatically `bake()` the recipe on `new_data`,
# applying the log step to `disp`, and then fit the regression.
predict(tflow, new_data = pull_tflow_testing(tflow))

# When a split has been specified through `plug_split`,
# predict_training/predict_testing automatically extract
# everything and applies the recip/formula:
predict_testing(tflow)
predict_training(tflow)

# When a grid search has been performed, the user needs to
# finalize the model through complete_tflow and then
# predict/predict_training/predict_testing will work.
res <-
 tflow %>%
 # Adds a grid search for the polynomials of qsec
 replace_recipe(~ recipe(mpg ~ ., data = .) %>% step_ns(hp, deg_free = tune())) %>%
 plug_resample(vfold_cv, v = 2) %>% 
 plug_grid(grid_regular, levels = 1) %>%
 fit()

# We can complete the tidyflow by fitting the best model
# based on the RMSE metric and then predict:
res %>%
 complete_tflow(metric = "rmse") %>%
 predict_training()

# In short, to be able to predict, you need to have either a single model
# or a finalized tuning grid with `complete_tflow`.
}