When building a model, preprocessing steps are often specific to a certain model. workflows
bundle preprocessing and parsnip
objects together, such that we can prep the data and fit the model with a single call to fit()
.
Initiate a workflow object with the workflow()
function. You can add a preprocessor and model in the initial function call or by using add_
methods (below).
wkf <- workflow(preprocessor = NULL, # pass in preprocessor here
spec = NULL) # pass in model here
There are two options for a workflow preprocessor:
add_formula()
allows you to pass in a formula, or you can specify outcomes and predictors directly with [add_variables()
](https://workflows.tidymodels.org/reference/add_variables.html.wkf1 <- wkf |>
add_formula(Species ~ .)
# identical to wkf1
# note the use of everything(), which will ignore variables already referenced
wkf |>
add_variables(outcomes = Species, predictors = everything())
## ══ Workflow ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════
## Preprocessor: Variables
## Model: None
##
## ── Preprocessor ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Outcomes: Species
## Predictors: everything()
add_recipe()
. This is more common. For this example, we will use the iris_recipe
defined in the Recipes Tutorial.
iris_wkf <- wkf |>
add_recipe(iris_recipe)
iris_wkf
## ══ Workflow ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: None
##
## ── Preprocessor ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
## 3 Recipe Steps
##
## • step_corr()
## • step_normalize()
## • step_rename()
Note that each add_x()
function has accompanying remove_x()
and update_x()
functions to allow for workflow modification.
All workflows must include a parsnip
model object. Add an unfitted model object using add_model()
. We will use the rf
random forest specification defined in the Parsnip Tutorial.
iris_wkf <- iris_wkf |>
add_model(rf)
Note: If you are using a model with a specialized formula, add it using the formula
argument of add_model()
. You must use add_model()
for this – special formulas cannot be specified with the add_formula()
function.
# GAMs have special syntax for formulas because of their smoothing functions
gam <- gen_additive_mod() |>
set_mode("regression") |>
set_engine("mgcv")
gam_wkf <- workflow() |>
# notice the lack of special syntax for the preprocessor formula
add_formula(Sepal.length ~ Species + Sepal.width) |>
add_model(gam,
# now using GAM specific syntax
formula = Sepal.length ~ Species + s(Sepal.width))
After building a workflow, use fit()
, predict()
, and augment()
just as you would with a parsnip
model in order to train the workflow and generate predictions. Input datasets should be raw, not baked by a recipe.
augment(<workflow>)
will bind predictions to the unbaked input data. If a recipe contains steps that alter row number, augment()
will error because the input and output datasets won’t have the same length.fit(<workflow>)
does not take a formula, as the workflow object contains a formula already.We will fit and predict the workflow using the data_split
object defined in the RSample Tutorial.
# fitting the model to the training data
iris_wkf <- iris_wkf |>
# Since the preprocessor is within the workflow, fit() only needs raw data
fit(training(data_split))
# using augment to generate predictions
# notice that augment now binds predictions to the unbaked input data!
iris_preds <- augment(iris_wkf, testing(data_split))
head(iris_preds)
## # A tibble: 6 × 10
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species row .pred_class .pred_setosa .pred_versicolor .pred_virginica
## <dbl> <dbl> <dbl> <dbl> <fct> <int> <fct> <dbl> <dbl> <dbl>
## 1 5.1 3.5 1.4 0.2 setosa 1 setosa 1 0 0
## 2 4.6 3.1 1.5 0.2 setosa 4 setosa 1 0 0
## 3 4.6 3.4 1.4 0.3 setosa 7 setosa 1 0 0
## 4 4.8 3 1.4 0.1 setosa 13 setosa 1 0 0
## 5 5.7 4.4 1.5 0.4 setosa 16 setosa 0.975 0.025 0
## 6 5.4 3.9 1.3 0.4 setosa 17 setosa 1 0 0
View information about a workflow object by calling it.
iris_wkf
## ══ Workflow [trained] ═══════════════════════════════════════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
##
## ── Preprocessor ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
## 3 Recipe Steps
##
## • step_corr()
## • step_normalize()
## • step_rename()
##
## ── Model ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
##
## Call:
## randomForest(x = maybe_data_frame(x), y = y, ntree = ~200)
## Type of random forest: classification
## Number of trees: 200
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 0.89%
## Confusion matrix:
## setosa versicolor virginica class.error
## setosa 39 0 0 0.00000000
## versicolor 0 38 0 0.00000000
## virginica 0 1 34 0.02857143
To retrieve specific information from a workflow, use an extraction method. Extraction methods are not just for Workflow
objects – there are also extractor methods for Parsnip
models, Tune
objects, and workflow sets. =
A complete list of extractors for workflows can be found here.
# extracting the (unprepped) preprocessor
extract_preprocessor(iris_wkf)
## Recipe
##
## Inputs:
##
## role #variables
## ID 1
## outcome 1
## predictor 4
##
## Operations:
##
## Correlation filter on all_numeric_predictors()
## Centering and scaling for all_numeric_predictors()
## Variable renaming for Row
#extracting the fitted parsnip model
extract_fit_parsnip(iris_wkf)
## parsnip model object
##
##
## Call:
## randomForest(x = maybe_data_frame(x), y = y, ntree = ~200)
## Type of random forest: classification
## Number of trees: 200
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 0.89%
## Confusion matrix:
## setosa versicolor virginica class.error
## setosa 39 0 0 0.00000000
## versicolor 0 38 0 0.00000000
## virginica 0 1 34 0.02857143