When building a model, preprocessing steps are often specific to a certain model. workflows bundle preprocessing and parsnip objects together, such that we can prep the data and fit the model with a single call to fit().
Initiate a workflow object with the workflow() function. You can add a preprocessor and model in the initial function call or by using add_ methods (below).
wkf <- workflow(preprocessor = NULL, # pass in preprocessor here
spec = NULL) # pass in model here
There are two options for a workflow preprocessor:
add_formula() allows you to pass in a formula, or you can specify outcomes and predictors directly with [add_variables()](https://workflows.tidymodels.org/reference/add_variables.html.wkf1 <- wkf |>
add_formula(Species ~ .)
# identical to wkf1
# note the use of everything(), which will ignore variables already referenced
wkf |>
add_variables(outcomes = Species, predictors = everything())
## ══ Workflow ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════
## Preprocessor: Variables
## Model: None
##
## ── Preprocessor ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Outcomes: Species
## Predictors: everything()
add_recipe(). This is more common. For this example, we will use the iris_recipe defined in the Recipes Tutorial.
iris_wkf <- wkf |>
add_recipe(iris_recipe)
iris_wkf
## ══ Workflow ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: None
##
## ── Preprocessor ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
## 3 Recipe Steps
##
## • step_corr()
## • step_normalize()
## • step_rename()
Note that each add_x() function has accompanying remove_x() and update_x() functions to allow for workflow modification.
All workflows must include a parsnip model object. Add an unfitted model object using add_model(). We will use the rf random forest specification defined in the Parsnip Tutorial.
iris_wkf <- iris_wkf |>
add_model(rf)
Note: If you are using a model with a specialized formula, add it using the formula argument of add_model(). You must use add_model() for this – special formulas cannot be specified with the add_formula() function.
# GAMs have special syntax for formulas because of their smoothing functions
gam <- gen_additive_mod() |>
set_mode("regression") |>
set_engine("mgcv")
gam_wkf <- workflow() |>
# notice the lack of special syntax for the preprocessor formula
add_formula(Sepal.length ~ Species + Sepal.width) |>
add_model(gam,
# now using GAM specific syntax
formula = Sepal.length ~ Species + s(Sepal.width))
After building a workflow, use fit(), predict(), and augment() just as you would with a parsnip model in order to train the workflow and generate predictions. Input datasets should be raw, not baked by a recipe.
augment(<workflow>) will bind predictions to the unbaked input data. If a recipe contains steps that alter row number, augment() will error because the input and output datasets won’t have the same length.fit(<workflow>) does not take a formula, as the workflow object contains a formula already.We will fit and predict the workflow using the data_split object defined in the RSample Tutorial.
# fitting the model to the training data
iris_wkf <- iris_wkf |>
# Since the preprocessor is within the workflow, fit() only needs raw data
fit(training(data_split))
# using augment to generate predictions
# notice that augment now binds predictions to the unbaked input data!
iris_preds <- augment(iris_wkf, testing(data_split))
head(iris_preds)
## # A tibble: 6 × 10
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species row .pred_class .pred_setosa .pred_versicolor .pred_virginica
## <dbl> <dbl> <dbl> <dbl> <fct> <int> <fct> <dbl> <dbl> <dbl>
## 1 5.1 3.5 1.4 0.2 setosa 1 setosa 1 0 0
## 2 4.6 3.1 1.5 0.2 setosa 4 setosa 1 0 0
## 3 4.6 3.4 1.4 0.3 setosa 7 setosa 1 0 0
## 4 4.8 3 1.4 0.1 setosa 13 setosa 1 0 0
## 5 5.7 4.4 1.5 0.4 setosa 16 setosa 0.975 0.025 0
## 6 5.4 3.9 1.3 0.4 setosa 17 setosa 1 0 0
View information about a workflow object by calling it.
iris_wkf
## ══ Workflow [trained] ═══════════════════════════════════════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
##
## ── Preprocessor ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
## 3 Recipe Steps
##
## • step_corr()
## • step_normalize()
## • step_rename()
##
## ── Model ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
##
## Call:
## randomForest(x = maybe_data_frame(x), y = y, ntree = ~200)
## Type of random forest: classification
## Number of trees: 200
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 0.89%
## Confusion matrix:
## setosa versicolor virginica class.error
## setosa 39 0 0 0.00000000
## versicolor 0 38 0 0.00000000
## virginica 0 1 34 0.02857143
To retrieve specific information from a workflow, use an extraction method. Extraction methods are not just for Workflow objects – there are also extractor methods for Parsnip models, Tune objects, and workflow sets. =
A complete list of extractors for workflows can be found here.
# extracting the (unprepped) preprocessor
extract_preprocessor(iris_wkf)
## Recipe
##
## Inputs:
##
## role #variables
## ID 1
## outcome 1
## predictor 4
##
## Operations:
##
## Correlation filter on all_numeric_predictors()
## Centering and scaling for all_numeric_predictors()
## Variable renaming for Row
#extracting the fitted parsnip model
extract_fit_parsnip(iris_wkf)
## parsnip model object
##
##
## Call:
## randomForest(x = maybe_data_frame(x), y = y, ntree = ~200)
## Type of random forest: classification
## Number of trees: 200
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 0.89%
## Confusion matrix:
## setosa versicolor virginica class.error
## setosa 39 0 0 0.00000000
## versicolor 0 38 0 0.00000000
## virginica 0 1 34 0.02857143