When building a model, preprocessing steps are often specific to a certain model. workflows bundle preprocessing and parsnip objects together, such that we can prep the data and fit the model with a single call to fit().


Creating a Workflow

Initiate a workflow object with the workflow() function. You can add a preprocessor and model in the initial function call or by using add_ methods (below).

wkf <- workflow(preprocessor = NULL, # pass in preprocessor here 
                spec = NULL) # pass in model here


Adding a Preprocessor

There are two options for a workflow preprocessor:

wkf1 <- wkf |>
  add_formula(Species ~ .)

# identical to wkf1
# note the use of everything(), which will ignore variables already referenced
wkf |>
  add_variables(outcomes = Species, predictors = everything())
## ══ Workflow ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════
## Preprocessor: Variables
## Model: None
## 
## ── Preprocessor ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Outcomes: Species
## Predictors: everything()
  • A Recipe: for more complex pre-processing, pass in a recipe object using add_recipe(). This is more common. For this example, we will use the iris_recipe defined in the Recipes Tutorial.
    • You can pass in a prepped or unprepped recipe to the workflow. If the recipe is not prepped, the workflow will prep it for you.
iris_wkf <- wkf |>
  add_recipe(iris_recipe)

iris_wkf
## ══ Workflow ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: None
## 
## ── Preprocessor ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
## 3 Recipe Steps
## 
## • step_corr()
## • step_normalize()
## • step_rename()

Note that each add_x() function has accompanying remove_x() and update_x() functions to allow for workflow modification.


Adding a Model

All workflows must include a parsnip model object. Add an unfitted model object using add_model(). We will use the rf random forest specification defined in the Parsnip Tutorial.

iris_wkf <- iris_wkf |>
  add_model(rf)

Note: If you are using a model with a specialized formula, add it using the formula argument of add_model(). You must use add_model() for this – special formulas cannot be specified with the add_formula() function.

# GAMs have special syntax for formulas because of their smoothing functions
gam <- gen_additive_mod() |>
  set_mode("regression") |>
  set_engine("mgcv")
  
gam_wkf <- workflow() |>
  # notice the lack of special syntax for the preprocessor formula
  add_formula(Sepal.length ~ Species + Sepal.width) |> 
  add_model(gam, 
            # now using GAM specific syntax 
            formula = Sepal.length ~ Species + s(Sepal.width))


Fitting and Predicting

After building a workflow, use fit(), predict(), and augment() just as you would with a parsnip model in order to train the workflow and generate predictions. Input datasets should be raw, not baked by a recipe.

We will fit and predict the workflow using the data_split object defined in the RSample Tutorial.

# fitting the model to the training data
iris_wkf <- iris_wkf |>
  # Since the preprocessor is within the workflow, fit() only needs raw data
  fit(training(data_split)) 

# using augment to generate predictions
# notice that augment now binds predictions to the unbaked input data!
iris_preds <- augment(iris_wkf, testing(data_split))
head(iris_preds)
## # A tibble: 6 × 10
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   row .pred_class .pred_setosa .pred_versicolor .pred_virginica
##          <dbl>       <dbl>        <dbl>       <dbl> <fct>   <int> <fct>              <dbl>            <dbl>           <dbl>
## 1          5.1         3.5          1.4         0.2 setosa      1 setosa             1                0                   0
## 2          4.6         3.1          1.5         0.2 setosa      4 setosa             1                0                   0
## 3          4.6         3.4          1.4         0.3 setosa      7 setosa             1                0                   0
## 4          4.8         3            1.4         0.1 setosa     13 setosa             1                0                   0
## 5          5.7         4.4          1.5         0.4 setosa     16 setosa             0.975            0.025               0
## 6          5.4         3.9          1.3         0.4 setosa     17 setosa             1                0                   0


Examining Workflows

View information about a workflow object by calling it.

iris_wkf
## ══ Workflow [trained] ═══════════════════════════════════════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
## 
## ── Preprocessor ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
## 3 Recipe Steps
## 
## • step_corr()
## • step_normalize()
## • step_rename()
## 
## ── Model ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## 
## Call:
##  randomForest(x = maybe_data_frame(x), y = y, ntree = ~200) 
##                Type of random forest: classification
##                      Number of trees: 200
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 0.89%
## Confusion matrix:
##            setosa versicolor virginica class.error
## setosa         39          0         0  0.00000000
## versicolor      0         38         0  0.00000000
## virginica       0          1        34  0.02857143

To retrieve specific information from a workflow, use an extraction method. Extraction methods are not just for Workflow objects – there are also extractor methods for Parsnip models, Tune objects, and workflow sets. =

A complete list of extractors for workflows can be found here.

# extracting the (unprepped) preprocessor
extract_preprocessor(iris_wkf)
## Recipe
## 
## Inputs:
## 
##       role #variables
##         ID          1
##    outcome          1
##  predictor          4
## 
## Operations:
## 
## Correlation filter on all_numeric_predictors()
## Centering and scaling for all_numeric_predictors()
## Variable renaming for Row
#extracting the fitted parsnip model
extract_fit_parsnip(iris_wkf)
## parsnip model object
## 
## 
## Call:
##  randomForest(x = maybe_data_frame(x), y = y, ntree = ~200) 
##                Type of random forest: classification
##                      Number of trees: 200
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 0.89%
## Confusion matrix:
##            setosa versicolor virginica class.error
## setosa         39          0         0  0.00000000
## versicolor      0         38         0  0.00000000
## virginica       0          1        34  0.02857143


Further Resources