Skip to content

The stacks package provides a shorthand interface for supplying multiple sets of candidate members in one call to add_candidates() via the workflowsets package. Instead of iteratively calling add_candidates() for each tuning result:

stacks() %>%
  add_candidates(tuning_result_1) %>%
  add_candidates(tuning_result_2) %>%
  add_candidates(tuning_result_3) %>%
  add_candidates(tuning_result_4) %>%
  blend_predictions() %>%

…we can add candidate members “in batch:”

This interface is especially helpful with a large number of candidate members, which is exactly the setting that model stacking excels in.


This example will parallel the “Getting Started” vignette, except that we will use workflowsets to bundle the model workflows that define the candidate members into one workflow set. We will then train each of them using workflowsets::workflow_map() and add the results to a model stack using add_candidates(). If you’re not familiar with the stacks package, refer to that vignette for the big picture!

First, loading needed packages:

In this example, we’ll again make use of the tree_frogs data exported with stacks, giving experimental results on hatching behavior of red-eyed tree frog embryos!

Red-eyed tree frog (RETF) embryos can hatch earlier than their normal 7ish days if they detect potential predator threat. Researchers wanted to determine how, and when, these tree frog embryos were able to detect stimulus from their environment. To do so, they subjected the embryos at varying developmental stages to “predator stimulus” by jiggling the embryos with a blunt probe. Beforehand, though some of the embryos were treated with gentamicin, a compound that knocks out their lateral line (a sensory organ.) Researcher Julie Jung and her crew found that these factors inform whether an embryo hatches prematurely or not!

We’ll start out with predicting latency (i.e. time to hatch) based on other attributes. We’ll need to filter out NAs (i.e. cases where the embryo did not hatch) first.


# subset the data
tree_frogs <- tree_frogs %>%
  filter(! %>%
  select(-c(clutch, hatched))

Taking a quick look at the data, it seems like the hatch time is pretty closely related to some of our predictors!


ggplot(tree_frogs) +
  aes(x = age, y = latency, color = treatment) +
  geom_point() +
  labs(x = "Embryo Age (s)", y = "Time to Hatch (s)", col = "Treatment")

A ggplot scatterplot with embryo age in seconds on the x axis, time to hatch on the y axis, and points colored by treatment. The ages range from 350,000 to 500,000 seconds, while the times to hatch range from 0 to 450 seconds. There are two treatments—control and gentamicin—and the time to hatch is generally larger for the gentamicin group. The embryo ages are generally distributed in three clouds, where the older embryos tend to hatch more quickly after stimulus than the younger ones.

Let’s give this a go!

Defining candidate ensemble members

As in the workflow-based setting, defining the candidate ensemble members is undoubtedly the longest part of the ensembling process with stacks. Instead of evaluating our model specifications via resampling for each workflow, though, we combine each of the model specifications with workflow_set(), and then evaluate them in batch with the workflow_map() function.

First, splitting up the training data, generating resamples, and setting some options that will be used by each model definition:

# some setup: resampling and a basic recipe
tree_frogs_split <- initial_split(tree_frogs)
tree_frogs_train <- training(tree_frogs_split)
tree_frogs_test  <- testing(tree_frogs_split)

folds <- rsample::vfold_cv(tree_frogs_train, v = 5)

tree_frogs_rec <- 
  recipe(latency ~ ., data = tree_frogs_train)

metric <- metric_set(rmse)

We also need to use the same control settings as in the workflow-based setting:

ctrl_grid <- control_stack_grid()

We’ll define three different model definitions to try to predict time to hatch—a K-nearest neighbors model (with hyperparameters to tune), a linear model, and a support vector machine model (again, with hyperparameters to tune).

Starting out with K-nearest neighbors, we begin by creating a parsnip model specification:

# create a model specification
knn_spec <-
    mode = "regression", 
    neighbors = tune("k")
  ) %>%

#> K-Nearest Neighbor Model Specification (regression)
#> Main Arguments:
#>   neighbors = tune("k")
#> Computational engine: kknn

Note that, since we are tuning over several possible numbers of neighbors, this model specification defines multiple model configurations. The specific form of those configurations will be determined when specifying the grid search.

From here, we extend the basic recipe defined earlier to fully specify the form of the design matrix for use in a K-nearest neighbors model:

# extend the recipe
knn_rec <-
  tree_frogs_rec %>%
  step_dummy(all_nominal_predictors()) %>%
  step_zv(all_predictors()) %>%
  step_impute_mean(all_numeric_predictors()) %>%

#> Recipe
#> Inputs:
#>       role #variables
#>    outcome          1
#>  predictor          4
#> Operations:
#> Dummy variables from all_nominal_predictors()
#> Zero variance filter on all_predictors()
#> Mean imputation for all_numeric_predictors()
#> Centering and scaling for all_numeric_predictors()

Starting with the basic recipe, we convert categorical predictors to dummy variables, remove predictors with only one observation, impute missing values in numeric predictors using the mean, and normalize numeric predictors. Pre-processing instructions for the remaining models are defined similarly.

Now, specifying the linear model, note that we are not optimizing over any hyperparameters.

# create a model specification
lin_reg_spec <-
  linear_reg() %>%

# extend the recipe
lin_reg_rec <-
  tree_frogs_rec %>%
  step_dummy(all_nominal_predictors()) %>%

Finally, putting together the model specification and recipe for the support vector machine:

# create a model specification
svm_spec <- 
    cost = tune("cost"), 
    rbf_sigma = tune("sigma")
  ) %>%
  set_engine("kernlab") %>%

# extend the recipe
svm_rec <-
  tree_frogs_rec %>%
  step_dummy(all_nominal_predictors()) %>%
  step_zv(all_predictors()) %>%
  step_impute_mean(all_numeric_predictors()) %>%
  step_corr(all_predictors()) %>%

With each model specification and accompanying recipe now defined, we can combine them via workflow_set:

wf_set <- 
    preproc = list(rec1 = knn_rec, rec2 = lin_reg_rec,     rec3 = svm_rec),
    models =  list(knn = knn_spec, lin_reg = lin_reg_spec, svm = svm_spec),
    cross = FALSE

#> # A workflow set/tibble: 3 × 4
#>   wflow_id     info             option    result    
#>   <chr>        <list>           <list>    <list>    
#> 1 rec1_knn     <tibble [1 × 4]> <opts[0]> <list [0]>
#> 2 rec2_lin_reg <tibble [1 × 4]> <opts[0]> <list [0]>
#> 3 rec3_svm     <tibble [1 × 4]> <opts[0]> <list [0]>

Note that each combination of preprocessor and model specification is assigned a wflow_id that we can use to interface with individual model definitions:

wf_set %>%
#> ══ Workflow ══════════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: svm_rbf()
#> ── Preprocessor ──────────────────────────────────────────────────────────
#> 5 Recipe Steps
#> • step_dummy()
#> • step_zv()
#> • step_impute_mean()
#> • step_corr()
#> • step_normalize()
#> ── Model ─────────────────────────────────────────────────────────────────
#> Radial Basis Function Support Vector Machine Model Specification (regression)
#> Main Arguments:
#>   cost = tune("cost")
#>   rbf_sigma = tune("sigma")
#> Computational engine: kernlab

The elements in this workflow set are nearly ready to be evaluated with tune_grid(). Before we do so, though, we need to ensure that we fit each with the appropriate control options, just as we do when evaluating individual workflows on resamples before passing them to stacks.

We can iteratively add the appropriate control settings with the control_stack_grid() shorthand using the option_add() function:

wf_set <-
  wf_set %>%
    control = control_stack_grid(),
    metrics = metric

We can now use the workflow_map() function to map over each model definition, evaluating hyperparameters on the supplied resamples.

wf_set_trained <-
    fn = "tune_grid",
    resamples = folds

#> # A workflow set/tibble: 3 × 4
#>   wflow_id     info             option    result   
#>   <chr>        <list>           <list>    <list>   
#> 1 rec1_knn     <tibble [1 × 4]> <opts[3]> <tune[+]>
#> 2 rec2_lin_reg <tibble [1 × 4]> <opts[3]> <rsmp[+]>
#> 3 rec3_svm     <tibble [1 × 4]> <opts[3]> <tune[+]>

The results section now contains three sets of tuning results. Note that the results corresponding to the linear regression have the subclass resample_results rather than tune_results—this is expected, as there were no hyperparameters to tune for that model specification. workflow_map() knows to fall back to fit_resamples() rather than tune_grid(), in this case!

We can extract tuning results with the extract_workflow_set_result() helper to explore our tuning results:

extract_workflow_set_result(wf_set_trained, id = "rec1_knn") %>%
  collect_metrics(summarize = TRUE)
#> # A tibble: 10 × 7
#>        k .metric .estimator  mean     n std_err .config              
#>    <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>                
#>  1     1 rmse    standard    81.3     5    4.37 Preprocessor1_Model01
#>  2     3 rmse    standard    69.6     5    3.57 Preprocessor1_Model02
#>  3     5 rmse    standard    63.7     5    3.30 Preprocessor1_Model03
#>  4     6 rmse    standard    62.2     5    3.13 Preprocessor1_Model04
#>  5     7 rmse    standard    61.2     5    2.97 Preprocessor1_Model05
#>  6     9 rmse    standard    59.8     5    2.83 Preprocessor1_Model06
#>  7    10 rmse    standard    59.4     5    2.82 Preprocessor1_Model07
#>  8    11 rmse    standard    58.9     5    2.82 Preprocessor1_Model08
#>  9    13 rmse    standard    58.5     5    2.85 Preprocessor1_Model09
#> 10    15 rmse    standard    58.1     5    2.88 Preprocessor1_Model10

With these three model definitions fully specified and tuned in a workflow set, we are ready to begin stacking these model configurations. (Note that, in most applied settings, one would likely specify many more than a handful candidate members from three model definitions.)

Putting together a stack

Building the stacked ensemble, now, takes even fewer lines than it did with individual workflows:

tree_frogs_model_st <- 
  # initialize the stack
  stacks() %>%
  # add candidate members
  add_candidates(wf_set_trained) %>%
  # determine how to combine their predictions
  blend_predictions() %>%
  # fit the candidates with nonzero stacking coefficients

#> # A tibble: 3 × 3
#>   member           type             weight
#>   <chr>            <chr>             <dbl>
#> 1 rec3_svm_1_06    svm_rbf           0.799
#> 2 rec2_lin_reg_1_1 linear_reg        0.266
#> 3 rec1_knn_1_08    nearest_neighbor  0.208

The results obtained from building a model stack with workflow sets are identical to those that would result from building a model stack with individual workflows.

To make sure that we have the right trade-off between minimizing the number of members and optimizing performance, we can use:


A ggplot line plot. The x axis shows the degree of penalization, ranging from 1e-06 to 1e-01, and the y axis displays the mean of three different metrics. The plots are faceted by metric type, with three facets: number of members, root mean squared error, and R squared. The plots generally show that, as penalization increases, the error decreases. There are very few proposed members in this example, so penalization doesn't drive down the number of members much at all. In this case, then, a larger penalty is acceptable.

If these results were not good enough, blend_predictions() could be called again with different values of penalty. As it is, blend_predictions() picks the penalty parameter with the numerically optimal results. To see the top results:

autoplot(tree_frogs_model_st, type = "weights")

A ggplot bar plot, giving the stacking coefficient on the x axis and member on the y axis. There are three members in this ensemble, where a nearest neighbor is weighted most heavily, followed by a linear regression with a stacking coefficient about half as large, followed by a support vector machine with a very small contribution.

To identify which model configurations were assigned what stacking coefficients, we can make use of the collect_parameters() function:

collect_parameters(tree_frogs_model_st, "rec3_svm")
#> # A tibble: 10 × 4
#>    member            cost    sigma  coef
#>    <chr>            <dbl>    <dbl> <dbl>
#>  1 rec3_svm_1_01  0.185   1.17e- 4 0    
#>  2 rec3_svm_1_02  1.23    2.39e- 7 0    
#>  3 rec3_svm_1_03  0.0195  3.37e- 5 0    
#>  4 rec3_svm_1_04 15.2     4.78e- 8 0    
#>  5 rec3_svm_1_05  2.97    1.30e- 6 0    
#>  6 rec3_svm_1_06  0.134   5.62e- 2 0.799
#>  7 rec3_svm_1_07  0.00163 3.01e- 3 0    
#>  8 rec3_svm_1_08  4.08    2.98e- 1 0    
#>  9 rec3_svm_1_09  0.00429 3.45e-10 0    
#> 10 rec3_svm_1_10  0.0280  1.54e- 9 0

This object is now ready to predict with new data!

tree_frogs_test <- 
  tree_frogs_test %>%
  bind_cols(predict(tree_frogs_model_st, .))

Juxtaposing the predictions with the true data:

ggplot(tree_frogs_test) +
  aes(x = latency, 
      y = .pred) +
  geom_point() + 

A ggplot scatterplot showing observed versus predicted latency values. While there is indeed a positive and roughly linear relationship, there is certainly patterned structure in the residuals.

Looks like our predictions were decent! How do the stacks predictions perform, though, as compared to the members’ predictions? We can use the type = "members" argument to generate predictions from each of the ensemble members.

member_preds <- 
  tree_frogs_test %>%
  select(latency) %>%
  bind_cols(predict(tree_frogs_model_st, tree_frogs_test, members = TRUE))

Now, evaluating the root mean squared error from each model:

map_dfr(member_preds, rmse, truth = latency, data = member_preds) %>%
  mutate(member = colnames(member_preds))
#> # A tibble: 5 × 4
#>   .metric .estimator .estimate member          
#>   <chr>   <chr>          <dbl> <chr>           
#> 1 rmse    standard         0   latency         
#> 2 rmse    standard        54.0 .pred           
#> 3 rmse    standard        56.2 rec1_knn_1_08   
#> 4 rmse    standard        55.5 rec2_lin_reg_1_1
#> 5 rmse    standard        57.4 rec3_svm_1_06

As we can see, the stacked ensemble outperforms each of the member models, though is closely followed by one of its members.

Voila! You’ve now made use of the stacks and workflowsets packages to predict red-eyed tree frog embryo hatching using a stacked ensemble!