Add model definitions to a data stack

add_candidates() collates the assessment set predictions and additional attributes from the supplied model definition (i.e. set of "candidates") to a data stack.

Behind the scenes, data stack objects are just tibble::tbl_dfs, where the first column gives the true response values, and the remaining columns give the assessment set predictions for each candidate. In the regression setting, there's only one column per ensemble member. In classification settings, there are as many columns per candidate ensemble member as there are levels of the outcome variable.

To initialize a data stack, use the stacks() function. Model definitions are appended to a data stack iteratively using several calls to add_candidates(). Data stacks are evaluated using the blend_predictions() function.

Usage

add_candidates(
  data_stack,
  candidates,
  name = deparse(substitute(candidates)),
  ...
)

Arguments

data_stack

A data_stack object.

candidates

A (set of) model definition(s) defining candidate model stack members. Should inherit from tune_results or workflow_set.

tune_results: An object outputted from tune::tune_grid(), tune::tune_bayes(), or tune::fit_resamples().
workflow_set: An object outputted from workflowsets::workflow_map(). This approach allows for supplying multiple sets of candidate members with only one call to add_candidates. See the "Stacking With Workflow Sets" article on the package website for example code!

Regardless, these results must have been fitted with the control settings save_pred = TRUE, save_workflow = TRUE—see the control_stack_grid(), control_stack_bayes(), and control_stack_resamples() documentation for helper functions.

name

The label for the model definition—defaults to the name of the candidates object. Ignored if candidates inherits from workflow_set.

...

Additional arguments. Currently ignored.

Value

A data_stack object–see stacks() for more details!

Example Data

This package provides some resampling objects and datasets for use in examples and vignettes derived from a study on 1212 red-eyed tree frog embryos!

Red-eyed tree frog (RETF) embryos can hatch earlier than their normal 7ish days if they detect potential predator threat. Researchers wanted to determine how, and when, these tree frog embryos were able to detect stimulus from their environment. To do so, they subjected the embryos at varying developmental stages to "predator stimulus" by jiggling the embryos with a blunt probe. Beforehand, though some of the embryos were treated with gentamicin, a compound that knocks out their lateral line (a sensory organ.) Researcher Julie Jung and her crew found that these factors inform whether an embryo hatches prematurely or not!

Note that the data included with the stacks package is not necessarily a representative or unbiased subset of the complete dataset, and is only for demonstrative purposes.

reg_folds and class_folds are rset cross-fold validation objects from rsample, splitting the training data into for the regression and classification model objects, respectively. tree_frogs_reg_test and tree_frogs_class_test are the analogous testing sets.

reg_res_lr, reg_res_svm, and reg_res_sp contain regression tuning results for a linear regression, support vector machine, and spline model, respectively, fitting latency (i.e. how long the embryos took to hatch in response to the jiggle) in the tree_frogs data, using most all of the other variables as predictors. Note that the data underlying these models is filtered to include data only from embryos that hatched in response to the stimulus.

class_res_rf and class_res_nn contain multiclass classification tuning results for a random forest and neural network classification model, respectively, fitting reflex (a measure of ear function) in the data using most all of the other variables as predictors.

log_res_rf and log_res_nn, contain binary classification tuning results for a random forest and neural network classification model, respectively, fitting hatched (whether or not the embryos hatched in response to the stimulus) using most all of the other variables as predictors.

See ?example_data to learn more about these objects, as well as browse the source code that generated them.

Examples

# see the "Example Data" section above for
# clarification on the objects used in these examples!

# put together a data stack using
# tuning results for regression models
reg_st <-
  stacks() |>
  add_candidates(reg_res_lr) |>
  add_candidates(reg_res_svm) |>
  add_candidates(reg_res_sp)

reg_st
#> # A data stack with 3 model definitions and 16 candidate members:
#> #   reg_res_lr: 1 model configuration
#> #   reg_res_svm: 5 model configurations
#> #   reg_res_sp: 10 model configurations
#> # Outcome: latency (numeric)

# do the same with multinomial classification models
class_st <-
  stacks() |>
  add_candidates(class_res_nn) |>
  add_candidates(class_res_rf)
#> Warning: Predictions from 1 candidate were identical to those from existing
#> candidates and were removed from the data stack.

class_st
#> # A data stack with 2 model definitions and 10.6666666666667 candidate members:
#> #   class_res_nn: 1 model configuration
#> #   class_res_rf: 9.66666666666667 model configurations
#> # Outcome: reflex (factor)

# ...or binomial classification models
log_st <-
  stacks() |>
  add_candidates(log_res_nn) |>
  add_candidates(log_res_rf)

log_st
#> # A data stack with 2 model definitions and 11 candidate members:
#> #   log_res_nn: 1 model configuration
#> #   log_res_rf: 10 model configurations
#> # Outcome: hatched (factor)

# use custom names for each model:
log_st2 <-
  stacks() |>
  add_candidates(log_res_nn, name = "neural_network") |>
  add_candidates(log_res_rf, name = "random_forest")

log_st2
#> # A data stack with 2 model definitions and 11 candidate members:
#> #   neural_network: 1 model configuration
#> #   random_forest: 10 model configurations
#> # Outcome: hatched (factor)

# these objects would likely then be
# passed to blend_predictions():
log_st2 |> blend_predictions()
#> ── A stacked ensemble model ─────────────────────────────────────
#> 
#> Out of 11 possible candidate members, the ensemble retained 3.
#> Penalty: 0.001.
#> Mixture: 1.
#> 
#> The 3 highest weighted member classes are:
#> # A tibble: 3 × 3
#>   member                      type        weight
#>   <chr>                       <chr>        <dbl>
#> 1 .pred_no_neural_network_1_1 mlp         7.39  
#> 2 .pred_no_random_forest_1_05 rand_forest 3.42  
#> 3 .pred_no_random_forest_1_02 rand_forest 0.0281
#> 
#> Members have not yet been fitted with `fit_members()`.

Usage

Arguments

Value

Example Data

See also

Examples