After evaluating a data stack with blend_predictions()
,
some number of candidates will have nonzero stacking
coefficients. Such candidates are referred to as "members."
Since members' predictions will ultimately inform the model
stack's predictions, members should be trained on the full
training set using fit_members()
.
Arguments
- model_stack
A
model_stack
object outputted byblend_predictions()
.- ...
Additional arguments. Currently ignored.
Value
A model_stack
object with a subclass linear_stack
—this fitted
model contains the necessary components to predict on new data.
Details
To fit members in parallel, please create a plan with the future package.
See the documentation of future::plan()
for examples.
Example Data
This package provides some resampling objects and datasets for use in examples and vignettes derived from a study on 1212 red-eyed tree frog embryos!
Red-eyed tree frog (RETF) embryos can hatch earlier than their normal 7ish days if they detect potential predator threat. Researchers wanted to determine how, and when, these tree frog embryos were able to detect stimulus from their environment. To do so, they subjected the embryos at varying developmental stages to "predator stimulus" by jiggling the embryos with a blunt probe. Beforehand, though some of the embryos were treated with gentamicin, a compound that knocks out their lateral line (a sensory organ.) Researcher Julie Jung and her crew found that these factors inform whether an embryo hatches prematurely or not!
Note that the data included with the stacks package is not necessarily a representative or unbiased subset of the complete dataset, and is only for demonstrative purposes.
reg_folds
and class_folds
are rset
cross-fold validation objects
from rsample
, splitting the training data into for the regression
and classification model objects, respectively. tree_frogs_reg_test
and
tree_frogs_class_test
are the analogous testing sets.
reg_res_lr
, reg_res_svm
, and reg_res_sp
contain regression tuning results
for a linear regression, support vector machine, and spline model, respectively,
fitting latency
(i.e. how long the embryos took to hatch in response
to the jiggle) in the tree_frogs
data, using most all of the other
variables as predictors. Note that the data underlying these models is
filtered to include data only from embryos that hatched in response to
the stimulus.
class_res_rf
and class_res_nn
contain multiclass classification tuning
results for a random forest and neural network classification model,
respectively, fitting reflex
(a measure of ear function) in the
data using most all of the other variables as predictors.
log_res_rf
and log_res_nn
, contain binary classification tuning results
for a random forest and neural network classification model, respectively,
fitting hatched
(whether or not the embryos hatched in response
to the stimulus) using most all of the other variables as predictors.
See ?example_data
to learn more about these objects, as well as browse
the source code that generated them.
See also
Other core verbs:
add_candidates()
,
blend_predictions()
,
stacks()
Examples
# see the "Example Data" section above for
# clarification on the objects used in these examples!
# put together a data stack
reg_st <-
stacks() %>%
add_candidates(reg_res_lr) %>%
add_candidates(reg_res_svm) %>%
add_candidates(reg_res_sp)
reg_st
#> # A data stack with 3 model definitions and 16 candidate members:
#> # reg_res_lr: 1 model configuration
#> # reg_res_svm: 5 model configurations
#> # reg_res_sp: 10 model configurations
#> # Outcome: latency (numeric)
# evaluate the data stack and fit the member models
reg_st %>%
blend_predictions() %>%
fit_members()
#> ── A stacked ensemble model ─────────────────────────────────────
#>
#> Out of 16 possible candidate members, the ensemble retained 3.
#> Penalty: 1e-06.
#> Mixture: 1.
#>
#> The 3 highest weighted members are:
#> # A tibble: 3 × 3
#> member type weight
#> <chr> <chr> <dbl>
#> 1 reg_res_svm_1_3 svm_rbf 0.638
#> 2 reg_res_sp_03_1 linear_reg 0.486
#> 3 reg_res_sp_10_1 linear_reg 0.0482
reg_st
#> # A data stack with 3 model definitions and 16 candidate members:
#> # reg_res_lr: 1 model configuration
#> # reg_res_svm: 5 model configurations
#> # reg_res_sp: 10 model configurations
#> # Outcome: latency (numeric)
# do the same with multinomial classification models
class_st <-
stacks() %>%
add_candidates(class_res_nn) %>%
add_candidates(class_res_rf) %>%
blend_predictions() %>%
fit_members()
#> Warning: Predictions from 1 candidate were identical to those from existing
#> candidates and were removed from the data stack.
class_st
#> ── A stacked ensemble model ─────────────────────────────────────
#>
#> Out of 21 possible candidate members, the ensemble retained 5.
#> Penalty: 0.1.
#> Mixture: 1.
#> Across the 3 classes, there are an average of 2.5 coefficients per class.
#>
#> The 5 highest weighted member classes are:
#> # A tibble: 5 × 4
#> member type weight class
#> <chr> <chr> <dbl> <fct>
#> 1 .pred_full_class_res_nn_1_1 mlp 12.0 full
#> 2 .pred_mid_class_res_rf_1_06 rand_forest 0.670 mid
#> 3 .pred_full_class_res_rf_1_05 rand_forest 0.101 full
#> 4 .pred_full_class_res_rf_1_07 rand_forest 0.00457 full
#> 5 .pred_full_class_res_rf_1_01 rand_forest 0.00219 full
# ...or binomial classification models
log_st <-
stacks() %>%
add_candidates(log_res_nn) %>%
add_candidates(log_res_rf) %>%
blend_predictions() %>%
fit_members()
log_st
#> ── A stacked ensemble model ─────────────────────────────────────
#>
#> Out of 11 possible candidate members, the ensemble retained 3.
#> Penalty: 1e-05.
#> Mixture: 1.
#>
#> The 3 highest weighted member classes are:
#> # A tibble: 3 × 3
#> member type weight
#> <chr> <chr> <dbl>
#> 1 .pred_no_log_res_nn_1_1 mlp 7.39
#> 2 .pred_no_log_res_rf_1_05 rand_forest 3.43
#> 3 .pred_no_log_res_rf_1_02 rand_forest 0.0834