pkgdown/extra.css

Skip to contents

do_rreg_multi() runs the regularized multiclassification model pipeline. It splits the data into training and test sets, creates class-balanced case-control groups, and fits the model. It performs hyperparameter optimization and fits the best model. It also plots the ROC curve and the AUC barplot for each class.

Usage

do_rreg_multi(
  olink_data,
  metadata,
  variable = "Disease",
  wide = TRUE,
  strata = TRUE,
  exclude_cols = "Sex",
  ratio = 0.75,
  type = "lasso",
  cor_threshold = 0.9,
  cv_sets = 5,
  grid_size = 10,
  ncores = 4,
  hypopt_vis = TRUE,
  palette = NULL,
  vline = TRUE,
  varimp_yaxis_names = FALSE,
  seed = 123
)

Arguments

Olink data.

metadata

Metadata.

variable

The variable to predict. Default is "Disease".

wide

Whether the data is wide format. Default is TRUE.

strata

Whether to stratify the data. Default is TRUE.

exclude_cols

Columns to exclude from the data before the model is tuned.

ratio

Ratio of training data to test data. Default is 0.75.

type

Type of regularization. Default is "lasso". Other options are "ridge" and "elnet".

cor_threshold

Threshold of absolute correlation values. This will be used to remove the minimum number of features so that all their resulting absolute correlations are less than this value.

cv_sets

Number of cross-validation sets. Default is 5.

grid_size

Size of the hyperparameter optimization grid. Default is 10.

ncores

Number of cores to use for parallel processing. Default is 4.

hypopt_vis

Whether to visualize hyperparameter optimization results. Default is TRUE.

palette

The color palette for the plot. If it is a character, it should be one of the palettes from get_hpa_palettes(). Default is NULL.

vline

Whether to add a vertical line at 50% importance. Default is TRUE.

varimp_yaxis_names

Whether to add y-axis names to the variable importance plot. Default is FALSE.

seed

Seed for reproducibility. Default is 123.

Value

A list with the following elements:

  • hypopt_res: Hyperparameter optimization results.

  • finalfit_res: Final model fitting results.

  • roc_curve: ROC curve plot.

  • auc: AUC values for each class.

  • auc_barplot: AUC barplot.

  • var_imp_res: Variable importance results.

Details

If the data contain missing values, KNN imputation will be applied. If no check for feature correlation is preferred, set cor_threshold to 1. It will filter out rows that contain NAs in Disease.

Examples

do_rreg_multi(example_data,
              example_metadata,
              wide = FALSE,
              palette = "cancers12",
              cv_sets = 5,
              grid_size = 5,
              ncores = 1)
#> Joining with `by = join_by(DAid)`
#> Warning: Too little data to stratify.
#>  Resampling will be unstratified.
#> Sets are ready. Multiclassification model fitting is starting...
#> Warning: Too little data to stratify.
#>  Resampling will be unstratified.
#> Warning: No event observations were detected in `truth` with event level 'BRC'.
#> Warning: No event observations were detected in `truth` with event level 'CLL'.
#> Warning: No event observations were detected in `truth` with event level 'CRC'.
#> Warning: No event observations were detected in `truth` with event level 'CVX'.
#> Warning: No event observations were detected in `truth` with event level 'ENDC'.
#> Warning: No event observations were detected in `truth` with event level 'LUNGC'.
#> Warning: No event observations were detected in `truth` with event level 'PRC'.
#> Warning: No event observations were detected in `truth` with event level 'AML'.
#> Warning: No event observations were detected in `truth` with event level 'CLL'.
#> Warning: No event observations were detected in `truth` with event level 'CVX'.
#> Warning: No event observations were detected in `truth` with event level 'GLIOM'.
#> Warning: No event observations were detected in `truth` with event level 'LUNGC'.
#> Warning: No event observations were detected in `truth` with event level 'MYEL'.
#> Warning: No event observations were detected in `truth` with event level 'OVC'.
#> Warning: No event observations were detected in `truth` with event level 'AML'.
#> Warning: No event observations were detected in `truth` with event level 'CRC'.
#> Warning: No event observations were detected in `truth` with event level 'CVX'.
#> Warning: No event observations were detected in `truth` with event level 'ENDC'.
#> Warning: No event observations were detected in `truth` with event level 'GLIOM'.
#> Warning: No event observations were detected in `truth` with event level 'LUNGC'.
#> Warning: No event observations were detected in `truth` with event level 'LYMPH'.
#> Warning: No event observations were detected in `truth` with event level 'MYEL'.
#> Warning: No event observations were detected in `truth` with event level 'OVC'.
#> Warning: No event observations were detected in `truth` with event level 'PRC'.
#> Warning: No event observations were detected in `truth` with event level 'AML'.
#> Warning: No event observations were detected in `truth` with event level 'BRC'.
#> Warning: No event observations were detected in `truth` with event level 'CLL'.
#> Warning: No event observations were detected in `truth` with event level 'ENDC'.
#> Warning: No event observations were detected in `truth` with event level 'GLIOM'.
#> Warning: No event observations were detected in `truth` with event level 'LYMPH'.
#> Warning: No event observations were detected in `truth` with event level 'MYEL'.
#> Warning: No event observations were detected in `truth` with event level 'AML'.
#> Warning: No event observations were detected in `truth` with event level 'CRC'.
#> Warning: No event observations were detected in `truth` with event level 'GLIOM'.
#> Warning: No event observations were detected in `truth` with event level 'MYEL'.
#> Warning: No event observations were detected in `truth` with event level 'AML'.
#> Warning: No event observations were detected in `truth` with event level 'CLL'.
#> Warning: No event observations were detected in `truth` with event level 'CRC'.
#> Warning: No event observations were detected in `truth` with event level 'LUNGC'.
#> Warning: No event observations were detected in `truth` with event level 'AML'.
#> Warning: No event observations were detected in `truth` with event level 'CLL'.
#> Warning: No event observations were detected in `truth` with event level 'CRC'.
#> Warning: No event observations were detected in `truth` with event level 'ENDC'.
#> Warning: No event observations were detected in `truth` with event level 'MYEL'.
#> Warning: No event observations were detected in `truth` with event level 'OVC'.
#> Warning: No event observations were detected in `truth` with event level 'AML'.
#> Warning: No event observations were detected in `truth` with event level 'CLL'.
#> Warning: No event observations were detected in `truth` with event level 'CVX'.
#> Warning: No event observations were detected in `truth` with event level 'MYEL'.
#> Warning: No event observations were detected in `truth` with event level 'PRC'.
#> Warning: No event observations were detected in `truth` with event level 'BRC'.
#> Warning: No event observations were detected in `truth` with event level 'CLL'.
#> Warning: No event observations were detected in `truth` with event level 'CRC'.
#> Warning: No event observations were detected in `truth` with event level 'ENDC'.
#> Warning: No event observations were detected in `truth` with event level 'GLIOM'.
#> Warning: No event observations were detected in `truth` with event level 'MYEL'.
#> Warning: No event observations were detected in `truth` with event level 'OVC'.
#> Warning: No event observations were detected in `truth` with event level 'BRC'.
#> Warning: No event observations were detected in `truth` with event level 'CLL'.
#> Warning: No event observations were detected in `truth` with event level 'CRC'.
#> Warning: No event observations were detected in `truth` with event level 'CVX'.
#> Warning: No event observations were detected in `truth` with event level 'GLIOM'.
#> Warning: No event observations were detected in `truth` with event level 'LUNGC'.
#> Warning: No event observations were detected in `truth` with event level 'LYMPH'.
#> Warning: No event observations were detected in `truth` with event level 'PRC'.
#> Warning: No event observations were detected in `truth` with event level 'AML'.
#> Warning: No event observations were detected in `truth` with event level 'BRC'.
#> Warning: No event observations were detected in `truth` with event level 'CLL'.
#> Warning: No event observations were detected in `truth` with event level 'CRC'.
#> Warning: No event observations were detected in `truth` with event level 'CVX'.
#> Warning: No event observations were detected in `truth` with event level 'LYMPH'.
#> Warning: No event observations were detected in `truth` with event level 'MYEL'.
#> Warning: No event observations were detected in `truth` with event level 'AML'.
#> Warning: No event observations were detected in `truth` with event level 'ENDC'.
#> Warning: No event observations were detected in `truth` with event level 'GLIOM'.
#> Warning: No event observations were detected in `truth` with event level 'LUNGC'.
#> Warning: No event observations were detected in `truth` with event level 'MYEL'.
#> Warning: No event observations were detected in `truth` with event level 'OVC'.
#> $hypopt_res
#> $hypopt_res$elnet_tune
#> # Tuning results
#> # 5-fold cross-validation using stratification 
#> # A tibble: 5 × 5
#>   splits           id    .metrics         .notes           .predictions       
#>   <list>           <chr> <list>           <list>           <list>             
#> 1 <split [351/88]> Fold1 <tibble [5 × 5]> <tibble [0 × 3]> <tibble [440 × 16]>
#> 2 <split [351/88]> Fold2 <tibble [5 × 5]> <tibble [0 × 3]> <tibble [440 × 16]>
#> 3 <split [351/88]> Fold3 <tibble [5 × 5]> <tibble [0 × 3]> <tibble [440 × 16]>
#> 4 <split [351/88]> Fold4 <tibble [5 × 5]> <tibble [0 × 3]> <tibble [440 × 16]>
#> 5 <split [352/87]> Fold5 <tibble [5 × 5]> <tibble [0 × 3]> <tibble [435 × 16]>
#> 
#> $hypopt_res$elnet_wf
#> ══ Workflow ════════════════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: multinom_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 4 Recipe Steps
#> 
#> • step_normalize()
#> • step_nzv()
#> • step_corr()
#> • step_impute_knn()
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Multinomial Regression Model Specification (classification)
#> 
#> Main Arguments:
#>   penalty = tune::tune()
#>   mixture = 1
#> 
#> Computational engine: glmnet 
#> 
#> 
#> $hypopt_res$train_set
#> # A tibble: 439 × 102
#>    DAid    AARSD1   ABL1  ACAA1   ACAN    ACE2  ACOX1    ACP5    ACP6  ACTA2
#>    <chr>    <dbl>  <dbl>  <dbl>  <dbl>   <dbl>  <dbl>   <dbl>   <dbl>  <dbl>
#>  1 DA00003  NA    NA     NA      0.989 NA       0.330  1.37   NA      NA    
#>  2 DA00004   3.41  3.38   1.69  NA      1.52   NA      0.841   0.582   1.70 
#>  3 DA00005   5.01  5.05   0.128  0.401 -0.933  -0.584  0.0265  1.16    2.73 
#>  4 DA00006   6.83  1.18  -1.74  -0.156  1.53   -0.721  0.620   0.527   0.772
#>  5 DA00007  NA    NA      3.96   0.682  3.14    2.62   1.47    2.25    2.01 
#>  6 DA00008   2.78  0.812 -0.552  0.982 -0.101  -0.304  0.376  -0.826   1.52 
#>  7 DA00010   1.83  1.21  -0.912 -1.04  -0.0918 -0.304  1.69    0.0920  2.04 
#>  8 DA00011   3.48  4.96   3.50  -0.338  4.48    1.26   2.18    1.62    1.79 
#>  9 DA00012   4.31  0.710 -1.44  -0.218 -0.469  -0.361 -0.0714 -1.30    2.86 
#> 10 DA00013   1.31  2.52   1.11   0.997  4.56   -1.35   0.833   2.33    3.57 
#> # ℹ 429 more rows
#> # ℹ 92 more variables: ACTN4 <dbl>, ACY1 <dbl>, ADA <dbl>, ADA2 <dbl>,
#> #   ADAM15 <dbl>, ADAM23 <dbl>, ADAM8 <dbl>, ADAMTS13 <dbl>, ADAMTS15 <dbl>,
#> #   ADAMTS16 <dbl>, ADAMTS8 <dbl>, ADCYAP1R1 <dbl>, ADGRE2 <dbl>, ADGRE5 <dbl>,
#> #   ADGRG1 <dbl>, ADGRG2 <dbl>, ADH4 <dbl>, ADM <dbl>, AGER <dbl>, AGR2 <dbl>,
#> #   AGR3 <dbl>, AGRN <dbl>, AGRP <dbl>, AGXT <dbl>, AHCY <dbl>, AHSP <dbl>,
#> #   AIF1 <dbl>, AIFM1 <dbl>, AK1 <dbl>, AKR1B1 <dbl>, AKR1C4 <dbl>, …
#> 
#> $hypopt_res$test_set
#> # A tibble: 147 × 102
#>    DAid  AARSD1  ABL1  ACAA1    ACAN  ACE2   ACOX1   ACP5    ACP6 ACTA2    ACTN4
#>    <chr>  <dbl> <dbl>  <dbl>   <dbl> <dbl>   <dbl>  <dbl>   <dbl> <dbl>    <dbl>
#>  1 DA00…  3.39  2.76   1.71   0.0333 1.76  -0.919   1.54   2.15   2.81   0.742  
#>  2 DA00…  1.42  1.25  -0.816 -0.459  0.826 -0.902   0.647  1.30   0.798 -0.0659 
#>  3 DA00…  4.39  3.34  -0.452 -0.868  0.395  1.71    1.49  -0.0285 0.200 -0.532  
#>  4 DA00…  3.31  1.90  NA     -0.926  0.408  0.687   1.03   0.612  2.19   0.258  
#>  5 DA00…  1.46  0.832 -2.73  -0.371  2.27   0.0234  0.144  0.826  1.98  -0.280  
#>  6 DA00…  2.62  2.48   0.537 -0.215  1.82   0.290   1.27   1.11   0.206  1.23   
#>  7 DA00…  2.47  2.16  -0.486 NA      0.386 NA       1.38   0.536  1.86   0.00982
#>  8 DA00…  3.62  3.06  -1.34   0.965  1.05   1.53    0.152 -0.124  2.81   0.285  
#>  9 DA00…  4.39  3.31   0.454  0.290  2.68   0.116  -1.32   0.945  2.14  -0.00881
#> 10 DA00…  0.964 2.94   1.55   1.67   2.50   0.164   1.83   1.46   3.03   0.449  
#> # ℹ 137 more rows
#> # ℹ 91 more variables: ACY1 <dbl>, ADA <dbl>, ADA2 <dbl>, ADAM15 <dbl>,
#> #   ADAM23 <dbl>, ADAM8 <dbl>, ADAMTS13 <dbl>, ADAMTS15 <dbl>, ADAMTS16 <dbl>,
#> #   ADAMTS8 <dbl>, ADCYAP1R1 <dbl>, ADGRE2 <dbl>, ADGRE5 <dbl>, ADGRG1 <dbl>,
#> #   ADGRG2 <dbl>, ADH4 <dbl>, ADM <dbl>, AGER <dbl>, AGR2 <dbl>, AGR3 <dbl>,
#> #   AGRN <dbl>, AGRP <dbl>, AGXT <dbl>, AHCY <dbl>, AHSP <dbl>, AIF1 <dbl>,
#> #   AIFM1 <dbl>, AK1 <dbl>, AKR1B1 <dbl>, AKR1C4 <dbl>, AKT1S1 <dbl>, …
#> 
#> $hypopt_res$hypopt_vis

#> 
#> 
#> $finalfit_res
#> $finalfit_res$final
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: multinom_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 4 Recipe Steps
#> 
#> • step_normalize()
#> • step_nzv()
#> • step_corr()
#> • step_impute_knn()
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> 
#> Call:  glmnet::glmnet(x = maybe_matrix(x), y = y, family = "multinomial",      alpha = ~1) 
#> 
#>      Df  %Dev   Lambda
#> 1     0  0.00 0.145200
#> 2     1  0.80 0.132300
#> 3     2  1.47 0.120500
#> 4     2  2.01 0.109800
#> 5     3  2.72 0.100100
#> 6     4  3.54 0.091180
#> 7     5  4.47 0.083080
#> 8     8  5.74 0.075700
#> 9     8  7.24 0.068980
#> 10   13  9.04 0.062850
#> 11   16 11.12 0.057270
#> 12   22 13.33 0.052180
#> 13   24 15.48 0.047540
#> 14   33 17.80 0.043320
#> 15   35 20.12 0.039470
#> 16   43 22.49 0.035960
#> 17   48 24.98 0.032770
#> 18   50 27.39 0.029860
#> 19   58 29.74 0.027210
#> 20   61 32.02 0.024790
#> 21   68 34.24 0.022590
#> 22   74 36.53 0.020580
#> 23   81 38.94 0.018750
#> 24   86 41.38 0.017090
#> 25   87 43.81 0.015570
#> 26   87 46.18 0.014190
#> 27   90 48.49 0.012920
#> 28   93 50.75 0.011780
#> 29   95 53.01 0.010730
#> 30   97 55.23 0.009777
#> 31   97 57.37 0.008909
#> 32   99 59.44 0.008117
#> 33   99 61.51 0.007396
#> 34   99 63.55 0.006739
#> 35   99 65.57 0.006140
#> 36   99 67.55 0.005595
#> 37  100 69.52 0.005098
#> 38  100 71.47 0.004645
#> 39  100 73.36 0.004232
#> 40  100 75.18 0.003856
#> 41  100 76.93 0.003514
#> 42  100 78.62 0.003202
#> 43  100 80.22 0.002917
#> 44  100 81.77 0.002658
#> 45  100 83.22 0.002422
#> 46  100 84.59 0.002207
#> 
#> ...
#> and 54 more lines.
#> 
#> $finalfit_res$best
#> # A tibble: 1 × 1
#>   penalty
#>     <dbl>
#> 1 0.00248
#> 
#> $finalfit_res$final_wf
#> ══ Workflow ════════════════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: multinom_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 4 Recipe Steps
#> 
#> • step_normalize()
#> • step_nzv()
#> • step_corr()
#> • step_impute_knn()
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Multinomial Regression Model Specification (classification)
#> 
#> Main Arguments:
#>   penalty = 0.0024775372708352
#>   mixture = 1
#> 
#> Computational engine: glmnet 
#> 
#> 
#> 
#> $roc_curve

#> 
#> $auc
#> # A tibble: 12 × 2
#>    Disease   AUC
#>    <chr>   <dbl>
#>  1 AML     0.940
#>  2 BRC     0.557
#>  3 CLL     1    
#>  4 CRC     0.697
#>  5 CVX     0.610
#>  6 ENDC    0.675
#>  7 GLIOM   0.759
#>  8 LUNGC   0.614
#>  9 LYMPH   0.791
#> 10 MYEL    0.905
#> 11 OVC     0.551
#> 12 PRC     0.880
#> 
#> $auc_barplot

#> 
#> $var_imp_res
#> $var_imp_res$features
#> # A tibble: 63 × 4
#>    Variable Importance Sign  Scaled_Importance
#>    <fct>         <dbl> <chr>             <dbl>
#>  1 ANGPT1         5.23 NEG               100  
#>  2 AHCY           4.96 POS                94.8
#>  3 APEX1          4.40 POS                84.2
#>  4 AK1            2.84 NEG                54.3
#>  5 ARTN           2.61 POS                49.8
#>  6 APBB1IP        2.35 POS                45.0
#>  7 ADAM8          2.24 NEG                42.8
#>  8 ADAMTS16       1.98 NEG                37.9
#>  9 ACP5           1.98 NEG                37.8
#> 10 AMIGO2         1.97 NEG                37.6
#> # ℹ 53 more rows
#> 
#> $var_imp_res$var_imp_plot

#> 
#>