pkgdown/extra.css

Skip to contents

qc_summary_metadata() summarizes the quality control results of the metadata dataframe. It checks the column types, calculates the percentage of NAs in each column and row, and creates summary visualizations for user selected categorical and numeric variables.

Usage

qc_summary_metadata(
  metadata,
  categorical = "Sex",
  numerical = "Age",
  disease_palette = NULL,
  categ_palette = "sex_hpa",
  report = TRUE
)

Arguments

metadata

The metadata dataframe.

categorical

The categorical variables to summarize. Default is "Sex".

numerical

The numeric variables to summarize. Default is "Age".

disease_palette

The color palette for the different diseases. If it is a character, it should be one of the palettes from get_hpa_palettes().

categ_palette

The categorical color palette. If it is a character, it should be one of the palettes from get_hpa_palettes(). Default is "sex_hpa".

report

Whether to print the summary. Default is TRUE.

Value

A list containing the following elements:

  • na_percentage_col: A tibble with the column names and the percentage of NAs in each column.

  • na_percentage_row: A tibble with the DAids and the percentage of NAs in each row.

  • Several distribution and barplots, as well as the counts of samples.

Examples

qc_res <- qc_summary_metadata(example_metadata, disease_palette = "cancers12")
#> [1] "Summary:"
#> [1] "Note: In case of long output, only the first 10 rows are shown. To see the rest display the object with view()"
#> [1] "Number of samples: 586"
#> [1] "Number of variables: 8"
#> [1] "--------------------------------------"
#> [1] "character : 7"
#> [1] "numeric : 2"
#> [1] "--------------------------------------"
#> [1] "NA percentage in each column:"
#> # A tibble: 1 × 2
#>   column na_percentage
#>   <chr>          <dbl>
#> 1 Grade           91.5
#> [1] "--------------------------------------"
#> [1] "NA percentage in each row:"
#> # A tibble: 536 × 2
#>    DAid    na_percentage
#>    <chr>           <dbl>
#>  1 DA00001          11.1
#>  2 DA00002          11.1
#>  3 DA00003          11.1
#>  4 DA00004          11.1
#>  5 DA00005          11.1
#>  6 DA00006          11.1
#>  7 DA00007          11.1
#>  8 DA00008          11.1
#>  9 DA00009          11.1
#> 10 DA00010          11.1
#> # ℹ 526 more rows
#> [1] "--------------------------------------"
#> Sex contains:
#> # A tibble: 19 × 3
#>    Disease Sex       n
#>    <chr>   <chr> <int>
#>  1 AML     F        23
#>  2 AML     M        27
#>  3 BRC     F        50
#>  4 CLL     F        21
#>  5 CLL     M        27
#>  6 CRC     F        28
#>  7 CRC     M        22
#>  8 CVX     F        50
#>  9 ENDC    F        50
#> 10 GLIOM   F        24
#> 11 GLIOM   M        26
#> 12 LUNGC   F        33
#> 13 LUNGC   M        17
#> 14 LYMPH   F        22
#> 15 LYMPH   M        28
#> 16 MYEL    F        15
#> 17 MYEL    M        23
#> 18 OVC     F        50
#> 19 PRC     M        50

# Metadata distributions
qc_res$barplot_Sex

qc_res$distplot_Age
#> Picking joint bandwidth of 6.06