qc_summary_metadata()
summarizes the quality control results of the metadata dataframe.
It checks the column types, calculates the percentage of NAs in each column and row,
and creates summary visualizations for user selected categorical and numeric variables.
Usage
qc_summary_metadata(
metadata,
categorical = "Sex",
numerical = "Age",
disease_palette = NULL,
categ_palette = "sex_hpa",
report = TRUE
)
Arguments
- metadata
The metadata dataframe.
- categorical
The categorical variables to summarize. Default is "Sex".
- numerical
The numeric variables to summarize. Default is "Age".
- disease_palette
The color palette for the different diseases. If it is a character, it should be one of the palettes from
get_hpa_palettes()
.- categ_palette
The categorical color palette. If it is a character, it should be one of the palettes from
get_hpa_palettes()
. Default is "sex_hpa".- report
Whether to print the summary. Default is TRUE.
Value
A list containing the following elements:
na_percentage_col: A tibble with the column names and the percentage of NAs in each column.
na_percentage_row: A tibble with the DAids and the percentage of NAs in each row.
Several distribution and barplots, as well as the counts of samples.
Examples
qc_res <- qc_summary_metadata(example_metadata, disease_palette = "cancers12")
#> [1] "Summary:"
#> [1] "Note: In case of long output, only the first 10 rows are shown. To see the rest display the object with view()"
#> [1] "Number of samples: 586"
#> [1] "Number of variables: 8"
#> [1] "--------------------------------------"
#> [1] "character : 7"
#> [1] "numeric : 2"
#> [1] "--------------------------------------"
#> [1] "NA percentage in each column:"
#> # A tibble: 1 × 2
#> column na_percentage
#> <chr> <dbl>
#> 1 Grade 91.5
#> [1] "--------------------------------------"
#> [1] "NA percentage in each row:"
#> # A tibble: 536 × 2
#> DAid na_percentage
#> <chr> <dbl>
#> 1 DA00001 11.1
#> 2 DA00002 11.1
#> 3 DA00003 11.1
#> 4 DA00004 11.1
#> 5 DA00005 11.1
#> 6 DA00006 11.1
#> 7 DA00007 11.1
#> 8 DA00008 11.1
#> 9 DA00009 11.1
#> 10 DA00010 11.1
#> # ℹ 526 more rows
#> [1] "--------------------------------------"
#> Sex contains:
#> # A tibble: 19 × 3
#> Disease Sex n
#> <chr> <chr> <int>
#> 1 AML F 23
#> 2 AML M 27
#> 3 BRC F 50
#> 4 CLL F 21
#> 5 CLL M 27
#> 6 CRC F 28
#> 7 CRC M 22
#> 8 CVX F 50
#> 9 ENDC F 50
#> 10 GLIOM F 24
#> 11 GLIOM M 26
#> 12 LUNGC F 33
#> 13 LUNGC M 17
#> 14 LYMPH F 22
#> 15 LYMPH M 28
#> 16 MYEL F 15
#> 17 MYEL M 23
#> 18 OVC F 50
#> 19 PRC M 50
# Metadata distributions
qc_res$barplot_Sex
qc_res$distplot_Age
#> Picking joint bandwidth of 6.06