clean_metadata()
preprocesses the metadata by filtering out rows based on the specified criteria.
It keeps only the specified columns.
It keeps only the data of the specified cohort.
It removes rows with NAs in the DAid and Disease columns.
It replaces the specified values with NA.
Arguments
- df_in
The input metadata.
- keep_cols
The columns to keep in the output metadata.
- cohort
The cohort to keep.
- remove_na_cols
The columns to check for NAs and remove respective rows. Defaults is c("DAid", "Disease").
- replace_w_na
The values to replace with NA. Default is c("Unknown", "unknown", "none", NA, "na").
Examples
# Unprocessed metadata
example_metadata
#> # A tibble: 586 × 9
#> DAid Sample Disease Stage Grade Sex Age BMI Cohort
#> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 DA00001 AML_syn_1 AML 2 NA F 42 22.7 UCAN
#> 2 DA00002 AML_syn_2 AML Unknown NA M 69 33.1 UCAN
#> 3 DA00003 AML_syn_3 AML 2 NA F 61 26.2 UCAN
#> 4 DA00004 AML_syn_4 AML Unknown NA M 54 28.1 UCAN
#> 5 DA00005 AML_syn_5 AML 2 NA F 57 21.4 UCAN
#> 6 DA00006 AML_syn_6 AML Unknown NA M 86 33.9 UCAN
#> 7 DA00007 AML_syn_7 AML 1 NA F 85 28.7 UCAN
#> 8 DA00008 AML_syn_8 AML 3 NA F 88 32.6 UCAN
#> 9 DA00009 AML_syn_9 AML Unknown NA M 80 26.1 UCAN
#> 10 DA00010 AML_syn_10 AML 3 NA M 48 33.8 UCAN
#> # ℹ 576 more rows
# Preprocessed metadata
clean_metadata(example_metadata)
#> # A tibble: 586 × 5
#> DAid Disease Sex Age BMI
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 DA00001 AML F 42 22.7
#> 2 DA00002 AML M 69 33.1
#> 3 DA00003 AML F 61 26.2
#> 4 DA00004 AML M 54 28.1
#> 5 DA00005 AML F 57 21.4
#> 6 DA00006 AML M 86 33.9
#> 7 DA00007 AML F 85 28.7
#> 8 DA00008 AML F 88 32.6
#> 9 DA00009 AML M 80 26.1
#> 10 DA00010 AML M 48 33.8
#> # ℹ 576 more rows