clean_data()
preprocesses the data by filtering out rows based on the specified criteria.
It keeps only the specified columns.
It keeps only the data of the specified plates and assays.
It can remove all samples with Assay_Warning != "PASS".
It removes rows with NAs in the DAid and NPX columns.
It replaces the specified values with NA.
Arguments
- df_in
The input dataframe.
- keep_cols
The columns to keep in the output dataframe.
- filter_plates
The plates to exclude.
- filter_assays
The assays to filter out.
- filter_assay_warning
If TRUE, only the rows with Assay_Warning == "PASS" are kept. Default is FALSE.
- remove_na_cols
The columns to check for NAs and remove respective rows. Defaults is c("DAid", "NPX").
- replace_w_na
The values to replace with NA. Default is c(0, "0", "", "Unknown", "unknown", "none", NA, "na").
Examples
# Unprocessed data
example_data
#> # A tibble: 56,142 × 10
#> DAid Sample OlinkID UniProt Assay Panel NPX Assay_Warning QC_Warning
#> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 DA00001 AML_syn… OID213… Q9BTE6 AARS… Onco… 3.39 PASS PASS
#> 2 DA00001 AML_syn… OID212… P00519 ABL1 Onco… 2.76 PASS PASS
#> 3 DA00001 AML_syn… OID212… P09110 ACAA1 Onco… 1.71 PASS PASS
#> 4 DA00001 AML_syn… OID201… P16112 ACAN Card… 0.0333 PASS PASS
#> 5 DA00001 AML_syn… OID201… Q9BYF1 ACE2 Card… 1.76 PASS PASS
#> 6 DA00001 AML_syn… OID201… Q15067 ACOX1 Card… -0.919 PASS PASS
#> 7 DA00001 AML_syn… OID203… P13686 ACP5 Card… 1.54 PASS PASS
#> 8 DA00001 AML_syn… OID214… Q9NPH0 ACP6 Onco… 2.15 PASS PASS
#> 9 DA00001 AML_syn… OID200… P62736 ACTA2 Card… 2.81 PASS PASS
#> 10 DA00001 AML_syn… OID204… O43707 ACTN4 Infl… 0.742 PASS PASS
#> # ℹ 56,132 more rows
#> # ℹ 1 more variable: PlateID <chr>
# Preprocessed data
clean_data(example_data, filter_plates = c("Plate1", "Plate2"), filter_assay_warning = TRUE)
#> # A tibble: 55,581 × 3
#> DAid Assay NPX
#> <chr> <chr> <dbl>
#> 1 DA00001 AARSD1 3.39
#> 2 DA00001 ABL1 2.76
#> 3 DA00001 ACAA1 1.71
#> 4 DA00001 ACAN 0.0333
#> 5 DA00001 ACE2 1.76
#> 6 DA00001 ACOX1 -0.919
#> 7 DA00001 ACP5 1.54
#> 8 DA00001 ACP6 2.15
#> 9 DA00001 ACTA2 2.81
#> 10 DA00001 ACTN4 0.742
#> # ℹ 55,571 more rows