pkgdown/extra.css

Skip to contents

clean_data() preprocesses the data by filtering out rows based on the specified criteria.

  • It keeps only the specified columns.

  • It keeps only the data of the specified plates and assays.

  • It can remove all samples with Assay_Warning != "PASS".

  • It removes rows with NAs in the DAid and NPX columns.

  • It replaces the specified values with NA.

Usage

clean_data(
  df_in,
  keep_cols = c("DAid", "Assay", "NPX"),
  filter_plates = NULL,
  filter_assays = NULL,
  filter_assay_warning = FALSE,
  remove_na_cols = c("DAid", "NPX"),
  replace_w_na = c(0, "0", "", "Unknown", "unknown", "none", NA, "na")
)

Arguments

df_in

The input dataframe.

keep_cols

The columns to keep in the output dataframe.

filter_plates

The plates to exclude.

filter_assays

The assays to filter out.

filter_assay_warning

If TRUE, only the rows with Assay_Warning == "PASS" are kept. Default is FALSE.

remove_na_cols

The columns to check for NAs and remove respective rows. Defaults is c("DAid", "NPX").

replace_w_na

The values to replace with NA. Default is c(0, "0", "", "Unknown", "unknown", "none", NA, "na").

Value

The preprocessed dataframe.

Examples

# Unprocessed data
example_data
#> # A tibble: 56,142 × 10
#>    DAid    Sample   OlinkID UniProt Assay Panel     NPX Assay_Warning QC_Warning
#>    <chr>   <chr>    <chr>   <chr>   <chr> <chr>   <dbl> <chr>         <chr>     
#>  1 DA00001 AML_syn… OID213… Q9BTE6  AARS… Onco…  3.39   PASS          PASS      
#>  2 DA00001 AML_syn… OID212… P00519  ABL1  Onco…  2.76   PASS          PASS      
#>  3 DA00001 AML_syn… OID212… P09110  ACAA1 Onco…  1.71   PASS          PASS      
#>  4 DA00001 AML_syn… OID201… P16112  ACAN  Card…  0.0333 PASS          PASS      
#>  5 DA00001 AML_syn… OID201… Q9BYF1  ACE2  Card…  1.76   PASS          PASS      
#>  6 DA00001 AML_syn… OID201… Q15067  ACOX1 Card… -0.919  PASS          PASS      
#>  7 DA00001 AML_syn… OID203… P13686  ACP5  Card…  1.54   PASS          PASS      
#>  8 DA00001 AML_syn… OID214… Q9NPH0  ACP6  Onco…  2.15   PASS          PASS      
#>  9 DA00001 AML_syn… OID200… P62736  ACTA2 Card…  2.81   PASS          PASS      
#> 10 DA00001 AML_syn… OID204… O43707  ACTN4 Infl…  0.742  PASS          PASS      
#> # ℹ 56,132 more rows
#> # ℹ 1 more variable: PlateID <chr>

# Preprocessed data
clean_data(example_data, filter_plates = c("Plate1", "Plate2"), filter_assay_warning = TRUE)
#> # A tibble: 55,581 × 3
#>    DAid    Assay      NPX
#>    <chr>   <chr>    <dbl>
#>  1 DA00001 AARSD1  3.39  
#>  2 DA00001 ABL1    2.76  
#>  3 DA00001 ACAA1   1.71  
#>  4 DA00001 ACAN    0.0333
#>  5 DA00001 ACE2    1.76  
#>  6 DA00001 ACOX1  -0.919 
#>  7 DA00001 ACP5    1.54  
#>  8 DA00001 ACP6    2.15  
#>  9 DA00001 ACTA2   2.81  
#> 10 DA00001 ACTN4   0.742 
#> # ℹ 55,571 more rows