hd_bin_columns()
bins continuous variables and labels them with ranges.
Details
In case of a dataset with many variables, it is recommended to use the function
hd_detect_vartype()
to automatically detect the type of each variable. See examples.
Examples
# Example dataframe
test_data <- data.frame(
age = c(25, 35, 45, 55, 65),
BMI = c(25, 35, 30, 32, 28),
sex = c("M", "F", "M", "F", "M")
)
column_types <- c("continuous", "continuous", "categorical")
hd_bin_columns(test_data, column_types, bins = 3)
#> age BMI sex
#> 1 25-38 25-28 M
#> 2 25-38 32-35 F
#> 3 38-52 28-32 M
#> 4 52-65 32-35 F
#> 5 52-65 25-28 M
# Automatically detect variable types
test_data <- data.frame(Category = c("A", "B", "A", "C", "B", "A"),
Continuous = c(1.1, 2.5, 3.8, 4.0, 5.8, 9))
column_types <- sapply(test_data, hd_detect_vartype)
# The variable to be binned has one significant digit
# So we will also round the bins to one digit
hd_bin_columns(test_data, column_types, bins = 3, round_digits = 1)
#> Category Continuous
#> 1 A 1.1-3.7
#> 2 B 1.1-3.7
#> 3 A 3.7-6.4
#> 4 C 3.7-6.4
#> 5 B 3.7-6.4
#> 6 A 6.4-9