This is a helper function to encode missing entries across groups of categorical variables in a data frame.
Usage
df_explicit_na(
data,
omit_columns = NULL,
char_as_factor = TRUE,
logical_as_factor = FALSE,
na_level = "<Missing>"
)
Arguments
- data
(
data.frame
)
data set.- omit_columns
(
character
)
names of variables fromdata
that should not be modified by this function.- char_as_factor
(
flag
)
whether to convert character variables indata
to factors.- logical_as_factor
(
flag
)
whether to convert logical variables indata
to factors.- na_level
(
string
)
used to replace allNA
or empty values inside non-omit_columns
columns.
Details
Missing entries are those with NA
or empty strings and will
be replaced with a specified value. If factor variables include missing
values, the missing value will be inserted as the last level.
Similarly, in case character or logical variables should be converted to factors
with the char_as_factor
or logical_as_factor
options, the missing values will
be set as the last level.
See also
sas_na()
and explicit_na()
for other missing data helper functions.
Examples
my_data <- data.frame(
u = c(TRUE, FALSE, NA, TRUE),
v = factor(c("A", NA, NA, NA), levels = c("Z", "A")),
w = c("A", "B", NA, "C"),
x = c("D", "E", "F", NA),
y = c("G", "H", "I", ""),
z = c(1, 2, 3, 4),
stringsAsFactors = FALSE
)
# Encode missing values in all character or factor columns.
df_explicit_na(my_data)
#> u v w x y z
#> 1 TRUE A A D G 1
#> 2 FALSE <Missing> B E H 2
#> 3 NA <Missing> <Missing> F I 3
#> 4 TRUE <Missing> C <Missing> <Missing> 4
# Also convert logical columns to factor columns.
df_explicit_na(my_data, logical_as_factor = TRUE)
#> u v w x y z
#> 1 TRUE A A D G 1
#> 2 FALSE <Missing> B E H 2
#> 3 <Missing> <Missing> <Missing> F I 3
#> 4 TRUE <Missing> C <Missing> <Missing> 4
# Encode missing values in a subset of columns.
df_explicit_na(my_data, omit_columns = c("x", "y"))
#> u v w x y z
#> 1 TRUE A A D G 1
#> 2 FALSE <Missing> B E H 2
#> 3 NA <Missing> <Missing> F I 3
#> 4 TRUE <Missing> C <NA> 4