We use the new S3 generic function s_summary() to implement summaries for
different x objects. This is used as Statistics Function in combination
with the new Analyze Function summarize_vars().
Usage
s_summary(x, na.rm = TRUE, denom, .N_row, .N_col, na_level, .var, ...)
# S3 method for numeric
s_summary(
x,
na.rm = TRUE,
denom,
.N_row,
.N_col,
na_level,
.var,
control = control_summarize_vars(),
...
)
# S3 method for factor
s_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
na_level = "<Missing>",
...
)
# S3 method for character
s_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
na_level = "<Missing>",
.var,
verbose = TRUE,
...
)
# S3 method for logical
s_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
...
)
a_summary(x, ..., .N_row, .N_col, .var)
# S3 method for numeric
a_summary(
x,
na.rm = TRUE,
denom,
.N_row,
.N_col,
na_level,
.var,
control = control_summarize_vars(),
...
)
# S3 method for factor
a_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
na_level = "<Missing>",
...
)
# S3 method for character
a_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
na_level = "<Missing>",
.var,
verbose = TRUE,
...
)
# S3 method for logical
a_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
...
)
summarize_vars(
lyt,
vars,
var_labels = vars,
nested = TRUE,
...,
show_labels = "default",
table_names = vars,
.stats = c("n", "mean_sd", "median", "range", "count_fraction"),
.formats = NULL,
.labels = NULL,
.indent_mods = NULL
)Arguments
- x
(
numeric)
vector of numbers we want to analyze.- na.rm
(
flag)
whetherNAvalues should be removed fromxprior to analysis.- denom
-
(
string)
choice of denominator for proportion. Options are:n: number of values in this row and column intersection.N_row: total number of values in this row across columns.N_col: total number of values in this column across rows.
- .N_row
(
count)
column-wise N (column count) for the full column that is passed byrtables.- .N_col
(
count)
row-wise N (row group count) for the group of observations being analyzed (i.e. with no column-based subsetting) that is passed byrtables.- na_level
(
string)
used to replace allNAor empty values in factors with customstring.- .var
(
string)
single variable name that is passed byrtableswhen requested by a statistics function.- ...
arguments passed to
s_summary().- control
-
(
list)
parameters for descriptive statistics details, specified by using the helper functioncontrol_summarize_vars(). Some possible parameter options are:conf_level(proportion)
confidence level of the interval for mean and median.quantiles(numeric)
vector of length two to specify the quantiles.quantile_type(numeric)
between 1 and 9 selecting quantile algorithms to be used. See more abouttypeinstats::quantile().test_mean(numeric)
value to test against the mean under the null hypothesis when calculating p-value.
- verbose
defaults to
TRUE. It prints out warnings and messages. It is mainly used to print out information about factor casting.- lyt
(
layout)
input layout where analyses will be added to.- vars
(
character)
variable names for the primary analysis variable to be iterated over.- var_labels
character for label.
- nested
boolean. Should this layout instruction be applied within the existing layout structure if possible (
TRUE, the default) or as a new top-level element (`FALSE). Ignored if it would nest a split underneath analyses, which is not allowed.- show_labels
label visibility: one of "default", "visible" and "hidden".
- table_names
(
character)
this can be customized in case that the samevarsare analyzed multiple times, to avoid warnings fromrtables.- .stats
(
character)
statistics to select for the table.- .formats
(named
characterorlist)
formats for the statistics.- .labels
(named
character)
labels for the statistics (without indent).- .indent_mods
(named
integer)
indent modifiers for the labels.
Value
If x is of class numeric, returns a list with the following named numeric items:
- n
the
length()ofx.- sum
the
sum()ofx.- mean
the
mean()ofx.- sd
the
stats::sd()ofx.- se
the standard error of
xmean, i.e.: (sd(x) / sqrt(length(x))).- mean_sd
the
mean()andstats::sd()ofx.- mean_se
the
mean()ofxand its standard error (see above).- mean_ci
the CI for the mean of
x(fromstat_mean_ci()).- mean_sei
the SE interval for the mean of
x, i.e.: (mean()-/+stats::sd()/sqrt()).- mean_sdi
the SD interval for the mean of
x, i.e.: (mean()-/+stats::sd()).- mean_pval
the two-sided p-value of the mean of
x(fromstat_mean_pval()).- median
the
stats::median()ofx.- mad
the median absolute deviation of
x, i.e.: (stats::median()ofxc, wherexc=x-stats::median()).- median_ci
the CI for the median of
x(fromstat_median_ci()).- quantiles
two sample quantiles of
x(fromstats::quantile()).- iqr
the
stats::IQR()ofx.- range
the
range_noinf()ofx.- min
the
max()ofx.- max
the
min()ofx.- cv
the coefficient of variation of
x, i.e.: (stats::sd()/mean()* 100).- geom_mean
the geometric mean of
x, i.e.: (exp(mean(log(x)))).- geom_cv
the geometric coefficient of variation of
x, i.e.: (sqrt(exp(sd(log(x)) ^ 2) - 1) * 100).
If x is of class factor or converted from character, returns a list with named numeric items:
- n
the
length()ofx.- count
a list with the number of cases for each level of the factor
x.- count_fraction
similar to
countbut also includes the proportion of cases for each level of the factorxrelative to the denominator, orNAif the denominator is zero.
If x is of class logical, returns a list with named numeric items:
- n
the
length()ofx(possibly after removingNAs).- count
count of
TRUEinx.- count_fraction
count and proportion of
TRUEinxrelative to the denominator, orNAif the denominator is zero. Note thatNAs inxare never counted or leading toNAhere.
Functions
s_summary():s_summaryis a S3 generic function to produce an object description.s_summary(numeric): Method for numeric class. Note that, ifxis an empty vector,NAis returned. This is the expected feature so as to returnrcellcontent inrtableswhen the intersection of a column and a row delimits an empty data selection. Also, when themeanfunction is applied to an empty vector,NAwill be returned instead ofNaN, the latter being standard behavior in R.s_summary(factor): Method for factor class. Note that, ifxis an empty factor, then still a list is returned forcountswith one element per factor level. If there are no levels inx, the function fails. IfxcontainsNA, it is expected thatNAhave been conveyed tona_levelappropriately beforehand withdf_explicit_na()orexplicit_na().s_summary(character): Method for character class. This makes an automatic conversion to factor (with a warning) and then forwards to the method for factors.s_summary(logical): Method for logical class.a_summary(): S3 generic Formatted Analysis function to produce an object description. It is used asafuninrtables::analyze().a_summary(numeric): Formatted Analysis function method fornumeric.a_summary(factor): Method forfactor.a_summary(character): Formatted Analysis function method forcharacter.a_summary(logical): Formatted Analysis function method forlogical.summarize_vars(): Analyze Function to add a descriptive analyze layer tortablespipelines. The analysis is applied to a vector and return the summary, inrcells. The ellipsis (...) conveys arguments tos_summary(), for instancena.rm = FALSEif missing data should be accounted for. When factor variables containsNA, it is expected thatNAhave been conveyed tona_levelappropriately beforehand withdf_explicit_na().
Note
Automatic conversion of character to factor does not guarantee that the table
can be generated correctly. In particular for sparse tables this very likely can fail.
It is therefore better to always pre-process the dataset such that factors are manually
created from character variables before passing the dataset to rtables::build_table().
Formatting arguments
These additional formatting arguments can be passed to the layout creating function:
- .stats
(
character)
names of the statistics to use- .indent_mods
(
integer)
named vector of indent modifiers for the labels- .formats
(
characterorlist)
named vector of formats for the statistics- .labels
(
character)
named vector of labels for the statistics (without indent)
Examples
# `s_summary.numeric`
## Basic usage: empty numeric returns NA-filled items.
s_summary(numeric())
#> $n
#> n
#> 0
#>
#> $sum
#> sum
#> NA
#>
#> $mean
#> mean
#> NA
#>
#> $sd
#> sd
#> NA
#>
#> $se
#> se
#> NA
#>
#> $mean_sd
#> mean sd
#> NA NA
#>
#> $mean_se
#> mean se
#> NA NA
#>
#> $mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean 95% CI"
#>
#> $mean_sei
#> mean_sei_lwr mean_sei_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#>
#> $mean_sdi
#> mean_sdi_lwr mean_sdi_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#>
#> $mean_pval
#> p_value
#> NA
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#>
#> $median
#> median
#> NA
#>
#> $mad
#> mad
#> NA
#>
#> $median_ci
#> median_ci_lwr median_ci_upr
#> NA NA
#> attr(,"conf_level")
#> [1] NA
#> attr(,"label")
#> [1] "Median 95% CI"
#>
#> $quantiles
#> quantile_0.25 quantile_0.75
#> NA NA
#> attr(,"label")
#> [1] "25% and 75%-ile"
#>
#> $iqr
#> iqr
#> NA
#>
#> $range
#> min max
#> NA NA
#>
#> $min
#> min
#> NA
#>
#> $max
#> max
#> NA
#>
#> $cv
#> cv
#> NA
#>
#> $geom_mean
#> geom_mean
#> NaN
#>
#> $geom_mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#>
#> $geom_cv
#> geom_cv
#> NA
#>
## Management of NA values.
x <- c(NA_real_, 1)
s_summary(x, na.rm = TRUE)
#> $n
#> n
#> 1
#>
#> $sum
#> sum
#> 1
#>
#> $mean
#> mean
#> 1
#>
#> $sd
#> sd
#> NA
#>
#> $se
#> se
#> NA
#>
#> $mean_sd
#> mean sd
#> 1 NA
#>
#> $mean_se
#> mean se
#> 1 NA
#>
#> $mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean 95% CI"
#>
#> $mean_sei
#> mean_sei_lwr mean_sei_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#>
#> $mean_sdi
#> mean_sdi_lwr mean_sdi_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#>
#> $mean_pval
#> p_value
#> NA
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#>
#> $median
#> median
#> 1
#>
#> $mad
#> mad
#> 0
#>
#> $median_ci
#> median_ci_lwr median_ci_upr
#> NA NA
#> attr(,"conf_level")
#> [1] NA
#> attr(,"label")
#> [1] "Median 95% CI"
#>
#> $quantiles
#> quantile_0.25 quantile_0.75
#> 1 1
#> attr(,"label")
#> [1] "25% and 75%-ile"
#>
#> $iqr
#> iqr
#> 0
#>
#> $range
#> min max
#> 1 1
#>
#> $min
#> min
#> 1
#>
#> $max
#> max
#> 1
#>
#> $cv
#> cv
#> NA
#>
#> $geom_mean
#> geom_mean
#> 1
#>
#> $geom_mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#>
#> $geom_cv
#> geom_cv
#> NA
#>
s_summary(x, na.rm = FALSE)
#> $n
#> n
#> 2
#>
#> $sum
#> sum
#> NA
#>
#> $mean
#> mean
#> NA
#>
#> $sd
#> sd
#> NA
#>
#> $se
#> se
#> NA
#>
#> $mean_sd
#> mean sd
#> NA NA
#>
#> $mean_se
#> mean se
#> NA NA
#>
#> $mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean 95% CI"
#>
#> $mean_sei
#> mean_sei_lwr mean_sei_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#>
#> $mean_sdi
#> mean_sdi_lwr mean_sdi_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#>
#> $mean_pval
#> p_value
#> NA
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#>
#> $median
#> median
#> NA
#>
#> $mad
#> mad
#> NA
#>
#> $median_ci
#> median_ci_lwr median_ci_upr
#> NA NA
#> attr(,"conf_level")
#> [1] NA
#> attr(,"label")
#> [1] "Median 95% CI"
#>
#> $quantiles
#> quantile_0.25 quantile_0.75
#> NA NA
#> attr(,"label")
#> [1] "25% and 75%-ile"
#>
#> $iqr
#> iqr
#> NA
#>
#> $range
#> min max
#> NA NA
#>
#> $min
#> min
#> NA
#>
#> $max
#> max
#> NA
#>
#> $cv
#> cv
#> NA
#>
#> $geom_mean
#> geom_mean
#> NA
#>
#> $geom_mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#>
#> $geom_cv
#> geom_cv
#> NA
#>
x <- c(NA_real_, 1, 2)
s_summary(x, stats = NULL)
#> $n
#> n
#> 2
#>
#> $sum
#> sum
#> 3
#>
#> $mean
#> mean
#> 1.5
#>
#> $sd
#> sd
#> 0.7071068
#>
#> $se
#> se
#> 0.5
#>
#> $mean_sd
#> mean sd
#> 1.5000000 0.7071068
#>
#> $mean_se
#> mean se
#> 1.5 0.5
#>
#> $mean_ci
#> mean_ci_lwr mean_ci_upr
#> -4.853102 7.853102
#> attr(,"label")
#> [1] "Mean 95% CI"
#>
#> $mean_sei
#> mean_sei_lwr mean_sei_upr
#> 1 2
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#>
#> $mean_sdi
#> mean_sdi_lwr mean_sdi_upr
#> 0.7928932 2.2071068
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#>
#> $mean_pval
#> p_value
#> 0.2048328
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#>
#> $median
#> median
#> 1.5
#>
#> $mad
#> mad
#> 0
#>
#> $median_ci
#> median_ci_lwr median_ci_upr
#> NA NA
#> attr(,"conf_level")
#> [1] NA
#> attr(,"label")
#> [1] "Median 95% CI"
#>
#> $quantiles
#> quantile_0.25 quantile_0.75
#> 1 2
#> attr(,"label")
#> [1] "25% and 75%-ile"
#>
#> $iqr
#> iqr
#> 1
#>
#> $range
#> min max
#> 1 2
#>
#> $min
#> min
#> 1
#>
#> $max
#> max
#> 2
#>
#> $cv
#> cv
#> 47.14045
#>
#> $geom_mean
#> geom_mean
#> 1.414214
#>
#> $geom_mean_ci
#> mean_ci_lwr mean_ci_upr
#> 0.01729978 115.60839614
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#>
#> $geom_cv
#> geom_cv
#> 52.10922
#>
## Benefits in `rtables` contructions:
require(rtables)
dta_test <- data.frame(
Group = rep(LETTERS[1:3], each = 2),
sub_group = rep(letters[1:2], each = 3),
x = 1:6
)
## The summary obtained in with `rtables`:
basic_table() %>%
split_cols_by(var = "Group") %>%
split_rows_by(var = "sub_group") %>%
analyze(vars = "x", afun = s_summary) %>%
build_table(df = dta_test)
#> Warning: number of items to replace is not a multiple of replacement length
#> Warning: number of items to replace is not a multiple of replacement length
#> Warning: number of items to replace is not a multiple of replacement length
#> Warning: number of items to replace is not a multiple of replacement length
#> Warning: number of items to replace is not a multiple of replacement length
#> Warning: number of items to replace is not a multiple of replacement length
#> Warning: number of items to replace is not a multiple of replacement length
#> Warning: number of items to replace is not a multiple of replacement length
#> Warning: number of items to replace is not a multiple of replacement length
#> Warning: number of items to replace is not a multiple of replacement length
#> Warning: number of items to replace is not a multiple of replacement length
#> Warning: number of items to replace is not a multiple of replacement length
#> A B C
#> ———————————————————————————————————————————————————————————————————————————————————————————————————————————————————
#> a
#> n 2 1 0
#> sum 3 3 NA
#> mean 1.5 3 NA
#> sd 0.707106781186548 NA NA
#> se 0.5 NA NA
#> mean_sd 1.5, 0.707106781186548 3, NA NA
#> mean_se 1.5, 0.5 3, NA NA
#> Mean 95% CI -4.85310236808735, 7.85310236808735 NA NA
#> Mean -/+ 1xSE 1, 2 NA NA
#> Mean -/+ 1xSD 0.792893218813452, 2.20710678118655 NA NA
#> Mean p-value (H0: mean = 0) 0.204832764699133 NA NA
#> median 1.5 3 NA
#> mad 0 0 NA
#> Median 95% CI NA NA NA
#> 25% and 75%-ile 1, 2 3, 3 NA
#> iqr 1 0 NA
#> range 1, 2 3, 3 NA
#> min 1 3 NA
#> max 2 3 NA
#> cv 47.1404520791032 NA NA
#> geom_mean 1.41421356237309 3 NA
#> Geometric Mean 95% CI 0.0172997815631007, 115.608396135236 NA NA
#> geom_cv 52.1092246837487 NA NA
#> b
#> n 0 1 2
#> sum NA 4 11
#> mean NA 4 5.5
#> sd NA NA 0.707106781186548
#> se NA NA 0.5
#> mean_sd NA 4, NA 5.5, 0.707106781186548
#> mean_se NA 4, NA 5.5, 0.5
#> Mean 95% CI NA NA -0.853102368087347, 11.8531023680873
#> Mean -/+ 1xSE NA NA 5, 6
#> Mean -/+ 1xSD NA NA 4.79289321881345, 6.20710678118655
#> Mean p-value (H0: mean = 0) NA NA 0.0577158767526089
#> median NA 4 5.5
#> mad NA 0 0
#> Median 95% CI NA NA NA
#> 25% and 75%-ile NA 4, 4 5, 6
#> iqr NA 0 1
#> range NA 4, 4 5, 6
#> min NA 4 5
#> max NA 4 6
#> cv NA NA 12.8564869306645
#> geom_mean NA 4 5.47722557505166
#> Geometric Mean 95% CI NA NA 1.71994304449266, 17.4424380482025
#> geom_cv NA NA 12.945835316564
## By comparison with `lapply`:
X <- split(dta_test, f = with(dta_test, interaction(Group, sub_group)))
lapply(X, function(x) s_summary(x$x))
#> $A.a
#> $A.a$n
#> n
#> 2
#>
#> $A.a$sum
#> sum
#> 3
#>
#> $A.a$mean
#> mean
#> 1.5
#>
#> $A.a$sd
#> sd
#> 0.7071068
#>
#> $A.a$se
#> se
#> 0.5
#>
#> $A.a$mean_sd
#> mean sd
#> 1.5000000 0.7071068
#>
#> $A.a$mean_se
#> mean se
#> 1.5 0.5
#>
#> $A.a$mean_ci
#> mean_ci_lwr mean_ci_upr
#> -4.853102 7.853102
#> attr(,"label")
#> [1] "Mean 95% CI"
#>
#> $A.a$mean_sei
#> mean_sei_lwr mean_sei_upr
#> 1 2
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#>
#> $A.a$mean_sdi
#> mean_sdi_lwr mean_sdi_upr
#> 0.7928932 2.2071068
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#>
#> $A.a$mean_pval
#> p_value
#> 0.2048328
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#>
#> $A.a$median
#> median
#> 1.5
#>
#> $A.a$mad
#> mad
#> 0
#>
#> $A.a$median_ci
#> median_ci_lwr median_ci_upr
#> NA NA
#> attr(,"conf_level")
#> [1] NA
#> attr(,"label")
#> [1] "Median 95% CI"
#>
#> $A.a$quantiles
#> quantile_0.25 quantile_0.75
#> 1 2
#> attr(,"label")
#> [1] "25% and 75%-ile"
#>
#> $A.a$iqr
#> iqr
#> 1
#>
#> $A.a$range
#> min max
#> 1 2
#>
#> $A.a$min
#> min
#> 1
#>
#> $A.a$max
#> max
#> 2
#>
#> $A.a$cv
#> cv
#> 47.14045
#>
#> $A.a$geom_mean
#> geom_mean
#> 1.414214
#>
#> $A.a$geom_mean_ci
#> mean_ci_lwr mean_ci_upr
#> 0.01729978 115.60839614
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#>
#> $A.a$geom_cv
#> geom_cv
#> 52.10922
#>
#>
#> $B.a
#> $B.a$n
#> n
#> 1
#>
#> $B.a$sum
#> sum
#> 3
#>
#> $B.a$mean
#> mean
#> 3
#>
#> $B.a$sd
#> sd
#> NA
#>
#> $B.a$se
#> se
#> NA
#>
#> $B.a$mean_sd
#> mean sd
#> 3 NA
#>
#> $B.a$mean_se
#> mean se
#> 3 NA
#>
#> $B.a$mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean 95% CI"
#>
#> $B.a$mean_sei
#> mean_sei_lwr mean_sei_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#>
#> $B.a$mean_sdi
#> mean_sdi_lwr mean_sdi_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#>
#> $B.a$mean_pval
#> p_value
#> NA
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#>
#> $B.a$median
#> median
#> 3
#>
#> $B.a$mad
#> mad
#> 0
#>
#> $B.a$median_ci
#> median_ci_lwr median_ci_upr
#> NA NA
#> attr(,"conf_level")
#> [1] NA
#> attr(,"label")
#> [1] "Median 95% CI"
#>
#> $B.a$quantiles
#> quantile_0.25 quantile_0.75
#> 3 3
#> attr(,"label")
#> [1] "25% and 75%-ile"
#>
#> $B.a$iqr
#> iqr
#> 0
#>
#> $B.a$range
#> min max
#> 3 3
#>
#> $B.a$min
#> min
#> 3
#>
#> $B.a$max
#> max
#> 3
#>
#> $B.a$cv
#> cv
#> NA
#>
#> $B.a$geom_mean
#> geom_mean
#> 3
#>
#> $B.a$geom_mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#>
#> $B.a$geom_cv
#> geom_cv
#> NA
#>
#>
#> $C.a
#> $C.a$n
#> n
#> 0
#>
#> $C.a$sum
#> sum
#> NA
#>
#> $C.a$mean
#> mean
#> NA
#>
#> $C.a$sd
#> sd
#> NA
#>
#> $C.a$se
#> se
#> NA
#>
#> $C.a$mean_sd
#> mean sd
#> NA NA
#>
#> $C.a$mean_se
#> mean se
#> NA NA
#>
#> $C.a$mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean 95% CI"
#>
#> $C.a$mean_sei
#> mean_sei_lwr mean_sei_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#>
#> $C.a$mean_sdi
#> mean_sdi_lwr mean_sdi_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#>
#> $C.a$mean_pval
#> p_value
#> NA
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#>
#> $C.a$median
#> median
#> NA
#>
#> $C.a$mad
#> mad
#> NA
#>
#> $C.a$median_ci
#> median_ci_lwr median_ci_upr
#> NA NA
#> attr(,"conf_level")
#> [1] NA
#> attr(,"label")
#> [1] "Median 95% CI"
#>
#> $C.a$quantiles
#> quantile_0.25 quantile_0.75
#> NA NA
#> attr(,"label")
#> [1] "25% and 75%-ile"
#>
#> $C.a$iqr
#> iqr
#> NA
#>
#> $C.a$range
#> min max
#> NA NA
#>
#> $C.a$min
#> min
#> NA
#>
#> $C.a$max
#> max
#> NA
#>
#> $C.a$cv
#> cv
#> NA
#>
#> $C.a$geom_mean
#> geom_mean
#> NaN
#>
#> $C.a$geom_mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#>
#> $C.a$geom_cv
#> geom_cv
#> NA
#>
#>
#> $A.b
#> $A.b$n
#> n
#> 0
#>
#> $A.b$sum
#> sum
#> NA
#>
#> $A.b$mean
#> mean
#> NA
#>
#> $A.b$sd
#> sd
#> NA
#>
#> $A.b$se
#> se
#> NA
#>
#> $A.b$mean_sd
#> mean sd
#> NA NA
#>
#> $A.b$mean_se
#> mean se
#> NA NA
#>
#> $A.b$mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean 95% CI"
#>
#> $A.b$mean_sei
#> mean_sei_lwr mean_sei_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#>
#> $A.b$mean_sdi
#> mean_sdi_lwr mean_sdi_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#>
#> $A.b$mean_pval
#> p_value
#> NA
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#>
#> $A.b$median
#> median
#> NA
#>
#> $A.b$mad
#> mad
#> NA
#>
#> $A.b$median_ci
#> median_ci_lwr median_ci_upr
#> NA NA
#> attr(,"conf_level")
#> [1] NA
#> attr(,"label")
#> [1] "Median 95% CI"
#>
#> $A.b$quantiles
#> quantile_0.25 quantile_0.75
#> NA NA
#> attr(,"label")
#> [1] "25% and 75%-ile"
#>
#> $A.b$iqr
#> iqr
#> NA
#>
#> $A.b$range
#> min max
#> NA NA
#>
#> $A.b$min
#> min
#> NA
#>
#> $A.b$max
#> max
#> NA
#>
#> $A.b$cv
#> cv
#> NA
#>
#> $A.b$geom_mean
#> geom_mean
#> NaN
#>
#> $A.b$geom_mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#>
#> $A.b$geom_cv
#> geom_cv
#> NA
#>
#>
#> $B.b
#> $B.b$n
#> n
#> 1
#>
#> $B.b$sum
#> sum
#> 4
#>
#> $B.b$mean
#> mean
#> 4
#>
#> $B.b$sd
#> sd
#> NA
#>
#> $B.b$se
#> se
#> NA
#>
#> $B.b$mean_sd
#> mean sd
#> 4 NA
#>
#> $B.b$mean_se
#> mean se
#> 4 NA
#>
#> $B.b$mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean 95% CI"
#>
#> $B.b$mean_sei
#> mean_sei_lwr mean_sei_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#>
#> $B.b$mean_sdi
#> mean_sdi_lwr mean_sdi_upr
#> NA NA
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#>
#> $B.b$mean_pval
#> p_value
#> NA
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#>
#> $B.b$median
#> median
#> 4
#>
#> $B.b$mad
#> mad
#> 0
#>
#> $B.b$median_ci
#> median_ci_lwr median_ci_upr
#> NA NA
#> attr(,"conf_level")
#> [1] NA
#> attr(,"label")
#> [1] "Median 95% CI"
#>
#> $B.b$quantiles
#> quantile_0.25 quantile_0.75
#> 4 4
#> attr(,"label")
#> [1] "25% and 75%-ile"
#>
#> $B.b$iqr
#> iqr
#> 0
#>
#> $B.b$range
#> min max
#> 4 4
#>
#> $B.b$min
#> min
#> 4
#>
#> $B.b$max
#> max
#> 4
#>
#> $B.b$cv
#> cv
#> NA
#>
#> $B.b$geom_mean
#> geom_mean
#> 4
#>
#> $B.b$geom_mean_ci
#> mean_ci_lwr mean_ci_upr
#> NA NA
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#>
#> $B.b$geom_cv
#> geom_cv
#> NA
#>
#>
#> $C.b
#> $C.b$n
#> n
#> 2
#>
#> $C.b$sum
#> sum
#> 11
#>
#> $C.b$mean
#> mean
#> 5.5
#>
#> $C.b$sd
#> sd
#> 0.7071068
#>
#> $C.b$se
#> se
#> 0.5
#>
#> $C.b$mean_sd
#> mean sd
#> 5.5000000 0.7071068
#>
#> $C.b$mean_se
#> mean se
#> 5.5 0.5
#>
#> $C.b$mean_ci
#> mean_ci_lwr mean_ci_upr
#> -0.8531024 11.8531024
#> attr(,"label")
#> [1] "Mean 95% CI"
#>
#> $C.b$mean_sei
#> mean_sei_lwr mean_sei_upr
#> 5 6
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#>
#> $C.b$mean_sdi
#> mean_sdi_lwr mean_sdi_upr
#> 4.792893 6.207107
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#>
#> $C.b$mean_pval
#> p_value
#> 0.05771588
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#>
#> $C.b$median
#> median
#> 5.5
#>
#> $C.b$mad
#> mad
#> 0
#>
#> $C.b$median_ci
#> median_ci_lwr median_ci_upr
#> NA NA
#> attr(,"conf_level")
#> [1] NA
#> attr(,"label")
#> [1] "Median 95% CI"
#>
#> $C.b$quantiles
#> quantile_0.25 quantile_0.75
#> 5 6
#> attr(,"label")
#> [1] "25% and 75%-ile"
#>
#> $C.b$iqr
#> iqr
#> 1
#>
#> $C.b$range
#> min max
#> 5 6
#>
#> $C.b$min
#> min
#> 5
#>
#> $C.b$max
#> max
#> 6
#>
#> $C.b$cv
#> cv
#> 12.85649
#>
#> $C.b$geom_mean
#> geom_mean
#> 5.477226
#>
#> $C.b$geom_mean_ci
#> mean_ci_lwr mean_ci_upr
#> 1.719943 17.442438
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#>
#> $C.b$geom_cv
#> geom_cv
#> 12.94584
#>
#>
# `s_summary.factor`
## Basic usage:
s_summary(factor(c("a", "a", "b", "c", "a")))
#> $n
#> [1] 5
#>
#> $count
#> $count$a
#> [1] 3
#>
#> $count$b
#> [1] 1
#>
#> $count$c
#> [1] 1
#>
#>
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.6
#>
#> $count_fraction$b
#> [1] 1.0 0.2
#>
#> $count_fraction$c
#> [1] 1.0 0.2
#>
#>
#> $n_blq
#> [1] 0
#>
# Empty factor returns NA-filled items.
s_summary(factor(levels = c("a", "b", "c")))
#> $n
#> [1] 0
#>
#> $count
#> $count$a
#> [1] 0
#>
#> $count$b
#> [1] 0
#>
#> $count$c
#> [1] 0
#>
#>
#> $count_fraction
#> $count_fraction$a
#> [1] 0 0
#>
#> $count_fraction$b
#> [1] 0 0
#>
#> $count_fraction$c
#> [1] 0 0
#>
#>
#> $n_blq
#> [1] 0
#>
## Management of NA values.
x <- factor(c(NA, "Female"))
x <- explicit_na(x)
s_summary(x, na.rm = TRUE)
#> $n
#> [1] 1
#>
#> $count
#> $count$Female
#> [1] 1
#>
#>
#> $count_fraction
#> $count_fraction$Female
#> [1] 1 1
#>
#>
#> $n_blq
#> [1] 0
#>
s_summary(x, na.rm = FALSE)
#> $n
#> [1] 2
#>
#> $count
#> $count$Female
#> [1] 1
#>
#> $count$`<Missing>`
#> [1] 1
#>
#>
#> $count_fraction
#> $count_fraction$Female
#> [1] 1.0 0.5
#>
#> $count_fraction$`<Missing>`
#> [1] 1.0 0.5
#>
#>
#> $n_blq
#> [1] 0
#>
## Different denominators.
x <- factor(c("a", "a", "b", "c", "a"))
s_summary(x, denom = "N_row", .N_row = 10L)
#> $n
#> [1] 5
#>
#> $count
#> $count$a
#> [1] 3
#>
#> $count$b
#> [1] 1
#>
#> $count$c
#> [1] 1
#>
#>
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.3
#>
#> $count_fraction$b
#> [1] 1.0 0.1
#>
#> $count_fraction$c
#> [1] 1.0 0.1
#>
#>
#> $n_blq
#> [1] 0
#>
s_summary(x, denom = "N_col", .N_col = 20L)
#> $n
#> [1] 5
#>
#> $count
#> $count$a
#> [1] 3
#>
#> $count$b
#> [1] 1
#>
#> $count$c
#> [1] 1
#>
#>
#> $count_fraction
#> $count_fraction$a
#> [1] 3.00 0.15
#>
#> $count_fraction$b
#> [1] 1.00 0.05
#>
#> $count_fraction$c
#> [1] 1.00 0.05
#>
#>
#> $n_blq
#> [1] 0
#>
# `s_summary.character`
## Basic usage:
s_summary(c("a", "a", "b", "c", "a"), .var = "x", verbose = FALSE)
#> $n
#> [1] 5
#>
#> $count
#> $count$a
#> [1] 3
#>
#> $count$b
#> [1] 1
#>
#> $count$c
#> [1] 1
#>
#>
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.6
#>
#> $count_fraction$b
#> [1] 1.0 0.2
#>
#> $count_fraction$c
#> [1] 1.0 0.2
#>
#>
#> $n_blq
#> [1] 0
#>
s_summary(c("a", "a", "b", "c", "a", ""), .var = "x", na.rm = FALSE, verbose = FALSE)
#> $n
#> [1] 6
#>
#> $count
#> $count$a
#> [1] 3
#>
#> $count$b
#> [1] 1
#>
#> $count$c
#> [1] 1
#>
#> $count$`<Missing>`
#> [1] 1
#>
#>
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.5
#>
#> $count_fraction$b
#> [1] 1.0000000 0.1666667
#>
#> $count_fraction$c
#> [1] 1.0000000 0.1666667
#>
#> $count_fraction$`<Missing>`
#> [1] 1.0000000 0.1666667
#>
#>
#> $n_blq
#> [1] 0
#>
# `s_summary.logical`
## Basic usage:
s_summary(c(TRUE, FALSE, TRUE, TRUE))
#> $n
#> [1] 4
#>
#> $count
#> [1] 3
#>
#> $count_fraction
#> [1] 3.00 0.75
#>
#> $n_blq
#> [1] 0
#>
## Management of NA values.
x <- c(NA, TRUE, FALSE)
s_summary(x, na.rm = TRUE)
#> $n
#> [1] 2
#>
#> $count
#> [1] 1
#>
#> $count_fraction
#> [1] 1.0 0.5
#>
#> $n_blq
#> [1] 0
#>
s_summary(x, na.rm = FALSE)
#> $n
#> [1] 3
#>
#> $count
#> [1] 1
#>
#> $count_fraction
#> [1] 1.0000000 0.3333333
#>
#> $n_blq
#> [1] 0
#>
## Different denominators.
x <- c(TRUE, FALSE, TRUE, TRUE)
s_summary(x, denom = "N_row", .N_row = 10L)
#> $n
#> [1] 4
#>
#> $count
#> [1] 3
#>
#> $count_fraction
#> [1] 3.0 0.3
#>
#> $n_blq
#> [1] 0
#>
s_summary(x, denom = "N_col", .N_col = 20L)
#> $n
#> [1] 4
#>
#> $count
#> [1] 3
#>
#> $count_fraction
#> [1] 3.00 0.15
#>
#> $n_blq
#> [1] 0
#>
# `a_summary.numeric`
a_summary(rnorm(10), .N_col = 10, .N_row = 20, .var = "bla")
#> RowsVerticalSection (in_rows) object print method:
#> ----------------------------
#> row_name formatted_cell indent_mod row_label
#> 1 n 10 0 n
#> 2 sum 1.1 0 Sum
#> 3 mean 0.1 0 Mean
#> 4 sd 1.0 0 SD
#> 5 se 0.3 0 SE
#> 6 mean_sd 0.1 (1.0) 0 Mean (SD)
#> 7 mean_se 0.1 (0.3) 0 Mean (SE)
#> 8 mean_ci (-0.63, 0.86) 0 Mean 95% CI
#> 9 mean_sei (-0.22, 0.44) 0 Mean -/+ 1xSE
#> 10 mean_sdi (-0.93, 1.16) 0 Mean -/+ 1xSD
#> 11 mean_pval 0.74 0 Mean p-value (H0: mean = 0)
#> 12 median 0.2 0 Median
#> 13 mad 0.0 0 Median Absolute Deviation
#> 14 median_ci (-0.62, 1.12) 0 Median 95% CI
#> 15 quantiles -0.3 - 0.7 0 25% and 75%-ile
#> 16 iqr 1.0 0 IQR
#> 17 range -2.2 - 1.5 0 Min - Max
#> 18 min -2.2 0 Minimum
#> 19 max 1.5 0 Maximum
#> 20 cv 918.5 0 CV (%)
#> 21 geom_mean NA 0 Geometric Mean
#> 22 geom_mean_ci NA 0 Geometric Mean 95% CI
#> 23 geom_cv NA 0 CV % Geometric Mean
# `a_summary.factor`
# We need to ungroup `count` and `count_fraction` first so that the rtables formatting
# functions can be applied correctly.
afun <- make_afun(
getS3method("a_summary", "factor"),
.ungroup_stats = c("count", "count_fraction")
)
afun(factor(c("a", "a", "b", "c", "a")), .N_row = 10, .N_col = 10)
#> RowsVerticalSection (in_rows) object print method:
#> ----------------------------
#> row_name formatted_cell indent_mod row_label
#> 1 n 5 0 n
#> 2 a 3 0 a
#> 3 b 1 0 b
#> 4 c 1 0 c
#> 5 a 3 (60%) 0 a
#> 6 b 1 (20%) 0 b
#> 7 c 1 (20%) 0 c
#> 8 n_blq 0 0 n_blq
# `a_summary.character`
afun <- make_afun(
getS3method("a_summary", "character"),
.ungroup_stats = c("count", "count_fraction")
)
afun(c("A", "B", "A", "C"), .var = "x", .N_col = 10, .N_row = 10, verbose = FALSE)
#> RowsVerticalSection (in_rows) object print method:
#> ----------------------------
#> row_name formatted_cell indent_mod row_label
#> 1 n 4 0 n
#> 2 A 2 0 A
#> 3 B 1 0 B
#> 4 C 1 0 C
#> 5 A 2 (50%) 0 A
#> 6 B 1 (25%) 0 B
#> 7 C 1 (25%) 0 C
#> 8 n_blq 0 0 n_blq
# `a_summary.logical`
afun <- make_afun(
getS3method("a_summary", "logical")
)
afun(c(TRUE, FALSE, FALSE, TRUE, TRUE), .N_row = 10, .N_col = 10)
#> RowsVerticalSection (in_rows) object print method:
#> ----------------------------
#> row_name formatted_cell indent_mod row_label
#> 1 n 5 0 n
#> 2 count 3 0 count
#> 3 count_fraction 3 (60%) 0 count_fraction
#> 4 n_blq 0 0 n_blq
## Fabricated dataset.
dta_test <- data.frame(
USUBJID = rep(1:6, each = 3),
PARAMCD = rep("lab", 6 * 3),
AVISIT = rep(paste0("V", 1:3), 6),
ARM = rep(LETTERS[1:3], rep(6, 3)),
AVAL = c(9:1, rep(NA, 9))
)
# `summarize_vars()` in `rtables` pipelines
## Default output within a `rtables` pipeline.
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
summarize_vars(vars = "AVAL")
build_table(l, df = dta_test)
#> A B C
#> ————————————————————————————————————————
#> V1
#> n 2 1 0
#> Mean (SD) 7.5 (2.1) 3.0 (NA) NA
#> Median 7.5 3.0 NA
#> Min - Max 6.0 - 9.0 3.0 - 3.0 NA
#> V2
#> n 2 1 0
#> Mean (SD) 6.5 (2.1) 2.0 (NA) NA
#> Median 6.5 2.0 NA
#> Min - Max 5.0 - 8.0 2.0 - 2.0 NA
#> V3
#> n 2 1 0
#> Mean (SD) 5.5 (2.1) 1.0 (NA) NA
#> Median 5.5 1.0 NA
#> Min - Max 4.0 - 7.0 1.0 - 1.0 NA
## Select and format statistics output.
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
summarize_vars(
vars = "AVAL",
.stats = c("n", "mean_sd", "quantiles"),
.formats = c("mean_sd" = "xx.x, xx.x"),
.labels = c(n = "n", mean_sd = "Mean, SD", quantiles = c("Q1 - Q3"))
)
results <- build_table(l, df = dta_test)
as_html(results)
#> <div class="rtables-all-parts-block rtables-container">
#> <table class="table table-condensed table-hover">
#> <tr>
#> <th style="white-space:pre;"></th>
#> <th class="text-center">A</th>
#> <th class="text-center">B</th>
#> <th class="text-center">C</th>
#> </tr>
#> <tr>
#> <td class="text-left">V1</td>
#> <td class="text-center"></td>
#> <td class="text-center"></td>
#> <td class="text-center"></td>
#> </tr>
#> <tr>
#> <td class="text-left" style="padding-left: 3ch">n</td>
#> <td class="text-center">2</td>
#> <td class="text-center">1</td>
#> <td class="text-center">0</td>
#> </tr>
#> <tr>
#> <td class="text-left" style="padding-left: 3ch">Mean, SD</td>
#> <td class="text-center">7.5, 2.1</td>
#> <td class="text-center">3.0, NA</td>
#> <td class="text-center">NA</td>
#> </tr>
#> <tr>
#> <td class="text-left" style="padding-left: 3ch">Q1 - Q3</td>
#> <td class="text-center">6.0 - 9.0</td>
#> <td class="text-center">3.0 - 3.0</td>
#> <td class="text-center">NA</td>
#> </tr>
#> <tr>
#> <td class="text-left">V2</td>
#> <td class="text-center"></td>
#> <td class="text-center"></td>
#> <td class="text-center"></td>
#> </tr>
#> <tr>
#> <td class="text-left" style="padding-left: 3ch">n</td>
#> <td class="text-center">2</td>
#> <td class="text-center">1</td>
#> <td class="text-center">0</td>
#> </tr>
#> <tr>
#> <td class="text-left" style="padding-left: 3ch">Mean, SD</td>
#> <td class="text-center">6.5, 2.1</td>
#> <td class="text-center">2.0, NA</td>
#> <td class="text-center">NA</td>
#> </tr>
#> <tr>
#> <td class="text-left" style="padding-left: 3ch">Q1 - Q3</td>
#> <td class="text-center">5.0 - 8.0</td>
#> <td class="text-center">2.0 - 2.0</td>
#> <td class="text-center">NA</td>
#> </tr>
#> <tr>
#> <td class="text-left">V3</td>
#> <td class="text-center"></td>
#> <td class="text-center"></td>
#> <td class="text-center"></td>
#> </tr>
#> <tr>
#> <td class="text-left" style="padding-left: 3ch">n</td>
#> <td class="text-center">2</td>
#> <td class="text-center">1</td>
#> <td class="text-center">0</td>
#> </tr>
#> <tr>
#> <td class="text-left" style="padding-left: 3ch">Mean, SD</td>
#> <td class="text-center">5.5, 2.1</td>
#> <td class="text-center">1.0, NA</td>
#> <td class="text-center">NA</td>
#> </tr>
#> <tr>
#> <td class="text-left" style="padding-left: 3ch">Q1 - Q3</td>
#> <td class="text-center">4.0 - 7.0</td>
#> <td class="text-center">1.0 - 1.0</td>
#> <td class="text-center">NA</td>
#> </tr>
#> <caption style="caption-side:top;"><div class="rtables-titles-block rtables-container">
#> <div class="rtables-main-titles-block rtables-container">
#> <p class="rtables-main-title"></p>
#> </div>
#> <div class="rtables-subtitles-block rtables-container"></div>
#> </div>
#> </caption>
#> </table>
#> <div class="rtables-footers-block rtables-container"></div>
#> </div>
## Use arguments interpreted by `s_summary`.
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
summarize_vars(vars = "AVAL", na.rm = FALSE)
results <- build_table(l, df = dta_test)
## Handle `NA` levels first when summarizing factors.
dta_test$AVISIT <- NA_character_
dta_test <- df_explicit_na(dta_test)
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
summarize_vars(vars = "AVISIT", na.rm = FALSE)
results <- build_table(l, df = dta_test)
if (FALSE) {
Viewer(results)
}