Defaults for statistical method names and their associated formats & labels
Source:R/utils_default_stats_formats_labels.R
default_stats_formats_labels.Rd
Utility functions to get valid statistic methods for different method groups
(.stats
) and their associated formats (.formats
) and labels (.labels
). This utility
is used across tern
, but some of its working principles can be seen in analyze_vars()
.
See notes to understand why this is experimental.
Usage
get_stats(
method_groups = "analyze_vars_numeric",
stats_in = NULL,
add_pval = FALSE
)
get_formats_from_stats(stats, formats_in = NULL)
get_labels_from_stats(stats, labels_in = NULL)
tern_default_formats
tern_default_labels
summary_formats(type = "numeric", include_pval = FALSE)
summary_labels(type = "numeric", include_pval = FALSE)
summary_custom(
type = "numeric",
include_pval = FALSE,
stats_custom = NULL,
formats_custom = NULL,
labels_custom = NULL,
indent_mods_custom = NULL
)
Format
tern_default_formats
is a list of available formats, named after their relevant statistic.
tern_default_labels
is a character vector of available labels, named after their relevant statistic.
Arguments
- method_groups
(
character
)
indicates the group of statistical methods that we need the defaults from. A character vector can be used to collect more than one group of statistical methods.- stats_in
(
character
)
desired stats to be picked out from the selected method group.- add_pval
(
flag
)
should"pval"
or"pval_counts"
(ifmethod_groups
contains"analyze_vars_counts"
) be added to the statistical methods?- stats
(
character
)
statistical methods to get defaults formats or labels for.- formats_in
(named
vector
)
inserted formats to replace defaults. It can be a character vector fromformatters::list_valid_format_labels()
or a custom format function.- labels_in
(named
vector
)
inserted labels to replace defaults.- type
(
flag
)
is it going to be"numeric"
or"counts"
?- include_pval
(
flag
)
deprecated parameter. Same asadd_pval
.- stats_custom
(
named vector
ofcharacter
)
vector of statistics to include if not the defaults. This argument overridesinclude_pval
and other custom value arguments such that only settings for these statistics will be returned.- formats_custom
(
named vector
ofcharacter
)
vector of custom statistics formats to use in place of the defaults defined insummary_formats()
. Names should be a subset of the statistics defined instats_custom
(or default statistics if this isNULL
).- labels_custom
(
named vector
ofcharacter
)
vector of custom statistics labels to use in place of the defaults defined insummary_labels()
. Names should be a subset of the statistics defined instats_custom
(or default statistics if this isNULL
).- indent_mods_custom
(
integer
ornamed vector
ofinteger
)
vector of custom indentation modifiers for statistics to use instead of the default of0L
for all statistics. Names should be a subset of the statistics defined instats_custom
(or default statistics if this isNULL
). Alternatively, the same indentation modifier can be applied to all statistics by settingindent_mods_custom
to a single integer value.
Value
get_stats()
returns a character vector with all default statistical methods.
get_formats_from_stats()
returns a named list of formats, they being a value fromformatters::list_valid_format_labels()
or a custom function (e.g. formatting_functions).
get_labels_from_stats()
returns a named character vector of default labels (if present otherwiseNULL
).
summary_formats()
returns a namedvector
of default statistic formats for the given data type.
summary_labels
returns a namedvector
of default statistic labels for the given data type.
summary_custom
returns alist
of 4 named elements:stats
,formats
,labels
, andindent_mods
.
Details
Current choices for type
are counts
and numeric
for analyze_vars()
and affect get_stats()
.
Functions
get_stats()
: Get defaults statistical methods for different groups of methods.get_formats_from_stats()
: Get formats from vector of statistical methods. If not presentNULL
is returned.get_labels_from_stats()
: Get labels from vector of statistical methods.tern_default_formats
: Named list of default formats fortern
.tern_default_labels
:character
vector that contains default labels fortern
.summary_formats()
: Quick function to retrieve default formats for summary statistics:analyze_vars()
andanalyze_vars_in_cols()
principally.summary_labels()
: Quick function to retrieve default labels for summary statistics. Returns labels of descriptive statistics which are understood byrtables
. Similar tosummary_formats
summary_custom()
: Function to configure settings for default or custom summary statistics for a given data type. In addition to selecting a custom subset of statistics, the user can also set custom formats, labels, and indent modifiers for any of these statistics.
Note
These defaults are experimental because we use the names of functions to retrieve the default statistics. This should be generalized in groups of methods according to more reasonable groupings.
Formats in tern
and rtables
can be functions that take in the table cell value and
return a string. This is well documented in vignette("custom_appearance", package = "rtables")
.
Examples
# analyze_vars is numeric
num_stats <- get_stats("analyze_vars_numeric") # also the default
# Other type
cnt_stats <- get_stats("analyze_vars_counts")
# Weirdly taking the pval from count_occurrences
only_pval <- get_stats("count_occurrences", add_pval = TRUE, stats_in = "pval")
# All count_occurrences
all_cnt_occ <- get_stats("count_occurrences")
# Multiple
get_stats(c("count_occurrences", "analyze_vars_counts"))
#> [1] "count" "count_fraction_fixed_dp"
#> [3] "fraction" "n"
#> [5] "count_fraction" "n_blq"
# Defaults formats
get_formats_from_stats(num_stats)
#> $n
#> [1] "xx."
#>
#> $sum
#> [1] "xx.x"
#>
#> $mean
#> [1] "xx.x"
#>
#> $sd
#> [1] "xx.x"
#>
#> $se
#> [1] "xx.x"
#>
#> $mean_sd
#> [1] "xx.x (xx.x)"
#>
#> $mean_se
#> [1] "xx.x (xx.x)"
#>
#> $mean_ci
#> [1] "(xx.xx, xx.xx)"
#>
#> $mean_sei
#> [1] "(xx.xx, xx.xx)"
#>
#> $mean_sdi
#> [1] "(xx.xx, xx.xx)"
#>
#> $mean_pval
#> [1] "xx.xx"
#>
#> $median
#> [1] "xx.x"
#>
#> $mad
#> [1] "xx.x"
#>
#> $median_ci
#> [1] "(xx.xx, xx.xx)"
#>
#> $quantiles
#> [1] "xx.x - xx.x"
#>
#> $iqr
#> [1] "xx.x"
#>
#> $range
#> [1] "xx.x - xx.x"
#>
#> $min
#> [1] "xx.x"
#>
#> $max
#> [1] "xx.x"
#>
#> $median_range
#> [1] "xx.x (xx.x - xx.x)"
#>
#> $cv
#> [1] "xx.x"
#>
#> $geom_mean
#> [1] "xx.x"
#>
#> $geom_mean_ci
#> [1] "(xx.xx, xx.xx)"
#>
#> $geom_cv
#> [1] "xx.x"
#>
get_formats_from_stats(cnt_stats)
#> $n
#> [1] "xx."
#>
#> $count
#> [1] "xx."
#>
#> $count_fraction
#> function(x, ...) {
#> attr(x, "label") <- NULL
#>
#> if (any(is.na(x))) {
#> return("NA")
#> }
#>
#> checkmate::assert_vector(x)
#> checkmate::assert_integerish(x[1])
#> assert_proportion_value(x[2], include_boundaries = TRUE)
#>
#> result <- if (x[1] == 0) {
#> "0"
#> } else {
#> paste0(x[1], " (", round(x[2] * 100, 1), "%)")
#> }
#>
#> return(result)
#> }
#> <environment: namespace:tern>
#>
#> $n_blq
#> [1] "xx."
#>
get_formats_from_stats(only_pval)
#> $pval
#> [1] "x.xxxx | (<0.0001)"
#>
get_formats_from_stats(all_cnt_occ)
#> $count
#> [1] "xx."
#>
#> $count_fraction_fixed_dp
#> function(x, ...) {
#> attr(x, "label") <- NULL
#>
#> if (any(is.na(x))) {
#> return("NA")
#> }
#>
#> checkmate::assert_vector(x)
#> checkmate::assert_integerish(x[1])
#> assert_proportion_value(x[2], include_boundaries = TRUE)
#>
#> result <- if (x[1] == 0) {
#> "0"
#> } else if (x[2] == 1) {
#> sprintf("%d (100%%)", x[1])
#> } else {
#> sprintf("%d (%.1f%%)", x[1], x[2] * 100)
#> }
#>
#> return(result)
#> }
#> <environment: namespace:tern>
#>
#> $fraction
#> function(x, ...) {
#> attr(x, "label") <- NULL
#> checkmate::assert_vector(x)
#> checkmate::assert_count(x["num"])
#> checkmate::assert_count(x["denom"])
#>
#> result <- if (x["num"] == 0) {
#> paste0(x["num"], "/", x["denom"])
#> } else {
#> paste0(
#> x["num"], "/", x["denom"],
#> " (", sprintf("%.1f", round(x["num"] / x["denom"] * 100, 1)), "%)"
#> )
#> }
#> return(result)
#> }
#> <environment: namespace:tern>
#>
# Addition of customs
get_formats_from_stats(all_cnt_occ, formats_in = c("fraction" = c("xx")))
#> $count
#> [1] "xx."
#>
#> $count_fraction_fixed_dp
#> function(x, ...) {
#> attr(x, "label") <- NULL
#>
#> if (any(is.na(x))) {
#> return("NA")
#> }
#>
#> checkmate::assert_vector(x)
#> checkmate::assert_integerish(x[1])
#> assert_proportion_value(x[2], include_boundaries = TRUE)
#>
#> result <- if (x[1] == 0) {
#> "0"
#> } else if (x[2] == 1) {
#> sprintf("%d (100%%)", x[1])
#> } else {
#> sprintf("%d (%.1f%%)", x[1], x[2] * 100)
#> }
#>
#> return(result)
#> }
#> <environment: namespace:tern>
#>
#> $fraction
#> [1] "xx"
#>
get_formats_from_stats(all_cnt_occ, formats_in = list("fraction" = c("xx.xx", "xx")))
#> $count
#> [1] "xx."
#>
#> $count_fraction_fixed_dp
#> function(x, ...) {
#> attr(x, "label") <- NULL
#>
#> if (any(is.na(x))) {
#> return("NA")
#> }
#>
#> checkmate::assert_vector(x)
#> checkmate::assert_integerish(x[1])
#> assert_proportion_value(x[2], include_boundaries = TRUE)
#>
#> result <- if (x[1] == 0) {
#> "0"
#> } else if (x[2] == 1) {
#> sprintf("%d (100%%)", x[1])
#> } else {
#> sprintf("%d (%.1f%%)", x[1], x[2] * 100)
#> }
#>
#> return(result)
#> }
#> <environment: namespace:tern>
#>
#> $fraction
#> [1] "xx.xx" "xx"
#>
# Defaults labels
get_labels_from_stats(num_stats)
#> n sum
#> "n" "Sum"
#> mean sd
#> "Mean" "SD"
#> se mean_sd
#> "SE" "Mean (SD)"
#> mean_se mean_ci
#> "Mean (SE)" "Mean 95% CI"
#> mean_sei mean_sdi
#> "Mean -/+ 1xSE" "Mean -/+ 1xSD"
#> mean_pval median
#> "Mean p-value (H0: mean = 0)" "Median"
#> mad median_ci
#> "Median Absolute Deviation" "Median 95% CI"
#> quantiles iqr
#> "25% and 75%-ile" "IQR"
#> range min
#> "Min - Max" "Minimum"
#> max median_range
#> "Maximum" "Median (Min - Max)"
#> cv geom_mean
#> "CV (%)" "Geometric Mean"
#> geom_mean_ci geom_cv
#> "Geometric Mean 95% CI" "CV % Geometric Mean"
get_labels_from_stats(cnt_stats)
#> n count count_fraction n_blq
#> "n" "count" "count_fraction" "n_blq"
get_labels_from_stats(only_pval)
#> pval
#> "p-value (t-test)"
get_labels_from_stats(all_cnt_occ)
#> count count_fraction_fixed_dp fraction
#> "count" "" ""
# Addition of customs
get_labels_from_stats(all_cnt_occ, labels_in = c("fraction" = "Fraction"))
#> count count_fraction_fixed_dp fraction
#> "count" "" "Fraction"
get_labels_from_stats(all_cnt_occ, labels_in = list("fraction" = c("Some more fractions")))
#> $count
#> [1] "count"
#>
#> $count_fraction_fixed_dp
#> [1] ""
#>
#> $fraction
#> [1] "Some more fractions"
#>
summary_formats()
#> $n
#> [1] "xx."
#>
#> $sum
#> [1] "xx.x"
#>
#> $mean
#> [1] "xx.x"
#>
#> $sd
#> [1] "xx.x"
#>
#> $se
#> [1] "xx.x"
#>
#> $mean_sd
#> [1] "xx.x (xx.x)"
#>
#> $mean_se
#> [1] "xx.x (xx.x)"
#>
#> $mean_ci
#> [1] "(xx.xx, xx.xx)"
#>
#> $mean_sei
#> [1] "(xx.xx, xx.xx)"
#>
#> $mean_sdi
#> [1] "(xx.xx, xx.xx)"
#>
#> $mean_pval
#> [1] "xx.xx"
#>
#> $median
#> [1] "xx.x"
#>
#> $mad
#> [1] "xx.x"
#>
#> $median_ci
#> [1] "(xx.xx, xx.xx)"
#>
#> $quantiles
#> [1] "xx.x - xx.x"
#>
#> $iqr
#> [1] "xx.x"
#>
#> $range
#> [1] "xx.x - xx.x"
#>
#> $min
#> [1] "xx.x"
#>
#> $max
#> [1] "xx.x"
#>
#> $median_range
#> [1] "xx.x (xx.x - xx.x)"
#>
#> $cv
#> [1] "xx.x"
#>
#> $geom_mean
#> [1] "xx.x"
#>
#> $geom_mean_ci
#> [1] "(xx.xx, xx.xx)"
#>
#> $geom_cv
#> [1] "xx.x"
#>
summary_formats(type = "counts", include_pval = TRUE)
#> $n
#> [1] "xx."
#>
#> $count
#> [1] "xx."
#>
#> $count_fraction
#> function(x, ...) {
#> attr(x, "label") <- NULL
#>
#> if (any(is.na(x))) {
#> return("NA")
#> }
#>
#> checkmate::assert_vector(x)
#> checkmate::assert_integerish(x[1])
#> assert_proportion_value(x[2], include_boundaries = TRUE)
#>
#> result <- if (x[1] == 0) {
#> "0"
#> } else {
#> paste0(x[1], " (", round(x[2] * 100, 1), "%)")
#> }
#>
#> return(result)
#> }
#> <environment: namespace:tern>
#>
#> $n_blq
#> [1] "xx."
#>
#> $pval_counts
#> [1] "x.xxxx | (<0.0001)"
#>
summary_labels()
#> n sum
#> "n" "Sum"
#> mean sd
#> "Mean" "SD"
#> se mean_sd
#> "SE" "Mean (SD)"
#> mean_se mean_ci
#> "Mean (SE)" "Mean 95% CI"
#> mean_sei mean_sdi
#> "Mean -/+ 1xSE" "Mean -/+ 1xSD"
#> mean_pval median
#> "Mean p-value (H0: mean = 0)" "Median"
#> mad median_ci
#> "Median Absolute Deviation" "Median 95% CI"
#> quantiles iqr
#> "25% and 75%-ile" "IQR"
#> range min
#> "Min - Max" "Minimum"
#> max median_range
#> "Maximum" "Median (Min - Max)"
#> cv geom_mean
#> "CV (%)" "Geometric Mean"
#> geom_mean_ci geom_cv
#> "Geometric Mean 95% CI" "CV % Geometric Mean"
summary_labels(type = "counts", include_pval = TRUE)
#> n count
#> "n" "count"
#> count_fraction n_blq
#> "count_fraction" "n_blq"
#> pval_counts
#> "p-value (chi-squared test)"
summary_custom()
#> Warning: `summary_custom()` was deprecated in tern 0.9.0.9001.
#> ℹ Please use `get_stats`, `get_formats_from_stats`, and `get_labels_from_stats`
#> directly instead.
#> $stats
#> [1] "n" "sum" "mean" "sd" "se"
#> [6] "mean_sd" "mean_se" "mean_ci" "mean_sei" "mean_sdi"
#> [11] "mean_pval" "median" "mad" "median_ci" "quantiles"
#> [16] "iqr" "range" "min" "max" "median_range"
#> [21] "cv" "geom_mean" "geom_mean_ci" "geom_cv"
#>
#> $formats
#> $formats$n
#> [1] "xx."
#>
#> $formats$sum
#> [1] "xx.x"
#>
#> $formats$mean
#> [1] "xx.x"
#>
#> $formats$sd
#> [1] "xx.x"
#>
#> $formats$se
#> [1] "xx.x"
#>
#> $formats$mean_sd
#> [1] "xx.x (xx.x)"
#>
#> $formats$mean_se
#> [1] "xx.x (xx.x)"
#>
#> $formats$mean_ci
#> [1] "(xx.xx, xx.xx)"
#>
#> $formats$mean_sei
#> [1] "(xx.xx, xx.xx)"
#>
#> $formats$mean_sdi
#> [1] "(xx.xx, xx.xx)"
#>
#> $formats$mean_pval
#> [1] "xx.xx"
#>
#> $formats$median
#> [1] "xx.x"
#>
#> $formats$mad
#> [1] "xx.x"
#>
#> $formats$median_ci
#> [1] "(xx.xx, xx.xx)"
#>
#> $formats$quantiles
#> [1] "xx.x - xx.x"
#>
#> $formats$iqr
#> [1] "xx.x"
#>
#> $formats$range
#> [1] "xx.x - xx.x"
#>
#> $formats$min
#> [1] "xx.x"
#>
#> $formats$max
#> [1] "xx.x"
#>
#> $formats$median_range
#> [1] "xx.x (xx.x - xx.x)"
#>
#> $formats$cv
#> [1] "xx.x"
#>
#> $formats$geom_mean
#> [1] "xx.x"
#>
#> $formats$geom_mean_ci
#> [1] "(xx.xx, xx.xx)"
#>
#> $formats$geom_cv
#> [1] "xx.x"
#>
#>
#> $labels
#> n sum
#> "n" "Sum"
#> mean sd
#> "Mean" "SD"
#> se mean_sd
#> "SE" "Mean (SD)"
#> mean_se mean_ci
#> "Mean (SE)" "Mean 95% CI"
#> mean_sei mean_sdi
#> "Mean -/+ 1xSE" "Mean -/+ 1xSD"
#> mean_pval median
#> "Mean p-value (H0: mean = 0)" "Median"
#> mad median_ci
#> "Median Absolute Deviation" "Median 95% CI"
#> quantiles iqr
#> "25% and 75%-ile" "IQR"
#> range min
#> "Min - Max" "Minimum"
#> max median_range
#> "Maximum" "Median (Min - Max)"
#> cv geom_mean
#> "CV (%)" "Geometric Mean"
#> geom_mean_ci geom_cv
#> "Geometric Mean 95% CI" "CV % Geometric Mean"
#>
#> $indent_mods
#> n sum mean sd se mean_sd
#> 0 0 0 0 0 0
#> mean_se mean_ci mean_sei mean_sdi mean_pval median
#> 0 0 0 0 0 0
#> mad median_ci quantiles iqr range min
#> 0 0 0 0 0 0
#> max median_range cv geom_mean geom_mean_ci geom_cv
#> 0 0 0 0 0 0
#>
summary_custom(type = "counts", include_pval = TRUE)
#> $stats
#> [1] "n" "count" "count_fraction" "n_blq"
#> [5] "pval_counts"
#>
#> $formats
#> $formats$n
#> [1] "xx."
#>
#> $formats$count
#> [1] "xx."
#>
#> $formats$count_fraction
#> function(x, ...) {
#> attr(x, "label") <- NULL
#>
#> if (any(is.na(x))) {
#> return("NA")
#> }
#>
#> checkmate::assert_vector(x)
#> checkmate::assert_integerish(x[1])
#> assert_proportion_value(x[2], include_boundaries = TRUE)
#>
#> result <- if (x[1] == 0) {
#> "0"
#> } else {
#> paste0(x[1], " (", round(x[2] * 100, 1), "%)")
#> }
#>
#> return(result)
#> }
#> <environment: namespace:tern>
#>
#> $formats$n_blq
#> [1] "xx."
#>
#> $formats$pval_counts
#> [1] "x.xxxx | (<0.0001)"
#>
#>
#> $labels
#> n count
#> "n" "count"
#> count_fraction n_blq
#> "count_fraction" "n_blq"
#> pval_counts
#> "p-value (chi-squared test)"
#>
#> $indent_mods
#> n count count_fraction n_blq pval_counts
#> 0 0 0 0 0
#>
summary_custom(
include_pval = TRUE, stats_custom = c("n", "mean", "sd", "pval"),
labels_custom = c(sd = "Std. Dev."), indent_mods_custom = 3L
)
#> $stats
#> [1] "n" "mean" "sd" "pval"
#>
#> $formats
#> $formats$n
#> [1] "xx."
#>
#> $formats$mean
#> [1] "xx.x"
#>
#> $formats$sd
#> [1] "xx.x"
#>
#> $formats$pval
#> [1] "x.xxxx | (<0.0001)"
#>
#>
#> $labels
#> n mean sd pval
#> "n" "Mean" "Std. Dev." "p-value (t-test)"
#>
#> $indent_mods
#> n mean sd pval
#> 3 3 3 3
#>