Compare Variables Between Groups

Comparison with a reference group for different x objects.

Usage

s_compare(x, .ref_group, .in_ref_col, ...)

# S3 method for numeric
s_compare(x, .ref_group, .in_ref_col, ...)

# S3 method for factor
s_compare(x, .ref_group, .in_ref_col, denom = "n", na.rm = TRUE, ...)

# S3 method for character
s_compare(
  x,
  .ref_group,
  .in_ref_col,
  denom = "n",
  na.rm = TRUE,
  .var,
  verbose = TRUE,
  ...
)

# S3 method for logical
s_compare(x, .ref_group, .in_ref_col, na.rm = TRUE, denom = "n", ...)

a_compare(
  x,
  .N_col,
  .N_row,
  .var = NULL,
  .df_row = NULL,
  .ref_group = NULL,
  .in_ref_col = FALSE,
  ...
)

compare_vars(
  lyt,
  vars,
  var_labels = vars,
  nested = TRUE,
  ...,
  na.rm = TRUE,
  na_level = NA_character_,
  show_labels = "default",
  table_names = vars,
  section_div = NA_character_,
  .stats = c("n", "mean_sd", "count_fraction", "pval"),
  .formats = NULL,
  .labels = NULL,
  .indent_mods = NULL
)

Arguments

x: (numeric)
vector of numbers we want to analyze.
.ref_group: (data.frame or vector)
the data corresponding to the reference group.
.in_ref_col: (logical)
TRUE when working with the reference level, FALSE otherwise.
...: arguments passed to s_compare().
denom: (string)
choice of denominator for factor proportions, can only be n (number of values in this row and column intersection).
na.rm: (flag)
whether NA values should be removed from x prior to analysis.
.var: (string)
single variable name that is passed by rtables when requested by a statistics function.
verbose: (logical)
Whether warnings and messages should be printed. Mainly used to print out information about factor casting. Defaults to TRUE.
.N_col: (integer)
column-wise N (column count) for the full column being analyzed that is typically passed by rtables.
.N_row: (integer)
row-wise N (row group count) for the group of observations being analyzed (i.e. with no column-based subsetting) that is typically passed by rtables.
.df_row: (data.frame)
data frame across all of the columns for the given row split.
lyt: (layout)
input layout where analyses will be added to.
vars: (character)
variable names for the primary analysis variable to be iterated over.
var_labels: (character)
character for label.
nested: (flag)
whether this layout instruction should be applied within the existing layout structure if possible (TRUE, the default) or as a new top-level element (FALSE). Ignored if it would nest a split. underneath analyses, which is not allowed.
na_level: (string)
string used to replace all NA or empty values in the output.
show_labels: (string)
label visibility: one of "default", "visible" and "hidden".
table_names: (character)
this can be customized in case that the same vars are analyzed multiple times, to avoid warnings from rtables.
section_div: (string)
string which should be repeated as a section divider after each group defined by this split instruction, or NA_character_ (the default) for no section divider.
.stats: (character)
statistics to select for the table.
.formats: (named character or list)
formats for the statistics.
.labels: (named character)
labels for the statistics (without indent).
.indent_mods: (named vector of integer)
indent modifiers for the labels. Each element of the vector should be a name-value pair with name corresponding to a statistic specified in .stats and value the indentation for that statistic's row label.

Value

s_compare() returns output of s_summary() and comparisons versus the reference group in the form of p-values.

a_compare() returns the corresponding list with formatted rtables::CellValue().

compare_vars() returns a layout object suitable for passing to further layouting functions, or to rtables::build_table(). Adding this function to an rtable layout will add formatted rows containing the statistics from s_compare() to the table layout.

Functions

s_compare(): S3 generic function to produce a comparison summary.
s_compare(numeric): Method for numeric class. This uses the standard t-test to calculate the p-value.
s_compare(factor): Method for factor class. This uses the chi-squared test to calculate the p-value.
s_compare(character): Method for character class. This makes an automatic conversion to factor (with a warning) and then forwards to the method for factors.
s_compare(logical): Method for logical class. A chi-squared test is used. If missing values are not removed, then they are counted as FALSE.
a_compare(): Formatted analysis function which is used as afun in compare_vars().
compare_vars(): Layout-creating function which can take statistics function arguments and additional format arguments. This function is a wrapper for rtables::analyze().

Note

For factor variables, denom for factor proportions can only be n since the purpose is to compare proportions between columns, therefore a row-based proportion would not make sense. Proportion based on N_col would be difficult since we use counts for the chi-squared test statistic, therefore missing values should be accounted for as explicit factor levels.
If factor variables contain NA, these NA values are excluded by default. To include NA values set na.rm = FALSE and missing values will be displayed as an NA level. Alternatively, an explicit factor level can be defined for NA values during pre-processing via df_explicit_na() - the default na_level ("<Missing>") will also be excluded when na.rm is set to TRUE.
For character variables, automatic conversion to factor does not guarantee that the table will be generated correctly. In particular for sparse tables this very likely can fail. Therefore it is always better to manually convert character variables to factors during pre-processing.
For compare_vars(), the column split must define a reference group via ref_group so that the comparison is well defined.

a_compare() has been deprecated in favor of a_summary() with argument compare set to TRUE.

Examples

# `s_compare.numeric`

## Usual case where both this and the reference group vector have more than 1 value.
s_compare(rnorm(10, 5, 1), .ref_group = rnorm(5, -5, 1), .in_ref_col = FALSE)
#> $n
#>  n 
#> 10 
#> 
#> $sum
#>      sum 
#> 47.84726 
#> 
#> $mean
#>     mean 
#> 4.784726 
#> 
#> $sd
#>       sd 
#> 1.067784 
#> 
#> $se
#>        se 
#> 0.3376629 
#> 
#> $mean_sd
#>     mean       sd 
#> 4.784726 1.067784 
#> 
#> $mean_se
#>      mean        se 
#> 4.7847265 0.3376629 
#> 
#> $mean_ci
#> mean_ci_lwr mean_ci_upr 
#>    4.020880    5.548573 
#> attr(,"label")
#> [1] "Mean 95% CI"
#> 
#> $mean_sei
#> mean_sei_lwr mean_sei_upr 
#>     4.447064     5.122389 
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#> 
#> $mean_sdi
#> mean_sdi_lwr mean_sdi_upr 
#>     3.716943     5.852510 
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#> 
#> $mean_pval
#>     p_value 
#> 1.84749e-07 
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#> 
#> $median
#>  median 
#> 5.02841 
#> 
#> $mad
#>           mad 
#> -4.440892e-16 
#> 
#> $median_ci
#> median_ci_lwr median_ci_upr 
#>      3.732971      5.888323 
#> attr(,"conf_level")
#> [1] 0.9785156
#> attr(,"label")
#> [1] "Median 95% CI"
#> 
#> $quantiles
#> quantile_0.25 quantile_0.75 
#>      3.763748      5.634264 
#> attr(,"label")
#> [1] "25% and 75%-ile"
#> 
#> $iqr
#>      iqr 
#> 1.870516 
#> 
#> $range
#>      min      max 
#> 2.730063 6.037027 
#> 
#> $min
#>      min 
#> 2.730063 
#> 
#> $max
#>      max 
#> 6.037027 
#> 
#> $median_range
#>   median      min      max 
#> 5.028410 2.730063 6.037027 
#> attr(,"label")
#> [1] "Median (Min - Max)"
#> 
#> $cv
#>       cv 
#> 22.31651 
#> 
#> $geom_mean
#> geom_mean 
#>  4.661413 
#> 
#> $geom_mean_ci
#> mean_ci_lwr mean_ci_upr 
#>    3.895615    5.577753 
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#> 
#> $geom_cv
#>  geom_cv 
#> 25.48777 
#> 
#> $pval
#> [1] 1.494493e-10
#> 

## If one group has not more than 1 value, then p-value is not calculated.
s_compare(rnorm(10, 5, 1), .ref_group = 1, .in_ref_col = FALSE)
#> $n
#>  n 
#> 10 
#> 
#> $sum
#>      sum 
#> 49.16744 
#> 
#> $mean
#>     mean 
#> 4.916744 
#> 
#> $sd
#>        sd 
#> 0.5701682 
#> 
#> $se
#>       se 
#> 0.180303 
#> 
#> $mean_sd
#>      mean        sd 
#> 4.9167444 0.5701682 
#> 
#> $mean_se
#>     mean       se 
#> 4.916744 0.180303 
#> 
#> $mean_ci
#> mean_ci_lwr mean_ci_upr 
#>    4.508871    5.324618 
#> attr(,"label")
#> [1] "Mean 95% CI"
#> 
#> $mean_sei
#> mean_sei_lwr mean_sei_upr 
#>     4.736441     5.097047 
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#> 
#> $mean_sdi
#> mean_sdi_lwr mean_sdi_upr 
#>     4.346576     5.486913 
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#> 
#> $mean_pval
#>      p_value 
#> 5.813307e-10 
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#> 
#> $median
#>   median 
#> 5.007262 
#> 
#> $mad
#>          mad 
#> 4.440892e-16 
#> 
#> $median_ci
#> median_ci_lwr median_ci_upr 
#>      4.117696      5.588811 
#> attr(,"conf_level")
#> [1] 0.9785156
#> attr(,"label")
#> [1] "Median 95% CI"
#> 
#> $quantiles
#> quantile_0.25 quantile_0.75 
#>      4.527450      5.339526 
#> attr(,"label")
#> [1] "25% and 75%-ile"
#> 
#> $iqr
#>       iqr 
#> 0.8120757 
#> 
#> $range
#>      min      max 
#> 4.033687 5.609946 
#> 
#> $min
#>      min 
#> 4.033687 
#> 
#> $max
#>      max 
#> 5.609946 
#> 
#> $median_range
#>   median      min      max 
#> 5.007262 4.033687 5.609946 
#> attr(,"label")
#> [1] "Median (Min - Max)"
#> 
#> $cv
#>       cv 
#> 11.59646 
#> 
#> $geom_mean
#> geom_mean 
#>  4.886037 
#> 
#> $geom_mean_ci
#> mean_ci_lwr mean_ci_upr 
#>    4.487165    5.320365 
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#> 
#> $geom_cv
#>  geom_cv 
#> 11.94691 
#> 
#> $pval
#> character(0)
#> 

## Empty numeric does not fail, it returns NA-filled items and no p-value.
s_compare(numeric(), .ref_group = numeric(), .in_ref_col = FALSE)
#> $n
#> n 
#> 0 
#> 
#> $sum
#> sum 
#>  NA 
#> 
#> $mean
#> mean 
#>   NA 
#> 
#> $sd
#> sd 
#> NA 
#> 
#> $se
#> se 
#> NA 
#> 
#> $mean_sd
#> mean   sd 
#>   NA   NA 
#> 
#> $mean_se
#> mean   se 
#>   NA   NA 
#> 
#> $mean_ci
#> mean_ci_lwr mean_ci_upr 
#>          NA          NA 
#> attr(,"label")
#> [1] "Mean 95% CI"
#> 
#> $mean_sei
#> mean_sei_lwr mean_sei_upr 
#>           NA           NA 
#> attr(,"label")
#> [1] "Mean -/+ 1xSE"
#> 
#> $mean_sdi
#> mean_sdi_lwr mean_sdi_upr 
#>           NA           NA 
#> attr(,"label")
#> [1] "Mean -/+ 1xSD"
#> 
#> $mean_pval
#> p_value 
#>      NA 
#> attr(,"label")
#> [1] "Mean p-value (H0: mean = 0)"
#> 
#> $median
#> median 
#>     NA 
#> 
#> $mad
#> mad 
#>  NA 
#> 
#> $median_ci
#> median_ci_lwr median_ci_upr 
#>            NA            NA 
#> attr(,"conf_level")
#> [1] NA
#> attr(,"label")
#> [1] "Median 95% CI"
#> 
#> $quantiles
#> quantile_0.25 quantile_0.75 
#>            NA            NA 
#> attr(,"label")
#> [1] "25% and 75%-ile"
#> 
#> $iqr
#> iqr 
#>  NA 
#> 
#> $range
#> min max 
#>  NA  NA 
#> 
#> $min
#> min 
#>  NA 
#> 
#> $max
#> max 
#>  NA 
#> 
#> $median_range
#> median    min    max 
#>     NA     NA     NA 
#> attr(,"label")
#> [1] "Median (Min - Max)"
#> 
#> $cv
#> cv 
#> NA 
#> 
#> $geom_mean
#> geom_mean 
#>       NaN 
#> 
#> $geom_mean_ci
#> mean_ci_lwr mean_ci_upr 
#>          NA          NA 
#> attr(,"label")
#> [1] "Geometric Mean 95% CI"
#> 
#> $geom_cv
#> geom_cv 
#>      NA 
#> 
#> $pval
#> character(0)
#> 

# `s_compare.factor`

## Basic usage:
x <- factor(c("a", "a", "b", "c", "a"))
y <- factor(c("a", "b", "c"))
s_compare(x = x, .ref_group = y, .in_ref_col = FALSE)
#> $n
#> [1] 5
#> 
#> $count
#> $count$a
#> [1] 3
#> 
#> $count$b
#> [1] 1
#> 
#> $count$c
#> [1] 1
#> 
#> 
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.6
#> 
#> $count_fraction$b
#> [1] 1.0 0.2
#> 
#> $count_fraction$c
#> [1] 1.0 0.2
#> 
#> 
#> $n_blq
#> [1] 0
#> 
#> $pval
#> [1] 0.7659283
#> 

## Management of NA values.
x <- explicit_na(factor(c("a", "a", "b", "c", "a", NA, NA)))
y <- explicit_na(factor(c("a", "b", "c", NA)))
s_compare(x = x, .ref_group = y, .in_ref_col = FALSE, na.rm = TRUE)
#> $n
#> [1] 5
#> 
#> $count
#> $count$a
#> [1] 3
#> 
#> $count$b
#> [1] 1
#> 
#> $count$c
#> [1] 1
#> 
#> 
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.6
#> 
#> $count_fraction$b
#> [1] 1.0 0.2
#> 
#> $count_fraction$c
#> [1] 1.0 0.2
#> 
#> 
#> $n_blq
#> [1] 0
#> 
#> $pval
#> [1] 0.7659283
#> 
s_compare(x = x, .ref_group = y, .in_ref_col = FALSE, na.rm = FALSE)
#> $n
#> [1] 7
#> 
#> $count
#> $count$a
#> [1] 3
#> 
#> $count$b
#> [1] 1
#> 
#> $count$c
#> [1] 1
#> 
#> $count$`<Missing>`
#> [1] 2
#> 
#> 
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0000000 0.4285714
#> 
#> $count_fraction$b
#> [1] 1.0000000 0.1428571
#> 
#> $count_fraction$c
#> [1] 1.0000000 0.1428571
#> 
#> $count_fraction$`<Missing>`
#> [1] 2.0000000 0.2857143
#> 
#> 
#> $n_blq
#> [1] 0
#> 
#> $pval
#> [1] 0.9063036
#> 

# `s_compare.character`

## Basic usage:
x <- c("a", "a", "b", "c", "a")
y <- c("a", "b", "c")
s_compare(x, .ref_group = y, .in_ref_col = FALSE, .var = "x", verbose = FALSE)
#> $n
#> [1] 5
#> 
#> $count
#> $count$a
#> [1] 3
#> 
#> $count$b
#> [1] 1
#> 
#> $count$c
#> [1] 1
#> 
#> 
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.6
#> 
#> $count_fraction$b
#> [1] 1.0 0.2
#> 
#> $count_fraction$c
#> [1] 1.0 0.2
#> 
#> 
#> $n_blq
#> [1] 0
#> 
#> $pval
#> [1] 0.7659283
#> 

## Note that missing values handling can make a large difference:
x <- c("a", "a", "b", "c", "a", NA)
y <- c("a", "b", "c", rep(NA, 20))
s_compare(x,
  .ref_group = y, .in_ref_col = FALSE,
  .var = "x", verbose = FALSE
)
#> $n
#> [1] 5
#> 
#> $count
#> $count$a
#> [1] 3
#> 
#> $count$b
#> [1] 1
#> 
#> $count$c
#> [1] 1
#> 
#> 
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.6
#> 
#> $count_fraction$b
#> [1] 1.0 0.2
#> 
#> $count_fraction$c
#> [1] 1.0 0.2
#> 
#> 
#> $n_blq
#> [1] 0
#> 
#> $pval
#> [1] 0.7659283
#> 
s_compare(x,
  .ref_group = y, .in_ref_col = FALSE, .var = "x",
  na.rm = FALSE, verbose = FALSE
)
#> $n
#> [1] 6
#> 
#> $count
#> $count$a
#> [1] 3
#> 
#> $count$b
#> [1] 1
#> 
#> $count$c
#> [1] 1
#> 
#> $count$`<Missing>`
#> [1] 1
#> 
#> 
#> $count_fraction
#> $count_fraction$a
#> [1] 3.0 0.5
#> 
#> $count_fraction$b
#> [1] 1.0000000 0.1666667
#> 
#> $count_fraction$c
#> [1] 1.0000000 0.1666667
#> 
#> $count_fraction$`<Missing>`
#> [1] 1.0000000 0.1666667
#> 
#> 
#> $n_blq
#> [1] 0
#> 
#> $pval
#> [1] 0.005768471
#> 

# `s_compare.logical`

## Basic usage:
x <- c(TRUE, FALSE, TRUE, TRUE)
y <- c(FALSE, FALSE, TRUE)
s_compare(x, .ref_group = y, .in_ref_col = FALSE)
#> $n
#> [1] 4
#> 
#> $count
#> [1] 3
#> 
#> $count_fraction
#> [1] 3.00 0.75
#> 
#> $n_blq
#> [1] 0
#> 
#> $pval
#> [1] 0.2702894
#> 

## Management of NA values.
x <- c(NA, TRUE, FALSE)
y <- c(NA, NA, NA, NA, FALSE)
s_compare(x, .ref_group = y, .in_ref_col = FALSE, na.rm = TRUE)
#> $n
#> [1] 2
#> 
#> $count
#> [1] 1
#> 
#> $count_fraction
#> [1] 1.0 0.5
#> 
#> $n_blq
#> [1] 0
#> 
#> $pval
#> [1] 0.3864762
#> 
s_compare(x, .ref_group = y, .in_ref_col = FALSE, na.rm = FALSE)
#> $n
#> [1] 3
#> 
#> $count
#> [1] 1
#> 
#> $count_fraction
#> [1] 1.0000000 0.3333333
#> 
#> $n_blq
#> [1] 0
#> 
#> $pval
#> [1] 0.1675463
#> 

# `a_compare` deprecated - use `a_summary()` instead
a_compare(rnorm(10, 5, 1), .ref_group = rnorm(20, -5, 1), .stats = c("n", "pval"))
#> Warning: `a_compare()` was deprecated in tern 0.8.3.
#> ℹ Please use a_summary() with argument `compare` set to TRUE instead.
#> RowsVerticalSection (in_rows) object print method:
#> ----------------------------
#>           row_name formatted_cell indent_mod        row_label
#> 1                n             10          0                n
#> 2 p-value (t-test)        <0.0001          0 p-value (t-test)

# `compare_vars()` in `rtables` pipelines

## Default output within a `rtables` pipeline.
lyt <- basic_table() %>%
  split_cols_by("ARMCD", ref_group = "ARM B") %>%
  compare_vars(c("AGE", "SEX"))
build_table(lyt, tern_ex_adsl)
#>                                  ARM B        ARM A        ARM C   
#> ———————————————————————————————————————————————————————————————————
#> AGE                                                                
#>   n                                73           69           58    
#>   Mean (SD)                    35.8 (7.1)   34.1 (6.8)   36.1 (7.4)
#>   p-value (t-test)                            0.1446       0.8212  
#> SEX                                                                
#>   n                                73           69           58    
#>   F                            40 (54.8%)   38 (55.1%)   32 (55.2%)
#>   M                            33 (45.2%)   31 (44.9%)   26 (44.8%)
#>   p-value (chi-squared test)                  1.0000       1.0000  

## Select and format statistics output.
lyt <- basic_table() %>%
  split_cols_by("ARMCD", ref_group = "ARM C") %>%
  compare_vars(
    vars = "AGE",
    .stats = c("mean_sd", "pval"),
    .formats = c(mean_sd = "xx.x, xx.x"),
    .labels = c(mean_sd = "Mean, SD")
  )
build_table(lyt, df = tern_ex_adsl)
#>                      ARM C       ARM A       ARM B  
#> ————————————————————————————————————————————————————
#> Mean, SD           36.1, 7.4   34.1, 6.8   35.8, 7.1
#> p-value (t-test)                0.1176      0.8212

Usage

Arguments

Value

Functions

Note

See also

Examples