Package 'neatStats' reference manual

Title:	Neat and Painless Statistical Reporting
Description:	User-friendly, clear and simple statistics, primarily for publication in psychological science. The main functions are wrappers for other packages, but there are various additions as well. Every relevant step from data aggregation to reportable printed statistics is covered for basic experimental designs.
Authors:	Gáspár Lukács [aut, cre] , Bennett Kleinberg [ctb] , Johnny van Doorn [ctb]
Maintainer:	Gáspár Lukács <[email protected]>
License:	BSD_2_clause + file LICENSE
Version:	1.13.3
Built:	2025-02-24 04:04:46 UTC
Source:	https://github.com/gasparl/neatstats

Aggregation, descriptives

Description

Returns aggregated values per group for given variable. Serves as argument in the table_neat function.

Usage

aggr_neat(
  dat,
  values,
  method = mean,
  group_by = NULL,
  filt = NULL,
  sep = "_",
  prefix = NULL,
  new_name = NULL,
  round_to = 2
)
aggr_neat(
  dat,
  values,
  method = mean,
  group_by = NULL,
  filt = NULL,
  sep = "_",
  prefix = NULL,
  new_name = NULL,
  round_to = 2
)

Arguments

`dat`	A data frame (or a `data.table`, or the name of either as string).
`values`	The vector of numbers from which the statistics are to be calculated, or the name of the column in the `dat` data frame, that contains the vector.
`method`	Function of string. If function, uses the `values` to calculate the returned value for the given function (e.g. means, as per default, using the `mean` function). Such a function may return a vector of results as well; see Examples. If string, one of two internal functions will be used. If the string end with `"+sd"`, e.g., `"mean+sd"`, the function preceding the `"+"` sign will be calculated along with the standard deviation, displayed in a single column, rounded as set in the `round_to` argument. (This is primarily for use in the `table_neat` function for summary tables.) If the string does not end with `"+sd"`, a ratio for the occurrences of given elements will be calculated. Multiple elements can by given as a vector of strings. The number of occurrences of these elements will be the numerator (dividend), while the entire column length (i.e., number of all elements) will be the denominator (divisor). For example, if a column contains elements `"correct"`, `"incorrect"`, `"tooslow"`, the ratio of `"correct"` to all other elements (i.e., including elements `"correct"`, `"incorrect"`, and `"tooslow"`) can be written simply as `method = "correct"`. The complementary ratio, of `"incorrect"` and `"tooslow"`, can be written as `method = "incorrect, tooslow"`. (Hint: filter to get ratios of subgroups, e.g. to include only `"correct"` and `"incorrect"` elements, and calculate their ratio; see below.)
`group_by`	String, or vector of strings: the name(s) of the column(s) in the `dat` data frame, containing the vector(s) of factors by which the statistics are grouped.
`filt`	An expression to filter, by column values, the entire `dat` data frame before performing the aggregation. The expression should use column names alone; see Examples.
`sep`	String (underscore `"_"` by default) for separating group names (and prefix, if given).
`prefix`	`NULL` (default) or string. String specifies a prefix for each group type under the `group` column.
`new_name`	`NULL` (default) or string. String specifies new name for the variable to be used as column title. If `NULL`, the name will be `"aggr_value"` (or, if used with `table_neat`, the input variable name is used).
`round_to`	Number of digits after the decimal point to round to, when using `"+sd"` in `method`.

Value

A data.table with the statistics per group, with a single column ("aggr_group") indicating the grouping.

Examples

data("mtcars") # load base R example dataset

# overall means and SDs for wt (Weight)
aggr_neat(mtcars, wt)

# rename column
aggr_neat(mtcars, wt, new_name = 'weight')

# grouped by cyl (Number of cylinders)
aggr_neat(mtcars, wt, group_by = 'cyl')

# grouped by cyl and gear
aggr_neat(mtcars, wt, group_by = c('cyl', 'gear'))

# prefix for group names
aggr_neat(mtcars, wt, group_by = 'cyl', prefix = 'cyl')

# filter to only have cyl larger than  4
aggr_neat(mtcars, wt, group_by = 'cyl', filt = cyl > 4)

# filter to only have hp (Gross horsepower) smaller than  200
aggr_neat(mtcars, wt, group_by = 'cyl', filt = hp < 200)

# combine two filters above, and add prefix
aggr_neat(
    mtcars,
    wt,
    group_by = 'cyl',
    filt = (hp < 200 & cyl > 4),
    prefix = 'filtered'
)

# add SD (and round output numbers to 2)
aggr_neat(mtcars,
          wt,
          group_by = 'cyl',
          method = 'mean+sd',
          round_to = 2)

# now medians instead of means
aggr_neat(mtcars, wt, group_by = 'cyl', method = median)

# with SD
aggr_neat(mtcars,
          wt,
          group_by = 'cyl',
          method = 'median+sd',
          round_to = 1)

# overall ratio of gear 4 (Number of gears)
aggr_neat(mtcars, gear, method = '4')

# overall ratio of gear 4 and 5
aggr_neat(mtcars, gear, method = '4, 5')

# same ratio calculated per each cyl
aggr_neat(mtcars, gear, group_by = 'cyl', method = '4, 5')

# per each cyl and per vs (engine type)
aggr_neat(mtcars,
          gear,
          group_by = c('cyl', 'vs'),
          method = '4, 5')

# ratio of gear 3 per gear 3 and 5
aggr_neat(
    mtcars,
    gear,
    group_by = 'cyl',
    method = '3',
    filt = gear %in% c(3, 5)
)

data("mtcars") # load base R example dataset

# overall means and SDs for wt (Weight)
aggr_neat(mtcars, wt)

# rename column
aggr_neat(mtcars, wt, new_name = 'weight')

# grouped by cyl (Number of cylinders)
aggr_neat(mtcars, wt, group_by = 'cyl')

# grouped by cyl and gear
aggr_neat(mtcars, wt, group_by = c('cyl', 'gear'))

# prefix for group names
aggr_neat(mtcars, wt, group_by = 'cyl', prefix = 'cyl')

# filter to only have cyl larger than  4
aggr_neat(mtcars, wt, group_by = 'cyl', filt = cyl > 4)

# filter to only have hp (Gross horsepower) smaller than  200
aggr_neat(mtcars, wt, group_by = 'cyl', filt = hp < 200)

# combine two filters above, and add prefix
aggr_neat(
    mtcars,
    wt,
    group_by = 'cyl',
    filt = (hp < 200 & cyl > 4),
    prefix = 'filtered'
)

# add SD (and round output numbers to 2)
aggr_neat(mtcars,
          wt,
          group_by = 'cyl',
          method = 'mean+sd',
          round_to = 2)

# now medians instead of means
aggr_neat(mtcars, wt, group_by = 'cyl', method = median)

# with SD
aggr_neat(mtcars,
          wt,
          group_by = 'cyl',
          method = 'median+sd',
          round_to = 1)

# overall ratio of gear 4 (Number of gears)
aggr_neat(mtcars, gear, method = '4')

# overall ratio of gear 4 and 5
aggr_neat(mtcars, gear, method = '4, 5')

# same ratio calculated per each cyl
aggr_neat(mtcars, gear, group_by = 'cyl', method = '4, 5')

# per each cyl and per vs (engine type)
aggr_neat(mtcars,
          gear,
          group_by = c('cyl', 'vs'),
          method = '4, 5')

# ratio of gear 3 per gear 3 and 5
aggr_neat(
    mtcars,
    gear,
    group_by = 'cyl',
    method = '3',
    filt = gear %in% c(3, 5)
)

Comparison of Multiple Means: ANOVA

Description

Analysis of variance (ANOVA) F-test results with appropriate Welch's and epsilon corrections where applicable (unless specified otherwise), including partial eta squared effect sizes with confidence intervals (CIs), generalized eta squared, and inclusion Bayes factor based on matched models (BFs).

Usage

anova_neat(
  data_per_subject,
  values,
  within_ids = NULL,
  between_vars = NULL,
  ci = 0.9,
  norm_tests = "none",
  norm_plots = FALSE,
  var_tests = FALSE,
  bf_added = FALSE,
  bf_sample = 10000,
  test_title = "--- neat ANOVA ---",
  welch = TRUE,
  e_correction = NULL,
  type = 2,
  white.adjust = FALSE,
  hush = FALSE,
  plot_means = FALSE,
  ...
)
anova_neat(
  data_per_subject,
  values,
  within_ids = NULL,
  between_vars = NULL,
  ci = 0.9,
  norm_tests = "none",
  norm_plots = FALSE,
  var_tests = FALSE,
  bf_added = FALSE,
  bf_sample = 10000,
  test_title = "--- neat ANOVA ---",
  welch = TRUE,
  e_correction = NULL,
  type = 2,
  white.adjust = FALSE,
  hush = FALSE,
  plot_means = FALSE,
  ...
)

Arguments

`data_per_subject`	Data frame. Should contain all values (measurements/observations) in a single row per each subject.
`values`	Vector of strings; column name(s) in the `data_per_subject` data frame. Each column should contain a single dependent variable: thus, to test repeated (within-subject) measurements, each specified column should contain one measurement.
`within_ids`	`NULL` (default), string, or named list. In case of no within-subject factors, leave as `NULL`. In case of a single within subject factor, a single string may be given to optionally provide custom name for the within-subject factor (note: this is a programming variable name, so it should not contain spaces, etc.); otherwise (if left `NULL`) this one within-subject factor will always just be named `"within_factor"`. In case of multiple within-subject factors, each factor must be specified as a named list element, each with a vector of strings that distinguish the levels within that factors. The column names given as `values` should always contain one (and only one) of these strings within each within-subject factor, and thus they will be assigned the appropriate level. For example, `values = 'rt_s1_neg, rt_s1_pos, rt_s2_neg, rt_s2_pos'` could have `within_ids = list( session = c('s1', 's2'), valence = c('pos', 'neg')`. (Note: the strings for distinguishing must be unambiguous. E.g., for values `apple_a` and `apple_b`, do not set levels `c('a','b')`, because `'a'` is also found in `apple_b`. In this case, you could choose levels `c('_a','_b')` to make sure the values are correctly distinguished.) See also Examples.
`between_vars`	`NULL` (default; in case of no between-subject factors) or vector of strings; column name(s) in the `data_per_subject` data frame. Each column should contain a single between-subject independent variable (representing between-subject factors).
`ci`	Numeric; confidence level for returned CIs. (Default: `.9`; Lakens, 2014; Steiger, 2004.)
`norm_tests`	Normality tests for the pooled ANOVA residuals (`"none"` by default, giving no tests). Any or all of the following character input is accepted (as a single string or a character vector; case-insensitive): `"W"` (Shapiro-Wilk), `"K2"` (D'Agostino), `"A2"` (Anderson-Darling), `"JB"` (Jarque-Bera); see `norm_tests`. The option `"all"` (or `TRUE`) selects all four previous tests at the same time.
`norm_plots`	If `TRUE`, displays density, histogram, and Q-Q plots (and scatter plots for paired tests) for the pooled residuals.
`var_tests`	Logical, `FALSE` by default. If `TRUE` (and there are any between-subject factors), runs variance equality tests via `var_tests` for all combinations of the between-subject factors within each level of within-subject factor combinations; see Details.
`bf_added`	Logical. If `TRUE` (default), inclusion Bayes factor is calculated and displayed. (Note: with multiple factors and/or larger dataset, the calculation can take considerable time.)
`bf_sample`	Number of samples used to estimate Bayes factor (`10000` by default).
`test_title`	String, `"--- neat ANOVA ---"` by default. Simply displayed in printing preceding the statistics.
`welch`	If `TRUE` (default), calculates Welch's ANOVA via `stats::oneway.test` in case of a single factor (one-way) between-subject design. If `FALSE`, calculates via `ez::ezANOVA` in such cases too (i.e., same as in case of every other design).
`e_correction`	`NULL` (default) or one of the following strings: `'gg'`, `'hf'`, or `'none'`. If set to `'gg'`, Greenhouse-Geisser correction is applied in case of repeated measures (regardless of violation of sphericity). If set to `'hf'`, Huynh-Feldt correction is applied. If set to `'none'`, no correction is applied. If `NULL`, Greenhouse-Geisser correction is applied when Mauchly's sphericity test is significant and the Greenhouse-Geisser epsilon is not larger than `.75`, while Huynh-Feldt correction is applied when Mauchly's sphericity test is significant and the Greenhouse-Geisser epsilon is larger than `.75` (see Girden, 1992).
`type`	Sum of squares type specified as a number: `1`, `2`, or `3`. Set to `2` by default (which is generally recommended, see e.g. Navarro, 2019, Chapter 16).
`white.adjust`	If not `FALSE` (default) uses a heteroscedasticity-corrected coefficient covariance matrix; the various values of the argument specify different corrections (`"hc0"`, `"hc1"`, `"hc2"`, `"hc3"`, or `"hc4"`). See the documentation for `car::hccm` for details. If set to `TRUE` then the `"hc3"` correction is selected.
`hush`	Logical. If `TRUE`, prevents printing any details to console.
`plot_means`	Logical (`FALSE` by default). If `TRUE`, creates plots of means by factor, by passing data and factor information to `plot_neat`.
`...`	Any additional arguments will be passed on to `plot_neat`.

Details

The Bayes factor (BF) is always calculated with the default rscaleFixed of 0.5 ("medium") and rscaleRandom of 1 ("nuisance"). BF supporting null hypothesis is denoted as BF01, while that supporting alternative hypothesis is denoted as BF10. When the BF is smaller than 1 (i.e., supports null hypothesis), the reciprocal is calculated (hence, BF10 = BF, but BF01 = 1/BF). When the BF is greater than or equal to 10000, scientific (exponential) form is reported for readability. (The original full BF number is available in the returned named vector as bf.)

Mauchly's sphericity test is returned for repeated measures with more than two levels. If Mauchly's test is significant, epsilon correction may be applied (see the e_correction parameter).

Variance equality tests (if var_tests is TRUE): Brown-Forsythe and Fligner-Killeen tests are performed for each within-subject factor level (or, in case of several factors, each combination) via var_tests. Note that variance testing is generally not recommended (see e.g., Zimmerman, 2004). In case of a one-way between-subject ANOVA, Welch-corrected results are reported by default, which corrects for unequal variances. In case of multiple between-subject factors, the white.adjust parameter can be set to TRUE (or "hc3") to apply "hc3" correction for unequal variances. In case of a mixed ANOVA (both between-subject and within-subject factors), if the tests are significant and the data is unbalanced (unequal group sample sizes), you should either consider the results respectively or choose a different test.

In case of multiple values (column names) that match identical levels for all factors, the given columns will be merged into a single column taking the mean value of all these columns. (This is to simplify "dropping" a within-subject factor and retesting the remaining factors.) Explicit warning messages are shown in each such case.

Value

Prints ANOVA statistics (including, for each model, F-test with partial eta squared and its CI, generalized eta squared, and BF, as specified via the corresponding parameters) in APA style. Furthermore, when assigned, returns all calculated information details for each effect (as stat_list), normality and variance tests (if any), etc.

Note

The F-tests are calculated via ez::ezANOVA, including Mauchly's sphericity test. (But Welch's ANOVA is calculated in case of one-way between-subject designs via stats::oneway.test, unless the welch parameter is set to FALSE.)

Confidence intervals are calculated, using the F value, via MBESS::conf.limits.ncf, converting noncentrality parameters to partial eta squared as ncp/(ncp+ df_nom+df_denom+1) (Smithson, 2003).

Generalized eta squared is to facilitate potential subsequent meta-analytical comparisons (see Lakens, 2013).

The inclusion Bayes factor based on matched models is calculated via bayestestR::bayesfactor_inclusion, (with match_models = TRUE, and using an BayesFactor::anovaBF object for models input).

References

Girden, E. (1992). ANOVA: Repeated measures. Newbury Park, CA: Sage.

Kelley, K. (2007). Methods for the behavioral, educational, and social sciences: An R package. Behavior Research Methods, 39(4), 979-984. doi:10.3758/BF03192993

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4. https://doi.org/10.3389/fpsyg.2013.00863

Lakens, D. (2014). Calculating confidence intervals for Cohen's d and eta-squared using SPSS, R, and Stata [Blog post]. Retrieved from http://daniellakens.blogspot.com/2014/06/calculating-confidence-intervals-for.html

Mathot. S. (2017). Bayes like a Baws: Interpreting Bayesian Repeated Measures in JASP [Blog post]. Retrieved from https://www.cogsci.nl/blog/interpreting-bayesian-repeated-measures-in-jasp

McDonald, J. H. 2015. Handbook of Biological Statistics (3rd ed.). Sparky House Publishing, Baltimore, Maryland. Retrieved from http://www.biostathandbook.com

Moder, K. (2010). Alternatives to F-test in one way ANOVA in case of heterogeneity of variances (a simulation study). Psychological Test and Assessment Modeling, 52(4), 343-353.

Navarro, D. (2019). Learning Statistics with R: A Tutorial for Psychology Students and Other Beginners (Version 0.6.1). Retrieved from https://learningstatisticswithr.com/

Smithson, M. (2003). Confidence intervals. Thousand Oaks, Calif: Sage Publications.

Steiger, J. H. (2004). Beyond the F test: effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9(2), 164-182. doi:10.1037/1082-989X.9.2.164

Zimmerman, D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology, 57(1), 173–181. https://doi.org/10.1348/000711004849222

Examples

# assign random data in a data frame for illustration
# (note that the 'subject' is only for illustration; since each row contains the
# data of a single subject, no additional subject id is needed)
dat_1 = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    grouping1 = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
    grouping2 = c(1, 2, 1, 2, 2, 1, 1, 1, 2, 1),
    value_1_a = c(36.2, 45.2, 41, 24.6, 30.5, 28.2, 40.9, 45.1, 31, 16.9),
    value_2_a = c(-14.1, 58.5, -25.5, 42.2, -13, 4.4, 55.5, -28.5, 25.6, -37.1),
    value_1_b = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    value_2_b = c(8.024, -14.162, 3.1, -2.1, -1.5, 0.91, 11.53, 18.37, 0.3, -0.59),
    value_1_c = c(27.4,-17.6,-32.7, 0.4, 37.2, 1.7, 18.2, 8.9, 1.9, 0.4),
    value_2_c = c(7.7, -0.8, 2.2, 14.1, 22.1, -47.7, -4.8, 8.6, 6.2, 18.2)
)
head(dat_1) # see what we have

# For example, numbers '1' and '2' in the variable names of the values can
# denote sessions in an experiment, such as '_1' for first session, and '_2 for
# second session'. The letters '_a', '_b', '_c' could denote three different
# types of techniques used within each session, to be compared to each other.
# See further below for a more verbose but more meaningful example data.

# get the between-subject effect of 'grouping1'
anova_neat(dat_1, values = 'value_1_a', between_vars = 'grouping1')

# main effects of 'grouping1', 'grouping2', and their interactions
anova_neat(dat_1,
           values = 'value_1_a',
           between_vars = c('grouping1', 'grouping2'))

# repeated measures:
# get the within-subject effect for 'value_1_a' vs. 'value_1_b'
anova_neat(dat_1, values = c('value_1_a', 'value_1_b'))

# same, but give the factor a custom variable name, and omit BF for speed
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b'),
    within_ids = 'a_vs_b',
    bf_added = FALSE
)
# or
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b'),
    within_ids = 'letters',
    bf_added = FALSE
)

# within-subject effect for 'value_1_a' vs. 'value_1_b' vs. 'value_1_c'
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    bf_added = FALSE
)

# within-subject main effect for 'value_1_a' vs. 'value_1_b' vs. 'value_1_c',
# between-subject main effect 'grouping1', and the interaction of these two main
# effects
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = 'grouping1',
    bf_added = FALSE
)

# within-subject 'number' main effect for variables with number '1' vs. number
# '2' ('value_1_a' and 'value_1_b' vs. 'value_2_a' and 'value_2_b'), 'letter'
# main effect for variables with final letterr 'a' vs. final letter 'b'
# ('value_1_a' and 'value_2_a' vs. 'value_1_b' and 'value_2_b'), and the
# 'letter' x 'number' interaction
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    ),
    bf_added = FALSE
)

# same as above, but now including between-subject main effect 'grouping2' and
# its interactions
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    ),
    between_vars = 'grouping2',
    bf_added = FALSE
)

# same as above, but now creating a plot of means
# y_title passed add an example title (label) for the Y axis
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    ),
    between_vars = 'grouping2',
    bf_added = FALSE,
    plot_means = TRUE,
    y_title = 'Example Y Title'
)

# same as above, but collapsing means over the removed "numbers" factor
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b')
    ),
    between_vars = 'grouping2',
    bf_added = FALSE,
    plot_means = TRUE,
    y_title = 'Example Y Title'
)

# In real datasets, these could of course be more meaningful. For example, let's
# say participants rated the attractiveness of pictures with low or high levels
# of frightening and low or high levels of disgusting qualities. So there are
# four types of ratings:
# 'low disgusting, low frightening' pictures
# 'low disgusting, high frightening' pictures
# 'high disgusting, low frightening' pictures
# 'high disgusting, high frightening' pictures

# this could be meaningfully assigned e.g. as below
pic_ratings = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    rating_fright_low_disgust_low = c(36.2,45.2,41,24.6,30.5,28.2,40.9,45.1,31,16.9),
    rating_fright_high_disgust_low = c(-14.1,58.5,-25.5,42.2,-13,4.4,55.5,-28.5,25.6,-37.1),
    rating_fright_low_disgust_high = c(83,71,111,70,92,75,110,111,110,85),
    rating_fright_high_disgust_high = c(8.024,-14.162,3.1,-2.1,-1.5,0.91,11.53,18.37,0.3,-0.59)
)
head(pic_ratings) # see what we have

# the same logic applies as for the examples above, but now the
# within-subject differences can be more meaningfully specified, e.g.
# 'disgust_low' vs. 'disgust_high' for levels of disgustingness, while
# 'fright_low' vs. 'fright_high' for levels of frighteningness
anova_neat(
    pic_ratings,
    values = c(
        'rating_fright_low_disgust_low',
        'rating_fright_high_disgust_low',
        'rating_fright_low_disgust_high',
        'rating_fright_high_disgust_high'
    ),
    within_ids = list(
        disgustingness = c('disgust_low', 'disgust_high'),
        frighteningness =  c('fright_low', 'fright_high')
    ),
    bf_added = FALSE
)

# the results are the same as for the analogous test for the 'dat_1' data, only
# with different names

# now let's say the ratings were done in two separate groups
pic_ratings = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    group_id = c(1, 2, 1, 2, 2, 1, 1, 1, 2, 1),
    rating_fright_low_disgust_low = c(36.2,45.2,41,24.6,30.5,28.2,40.9,45.1,31,16.9),
    rating_fright_high_disgust_low = c(-14.1,58.5,-25.5,42.2,-13,4.4,55.5,-28.5,25.6,-37.1),
    rating_fright_low_disgust_high = c(83,71,111,70,92,75,110,111,110,85),
    rating_fright_high_disgust_high = c(8.024,-14.162,3.1,-2.1,-1.5,0.91,11.53,18.37,0.3,-0.59)
)

# now test the effect and interactions of 'group_id'
anova_neat(
    pic_ratings,
    values = c(
        'rating_fright_low_disgust_low',
        'rating_fright_high_disgust_low',
        'rating_fright_low_disgust_high',
        'rating_fright_high_disgust_high'
    ),
    within_ids = list(
        disgustingness = c('disgust_low', 'disgust_high'),
        frighteningness =  c('fright_low', 'fright_high')
    ),
    between_vars = 'group_id',
    bf_added = FALSE
)

# again, same results as with 'dat_1' (using 'grouping2' as group_id)

# assign random data in a data frame for illustration
# (note that the 'subject' is only for illustration; since each row contains the
# data of a single subject, no additional subject id is needed)
dat_1 = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    grouping1 = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
    grouping2 = c(1, 2, 1, 2, 2, 1, 1, 1, 2, 1),
    value_1_a = c(36.2, 45.2, 41, 24.6, 30.5, 28.2, 40.9, 45.1, 31, 16.9),
    value_2_a = c(-14.1, 58.5, -25.5, 42.2, -13, 4.4, 55.5, -28.5, 25.6, -37.1),
    value_1_b = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    value_2_b = c(8.024, -14.162, 3.1, -2.1, -1.5, 0.91, 11.53, 18.37, 0.3, -0.59),
    value_1_c = c(27.4,-17.6,-32.7, 0.4, 37.2, 1.7, 18.2, 8.9, 1.9, 0.4),
    value_2_c = c(7.7, -0.8, 2.2, 14.1, 22.1, -47.7, -4.8, 8.6, 6.2, 18.2)
)
head(dat_1) # see what we have

# For example, numbers '1' and '2' in the variable names of the values can
# denote sessions in an experiment, such as '_1' for first session, and '_2 for
# second session'. The letters '_a', '_b', '_c' could denote three different
# types of techniques used within each session, to be compared to each other.
# See further below for a more verbose but more meaningful example data.

# get the between-subject effect of 'grouping1'
anova_neat(dat_1, values = 'value_1_a', between_vars = 'grouping1')

# main effects of 'grouping1', 'grouping2', and their interactions
anova_neat(dat_1,
           values = 'value_1_a',
           between_vars = c('grouping1', 'grouping2'))

# repeated measures:
# get the within-subject effect for 'value_1_a' vs. 'value_1_b'
anova_neat(dat_1, values = c('value_1_a', 'value_1_b'))

# same, but give the factor a custom variable name, and omit BF for speed
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b'),
    within_ids = 'a_vs_b',
    bf_added = FALSE
)
# or
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b'),
    within_ids = 'letters',
    bf_added = FALSE
)

# within-subject effect for 'value_1_a' vs. 'value_1_b' vs. 'value_1_c'
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    bf_added = FALSE
)

# within-subject main effect for 'value_1_a' vs. 'value_1_b' vs. 'value_1_c',
# between-subject main effect 'grouping1', and the interaction of these two main
# effects
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = 'grouping1',
    bf_added = FALSE
)

# within-subject 'number' main effect for variables with number '1' vs. number
# '2' ('value_1_a' and 'value_1_b' vs. 'value_2_a' and 'value_2_b'), 'letter'
# main effect for variables with final letterr 'a' vs. final letter 'b'
# ('value_1_a' and 'value_2_a' vs. 'value_1_b' and 'value_2_b'), and the
# 'letter' x 'number' interaction
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    ),
    bf_added = FALSE
)

# same as above, but now including between-subject main effect 'grouping2' and
# its interactions
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    ),
    between_vars = 'grouping2',
    bf_added = FALSE
)

# same as above, but now creating a plot of means
# y_title passed add an example title (label) for the Y axis
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    ),
    between_vars = 'grouping2',
    bf_added = FALSE,
    plot_means = TRUE,
    y_title = 'Example Y Title'
)

# same as above, but collapsing means over the removed "numbers" factor
anova_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b')
    ),
    between_vars = 'grouping2',
    bf_added = FALSE,
    plot_means = TRUE,
    y_title = 'Example Y Title'
)

# In real datasets, these could of course be more meaningful. For example, let's
# say participants rated the attractiveness of pictures with low or high levels
# of frightening and low or high levels of disgusting qualities. So there are
# four types of ratings:
# 'low disgusting, low frightening' pictures
# 'low disgusting, high frightening' pictures
# 'high disgusting, low frightening' pictures
# 'high disgusting, high frightening' pictures

# this could be meaningfully assigned e.g. as below
pic_ratings = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    rating_fright_low_disgust_low = c(36.2,45.2,41,24.6,30.5,28.2,40.9,45.1,31,16.9),
    rating_fright_high_disgust_low = c(-14.1,58.5,-25.5,42.2,-13,4.4,55.5,-28.5,25.6,-37.1),
    rating_fright_low_disgust_high = c(83,71,111,70,92,75,110,111,110,85),
    rating_fright_high_disgust_high = c(8.024,-14.162,3.1,-2.1,-1.5,0.91,11.53,18.37,0.3,-0.59)
)
head(pic_ratings) # see what we have

# the same logic applies as for the examples above, but now the
# within-subject differences can be more meaningfully specified, e.g.
# 'disgust_low' vs. 'disgust_high' for levels of disgustingness, while
# 'fright_low' vs. 'fright_high' for levels of frighteningness
anova_neat(
    pic_ratings,
    values = c(
        'rating_fright_low_disgust_low',
        'rating_fright_high_disgust_low',
        'rating_fright_low_disgust_high',
        'rating_fright_high_disgust_high'
    ),
    within_ids = list(
        disgustingness = c('disgust_low', 'disgust_high'),
        frighteningness =  c('fright_low', 'fright_high')
    ),
    bf_added = FALSE
)

# the results are the same as for the analogous test for the 'dat_1' data, only
# with different names

# now let's say the ratings were done in two separate groups
pic_ratings = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    group_id = c(1, 2, 1, 2, 2, 1, 1, 1, 2, 1),
    rating_fright_low_disgust_low = c(36.2,45.2,41,24.6,30.5,28.2,40.9,45.1,31,16.9),
    rating_fright_high_disgust_low = c(-14.1,58.5,-25.5,42.2,-13,4.4,55.5,-28.5,25.6,-37.1),
    rating_fright_low_disgust_high = c(83,71,111,70,92,75,110,111,110,85),
    rating_fright_high_disgust_high = c(8.024,-14.162,3.1,-2.1,-1.5,0.91,11.53,18.37,0.3,-0.59)
)

# now test the effect and interactions of 'group_id'
anova_neat(
    pic_ratings,
    values = c(
        'rating_fright_low_disgust_low',
        'rating_fright_high_disgust_low',
        'rating_fright_low_disgust_high',
        'rating_fright_high_disgust_high'
    ),
    within_ids = list(
        disgustingness = c('disgust_low', 'disgust_high'),
        frighteningness =  c('fright_low', 'fright_high')
    ),
    between_vars = 'group_id',
    bf_added = FALSE
)

# again, same results as with 'dat_1' (using 'grouping2' as group_id)

CI from p value

Description

Calculates approximate confidence interval (CI) for any given difference, based on the difference value and p value, according to Altman & Bland (2011).

Usage

ci_from_p(diff, p_value, ci = 0.95)
ci_from_p(diff, p_value, ci = 0.95)

Arguments

`diff`	Difference (number) around which the CI is to be calculated.
`p_value`	The p value for CI calculation.
`ci`	Numeric; confidence level for the returned CIs (`.95` per default).

Value

CI limits as named numeric vector with two elements.

Note

Check the Altman & Bland (2011) paper for details! (The calculation for proportions is not implemented here.)

References

Altman, D. G., & Bland, J. M. (2011). How to obtain the confidence interval from a P value. Bmj, 343(d2090). doi:10.1136/bmj.d2090

Examples


# Example 1
# calculate proportion difference test
proptest_stat = prop.test(x = c(49, 40), n = c(50, 50))

# calculate proportion difference
my_diff = 49/50-40/50

# calculate approximate CI
ci_from_p(my_diff, proptest_stat$p.value)

# returned CI should be very similar to the actual CI
proptest_stat$conf.int


# Example 2
# generate random data
v1 = stats::rnorm(190, 40, 60)
v2 = stats::rnorm(170, 50, 45)

# calculate t-test
ttest_stat = stats::t.test(v1, v2)

# calculate mean difference
my_diff = mean(v1) - mean(v2)

# calculate approximate CI
ci_from_p(my_diff, ttest_stat$p.value)

# returned CI should be similar to the actual CI
ttest_stat$conf.int

# Example 1
# calculate proportion difference test
proptest_stat = prop.test(x = c(49, 40), n = c(50, 50))

# calculate proportion difference
my_diff = 49/50-40/50

# calculate approximate CI
ci_from_p(my_diff, proptest_stat$p.value)

# returned CI should be very similar to the actual CI
proptest_stat$conf.int


# Example 2
# generate random data
v1 = stats::rnorm(190, 40, 60)
v2 = stats::rnorm(170, 50, 45)

# calculate t-test
ttest_stat = stats::t.test(v1, v2)

# calculate mean difference
my_diff = mean(v1) - mean(v2)

# calculate approximate CI
ci_from_p(my_diff, ttest_stat$p.value)

# returned CI should be similar to the actual CI
ttest_stat$conf.int

Correlation Statistics

Description

Pearson correlation results including confidence interval (CI) and correlation Bayes factor (BF). For non-parametric version, Spearman's rank correlation results along with corresponding rank-based BFs (as per van Doorn et al., 2020).

Usage

corr_neat(
  var1,
  var2,
  nonparametric = FALSE,
  ci = 0.95,
  bf_added = FALSE,
  direction = NULL,
  round_r = 3,
  for_table = FALSE,
  sb_correction = FALSE,
  hush = FALSE
)
corr_neat(
  var1,
  var2,
  nonparametric = FALSE,
  ci = 0.95,
  bf_added = FALSE,
  direction = NULL,
  round_r = 3,
  for_table = FALSE,
  sb_correction = FALSE,
  hush = FALSE
)

Arguments

`var1`	Numeric vector; numbers of the first variable.
`var2`	Numeric vector; numbers of the second variable.
`nonparametric`	Logical (`FALSE` by default). If `TRUE`, uses nonparametric tests (Spearman's rank correlation, including BFs; see Details).
`ci`	Numeric; confidence level for the returned CI, as implemented in `cor.test`.
`bf_added`	Logical. If `TRUE` (default), Bayes factor is calculated and displayed.
`direction`	`NULL` or string; optionally specifies one-sided test: either "negative" (negative correlation expected) or "positive" (positive correlation expected). (Short forms also work, e.g. "p", "pos", "neg", etc.) If `NULL` (default), the test is two-sided.
`round_r`	Number `to round` to the correlation and its CI.
`for_table`	Logical. If `TRUE`, omits the confidence level display from the printed text.
`sb_correction`	Logical. If `TRUE`, applies Spearman-Brown correction (`2 * r / (1+r)`) to the correlation (including CI).
`hush`	Logical. If `TRUE`, prevents printing any details to console.

Details

The Bayes factor (BF) is calculated with the default r-scale of 1/3 for parametric test, and with the default r-scale of 1 for nonparametric test. BF supporting null hypothesis is denoted as BF01, while that supporting alternative hypothesis is denoted as BF10. When the BF is smaller than 1 (i.e., supports null hypothesis), the reciprocal is calculated (hence, BF10 = BF, but BF01 = 1/BF). When the BF is greater than or equal to 10000, scientific (exponential) form is reported for readability. (The original full BF number is available in the returned named vector as bf.)#'

Value

Prints correlation statistics (including CI and BF) in APA style. Furthermore, when assigned, returns a named vector with the following elements: r (Pearson correlation), p (p value), bf (Bayes factor).

Note

The correlation and CI is calculated via stats::cor.test.

The parametric Bayes factor is calculated via BayesFactor::correlationBF. The nonparametric (rank-based) Bayes factor is a contribution by Johnny van Doorn; the original source code is available via https://osf.io/gny35/.

References

Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 1904-1920, 3(3), 296-322. doi:10.1111/j.2044-8295.1910.tb00207.x

Eisinga, R., Grotenhuis, M. te, & Pelzer, B. (2013). The reliability of a two-item scale: Pearson, Cronbach, or Spearman-Brown? International Journal of Public Health, 58(4), 637-642. doi:10.1007/s00038-012-0416-3

Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 1904-1920, 3(3), 271-295. doi:10.1111/j.2044-8295.1910.tb00206.x

van Doorn, J., Ly, A., Marsman, M., & Wagenmakers, E.-J. (2020). Bayesian rank-based hypothesis testing for the rank sum test, the signed rank test, and Spearman’s rho. Journal of Applied Statistics, 1–23. doi:10.1080/02664763.2019.1709053

Examples

# assign two variables
v1 = c(11, 15, 19, 43, 53, -4, 34, 8, 33, -1, 54 )
v2 = c(4, -2, 23, 13, 32, 16, 3, 29, 37, -4, 65 )

corr_neat(v1, v2) # prints statistics

# one-sided, and omitting the "95% CI" part
corr_neat(v1, v2, direction = 'pos', for_table = TRUE)

# print statistics and assign main results
results = corr_neat(v1, v2, direction = 'pos')

results['p'] # get precise p value
# assign two variables
v1 = c(11, 15, 19, 43, 53, -4, 34, 8, 33, -1, 54 )
v2 = c(4, -2, 23, 13, 32, 16, 3, 29, 37, -4, 65 )

corr_neat(v1, v2) # prints statistics

# one-sided, and omitting the "95% CI" part
corr_neat(v1, v2, direction = 'pos', for_table = TRUE)

# print statistics and assign main results
results = corr_neat(v1, v2, direction = 'pos')

results['p'] # get precise p value

Demographics

Description

Prints participant count, age mean and SD, and gender ratio, from given dataset.

Usage

dems_neat(
  data_per_subject,
  group_by = NULL,
  gender_col = NULL,
  age_col = NULL,
  male = "male",
  female = "female",
  percent = FALSE,
  round_perc = 0,
  show_fem = NULL,
  age_range = FALSE,
  age_min = NULL,
  age_max = NULL
)
dems_neat(
  data_per_subject,
  group_by = NULL,
  gender_col = NULL,
  age_col = NULL,
  male = "male",
  female = "female",
  percent = FALSE,
  round_perc = 0,
  show_fem = NULL,
  age_range = FALSE,
  age_min = NULL,
  age_max = NULL
)

Arguments

`data_per_subject`	Data frame from which demographics are to be calculated. Should contain columns named as "`age`" and as "`gender`" (or, alternatively, "`sex`"). Alternatively, these columns can be specified via the `gender_col` and `age_col` parameters. The `age` column must contain only numbers or `NA`, while `gender` column must contain only `1` (= male) or `2` (= female), either as numbers or as strings, or `NA`. Alternatively, different gender coding can be set via the parameters `male` and `female` (but `1`/`2` will be checked for first in any case).
`group_by`	Optionally the name(s) of column(s) from the data frame provided as `data_per_subject` to group by.
`gender_col`	Optionally the name of column from the data frame that contains the gender (sex) information.
`age_col`	Optionally the name of column from the data frame that contains the age information.
`male`	Alternative code for male: by default, it is the string `"male"`. Whatever string is given, its abbreviations will also be accepted (e.g. `"m"`). (Lettercases do not matter, e.g. `Male` or `MALE` are both evaluated same as `male`.)
`female`	Alternative code for female: by default, it is the string `"female"`. Whatever string is given, its abbreviations will also be accepted (e.g. `"fem"`). (Lettercases do not matter.)
`percent`	Logical. If `TRUE`, gender ratios (and the "unknown" ratios based on `NA` values) are presented as percentage. If `FALSE`, they are presented as counts (i.e., numbers of subjects).
`round_perc`	Number `to round` to, when using percentages.
`show_fem`	Logical or `NULL`. If `TRUE`, the numbers of both male and female are displayed. If `FALSE`, only the number of males is displayed. If `NULL` (default), only the number of males is displayed when there are no unknown cases, but both numbers are displayed when there are any unknown cases.
`age_range`	Logical, `FALSE` by default. If `TRUE`, also displays age range per group (minimum and maximum ages).
`age_min`	If numeric given, removes all ages below (exclusive!) the given number before any age calculation.#'
`age_max`	If numeric given, removes all ages above (exclusive!) the given number before any age calculation.

Details

If gender_col and/or age_col are not specified, the function will first look for columns named precisely "age" and as "gender". If either is not found, the function looks for the same names but with any lettercase (e.g. "AGE" or "Gender"). If still no "gender" column is found, the function looks for "sex" column in the same manner. If no column is found for either, all related values will be counted as "unknown" (NA).

If NA values are found in either the age or gender column, the ratio (or count) of unknown cases will be displayed everywhere. Otherwise it will simply not be displayed anywhere.

Examples

# below is an illustrative example dataset
# (the "subject" and "measure_x" columns are not used in the function)
dat = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    conditions = c('x', 'y', 'x', 'y', 'y', 'x', 'x', 'x', 'y', 'x'),
    gender = c(2, 2, 1, 2, 1, 2, 2, 2, 1, 1),
    age = c(6, 7, 8.5, 6, 5, 16.5, 17, 16, 45.8, 77),
    measure_x = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    stringsAsFactors = TRUE
)

# print demographics (age and gender) per "conditions":
dems_neat(dat, group_by = 'conditions')

# replace unlikely ages with NAs
dems_neat(dat,
          group_by = 'conditions',
          age_min = 8,
          age_max = 50)

# remove only high values, and display age ranges
dems_neat(dat,
          group_by = 'conditions',
          age_max = 45,
          age_range = TRUE)

# another dataset, with some missing values
dat = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    conditions = c('x', 'y', 'x', 'y', 'y', 'x', 'x', 'x', 'y', 'x'),
    gender = c(2, 2, NA, NA, 1, 1, 1, 2, NA, NA),
    age = c(6, 7, 8.5, 6, 5, 16, NA, 16, 45, 77),
    measure_x = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    stringsAsFactors = TRUE
)
# again print demographics per "conditions":
dems_neat(dat, group_by = 'conditions')

# another dataset, with no "age"/"gender" columns
dat = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    conditions = c('x', 'y', 'x', 'y', 'y', 'x', 'x', 'x', 'y', 'x'),
    geschlecht = c(2, 2, NA, NA, 1, 1, 1, 2, NA, NA),
    alter = c(6, 7, 8.5, 6, 5, 16, NA, 16, 45, 77),
    measure_y = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    stringsAsFactors = TRUE
)

# the following will return "unknowns"
dems_neat(dat, group_by = 'conditions')

# gender column specified
dems_neat(dat, group_by = 'conditions', gender_col = 'geschlecht')

# both columns specified
dems_neat(dat,
          group_by = 'conditions',
          age_col = 'alter',
          gender_col = 'geschlecht')

# below is an illustrative example dataset
# (the "subject" and "measure_x" columns are not used in the function)
dat = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    conditions = c('x', 'y', 'x', 'y', 'y', 'x', 'x', 'x', 'y', 'x'),
    gender = c(2, 2, 1, 2, 1, 2, 2, 2, 1, 1),
    age = c(6, 7, 8.5, 6, 5, 16.5, 17, 16, 45.8, 77),
    measure_x = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    stringsAsFactors = TRUE
)

# print demographics (age and gender) per "conditions":
dems_neat(dat, group_by = 'conditions')

# replace unlikely ages with NAs
dems_neat(dat,
          group_by = 'conditions',
          age_min = 8,
          age_max = 50)

# remove only high values, and display age ranges
dems_neat(dat,
          group_by = 'conditions',
          age_max = 45,
          age_range = TRUE)

# another dataset, with some missing values
dat = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    conditions = c('x', 'y', 'x', 'y', 'y', 'x', 'x', 'x', 'y', 'x'),
    gender = c(2, 2, NA, NA, 1, 1, 1, 2, NA, NA),
    age = c(6, 7, 8.5, 6, 5, 16, NA, 16, 45, 77),
    measure_x = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    stringsAsFactors = TRUE
)
# again print demographics per "conditions":
dems_neat(dat, group_by = 'conditions')

# another dataset, with no "age"/"gender" columns
dat = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    conditions = c('x', 'y', 'x', 'y', 'y', 'x', 'x', 'x', 'y', 'x'),
    geschlecht = c(2, 2, NA, NA, 1, 1, 1, 2, NA, NA),
    alter = c(6, 7, 8.5, 6, 5, 16, NA, 16, 45, 77),
    measure_y = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    stringsAsFactors = TRUE
)

# the following will return "unknowns"
dems_neat(dat, group_by = 'conditions')

# gender column specified
dems_neat(dat, group_by = 'conditions', gender_col = 'geschlecht')

# both columns specified
dems_neat(dat,
          group_by = 'conditions',
          age_col = 'alter',
          gender_col = 'geschlecht')

Enumerate

Description

Aids enumeration and merging (via rbind_loop) in loops: adds numbers to a vector input, and indicates loop start for rbind_loop.

Usage

enum(items, hush = FALSE, enumerate = TRUE)
enum(items, hush = FALSE, enumerate = TRUE)

Arguments

`items`	The items to be enumerated in the loop.
`hush`	Logical. If `TRUE` (default), prints "Loop started." when executed.
`enumerate`	Logical. If `TRUE` (default), adds numbering to the input vector (in item pairs, see Examples). If `FALSE`, returns the original input.

Value

Vector with numbers added (if so set).

Examples


my_vector = c('aa', 'bb', 'cxyz', 'last')

for (item in enum(my_vector)) {
    print(item)
}

# just to show what enum() returns
enum(my_vector)

my_vector = c('aa', 'bb', 'cxyz', 'last')

for (item in enum(my_vector)) {
    print(item)
}

# just to show what enum() returns
enum(my_vector)

Exclusion

Description

Filters dataset by rows (normally: subjects, observations) and prints the numbers of excluded rows and remaining rows. Returns the filtered dataset and (optionally) also the excluded rows.

Usage

excl_neat(
  dat,
  filt,
  excluded = FALSE,
  group_by = NULL,
  sort_by = "exclusion",
  hush = FALSE
)
excl_neat(
  dat,
  filt,
  excluded = FALSE,
  group_by = NULL,
  sort_by = "exclusion",
  hush = FALSE
)

Arguments

`dat`	Data frame to be filtered.
`filt`	An expression to use for filtering, by column values, the `dat` data frame. (Only the rows for which the filter expression is `TRUE` will be kept.)
`excluded`	Logical; `FALSE` by default. If `TRUE`, the function returns not only the filtered data frame, but also a data frame containing the excluded rows. The returned object in this case will be a list with two elements: (1) the filtered data frame named `filtered`, and (2) the data frame with excluded rows named `excluded` (see Examples).
`group_by`	String, or vector of strings: the name(s) of the column(s) in the `dat` data frame, containing the vector(s) of factors by which the printed counts are grouped.
`sort_by`	String; specifies whether the printed counts should be sorted by exclusion (default; `"exclusion"` or its short forms, e.g. `"excl"`), or by the factors given for `group_by` (for this, give any other string, e.g. `"conditions"`). If `NULL` (default).
`hush`	Logical. If `TRUE`, prevents printing counts to console.

Value

A data frame with the rows for which the filt expression is TRUE, or, optionally, a list with this data frame plus a data frame with the excluded rows. At the same time, prints, by default, the count of remaining and excluded rows.

Examples


data("mtcars") # load base R example dataset

# filter mtcars for mpg > 20
excl_neat(mtcars, mpg > 20)

# assign the same
mtcars_filtered = excl_neat(mtcars, mpg > 20)
# (mtcars_filtered now contains the filtered subset)

# return and assign excluded rows too
mtcars_filtered_plus_excluded = excl_neat(mtcars, mpg > 20, excluded = TRUE)

# print filtered data frame
print(mtcars_filtered_plus_excluded$filtered)

# print data frame with excluded rows
print(mtcars_filtered_plus_excluded$excluded)

# group printed count by cyl
excl_neat(mtcars, mpg > 20, group_by = 'cyl')

# sort output by grouping
excl_neat(mtcars, mpg > 20, group_by = 'cyl', sort_by = 'group')

# group by cyl amd carb
excl_neat(mtcars, mpg > 15, group_by = c('cyl', 'carb'))

# longer filter expression
excl_neat(mtcars, mpg > 15 & gear == 4, group_by = 'cyl',)

data("mtcars") # load base R example dataset

# filter mtcars for mpg > 20
excl_neat(mtcars, mpg > 20)

# assign the same
mtcars_filtered = excl_neat(mtcars, mpg > 20)
# (mtcars_filtered now contains the filtered subset)

# return and assign excluded rows too
mtcars_filtered_plus_excluded = excl_neat(mtcars, mpg > 20, excluded = TRUE)

# print filtered data frame
print(mtcars_filtered_plus_excluded$filtered)

# print data frame with excluded rows
print(mtcars_filtered_plus_excluded$excluded)

# group printed count by cyl
excl_neat(mtcars, mpg > 20, group_by = 'cyl')

# sort output by grouping
excl_neat(mtcars, mpg > 20, group_by = 'cyl', sort_by = 'group')

# group by cyl amd carb
excl_neat(mtcars, mpg > 15, group_by = c('cyl', 'carb'))

# longer filter expression
excl_neat(mtcars, mpg > 15 & gear == 4, group_by = 'cyl',)

Confidence Interval of Mean

Description

Calculates confidence interval of a vector of numbers.

Usage

mean_ci(x, distance_only = TRUE, ci = 0.95)
mean_ci(x, distance_only = TRUE, ci = 0.95)

Arguments

`x`	Numeric vector.
`distance_only`	Logical. If `TRUE` (default), the function returns only the distance between the mean and either confidence interval limit. Otherwise returns the confidence interval (i.e., both limits).
`ci`	Numeric; confidence level for returned CI.

Value

Distance of limit or confidence interval (as named vector).

Examples

myvec = c(11, 15, 19, 43, 53, -4, 34, 8, 33, -1, 54 )
mean_ci( myvec, FALSE )
mean_ci( myvec, FALSE, ci = .80 )
mean_ci( myvec, ci = .80 )

myvec = c(11, 15, 19, 43, 53, -4, 34, 8, 33, -1, 54 )
mean_ci( myvec, FALSE )
mean_ci( myvec, FALSE, ci = .80 )
mean_ci( myvec, ci = .80 )

Monitor Screen Unit Conversion

Description

Given a specific monitor object, converts specified screen units to other specified units. The possible units to convert from and to: "cm" (centimeters), "pix" (pixels), or "deg" (degrees of visual angle).

Usage

mon_conv(mon_obj, value, from, to)
mon_conv(mon_obj, value, from, to)

Arguments

`mon_obj`	Monitor object, as assigned with `mon_neat`.
`value`	Number; value (magnitude) of the given unit to convert from. (Can be vector as well.)
`from`	String; unit ("cm", "pix", or "deg") to convert from.
`to`	String; unit ("cm", "pix", or "deg") to convert to.

Value

Number (magnitude) in the given output (to) unit.

Examples

# assign monitor with 50 cm distance, screen width 52 cm and 1920 pixels
my_mon = mon_neat(distance = 50, mon_width_cm = 52, mon_width_pixel = 1920)

# convert 30.4 pixels to degrees of visual angle, for the specified monitor
mon_conv(my_mon, 30.4, 'pix', 'deg') # returns 0.9434492 (degrees)

# convert 0.94 degrees of visual angle to pixels
mon_conv(my_mon, 0.94, 'deg', 'pix') # returns 30.28885 (pixels)

# convert 10 degrees of visual angle to cm
mon_conv(my_mon, 10, from = 'deg', to = 'cm')

# convert 8.748866 cm to pixels
mon_conv(my_mon, 8.748866, from = 'cm', to = 'pix')
# assign monitor with 50 cm distance, screen width 52 cm and 1920 pixels
my_mon = mon_neat(distance = 50, mon_width_cm = 52, mon_width_pixel = 1920)

# convert 30.4 pixels to degrees of visual angle, for the specified monitor
mon_conv(my_mon, 30.4, 'pix', 'deg') # returns 0.9434492 (degrees)

# convert 0.94 degrees of visual angle to pixels
mon_conv(my_mon, 0.94, 'deg', 'pix') # returns 30.28885 (pixels)

# convert 10 degrees of visual angle to cm
mon_conv(my_mon, 10, from = 'deg', to = 'cm')

# convert 8.748866 cm to pixels
mon_conv(my_mon, 8.748866, from = 'cm', to = 'pix')

Monitor Object

Description

Assigns a monitor object, storing distance and width parameters.

Usage

mon_neat(distance, mon_width_cm, mon_width_pixel)
mon_neat(distance, mon_width_cm, mon_width_pixel)

Arguments

`distance`	Viewing distance in cm (from eyes to screen).
`mon_width_cm`	Monitor screen width in cm.
`mon_width_pixel`	Monitor screen width in pixels.

Value

A monitor object with the specified parameters, to be used in the mon_conv function.

Examples

# assign monitor with 57 cm viewing distance, screen width 52 cm and 1920 pixels
my_mon = mon_neat(distance = 57, mon_width_cm = 52, mon_width_pixel = 1920)
# assign monitor with 57 cm viewing distance, screen width 52 cm and 1920 pixels
my_mon = mon_neat(distance = 57, mon_width_cm = 52, mon_width_pixel = 1920)

Normality Tests and Plots

Description

Performs normality tests and creates related plots (histogram, density, Q-Q). This is primarily a subfunction of t_neat and anova_neat, but here it is available separately for other potential purposes.

Usage

norm_tests(
  var1,
  var2 = NULL,
  pair = FALSE,
  norm_tests = "all",
  alpha = 0.05,
  plots = FALSE,
  aspect_ratio = 1,
  hush = FALSE
)
norm_tests(
  var1,
  var2 = NULL,
  pair = FALSE,
  norm_tests = "all",
  alpha = 0.05,
  plots = FALSE,
  aspect_ratio = 1,
  hush = FALSE
)

Arguments

`var1`	Numeric vector; numbers of any given variable.
`var2`	Optional numeric vector (or `NULL`); numbers of a second variable.
`pair`	Logical; only matters if `var2` is not null. In that case, if `TRUE` each normality test is performed for the difference values between the two variables in case of paired samples, or, if `FALSE`, separately for each of the two variables for unpaired samples.
`norm_tests`	Normality tests. Any or all of the following character input is accepted (as a single string or a character vector; case-insensitive): `"W"` (Shapiro-Wilk), `"K2"` (D'Agostino), `"A2"` (Anderson-Darling), `"JB"` (Jarque-Bera); see Notes. The option `"all"` (default value) selects all four previous tests at the same time.
`alpha`	Numeric (`.05` by default), alpha level: if any p value if below this alpha level, the function returns `TRUE`, otherwise `FALSE`.
`plots`	Logical: if `TRUE` adds histogram, density, and Q-Q plots. (Note: in case of paired samples, Q-Q plots are plotted on a separate figure. In RStudio, press on "Previous plot" under "Plots" to see these Q-Q plots.)
`aspect_ratio`	Aspect ratio of the plots: `1` (`1`/`1`) by default. (Set to `NULL` for dynamic aspect ratio.)
`hush`	Logical. If `TRUE`, prevents printing any details to console.

Value

Prints normality tests, and displays plots if so specified. Returns TRUE if any of the specified tests has p value below the specified alpha, otherwise returns FALSE.

Note

Normality tests are all calculated via fBasics::NormalityTests, selected based on the recommendation of Lakens (2015), quoting Yap and Sim (2011, p. 2153): "If the distribution is symmetric with low kurtosis values (i.e. symmetric short-tailed distribution), then the D'Agostino and Shapiro-Wilkes tests have good power. For symmetric distribution with high sample kurtosis (symmetric long-tailed), the researcher can use the JB, Shapiro-Wilkes, or Anderson-Darling test." See urlhttps://github.com/Lakens/perfect-t-test for more details.

References

Lakens, D. (2015). The perfect t-test (version 1.0.0). Retrieved from https://github.com/Lakens/perfect-t-test. doi:10.5281/zenodo.17603

Yap, B. W., & Sim, C. H. (2011). Comparisons of various types of normality tests. Journal of Statistical Computation and Simulation, 81(12), 2141–2155. doi:10.1080/00949655.2010.520163

Examples


norm_tests(stats::rnorm(100))
# should be normal...

norm_tests(stats::rnorm(100))
# should be normal...

Script Path

Description

Gives, in RStudio, the path to the script file in which it is executed.

Usage

path_neat(subdir = "")
path_neat(subdir = "")

Arguments

subdir

String, optional. Subdirectory relative to the script's path.

Value

Script file's path as string. If subdir is given, it is appended to the original path.

Examples


# assuming the given script is at path "C:/script_folder/"
path_neat('') # returns "C:/script_folder/"
path_neat('my_subdir/misc/') # returns "C:/script_folder/my_subdir/misc/"

# Note: the returned string can be used as argument for base::setwd()
# e.g. setwd( path_neat() ) # sets working directory to the script's path

# assuming the given script is at path "C:/script_folder/"
path_neat('') # returns "C:/script_folder/"
path_neat('my_subdir/misc/') # returns "C:/script_folder/my_subdir/misc/"

# Note: the returned string can be used as argument for base::setwd()
# e.g. setwd( path_neat() ) # sets working directory to the script's path

Cursory Summaries and Plots per Group

Description

Cursory summaries and plots per group.

Usage

peek_neat(
  dat,
  values,
  group_by = NULL,
  filt = NULL,
  sep = ", ",
  collapse = NULL,
  f_print = FALSE,
  f_plot = NULL,
  iqr_times = 3,
  round_to = 4,
  group_n = TRUE,
  ...
)
peek_neat(
  dat,
  values,
  group_by = NULL,
  filt = NULL,
  sep = ", ",
  collapse = NULL,
  f_print = FALSE,
  f_plot = NULL,
  iqr_times = 3,
  round_to = 4,
  group_n = TRUE,
  ...
)

Arguments

`dat`	Data frame (or name of data frame as string).
`values`	String, or vector of strings: the name(s) of the column(s) in the `dat` data frame, containing the vector(s) of values.
`group_by`	String, or vector of strings: the name(s) of the column(s) in the `dat` data frame, containing the vector(s) of factors by which the statistics are grouped.
`filt`	An expression to filter, by column values, the entire `dat` data frame before performing the aggregation. The expression should use column names alone; see Examples.
`sep`	String (comma by default) for separating group names.
`collapse`	Decides how to handle multiple columns of `values`. If `NULL` (default), displays each column of values as separate groups. Alternatively, any function can be given using which the columns are collapsed into a single column. For example, if `mean` is given for this parameter, a single column will be calculated based on the means of all given values columns. (`NA` is always ignored.)
`f_print`	Printing function; see details.
`f_plot`	Plotting function; see details. (Provide string to skip plotting.)
`iqr_times`	The multiplication of IQR to calculate Tukey's fences, when using default `f_print` function (set to `TRUE`); see Details. The default is `3` to spot Tukey's "far outliers" (e.g. by comparing the fences with min and max values). (Note that the usual fences, e.g. for box plots, use `1.5`).
`round_to`	Number of `significant fractional digits to round to`, when using default `f_print` function (set to `TRUE`).
`group_n`	Logical. If `TRUE`, adds sample sizes (`n`) per group to plots when using default `f_plot` (`NULL`).
`...`	Any arguments to be passed to the `f_plot` function.

Details

If set to TRUE, prints to console the following data (per group): mean; 95 median; quantile_1st and quantile_3rd (first and third quantiles); "Tukey's fences" as fence_low and fence_upp; minimum and maximum values (min, max); number of NAs (na). Tukey's fences are the upper and lower limits with distances of X times the IQR from the actual IQR, where X is specified via the iqr_times parameter. Returns (invisibly) the same values, unrounded, via a data frame. If alternative f_print is given, prints whatever value is returned from the given function (and attempts, if possible, to create a data frame).

Creates and displays box plot(s) (per group) by default, along with overlayed violin plot (densities proportionate to sample sizes). If alternative f_plot is given, the first argument will be the values per group, and all plots will be arranged into a single plot and displayed together. To skip plotting, just give any character as argument (e.g. "none" or just "").

Value

Data frame with the printed values (if possible).

Examples


data("mtcars") # load base R example dataset

# overall info for wt (Weight)
peek_neat(mtcars, 'wt', f_print = TRUE)
#'
# now groupped by cyl (Number of cylinders)
peek_neat(mtcars, 'wt', group_by = 'cyl')

# grouped by cyl and gear
peek_neat(mtcars,
          'wt',
          group_by = c('cyl', 'gear'),
          f_print = TRUE)

# filter to only have cyl larger than  4
peek_neat(mtcars, 'wt', group_by = 'cyl', filt = cyl > 4)

# without plots
peek_neat(mtcars,
          'wt',
          group_by = 'cyl',
          f_plot = "",
          f_print = TRUE)

# with histogramms etc, using plot_neat
peek_neat(mtcars, 'wt', group_by = 'cyl', f_plot = plot_neat)

# with Q-Q plots, via ggpubr
peek_neat(mtcars,
          'wt',
          group_by = 'cyl',
          f_plot = ggpubr::ggqqplot)

# skewness and kurtosis data via psych
## Not run: 
info_df = peek_neat(
    mtcars,
    'wt',
    group_by = 'cyl',
    f_print = psych::describe,
    f_plot = ""
)
info_df # contains all data returns by psych::describe

## End(Not run)

data("mtcars") # load base R example dataset

# overall info for wt (Weight)
peek_neat(mtcars, 'wt', f_print = TRUE)
#'
# now groupped by cyl (Number of cylinders)
peek_neat(mtcars, 'wt', group_by = 'cyl')

# grouped by cyl and gear
peek_neat(mtcars,
          'wt',
          group_by = c('cyl', 'gear'),
          f_print = TRUE)

# filter to only have cyl larger than  4
peek_neat(mtcars, 'wt', group_by = 'cyl', filt = cyl > 4)

# without plots
peek_neat(mtcars,
          'wt',
          group_by = 'cyl',
          f_plot = "",
          f_print = TRUE)

# with histogramms etc, using plot_neat
peek_neat(mtcars, 'wt', group_by = 'cyl', f_plot = plot_neat)

# with Q-Q plots, via ggpubr
peek_neat(mtcars,
          'wt',
          group_by = 'cyl',
          f_plot = ggpubr::ggqqplot)

# skewness and kurtosis data via psych
## Not run: 
info_df = peek_neat(
    mtcars,
    'wt',
    group_by = 'cyl',
    f_print = psych::describe,
    f_plot = ""
)
info_df # contains all data returns by psych::describe

## End(Not run)

Plots of Means and of Dispersion

Description

Primarily for line and bar plots for factorial designs. Otherwise (if no data_per_subject is given) descriptive dispersion plots (histogram, density, or box plots) for a continuous variable. (For the latter, only the parameters values, parts, part_colors, and binwidth are used, the rest are ignored.)

Usage

plot_neat(
  data_per_subject = NULL,
  values = NULL,
  within_ids = NULL,
  between_vars = NULL,
  factor_names = NULL,
  value_names = NULL,
  y_title = NULL,
  reverse = FALSE,
  panels = NULL,
  type = "line",
  dodge = NULL,
  bar_colors = "viridis",
  line_colors = "viridis",
  row_number = 1,
  method = mean,
  eb_method = neatStats::mean_ci,
  numerics = FALSE,
  hush = FALSE,
  parts = c("h", "d", "n", "b"),
  part_colors = NULL,
  binwidth = NULL
)
plot_neat(
  data_per_subject = NULL,
  values = NULL,
  within_ids = NULL,
  between_vars = NULL,
  factor_names = NULL,
  value_names = NULL,
  y_title = NULL,
  reverse = FALSE,
  panels = NULL,
  type = "line",
  dodge = NULL,
  bar_colors = "viridis",
  line_colors = "viridis",
  row_number = 1,
  method = mean,
  eb_method = neatStats::mean_ci,
  numerics = FALSE,
  hush = FALSE,
  parts = c("h", "d", "n", "b"),
  part_colors = NULL,
  binwidth = NULL
)

Arguments

`data_per_subject`	Data frame containing all values (measurements/observations for a factorial design) in a single row per each subject. Otherwise, if no data frame is given (default: `NULL`), histogram, density, or box plots will be returned for a continuous variable (numeric vector).
`values`	For plots of means (factorial designs): vector of strings; column name(s) in the `data_per_subject` data frame. Each column should contain a single dependent variable: thus, to plot repeated (within-subject) measurements, each specified column should contain one measurement. For descriptive dispersion plots (if `data_per_subject` is `NULL`), a numeric vector is expected.
`within_ids`	`NULL` (default), string, or named list. In case of no within-subject factors, leave as `NULL`. In case of a single within subject factor, a single string may be given to optionally provide custom name for the within-subject factor (note: this is a programming variable name, so it should not contain spaces, etc.); otherwise (if left `NULL`) this one within-subject factor will always just be named `"within_factor"`. In case of multiple within-subject factors, each factor must be specified as a named list element, each with a vector of strings that distinguish the levels within that factors. The column names given as `values` should always contain one (and only one) of these strings within each within-subject factor, and thus they will be assigned the appropriate level. For example, `values = 'rt_s1_neg, rt_s1_pos, rt_s2_neg, rt_s2_pos'` could have `within_ids = list( session = c('s1', 's2'), valence = c('pos', 'neg')`. (Note: the strings for distinguishing must be unambiguous. E.g., for values `apple_a` and `apple_b`, do not set levels `c('a','b')`, because `'a'` is also found in `apple_b`. In this case, you could choose levels `c('_a','_b')` to make sure the values are correctly distinguished.) See also Examples.
`between_vars`	`NULL` (default; in case of no between-subject factors) or vector of strings; column name(s) in the `data_per_subject` data frame. Each column should contain a single between-subject independent variable (representing between-subject factors).
`factor_names`	`NULL` or named vector. In a named vector, factor names (either within or between) can be given a different name for display, in a dictionary style, using original factor name as the name of a vector element, and the element's value (as string) for the new name. For example, to change a factor named `"condition"` to `"High vs. low arousal"`, the vector may be given (in this case with a single element) as `factor_names = c(condition = "High vs. low arousal")`.
`value_names`	`NULL` or named vector. Same as `factor_names`, but regarding the factor values. For example, to change values `"high_a"` and `"low_a"` to `"High"` and `"Low"` for display, the vector may be given as `value_names = c(high_a = "High", low_a = "Low")`.
`y_title`	`NULL` (default) or string. Optionally given title for the `y` axis.
`reverse`	Logical (default: `FALSE`). If `TRUE`, reverses the default grouping of variables within the figure, or within each panel, in case of multiple panels. (The default grouping is decided automatically by given factor order, but always starting, when applicable, with within-subject factors: first factor is split to adjacent bars, or vertically aligned dots in case of line plot.)
`panels`	`NULL` or string. Optionally gives the factor name by which the plot is to be split into different panels, in case of three factors. (By default, the third given factor is used.)
`type`	Strong: `"line"` (default) or `"bar"`. The former gives line plot, the latter gives bar plot.
`dodge`	Number. Specifies the amount by which the adjacent bars or dots '`dodge`' each other (i.e., are displaced compared to each other). (Default is `0.1` for `line` plots, and `0.9` for `bar` plots.)
`bar_colors`	Vector of strings, specifying colors from which all colors for any number of differing adjacent bars are interpolated. (If the number of given colors equal the number of different bars, the precise colors will correspond to each bar.) The default `'viridis'` gives a color gradient based on `viridis`. (In case of a single factor, the first given colors is taken.)
`line_colors`	Vector of strings, specifying colors from which all colors for any number of differing vertically aligned dots and corresponding lines are interpolated. The default `'viridis'` gives a color gradient based on `viridis`. (In case of a single factor, the first given colors is taken.)
`row_number`	Number. In case of multiple panels, the number of rows in which the panels should be arranged. For example, with the default `row_number = 1`, all panels will be displayed in one vertically aligned row.
`method`	A function (default: `mean`) for the calculation of the main statistics (bar or dot heights).
`eb_method`	A function (default: `mean_ci` for 95 the calculation of the error bar size (as a single value used for both directions of the error bar). If set to `NULL`, no error bar is displayed.#'
`numerics`	If `FALSE` (default), returns `ggplot` object. If set to `TRUE`, returns only the numeric aggregated data per grouping factors, as specified by `method` and `eb_method` functions. If set to any string (e.g. `"both"`), returns the numeric aggregated data and at the same time `draws` the plot.
`hush`	Logical. If `TRUE`, prevents printing aggregated values.
`parts`	For dispersion plots only (if no `data_per_subject` is given). A vector of characters that specify which types of overlayed types to plot: `"h"` for histogram, `"d"` for density, `"n"` normally distributed density (using the mean and standard deviation of the given variable), `"b"` for boxplot. (All are included by default: `parts = c("h", "d", "n", "b")`).
`part_colors`	For dispersion plots only (if no `data_per_subject` is given). A named that can specify and thereby override default colors and alpha (transparency) of each plot type. Colors can be given by adding "c" to the plot type letter, e.g. `c(hc = "blue")` for blue histogram. Alpha can be given by adding "a" to the plot type letter, e.g. `c(ha = 0)` for completely transparent histogram. Any number may be given: e.g. a dark red transparent histogram with green boxplot would be `part_colors = c(hc = "#cc0000", ha = 0.1, bc = "green")`.
`binwidth`	For dispersion plots only (if no `data_per_subject` is given). Binwidth for histograms. If `NULL` (default), Freedman–Diaconis rule is used if it produces at least 10 bins – otherwise 1bandwidth is calculated for 10 bins.

Value

By default, a ggplot plot object. (This object may be further modified or adjusted via regular ggplot methods.) If so set (numerics), aggregated values as specified by the methods.

Note

More than three factors is not allowed: it would make little sense and it would be difficult to clearly depict in a simple figure. (However, you can build an appropriate graph using ggplot directly; but you can also just divide the data to produce several three-factor plots, after which you can use e.g. ggpubr's ggarrange to easily collate the plots.)

Examples


# assign random data in a data frame for illustration
# (note that the 'subject' is only for illustration; since each row contains the
# data of a single subject, no additional subject id is needed)
dat_1 = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14),
    grouping1 = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),
    grouping2 = c(1, 2, 1, 2, 2, 1,2, 1,2,1, 1, 1, 2, 1),
    value_1_a = c(36.2, 45.2, 41, 24.6, 30.5, 28.2, 40.9, 45.1,
                  31, 16.9, 40.1, 42.1, 41, 12.9),
    value_2_a = c(-14.1, 58.5,-25.5, 42.2,-13, 4.4, 55.5,-28.5,
                  25.6,-37.1, 55.1,-38.5, 28.6,-34.1),
    value_1_b = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85,
                  132, 121, 151, 95),
    value_2_b = c(8.024,-14.162, 3.1,-2.1,-1.5, 0.91, 11.53,
                  18.37, 0.3,-0.59, 12.53, 13.37, 2.3,-3),
    value_1_c = c(27.4, -17.6, -32.7, 0.4, 37.2, 1.7, 18.2, 8.9,
                  1.9, 0.4, 2.7, 14.2, 3.9, 4.9),
    value_2_c = c(7.7,-0.8, 2.2, 14.1, 22.1,-47.7,-4.8, 8.6,
                  6.2, 18.2,-6.8, 5.6, 7.2, 13.2)
)
head(dat_1) # see what we have

# plot for factors 'grouping1', 'grouping2'
plot_neat(
    data_per_subject = dat_1,
    values = 'value_1_a',
    between_vars = c('grouping1', 'grouping2')
)

# same as above, but with bars and renamed factors
plot_neat(
    data_per_subject = dat_1,
    values = 'value_1_a',
    between_vars = c('grouping1', 'grouping2'),
    type = 'bar',
    factor_names = c(grouping1 = 'experimental condition', grouping2 = 'gender')
)

# same, but with different (lighter) gray scale bars
plot_neat(
    dat_1,
    values = 'value_1_a',
    between_vars = c('grouping1', 'grouping2'),
    type = 'bar',
    factor_names = c(grouping1 = 'experimental condition', grouping2 = 'gender'),
    bar_colors = c('#555555', '#BBBBBB')
)

# same, but with red and blue bars
plot_neat(
    dat_1,
    values = 'value_1_a',
    between_vars = c('grouping1', 'grouping2'),
    type = 'bar',
    factor_names = c(grouping1 = 'experimental condition', grouping2 = 'gender'),
    bar_colors = c('red', 'blue') # equals c('#FF0000', '#0000FF')
)

# within-subject factor for 'value_1_a' vs. 'value_1_b' vs. 'value_1_c'
# (automatically named 'within_factor'), between-subject factor 'grouping1'
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2')
)

# same, but panelled by 'within_factor'
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2'),
    panels = 'within_factor'
)

# same, but SE for error bars instead of (default) SD
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2'),
    panels = 'within_factor',
    eb_method = se
)

# same, but 95% CI for error bars instead of SE
# (arguably more meaningful than SEs)
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2'),
    panels = 'within_factor',
    eb_method = mean_ci
)

# same, but using medians and Median Absolute Deviations
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2'),
    panels = 'within_factor',
    method = stats::median,
    eb_method = stats::mad
)

# within-subject factor 'number' for variables with number '1' vs. number '2'
# ('value_1_a' and 'value_1_b' vs. 'value_2_a' and 'value_2_b'), factor 'letter'
# for variables with final letter 'a' vs. final letter 'b' ('value_1_a' and
# 'value_2_a' vs. 'value_1_b' and 'value_2_b')
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    )
)

# same as above, but now including between-subject factor 'grouping2'
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    ),
    between_vars = 'grouping2'
)

# same as above, but renaming factors and values for display
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    ),
    between_vars = 'grouping2',
    factor_names = c(numbers = 'session (first vs. second)'),
    value_names = c(
        '_1' = 'first',
        '_2' = 'second',
        '1' = 'group 1',
        '2' = 'group 2'
    )
)

# In real datasets, these could of course be more meaningful. For example, let's
# say participants rated the attractiveness of pictures with low or high levels
# of frightening and low or high levels of disgusting qualities. So there are
# four types of ratings:
# 'low disgusting, low frightening' pictures
# 'low disgusting, high frightening' pictures
# 'high disgusting, low frightening' pictures
# 'high disgusting, high frightening' pictures

# this could be meaningfully assigned e.g. as below
pic_ratings = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    rating_fright_low_disgust_low = c(36.2, 45.2, 41, 24.6, 30.5, 28.2, 40.9, 45.1, 31, 16.9),
    rating_fright_high_disgust_low = c(-14.1, 58.5,-25.5, 42.2,-13, 4.4, 55.5,-28.5, 25.6,-37.1),
    rating_fright_low_disgust_high = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    rating_fright_high_disgust_high = c(8.024,-14.162, 3.1,-2.1,-1.5, 0.91, 11.53, 18.37, 0.3,-0.59)
)
head(pic_ratings) # see what we have

# the same logic applies as for the examples above, but now the
# within-subject differences can be more meaningfully specified, e.g.
# 'disgust_low' vs. 'disgust_high' for levels of disgustingness, while
# 'fright_low' vs. 'fright_high' for levels of frighteningness
plot_neat(
    pic_ratings,
    values = c(
        'rating_fright_low_disgust_low',
        'rating_fright_high_disgust_low',
        'rating_fright_low_disgust_high',
        'rating_fright_high_disgust_high'
    ),
    within_ids = list(
        disgustingness = c('disgust_low', 'disgust_high'),
        frighteningness =  c('fright_low', 'fright_high')
    )
)

# now let's say the ratings were done in two separate groups
pic_ratings = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    group_id = c(1, 2, 1, 2, 2, 1, 1, 1, 2, 1),
    rating_fright_low_disgust_low = c(36.2, 45.2, 41, 24.6, 30.5, 28.2, 40.9, 45.1, 31, 16.9),
    rating_fright_high_disgust_low = c(-14.1, 58.5,-25.5, 42.2,-13, 4.4, 55.5,-28.5, 25.6,-37.1),
    rating_fright_low_disgust_high = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    rating_fright_high_disgust_high = c(8.024,-14.162, 3.1,-2.1,-1.5, 0.91, 11.53, 18.37, 0.3,-0.59)
)

# now include the 'group_id' factor in the plot
plot_neat(
    pic_ratings,
    values = c(
        'rating_fright_low_disgust_low',
        'rating_fright_high_disgust_low',
        'rating_fright_low_disgust_high',
        'rating_fright_high_disgust_high'
    ),
    within_ids = list(
        disgustingness = c('disgust_low', 'disgust_high'),
        frighteningness =  c('fright_low', 'fright_high')
    ),
    between_vars = 'group_id'
)


## DISPERSION PLOTS

plot_neat(values = rnorm(100))

# with smaller binwidth (hence more bins)
plot_neat(values = rnorm(100), binwidth = 0.2)

# without normal distribution line
plot_neat(values = rnorm(100), parts = c('h', 'd', 'b'))

# without histrogram
plot_neat(values = rnorm(100), parts = c('d', 'n', 'b'))

# blue density, fully opaque histogram
plot_neat(values = rnorm(100),
         part_colors = c(dc = 'blue', ha = 1))

# assign random data in a data frame for illustration
# (note that the 'subject' is only for illustration; since each row contains the
# data of a single subject, no additional subject id is needed)
dat_1 = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14),
    grouping1 = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),
    grouping2 = c(1, 2, 1, 2, 2, 1,2, 1,2,1, 1, 1, 2, 1),
    value_1_a = c(36.2, 45.2, 41, 24.6, 30.5, 28.2, 40.9, 45.1,
                  31, 16.9, 40.1, 42.1, 41, 12.9),
    value_2_a = c(-14.1, 58.5,-25.5, 42.2,-13, 4.4, 55.5,-28.5,
                  25.6,-37.1, 55.1,-38.5, 28.6,-34.1),
    value_1_b = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85,
                  132, 121, 151, 95),
    value_2_b = c(8.024,-14.162, 3.1,-2.1,-1.5, 0.91, 11.53,
                  18.37, 0.3,-0.59, 12.53, 13.37, 2.3,-3),
    value_1_c = c(27.4, -17.6, -32.7, 0.4, 37.2, 1.7, 18.2, 8.9,
                  1.9, 0.4, 2.7, 14.2, 3.9, 4.9),
    value_2_c = c(7.7,-0.8, 2.2, 14.1, 22.1,-47.7,-4.8, 8.6,
                  6.2, 18.2,-6.8, 5.6, 7.2, 13.2)
)
head(dat_1) # see what we have

# plot for factors 'grouping1', 'grouping2'
plot_neat(
    data_per_subject = dat_1,
    values = 'value_1_a',
    between_vars = c('grouping1', 'grouping2')
)

# same as above, but with bars and renamed factors
plot_neat(
    data_per_subject = dat_1,
    values = 'value_1_a',
    between_vars = c('grouping1', 'grouping2'),
    type = 'bar',
    factor_names = c(grouping1 = 'experimental condition', grouping2 = 'gender')
)

# same, but with different (lighter) gray scale bars
plot_neat(
    dat_1,
    values = 'value_1_a',
    between_vars = c('grouping1', 'grouping2'),
    type = 'bar',
    factor_names = c(grouping1 = 'experimental condition', grouping2 = 'gender'),
    bar_colors = c('#555555', '#BBBBBB')
)

# same, but with red and blue bars
plot_neat(
    dat_1,
    values = 'value_1_a',
    between_vars = c('grouping1', 'grouping2'),
    type = 'bar',
    factor_names = c(grouping1 = 'experimental condition', grouping2 = 'gender'),
    bar_colors = c('red', 'blue') # equals c('#FF0000', '#0000FF')
)

# within-subject factor for 'value_1_a' vs. 'value_1_b' vs. 'value_1_c'
# (automatically named 'within_factor'), between-subject factor 'grouping1'
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2')
)

# same, but panelled by 'within_factor'
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2'),
    panels = 'within_factor'
)

# same, but SE for error bars instead of (default) SD
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2'),
    panels = 'within_factor',
    eb_method = se
)

# same, but 95% CI for error bars instead of SE
# (arguably more meaningful than SEs)
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2'),
    panels = 'within_factor',
    eb_method = mean_ci
)

# same, but using medians and Median Absolute Deviations
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2'),
    panels = 'within_factor',
    method = stats::median,
    eb_method = stats::mad
)

# within-subject factor 'number' for variables with number '1' vs. number '2'
# ('value_1_a' and 'value_1_b' vs. 'value_2_a' and 'value_2_b'), factor 'letter'
# for variables with final letter 'a' vs. final letter 'b' ('value_1_a' and
# 'value_2_a' vs. 'value_1_b' and 'value_2_b')
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    )
)

# same as above, but now including between-subject factor 'grouping2'
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    ),
    between_vars = 'grouping2'
)

# same as above, but renaming factors and values for display
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    ),
    between_vars = 'grouping2',
    factor_names = c(numbers = 'session (first vs. second)'),
    value_names = c(
        '_1' = 'first',
        '_2' = 'second',
        '1' = 'group 1',
        '2' = 'group 2'
    )
)

# In real datasets, these could of course be more meaningful. For example, let's
# say participants rated the attractiveness of pictures with low or high levels
# of frightening and low or high levels of disgusting qualities. So there are
# four types of ratings:
# 'low disgusting, low frightening' pictures
# 'low disgusting, high frightening' pictures
# 'high disgusting, low frightening' pictures
# 'high disgusting, high frightening' pictures

# this could be meaningfully assigned e.g. as below
pic_ratings = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    rating_fright_low_disgust_low = c(36.2, 45.2, 41, 24.6, 30.5, 28.2, 40.9, 45.1, 31, 16.9),
    rating_fright_high_disgust_low = c(-14.1, 58.5,-25.5, 42.2,-13, 4.4, 55.5,-28.5, 25.6,-37.1),
    rating_fright_low_disgust_high = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    rating_fright_high_disgust_high = c(8.024,-14.162, 3.1,-2.1,-1.5, 0.91, 11.53, 18.37, 0.3,-0.59)
)
head(pic_ratings) # see what we have

# the same logic applies as for the examples above, but now the
# within-subject differences can be more meaningfully specified, e.g.
# 'disgust_low' vs. 'disgust_high' for levels of disgustingness, while
# 'fright_low' vs. 'fright_high' for levels of frighteningness
plot_neat(
    pic_ratings,
    values = c(
        'rating_fright_low_disgust_low',
        'rating_fright_high_disgust_low',
        'rating_fright_low_disgust_high',
        'rating_fright_high_disgust_high'
    ),
    within_ids = list(
        disgustingness = c('disgust_low', 'disgust_high'),
        frighteningness =  c('fright_low', 'fright_high')
    )
)

# now let's say the ratings were done in two separate groups
pic_ratings = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    group_id = c(1, 2, 1, 2, 2, 1, 1, 1, 2, 1),
    rating_fright_low_disgust_low = c(36.2, 45.2, 41, 24.6, 30.5, 28.2, 40.9, 45.1, 31, 16.9),
    rating_fright_high_disgust_low = c(-14.1, 58.5,-25.5, 42.2,-13, 4.4, 55.5,-28.5, 25.6,-37.1),
    rating_fright_low_disgust_high = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    rating_fright_high_disgust_high = c(8.024,-14.162, 3.1,-2.1,-1.5, 0.91, 11.53, 18.37, 0.3,-0.59)
)

# now include the 'group_id' factor in the plot
plot_neat(
    pic_ratings,
    values = c(
        'rating_fright_low_disgust_low',
        'rating_fright_high_disgust_low',
        'rating_fright_low_disgust_high',
        'rating_fright_high_disgust_high'
    ),
    within_ids = list(
        disgustingness = c('disgust_low', 'disgust_high'),
        frighteningness =  c('fright_low', 'fright_high')
    ),
    between_vars = 'group_id'
)


## DISPERSION PLOTS

plot_neat(values = rnorm(100))

# with smaller binwidth (hence more bins)
plot_neat(values = rnorm(100), binwidth = 0.2)

# without normal distribution line
plot_neat(values = rnorm(100), parts = c('h', 'd', 'b'))

# without histrogram
plot_neat(values = rnorm(100), parts = c('d', 'n', 'b'))

# blue density, fully opaque histogram
plot_neat(values = rnorm(100),
         part_colors = c(dc = 'blue', ha = 1))

Difference of Two Proportions

Description

Comparison of paired and unpaired proportions. For unpaired: Pearson's chi-squared test or unconditional exact test, including confidence interval (CI) for the proportion difference, and corresponding independent multinomial contingency table Bayes factor (BF). (Cohen's h and its CI are also calculated.) For paired tests, classical (asymptotic) McNemar test (optionally with mid-P as well), including confidence interval (CI) for the proportion difference.

Usage

props_neat(
  var1 = NULL,
  var2 = NULL,
  case1 = NULL,
  case2 = NULL,
  control1 = NULL,
  control2 = NULL,
  prop1 = NULL,
  prop2 = NULL,
  n1 = NULL,
  n2 = NULL,
  pair = FALSE,
  greater = NULL,
  ci = NULL,
  bf_added = FALSE,
  round_to = 3,
  exact = FALSE,
  inverse = FALSE,
  yates = FALSE,
  midp = FALSE,
  h_added = FALSE,
  for_table = FALSE,
  hush = FALSE
)
props_neat(
  var1 = NULL,
  var2 = NULL,
  case1 = NULL,
  case2 = NULL,
  control1 = NULL,
  control2 = NULL,
  prop1 = NULL,
  prop2 = NULL,
  n1 = NULL,
  n2 = NULL,
  pair = FALSE,
  greater = NULL,
  ci = NULL,
  bf_added = FALSE,
  round_to = 3,
  exact = FALSE,
  inverse = FALSE,
  yates = FALSE,
  midp = FALSE,
  h_added = FALSE,
  for_table = FALSE,
  hush = FALSE
)

Arguments

`var1`	First variable containing classifications, in 'group 1', for the first proportion (see Examples). If given (strictly necessary for paired proportions), proportions will be defined using `var1` and `var2` (see Details). To distinguish classification ('cases' and 'controls'; e.g. positive outcomes vs. negative outcomes), any two specific characters (or numbers) can be used. However, more than two different elements (apart from `NA`s) will cause error.
`var2`	Second variable containing classifications in, 'group 2', for the second proportion, analogously to `var1`.
`case1`	Number of 'cases' (as opposed to 'controls'; e.g. positive outcomes vs. negative outcomes) in 'group 1'. As counterpart, either control numbers or sample sizes needs to be given (see Details).
`case2`	Number of 'cases' in 'group 2'.
`control1`	Number of 'controls' in 'group 1'. As counterpart, case numbers need to be given (see Details).
`control2`	Number of 'controls' in 'group 2'.
`prop1`	Proportion in 'group 1'. As counterpart, sample sizes need to be given (see Details).
`prop2`	Proportion in 'group 2'.
`n1`	Number; sample size of 'group 1'.
`n2`	Number; sample size of 'group 2'.
`pair`	Logical. Set `TRUE` for paired proportions (McNemar, mid-P), or `FALSE` (default) for unpaired (chi squared, or unconditional exact test). Note: paired data must be given in `var1` and `var2`.
`greater`	`NULL` or string (or number); optionally specifies one-sided exact test: either "1" (`case1/n1` proportion expected to be greater than `case2/n2` proportion) or "2" (`case2/n2` proportion expected to be greater than `case1/n1` proportion). If `NULL` (default), the test is two-sided.
`ci`	Numeric; confidence level for the returned CIs (proportion difference and Cohen's h).
`bf_added`	Logical. If `TRUE`, Bayes factor is calculated and displayed. (Always two-sided!)
`round_to`	Number `to round` to the proportion statistics (difference and CIs).
`exact`	Logical, `FALSE` by default. If `TRUE`, `unconditional exact test` is calculated and displayed, otherwise the default Pearson's `chi-squared test`.
`inverse`	Logical, `FALSE` by default. When `var1` and `var2` are given to calculate proportion from, by default the factors' frequency determines which are 'cases' and which are 'controls' (so that the latter are more frequent). If the `inverse` argument is `TRUE`, it reverses the default proportion direction.
`yates`	Logical, `FALSE` by default. If `TRUE`, Yates' continuity correction is applied to the chi-squared (unpaired) or the McNemar (paired) test. Some authors advise this correction for certain specific cases (e.g., small sample), but evidence does not seem to support this (Pembury Smith & Ruxton, 2020).
`midp`	Logical, `FALSE` by default. If `TRUE`, displays an additional 'mid-P' p value (using the formula by Pembury Smith & Ruxton, 2020) for McNemar's test (Fagerland et al., 2013). This provides better control for Type I error (less false positive findings) than the classical McNemar test, while it is also probably not much less robust (Pembury Smith & Ruxton, 2020).
`h_added`	Logical. If `TRUE`, Cohen's h and its CI are calculated and displayed. (`FALSE` by default.)
`for_table`	Logical. If `TRUE`, omits the confidence level display from the printed text.
`hush`	Logical. If `TRUE`, prevents printing any details to console.

Details

The proportion for the two groups can be given using any of the following combinations (a) two vectors (var1 and var2), (b) cases and controls, (c) cases and sample sizes, or (d) proportions and sample sizes. Whenever multiple combinations are specified, only the first parameters (as given in the function and in the previous sentence) will be taken into account.

The Bayes factor (BF), in case of unpaired samples, is always calculated with the default r-scale of 0.707. BF supporting null hypothesis is denoted as BF01, while that supporting alternative hypothesis is denoted as BF10. When the BF is smaller than 1 (i.e., supports null hypothesis), the reciprocal is calculated (hence, BF10 = BF, but BF01 = 1/BF). When the BF is greater than or equal to 10000, scientific (exponential) form is reported for readability. (The original full BF number is available in the returned named vector as bf.)

Value

Prints exact test statistics (including proportion difference with CI, and BF) in APA style. Furthermore, when assigned, returns a named vector with the following elements: z (Z), p (p value), prop_diff (raw proportion difference), h (Cohen's h), bf (Bayes factor).

Note

Barnard's unconditional exact test is calculated via Exact::exact.test ("z-pooled").

The CI for the proportion difference in case of the exact test is calculated based on the p value, as described by Altman and Bland (2011). In case of extremely large or extremely small p values, this can be biased and misleading.

The Bayes factor is calculated via BayesFactor::contingencyTableBF, with sampleType = "indepMulti", as appropriate when both sample sizes (n1 and n2) are known in advance (as it normally happens). (For details, see contingencyTableBF, or e.g. 'Chapter 17 Bayesian statistics' in Navarro, 2019.)

References

Altman, D. G., & Bland, J. M. (2011). How to obtain the confidence interval from a P value. Bmj, 343(d2090). doi:10.1136/bmj.d2090

Barnard, G. A. (1947). Significance tests for 2x2 tables. Biometrika, 34(1/2), 123-138. doi:10.1093/biomet/34.1-2.123

Fagerland, M. W., Lydersen, S., & Laake, P. (2013). The McNemar test for binary matched-pairs data: Mid-p and asymptotic are better than exact conditional. BMC Medical Research Methodology, 13(1), 91. doi:10.1186/1471-2288-13-91

Lydersen, S., Fagerland, M. W., & Laake, P. (2009). Recommended tests for association in 2x2 tables. Statistics in medicine, 28(7), 1159-1175. doi:10.1002/sim.3531

Navarro, D. (2019). Learning statistics with R. https://learningstatisticswithr.com/

Pembury Smith, M. Q. R., & Ruxton, G. D. (2020). Effective use of the McNemar test. Behavioral Ecology and Sociobiology, 74(11), 133. doi:10.1007/s00265-020-02916-y

Suissa, S., & Shuster, J. J. (1985). Exact unconditional sample sizes for the 2 times 2 binomial trial. Journal of the Royal Statistical Society: Series A (General), 148(4), 317-327. doi:10.2307/2981892

Examples

# example data
set.seed(1)
outcomes_A = sample(c(rep('x', 490), rep('y', 10)))
outcomes_B = sample(c(rep('x', 400), rep('y', 100)))

# paired proportion test (McNemar)
props_neat(var1 = outcomes_A,
           var2 = outcomes_B,
           pair = TRUE)

# unpaired chi test for the same data (two independent samples assumed)
# Yates correction applied
# cf. http://www.sthda.com/english/wiki/two-proportions-z-test-in-r
props_neat(
    var1 = outcomes_A,
    var2 = outcomes_B,
    pair = FALSE,
    yates = TRUE
)

# above data given differently for unpaired test
# (no Yates corrrection)
props_neat(
    case1 = 490,
    case2 = 400,
    control1 = 10,
    control2 = 100
)

# again differently
props_neat(
    case1 = 490,
    case2 = 400,
    n1 = 500,
    n2 = 500
)

# other example data
outcomes_A2 = c(rep(1, 707), rep(0, 212),  rep(1, 256), rep(0, 144))
outcomes_B2 = c(rep(1, 707), rep(0, 212),  rep(0, 256), rep(1, 144))

# paired test
# cf. https://www.medcalc.org/manual/mcnemartest2.php
props_neat(var1 = outcomes_A2,
           var2 = outcomes_B2,
           pair = TRUE)

# show reverse proportions (otherwise the same)
props_neat(
    var1 = outcomes_A2,
    var2 = outcomes_B2,
    pair = TRUE,
    inverse = TRUE
)


# two different sample sizes
out_chi = props_neat(
    case1 = 40,
    case2 = 70,
    n1 = 150,
    n2 = 170
)

# exact test
out_exact = props_neat(
    case1 = 40,
    case2 = 70,
    n1 = 150,
    n2 = 170,
    exact = TRUE
)

# the two p values are just tiny bit different
print(out_chi) # p 0.00638942
print(out_exact) # p 0.006481884

# one-sided test
props_neat(
    case1 = 40,
    case2 = 70,
    n1 = 150,
    n2 = 170,
    greater = '2'
)

# example data
set.seed(1)
outcomes_A = sample(c(rep('x', 490), rep('y', 10)))
outcomes_B = sample(c(rep('x', 400), rep('y', 100)))

# paired proportion test (McNemar)
props_neat(var1 = outcomes_A,
           var2 = outcomes_B,
           pair = TRUE)

# unpaired chi test for the same data (two independent samples assumed)
# Yates correction applied
# cf. http://www.sthda.com/english/wiki/two-proportions-z-test-in-r
props_neat(
    var1 = outcomes_A,
    var2 = outcomes_B,
    pair = FALSE,
    yates = TRUE
)

# above data given differently for unpaired test
# (no Yates corrrection)
props_neat(
    case1 = 490,
    case2 = 400,
    control1 = 10,
    control2 = 100
)

# again differently
props_neat(
    case1 = 490,
    case2 = 400,
    n1 = 500,
    n2 = 500
)

# other example data
outcomes_A2 = c(rep(1, 707), rep(0, 212),  rep(1, 256), rep(0, 144))
outcomes_B2 = c(rep(1, 707), rep(0, 212),  rep(0, 256), rep(1, 144))

# paired test
# cf. https://www.medcalc.org/manual/mcnemartest2.php
props_neat(var1 = outcomes_A2,
           var2 = outcomes_B2,
           pair = TRUE)

# show reverse proportions (otherwise the same)
props_neat(
    var1 = outcomes_A2,
    var2 = outcomes_B2,
    pair = TRUE,
    inverse = TRUE
)


# two different sample sizes
out_chi = props_neat(
    case1 = 40,
    case2 = 70,
    n1 = 150,
    n2 = 170
)

# exact test
out_exact = props_neat(
    case1 = 40,
    case2 = 70,
    n1 = 150,
    n2 = 170,
    exact = TRUE
)

# the two p values are just tiny bit different
print(out_chi) # p 0.00638942
print(out_exact) # p 0.006481884

# one-sided test
props_neat(
    case1 = 40,
    case2 = 70,
    n1 = 150,
    n2 = 170,
    greater = '2'
)

Merge by Columns in Loops

Description

Merges rows by columns in a loop using the enum function. On first iteration, indicated by enum, initiates a new data.table with the data to merge as first row. On all following iterations, adds data to merge as subsequent rows (using data.table::rbindlist).

Usage

rbind_loop(merged, ..., hush = FALSE)
rbind_loop(merged, ..., hush = FALSE)

Arguments

`merged`	The name of the `data.table` for the merged data (without quotes).
`...`	Any number of data to be merged. Each argument must be one of the following: a `data.frame` (or `data.table`) with either single row or two column; a named vector or a named list (with single elements); or a single value with parameter name (e.g. date = 1989 or id = "jdoe"). Data with two columns will be transposed using first column as column names and second column as corresponding values. See Details, Examples.
`hush`	Logical. If `TRUE` (default), prints message when the data frame for merging is initiated.

Details

In each call, all data passed to the function (via ...) will be merged into a single row, and that single row will be added to the "merged" data table.

See an extensive example via https://github.com/gasparl/neatstats.

Examples


my_vector = c('aa', 'bb', 'cxyz', 'last')
for (elem in enum(my_vector)) {
    cat(elem, fill = TRUE)
    rbind_loop(
        merged_data, # data frame name for merging
        item = elem[2],
        number = elem[1],
        whatever = paste0('number (', elem[1], ')')
    )
}
# merged_data now contains all merged rows
print(merged_data)
# item number   whatever
# 1   aa      1 number (1)
# 2   bb      2 number (2)
# 3 cxyz      3 number (3)
# 4 last      4 number (4)


# example with other data types
for (elem in enum(my_vector)) {
    cat(elem, fill = TRUE)
    dframe1 = data.frame(item = elem[2],
                         number = elem[1])
    print(elem[1])
    asnum = as.numeric(elem[1])
    dframe2 = data.frame(
        my_cols = c('index', 'squared', 'multiple'),
        my_vals = c(elem[1], asnum ** 2, asnum * 10)
    )
    my_list = list(ls_item = elem[2], ls_num = elem[1])
    my_vec = c(v_item = elem[2], v_num = elem[1])
    rbind_loop(
        merged_data,
        dframe1, # data frame with single row
        dframe2, # data frame with two columns
        my_list, # named list
        my_vec, # named vector
        single_val = elem[2], # single element
        constant = "whatever" # other single element
    )
}

# again merged_data contains all merged rows
# (previous content, if any, were removed)
print(merged_data)


# example with differring columns
for (elem in enum(my_vector)) {
    cat(elem, fill = TRUE)
    dframe = data.frame(item = elem[2],
                        number = elem[1])
    asnum = as.numeric(elem[1])
    if (asnum %% 2 == 0) {
        dframe$sqr = asnum ** 2
    }
    rbind_loop(merged_data,
               dframe)
}

# merged_data contains all new merged rows
# with NAs where sqr was not added
print(merged_data)

# example with data.table added
library('data.table')
for (elem in enum(my_vector)) {
    cat(elem, fill = TRUE)
    dframe = data.frame(item = elem[2],
                        number = elem[1])
    asnum = as.numeric(elem[1])
    dtable = data.table(item2 = paste('DT', elem[2]),
                        number2 = asnum + 9)
    if (asnum %% 2 == 0) {
        dframe$sqr = asnum ** 2
    }
    rbind_loop(merged_data,
               dframe,
               dtable)
}

print(merged_data)

# an extensive example to show how to collect and aggregate raw data is
# available via the README file at the repository:
# https://github.com/gasparl/neatstats

my_vector = c('aa', 'bb', 'cxyz', 'last')
for (elem in enum(my_vector)) {
    cat(elem, fill = TRUE)
    rbind_loop(
        merged_data, # data frame name for merging
        item = elem[2],
        number = elem[1],
        whatever = paste0('number (', elem[1], ')')
    )
}
# merged_data now contains all merged rows
print(merged_data)
# item number   whatever
# 1   aa      1 number (1)
# 2   bb      2 number (2)
# 3 cxyz      3 number (3)
# 4 last      4 number (4)


# example with other data types
for (elem in enum(my_vector)) {
    cat(elem, fill = TRUE)
    dframe1 = data.frame(item = elem[2],
                         number = elem[1])
    print(elem[1])
    asnum = as.numeric(elem[1])
    dframe2 = data.frame(
        my_cols = c('index', 'squared', 'multiple'),
        my_vals = c(elem[1], asnum ** 2, asnum * 10)
    )
    my_list = list(ls_item = elem[2], ls_num = elem[1])
    my_vec = c(v_item = elem[2], v_num = elem[1])
    rbind_loop(
        merged_data,
        dframe1, # data frame with single row
        dframe2, # data frame with two columns
        my_list, # named list
        my_vec, # named vector
        single_val = elem[2], # single element
        constant = "whatever" # other single element
    )
}

# again merged_data contains all merged rows
# (previous content, if any, were removed)
print(merged_data)


# example with differring columns
for (elem in enum(my_vector)) {
    cat(elem, fill = TRUE)
    dframe = data.frame(item = elem[2],
                        number = elem[1])
    asnum = as.numeric(elem[1])
    if (asnum %% 2 == 0) {
        dframe$sqr = asnum ** 2
    }
    rbind_loop(merged_data,
               dframe)
}

# merged_data contains all new merged rows
# with NAs where sqr was not added
print(merged_data)

# example with data.table added
library('data.table')
for (elem in enum(my_vector)) {
    cat(elem, fill = TRUE)
    dframe = data.frame(item = elem[2],
                        number = elem[1])
    asnum = as.numeric(elem[1])
    dtable = data.table(item2 = paste('DT', elem[2]),
                        number2 = asnum + 9)
    if (asnum %% 2 == 0) {
        dframe$sqr = asnum ** 2
    }
    rbind_loop(merged_data,
               dframe,
               dtable)
}

print(merged_data)

# an extensive example to show how to collect and aggregate raw data is
# available via the README file at the repository:
# https://github.com/gasparl/neatstats

Read and Merge Files from Directory

Description

Reads data files from any given directory as data frames and merges them into a single data frame (using data.table::rbindlist).

Usage

read_dir(
  pattern = "*[.]",
  path = ".",
  reader_function = data.table::fread,
  ...,
  subdirs = FALSE,
  filt = NULL,
  hush = FALSE
)
read_dir(
  pattern = "*[.]",
  path = ".",
  reader_function = data.table::fread,
  ...,
  subdirs = FALSE,
  filt = NULL,
  hush = FALSE
)

Arguments

`pattern`	Regular expression ("regex"; as string or `NULL`) for selecting files (passed to the `list.files` function). The default `NULL` means that all files at the specified path will be read in. To select, for example, a specific extension like ".txt", the pattern can be given as `"\.txt$"` (for CSV files, `"\.csv$"`, etc.). Files ending with e.g. "group2.txt" can be specified as `"group2\.txt$"`. Files starting with "exp3" can be specified as `"^exp3"`. Files starting with "exp3" AND ending with ".txt" extension can be specified as `"^exp3.*\.txt$"`. To read in a single file, specify the full filename (e.g. `"exp3_subject46_group2.txt"`). (See `?regex` for more details.)
`path`	Path to the directory from which the files should be selected and read. The default `"."` means the current working directory (as returned by `getwd()`). Either specify correct working directory in advance (see `setwd`, `path_neat`), or otherwise enter relative or full paths (e.g. `"C:/research"` or `"/home/projects"`, etc.).
`reader_function`	A function to be used for reading the files, `data.table::fread` by default.
`...`	Any arguments to be passed on to the chosen `reader_function`.
`subdirs`	Logical (`FALSE` by default). If `TRUE`, searches files in subdirectories as well (relative to the given `path`).
`filt`	An expression to filter, by column values, each data file after it is read and before it is merged with the other data. (The expression should use column names alone; see Examples.)
`hush`	Logical. If `FALSE` (default), prints lists all data file names as they are being read (along with related warnings).

Note

This function is very similar to the readbulk::read_bulk function. One important difference however is the data.table use, which greatly speeds up the process. Another important difference is the possibility of file selection based on any regex pattern. Furthermore, this function allows pre-filtering by file (see filt). Data files could include significant amount of unnecessary data, and filtering prevents these to be merged.

Examples



# first, set current working directory
# e.g. to script's path with setwd(path_neat())

# read all text files in currect working directory
merged_df = read_dir("\\.txt$")
# merged_df now has all data

# to use utils::read.table for reading (slower than fread)
# (with some advisable options passed to it)
merged_df = read_dir(
    '\\.txt$',
    reader_function = read.table,
    header = TRUE,
    fill = TRUE,
    quote = "\"",
    stringsAsFactors = FALSE
)


# first, set current working directory
# e.g. to script's path with setwd(path_neat())

# read all text files in currect working directory
merged_df = read_dir("\\.txt$")
# merged_df now has all data

# to use utils::read.table for reading (slower than fread)
# (with some advisable options passed to it)
merged_df = read_dir(
    '\\.txt$',
    reader_function = read.table,
    header = TRUE,
    fill = TRUE,
    quote = "\"",
    stringsAsFactors = FALSE
)

Neat rounding

Description

Rounds a given number to given number of digits after the decimal point, returning it as string, with trailing zeros when applicable.

Usage

ro(num, round_to = 2, leading_zero = TRUE, signi = FALSE)
ro(num, round_to = 2, leading_zero = TRUE, signi = FALSE)

Arguments

`num`	Number to be rounded.
`round_to`	Number of fractional digits (i.e., digits after the decimal point), to round to.
`leading_zero`	Logical, `TRUE` by default. If `FALSE`, omits leading zero (e.g. returns ".17" instead of "0.17").
`signi`	Logical, `FALSE` by default. If `TRUE`, rounds to a fractional digit that allows at least the first `N` non-zero digits displayed in all numbers, where `N` is specified by the `round_to` parameter. (See `base:formatC`)

Value

Number as string: num rounded to round_to digits, with trailing zeros when applicable.

Examples

ro( 1.2345 ) # returns "1.23"

ro( 0.12345, 1 ) # returns "0.1"

ro( 12.3, 4 ) # returns "12.3000"

# examples with vectors
to_round = c(1000, 100, 0.1, 0.01, 0.001, 0.0001)
ro(to_round)
ro(to_round, 3)
ro(to_round, 3, leading_zero = FALSE)
ro(to_round, 3, signi = TRUE)
to_round2 = c(1230.000, 100, 0.012, 0.01, 0.123, 0.012340)
ro(to_round2, 3)
ro(to_round2, 3, signi = TRUE)
ro(to_round2, 2, signi = TRUE)
ro(to_round2, 1, signi = TRUE)
ro(to_round2, 9, signi = TRUE)


ro( 1.2345 ) # returns "1.23"

ro( 0.12345, 1 ) # returns "0.1"

ro( 12.3, 4 ) # returns "12.3000"

# examples with vectors
to_round = c(1000, 100, 0.1, 0.01, 0.001, 0.0001)
ro(to_round)
ro(to_round, 3)
ro(to_round, 3, leading_zero = FALSE)
ro(to_round, 3, signi = TRUE)
to_round2 = c(1230.000, 100, 0.012, 0.01, 0.123, 0.012340)
ro(to_round2, 3)
ro(to_round2, 3, signi = TRUE)
ro(to_round2, 2, signi = TRUE)
ro(to_round2, 1, signi = TRUE)
ro(to_round2, 9, signi = TRUE)

Difference of Two Areas Under the Curves

Description

Comparison of two areas under the receiver operating characteristic curves (AUCs) and plotting any number of ROC curves.

Usage

roc_neat(
  roc1,
  roc2 = NULL,
  pair = FALSE,
  greater = NULL,
  ci = NULL,
  hush = FALSE,
  plot_rocs = FALSE,
  roc_labels = "",
  cutoff_auto = TRUE,
  cutoff_custom = NULL
)
roc_neat(
  roc1,
  roc2 = NULL,
  pair = FALSE,
  greater = NULL,
  ci = NULL,
  hush = FALSE,
  plot_rocs = FALSE,
  roc_labels = "",
  cutoff_auto = TRUE,
  cutoff_custom = NULL
)

Arguments

`roc1`	Receiver operating characteristic (ROC) `object`, or, for plotting only, a `list` including any number of such ROC objects.
`roc2`	Receiver operating characteristic (ROC) `object`, or, for plotting only, leave it as `NULL` (default) and provide list for the first parameter (`roc1`).
`pair`	Logical. If `TRUE`, the test is conducted for paired samples. Otherwise (default) for independent samples.
`greater`	`NULL` or string (or number); optionally specifies one-sided test: either "1" (`roc1` AUC expected to be greater than `roc2` AUC) or "2" (`roc2` AUC expected to be greater than `roc2` AUC). If `NULL` (default), the test is two-sided.
`ci`	Numeric; confidence level for the returned CIs (raw difference).
`hush`	Logical. If `TRUE`, prevents printing any details to console (and plotting).
`plot_rocs`	Logical. If `TRUE`, plots and returns ROC curves.
`roc_labels`	Optional character vector to provide legend label texts (in the order of the provided ROC objects) for the ROC plot.
`cutoff_auto`	Logical. If `TRUE` (default), optimal cutoffs on the ROC plots are displayed.
`cutoff_custom`	Custom cutoff to be indicated on the plot can be given here in a `list`. The list index must exactly correspond to the index of the list index of the AUC (given in `roc1`) for which the given cutoff is intended.

Value

Prints DeLong's test results for the comparison of the two given AUCs in APA style, as well as corresponding CI for the AUC difference. Furthermore, when assigned, returns a list with stat (D value), p (p value), and, when plot is added, ROC plot.

Note

The main test statistics are calculated via pROC::roc.test as DeLong's test (for both paired and unpaired). The roc_neat function merely prints it in APA style. The CI is calculated based on the p value, as described by Altman and Bland (2011).

The ROC object may be calculated via t_neat, or directly with pROC::roc.

References

Altman, D. G., & Bland, J. M. (2011). How to obtain the confidence interval from a P value. Bmj, 343(d2090). doi:10.1136/bmj.d2090

DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44(3), 837-845. doi:10.2307/2531595

Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J. C., & Muller, M. (2011). pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC bioinformatics, 12(1), 77. doi:10.1186/1471-2105-12-77

Examples


# calculate first AUC (from v1 and v2)
v1 = c(191, 115, 129, 43, 523,-4, 34, 28, 33,-1, 54)
v2 = c(4,-2, 23, 13, 32, 16, 3, 29, 37,-4, 65)
results1 = t_neat(v1, v2, auc_added = TRUE)

# calculate second AUC (from v3 and v4)
v3 = c(14.1, 58.5, 25.5, 42.2, 13, 4.4, 55.5, 28.5, 25.6, 37.1)
v4 = c(36.2, 45.2, 41, 24.6, 30.5, 28.2, 40.9, 45.1, 31, 16.9)
results2 = t_neat(v3, v4, auc_added = TRUE)

# one-sided comparison of the two AUCs
roc_neat(results1$roc_obj, results2$roc_obj, greater = "1")


# create a list of randomlz generated AUCs
set.seed(1)
aucs_list = list()
for (i in 1:4) {
    aucs_list[[i]] = t_neat(rnorm(50, (i-1)),
                            rnorm(50),
                            auc_added = TRUE,
                            hush = TRUE)$roc_obj
}
# depict AUCs (recognized as list)
roc_neat(aucs_list)


# with custom cutoffs depicted
roc_neat(aucs_list,
         cutoff_custom = list(0.2),
         cutoff_auto = FALSE)
roc_neat(aucs_list,
         cutoff_custom = list(.1, c(-.5, 0), NULL, c(.7, 1.6)),
         cutoff_auto = FALSE)
roc_neat(aucs_list,
         cutoff_custom = list(.6, NULL, NULL, 1.1))


# calculate first AUC (from v1 and v2)
v1 = c(191, 115, 129, 43, 523,-4, 34, 28, 33,-1, 54)
v2 = c(4,-2, 23, 13, 32, 16, 3, 29, 37,-4, 65)
results1 = t_neat(v1, v2, auc_added = TRUE)

# calculate second AUC (from v3 and v4)
v3 = c(14.1, 58.5, 25.5, 42.2, 13, 4.4, 55.5, 28.5, 25.6, 37.1)
v4 = c(36.2, 45.2, 41, 24.6, 30.5, 28.2, 40.9, 45.1, 31, 16.9)
results2 = t_neat(v3, v4, auc_added = TRUE)

# one-sided comparison of the two AUCs
roc_neat(results1$roc_obj, results2$roc_obj, greater = "1")


# create a list of randomlz generated AUCs
set.seed(1)
aucs_list = list()
for (i in 1:4) {
    aucs_list[[i]] = t_neat(rnorm(50, (i-1)),
                            rnorm(50),
                            auc_added = TRUE,
                            hush = TRUE)$roc_obj
}
# depict AUCs (recognized as list)
roc_neat(aucs_list)


# with custom cutoffs depicted
roc_neat(aucs_list,
         cutoff_custom = list(0.2),
         cutoff_auto = FALSE)
roc_neat(aucs_list,
         cutoff_custom = list(.1, c(-.5, 0), NULL, c(.7, 1.6)),
         cutoff_auto = FALSE)
roc_neat(aucs_list,
         cutoff_custom = list(.6, NULL, NULL, 1.1))

Confidence Interval of Standard Deviation

Description

Calculates the SD confidence interval of a vector of numbers.

Usage

sd_ci(x, ci = 0.95)
sd_ci(x, ci = 0.95)

Arguments

`x`	Numeric vector.
`ci`	Numeric; confidence level for returned CI.

Value

SD confidence interval (as named vector).

Examples

myvec = c(11, 15, 19, 43, 53, -4, 34, 8, 33, -1, 54 )
sd_ci( myvec )
sd_ci( myvec, ci = .80 )

myvec = c(11, 15, 19, 43, 53, -4, 34, 8, 33, -1, 54 )
sd_ci( myvec )
sd_ci( myvec, ci = .80 )

Standard Error of Mean

Description

Simply calculates the standard error of a vector of numbers.

Usage

se(x)
se(x)

Arguments

`x`	Numeric vector.

Value

Standard error.

Examples

se( c(11, 15, 19, 43, 53, -4, 34, 8, 33, -1, 54 ) )

se( c(11, 15, 19, 43, 53, -4, 34, 8, 33, -1, 54 ) )

Difference of Two Means and Area Under the Curve

Description

Welch's t-test results including Cohen's d with confidence interval (CI), Bayes factor (BF), and area under the receiver operating characteristic curve (AUC). For non-parametric version, Wilcoxon test results (Mann–Whitney U test, aka "Wilcoxon rank-sum test", for independent samples; Wilcoxon signed-rank test for paired samples; including nonparametric "location difference estimate" (see stats::wilcox.test); along with corresponding rank-based BFs as per van Doorn et al., 2020).

Usage

t_neat(
  var1,
  var2,
  pair = FALSE,
  nonparametric = FALSE,
  greater = NULL,
  norm_tests = "latent",
  norm_plots = FALSE,
  ci = NULL,
  bf_added = FALSE,
  bf_rscale = sqrt(0.5),
  bf_sample = 1000,
  auc_added = FALSE,
  cutoff = NULL,
  r_added = TRUE,
  for_table = FALSE,
  test_title = NULL,
  round_descr = 2,
  round_auc = 3,
  auc_greater = "1",
  cv_rep = FALSE,
  cv_fold = 10,
  hush = FALSE,
  plots = FALSE,
  rug_size = 4,
  aspect_ratio = 1,
  y_label = "density estimate",
  x_label = "\nvalues",
  factor_name = NULL,
  var_names = c("1", "2"),
  reverse = FALSE
)
t_neat(
  var1,
  var2,
  pair = FALSE,
  nonparametric = FALSE,
  greater = NULL,
  norm_tests = "latent",
  norm_plots = FALSE,
  ci = NULL,
  bf_added = FALSE,
  bf_rscale = sqrt(0.5),
  bf_sample = 1000,
  auc_added = FALSE,
  cutoff = NULL,
  r_added = TRUE,
  for_table = FALSE,
  test_title = NULL,
  round_descr = 2,
  round_auc = 3,
  auc_greater = "1",
  cv_rep = FALSE,
  cv_fold = 10,
  hush = FALSE,
  plots = FALSE,
  rug_size = 4,
  aspect_ratio = 1,
  y_label = "density estimate",
  x_label = "\nvalues",
  factor_name = NULL,
  var_names = c("1", "2"),
  reverse = FALSE
)

Arguments

`var1`	Numeric vector; numbers of the first variable.
`var2`	Numeric vector; numbers of the second variable.
`pair`	Logical. If `TRUE`, all tests (t, BF, AUC) are conducted for paired samples. If `FALSE` (default) for independent samples.
`nonparametric`	Logical (`FALSE` by default). If `TRUE`, uses nonparametric (rank-based, "Wilcoxon") t-tests (including BFs; see Notes).
`greater`	`NULL` or string (or number); optionally specifies one-sided tests (t and BF): either "1" (`var1` mean expected to be greater than `var2` mean) or "2" (`var2` mean expected to be greater than `var1` mean). If `NULL` (default), the test is two-sided.
`norm_tests`	Normality tests. Any or all of the following character input is accepted (as a single string or a character vector; case-insensitive): `"W"` (Shapiro-Wilk), `"K2"` (D'Agostino), `"A2"` (Anderson-Darling), `"JB"` (Jarque-Bera); see Notes. Two other options are `"all"` (same as `TRUE`; to choose all four previous tests at the same time) or `"latent"` (default value; prints all tests only if `nonparametric` is set to `FALSE` and any of the four tests gives a p value below .05). Each normality test is performed for the difference values between the two variables in case of paired samples, or for each of the two variables for unpaired samples. Set to `"none"` to disable (i.e., not to perform any normality tests).
`norm_plots`	If `TRUE`, displays density, histogram, and Q-Q plots (and scatter plots for paired tests) for each of the two variable (and differences for pairwise observations, in case of paired samples).
`ci`	Numeric; confidence level for returned CIs for Cohen's d and AUC.
`bf_added`	Logical. If `TRUE` (default), Bayes factor is calculated and displayed.
`bf_rscale`	The scale of the prior distribution (`0.707` by default).
`bf_sample`	Number of samples used to estimate Bayes factor (`1000` by default). More samples (e.g. `10000`) take longer time but give more stable BF.
`auc_added`	Logical (`FALSE` by default). If `TRUE`, AUC is calculated and displayed. Includes TPR and TNR, i.e., true positive and true negative rates, i.e. sensitivity and specificity, using an optimal value, i.e. threshold, that provides maximal TPR and TNR. These values may be cross-validated: see `cv_rep`. (Note that what is designated as "positive" or "negative" depends on the scenario: this function always assumes `var1` as positive and `var2` as negative. If your scenario or preference differs, you can simply switch the names or values when reporting the results.)
`cutoff`	Numeric. Custom cutoff value for AUC TPR and TNR, also to be depicted in the plot. In case of multiple given, the first is used for calculations, but all will be depicted in the plot.
`r_added`	Logical. If `TRUE` (default), Pearson correlation is calculated and displayed in case of paired comparison.
`for_table`	Logical. If `TRUE`, omits the confidence level display from the printed text.
`test_title`	`NULL` or string. If not `NULL`, simply displayed in printing preceding the statistics. (Useful e.g. to distinguish several different comparisons inside a `function` or a `for` loop.)
`round_descr`	Number `to round` to the descriptive statistics (means and SDs).
`round_auc`	Number `to round` to the AUC and its CI.
`auc_greater`	String (or number); specifies which variable is expected to have greater values for 'cases' as opposed to 'controls': "1" (default; `var1` expected to be greater for 'cases' than `var2` mean) or "2" (`var2` expected to be greater for 'cases' than `var1`). Not to be confused with one-sided tests; see Details.
`cv_rep`	`FALSE` (default), `TRUE`, or numeric. If `TRUE` or numeric, a cross-validation is performed for the calculation of TPRs and TNRs. Numeric value specifies the number of repetitions, while, if `TRUE`, it defaults to `100` repetitions. In each repetition, the data is divided into `k` random parts ("folds"; see `cv_fold`), and the optimal accuracy is obtained k times from a k-1 training set (`var1` and `var2` truncated to equal length, if needed, in each case within each repetition), and the TPR and TNR are calculated from the remaining test set (different each time).
`cv_fold`	Numeric. The number of folds into which the data is divided for cross-validation (default: 10).
`hush`	Logical. If `TRUE`, prevents printing any details to console.
`plots`	Logical (or `NULL`). If `TRUE`, creates a combined density plot (i.e., `Gaussian kernel density estimates`) from the two variables. Includes dashed vertical lines to indicate means of each of the two variables. If `nonparametric` is set to `TRUE`, medians are calculated for these dashed lines instead of means. When `auc_added` is `TRUE` (and the AUC is at least .5), the best threshold value for classification (maximal differentiation accuracy using Youden's index) is added to the plot as solid vertical line. (In case of multiple best thresholds with identical overall accuracy, all are added.) If `NULL`, same as if `TRUE` except that histogram is added to the background.
`rug_size`	Numeric (`4` by default): size of the rug ticks below the density plot. Set to `0` (zero) to omit rug plotting.
`aspect_ratio`	Aspect ratio of the plots: `1` (`1`/`1`) by default. (Set to `NULL` for dynamic aspect ratio.)
`y_label`	String or `NULL`; the label for the `y` axis. (Default: `"density estimate"`.)
`x_label`	String or `NULL`; the label for the `x` axis. (Default: `"values"`.)
`factor_name`	String or `NULL`; factor legend title. (Default: `NULL`.)
`var_names`	A vector of two strings; the variable names to be displayed in the legend. (Default: `c("1", "2")`.)
`reverse`	Logical. If `TRUE`, reverses the order of variable names displayed in the legend.

Details

The Bayes factor (BF) supporting null hypothesis is denoted as BF01, while that supporting alternative hypothesis is denoted as BF10. When the BF is smaller than 1 (i.e., supports null hypothesis), the reciprocal is calculated (hence, BF10 = BF, but BF01 = 1/BF). When the BF is greater than or equal to 10000, scientific (exponential) form is reported for readability. (The original full BF number is available in the returned named vector as bf.)

For simplicity, Cohen's d is reported for nonparametric tests too: you may however want to consider reporting alternative effect sizes in this case.

The original pROC::auc function, by default, always returns an AUC greater than (or equal to) .5, assuming that the prediction based on values in the expected direction work correctly at least at chance level. This however may be confusing. Consider an example where we measure the heights of persons in a specific small sample and expect that greater height predicts masculine gender. The results are, say, 169, 175, 167, 164 (cm) for one gender, and 176, 182, 179, 165 for the other. If the expectation is correct (the second, greater values are for males), the AUC is .812. However, if in this particular population females are actually taller than males, the AUC is in fact .188. To keep things clear, the t_neat function always makes an assumption about which variable is expected to be greater for correct classification ("1" by default; i.e., var1; to be specified as auc_greater = "2" for var2 to be expected as greater). For this example, if the first (smaller) variables are given as var1 for females, and second (larger), variables are given as var2 for males, we have to specify auc_greater = "2" to indicate the expectation of larger values for males. (Or, easier, just add the expected larger values as var1.)

Value

Prints t-test statistics (including Cohen's d with CI, BF, and AUC, as specified via the corresponding parameters) in APA style. Furthermore, when assigned, returns a list, that contains a named vector 'stats' with the following elements: t (t value), p (p value), d (Cohen's d), bf (Bayes factor), auc (AUC), accuracy (overall accuracy using the optimal classification threshold), and youden (Youden's index: specificity + sensitivity - 1). The latter three are NULL when auc_added is FALSE. When auc_added is TRUE, there are also two or three additional elements of the list. One is 'roc_obj', which is a roc object, to be used e.g. with the roc_neat function. Another is 'best_thresholds', which contains the best threshold value(s) for classification, along with corresponding specificity and sensitivity. The third 'cv_results' contains the results, if any, of the cross-validation of TPRs and TNRs (means per repetition). Finally, if plots is TRUE (or NULL), the plot is displayed as well as returned as a ggplot object, named t_plot.

Note

The Welch's t-test is calculated via stats::t.test.

#'Normality tests are all calculated via fBasics::NormalityTests, selected based on the recommendation of Lakens (2015), quoting Yap and Sim (2011, p. 2153): "If the distribution is symmetric with low kurtosis values (i.e. symmetric short-tailed distribution), then the D'Agostino and Shapiro-Wilkes tests have good power. For symmetric distribution with high sample kurtosis (symmetric long-tailed), the researcher can use the JB, Shapiro-Wilkes, or Anderson-Darling test." See urlhttps://github.com/Lakens/perfect-t-test for more details.

Cohen's d and its confidence interval are calculated, using the t value, via MBESS::ci.smd for independent samples (as standardized mean difference) and via MBESS::ci.sm for paired samples (as standardized mean).

The parametric Bayes factor is calculated via BayesFactor::ttestBF. The nonparametric (rank-based) Bayes factor is a contribution by Johnny van Doorn; the original source code is available via https://osf.io/gny35/.

The correlation and its CI are calculated via stats::cor.test, and is always two-sided, always with 95 percent CI. For more, use corr_neat.

The AUC and its CI are calculated via pROC::auc, and the accuracy at optimal threshold via pROC::coords (x = "best"); both using the object pROC::roc.

References

Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch's t-test instead of Student's t-test. International Review of Social Psychology, 30(1). doi:10.5334/irsp.82

Kelley, K. (2007). Methods for the behavioral, educational, and social sciences: An R package. Behavior Research Methods, 39(4), 979-984. doi:10.3758/BF03192993

Lakens, D. (2015). The perfect t-test (version 1.0.0). Retrieved from https://github.com/Lakens/perfect-t-test. doi:10.5281/zenodo.17603

Yap, B. W., & Sim, C. H. (2011). Comparisons of various types of normality tests. Journal of Statistical Computation and Simulation, 81(12), 2141–2155. doi:10.1080/00949655.2010.520163

Examples

# assign two variables (numeric vectors)
v1 = c(191, 115, 129, 43, 523,-4, 34, 28, 33,-1, 54)
v2 = c(4,-2, 23, 13, 32, 16, 3, 29, 37,-4, 65)

t_neat(v1, v2) # prints results as independent samples
t_neat(v1, v2, pair = TRUE) # as paired samples (r added by default)
t_neat(v1, v2, pair = TRUE, greater = "1") # one-sided
t_neat(v1, v2, pair = TRUE, auc_added = TRUE ) # AUC included

# print results and assign returned list
results = t_neat(v1, v2, pair = TRUE)

results$stats['bf'] # get precise BF value

# assign two variables (numeric vectors)
v1 = c(191, 115, 129, 43, 523,-4, 34, 28, 33,-1, 54)
v2 = c(4,-2, 23, 13, 32, 16, 3, 29, 37,-4, 65)

t_neat(v1, v2) # prints results as independent samples
t_neat(v1, v2, pair = TRUE) # as paired samples (r added by default)
t_neat(v1, v2, pair = TRUE, greater = "1") # one-sided
t_neat(v1, v2, pair = TRUE, auc_added = TRUE ) # AUC included

# print results and assign returned list
results = t_neat(v1, v2, pair = TRUE)

results$stats['bf'] # get precise BF value

Table, descriptives

Description

Creates a neat means (or similar descriptives) and standard deviations table, using aggr_neat functions as arguments. Alternatively, merges and transposes data frames into rows.

Usage

table_neat(
  values_list,
  group_by = NULL,
  group_per = "rows",
  to_clipboard = FALSE,
  method = "mean+sd",
  transpose = FALSE
)
table_neat(
  values_list,
  group_by = NULL,
  group_per = "rows",
  to_clipboard = FALSE,
  method = "mean+sd",
  transpose = FALSE
)

Arguments

`values_list`	Data frames as returned from the `aggr_neat` function: variables from which the statistics for the table are to be calculated. The `group_by`, `method`, and `prefix` parameters are ignored when they are given in the `table_neat` function; see Details and also an extensive example via https://github.com/gasparl/neatstats.
`group_by`	String, or vector of strings: the name(s) of the column(s) in the `dat` given data frame, containing the vector(s) of factors by which the statistics are grouped. (Overwrites `group_by` in `aggr_neat`; see Details.)
`group_per`	String, "rows" or "columns". If set to "columns" (or just "c" or "col", etc.), each column contains statistics for one group. Otherwise (default), each row contains statistics for one group.
`to_clipboard`	Logical. If `TRUE`, the table is copied to the clipboard (default: `FALSE`).
`method`	Function or string; overwrites the `method` argument in `aggr_neat` when used within this function. See `method` in the `aggr_neat` function for details. Default value: `"mean+sd"` (to calculate means and standard deviations table).
`transpose`	Logical (default: `FALSE`) or string. If `TRUE` or string, ignores all other parameters (except `values_list`), but merges all values from given list of data frames (as returned from the `aggr_neat`) and transposes them into a single row, using, by default, the `"aggr_group"` column values for new headers (corresponding to the output of `aggr_neat`; see Examples). However, a string given as argument for the `transpose` parameter can also specify a custom column name.

Details

The values, round_to, and new_name arguments given in the aggr_neat function are always applied. However, the prefix parameter will be overwritten as NULL. If new_name in aggr_neat is NULL, the given input variable names will be used instead of "aggr_value". Furthermore, the group_by or method given in the aggr_neat function are only applied when no arguments are given in the table_neat function for the identical parameters (group_by or medians). If either parameter is given in the table_neat function, all separately given respective argument(s) in the aggr_neat function(s) are ignored.

Value

Returns a data frame with means or medians and SDs per variable and per group.

Examples

data("mtcars") # load base R example dataset

# overall means and SDs table for disp (Displacement) and hp (Gross horsepower)
table_neat(list(aggr_neat(mtcars, disp),
                aggr_neat(mtcars, hp)))

# means and SDs table for mpg (Miles/(US) gallon), wt (Weight), and hp (Gross horsepower)
# grouped by cyl (Number of cylinders)
# each measure rounded to respective optimal number of digits
# wt renamed to weight (for the column title)
table_neat(list(
    aggr_neat(mtcars, mpg, round_to = 1),
    aggr_neat(mtcars, wt, new_name = 'weight', round_to = 2),
    aggr_neat(mtcars, hp, round_to = 0)
),
group_by = 'cyl')

# same as above, but with medians, and with groups per columns
table_neat(
    list(
        aggr_neat(mtcars, mpg, round_to = 1),
        aggr_neat(mtcars, wt, new_name = 'weight', round_to = 2),
        aggr_neat(mtcars, hp, round_to = 0)
    ),
    group_by = 'cyl',
    method = 'median+sd',
    group_per = 'columns'
)

# an extensive example to show how to collect and aggregate raw data is
# available via the README file at the repository:
# https://github.com/gasparl/neatstats

data("mtcars") # load base R example dataset

# overall means and SDs table for disp (Displacement) and hp (Gross horsepower)
table_neat(list(aggr_neat(mtcars, disp),
                aggr_neat(mtcars, hp)))

# means and SDs table for mpg (Miles/(US) gallon), wt (Weight), and hp (Gross horsepower)
# grouped by cyl (Number of cylinders)
# each measure rounded to respective optimal number of digits
# wt renamed to weight (for the column title)
table_neat(list(
    aggr_neat(mtcars, mpg, round_to = 1),
    aggr_neat(mtcars, wt, new_name = 'weight', round_to = 2),
    aggr_neat(mtcars, hp, round_to = 0)
),
group_by = 'cyl')

# same as above, but with medians, and with groups per columns
table_neat(
    list(
        aggr_neat(mtcars, mpg, round_to = 1),
        aggr_neat(mtcars, wt, new_name = 'weight', round_to = 2),
        aggr_neat(mtcars, hp, round_to = 0)
    ),
    group_by = 'cyl',
    method = 'median+sd',
    group_per = 'columns'
)

# an extensive example to show how to collect and aggregate raw data is
# available via the README file at the repository:
# https://github.com/gasparl/neatstats

Variance Equality Tests

Description

Displayed sample sizes and SDs and performs Brown-Forsythe and Fligner-Killeen variance equality tests (tests of homogeneity of variances) per group combinations. This is primarily a subfunction of anova_neat, but here it is available separately for other potential purposes.

Usage

var_tests(xvar, group_by, dat = NULL, hush = FALSE, sep = ", ")
var_tests(xvar, group_by, dat = NULL, hush = FALSE, sep = ", ")

Arguments

`xvar`	Either a numeric vector (numbers of any given variable), or, if `dat` is given, a column name specifying the variable in the given data frame.
`group_by`	Either a vector of factors with which to group the `xvar` values, or, if `dat` is given, one or more column names specifying the columns in the given data frame.
`dat`	Either `NULL` or a data frame from which the respective column names should be selected for `xvar` and `group`.
`hush`	Logical. If `TRUE`, prevents printing any details to console.
`sep`	String (underscore `"_"` by default) for separating group names.

Value

Prints test results.

Note

Brown-Forsythe test (i.e., Levene's test using medians) is calculated via car::leveneTest. Fligner-Killeen test, which may be more robust (i.e., less affected by non-normal distribution), is calculated via stats::fligner.test. (See also Conover et al., 1981, p. 360.)

References

Brown, M. B. & Forsythe, A. B. (1974). Robust tests for the equality of variances. Journal of the American Statistical Association, 69, pp. 364-367.

Conover W. J., Johnson M. E., & Johnson M. M. (1981). A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics, 23, 351–361.

Fligner, M. A. & Killeen, T. J. (1976). Distribution-free two-sample tests for scale. ‘Journal of the American Statistical Association. 71(353), 210-213.

Fox, J. & Weisberg, S. (2019) An R Companion to Applied Regression, Third Edition, Sage.

Levene, H. (1960). Robust tests for equality of variances. In I. Olkin, H. Hotelling, et al. (eds.). Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling. Stanford University Press. pp. 278–292.

Examples


data("ToothGrowth") # load base R example dataset

# the statistics of the four functions below should match
var_tests(ToothGrowth$len, ToothGrowth$supp)
var_tests('len', 'supp', ToothGrowth)
car::leveneTest(len ~ supp, data = ToothGrowth)
stats::fligner.test(len ~ supp, ToothGrowth)

# again the results below should match each other
var_tests(ToothGrowth$len,
          interaction(ToothGrowth$supp, ToothGrowth$dose))
var_tests('len', c('supp', 'dose'), ToothGrowth)
car::leveneTest(len ~ supp * as.factor(dose), data = ToothGrowth)
stats::fligner.test(len ~ interaction(supp, dose), ToothGrowth)

data("ToothGrowth") # load base R example dataset

# the statistics of the four functions below should match
var_tests(ToothGrowth$len, ToothGrowth$supp)
var_tests('len', 'supp', ToothGrowth)
car::leveneTest(len ~ supp, data = ToothGrowth)
stats::fligner.test(len ~ supp, ToothGrowth)

# again the results below should match each other
var_tests(ToothGrowth$len,
          interaction(ToothGrowth$supp, ToothGrowth$dose))
var_tests('len', c('supp', 'dose'), ToothGrowth)
car::leveneTest(len ~ supp * as.factor(dose), data = ToothGrowth)
stats::fligner.test(len ~ interaction(supp, dose), ToothGrowth)

Package 'neatStats'

Help Index

Aggregation, descriptives

Description

Usage

Arguments

Value

See Also

Examples

Comparison of Multiple Means: ANOVA

Description

Usage

Arguments

Details

Value

Note

References

See Also

Examples

CI from p value

Description

Usage

Arguments

Value

Note

References

Examples

Correlation Statistics

Description

Usage

Arguments

Details

Value

Note

References

See Also

Examples

Demographics

Description

Usage

Arguments

Details

Examples

Enumerate

Description

Usage

Arguments

Value

See Also

Examples

Exclusion

Description

Usage

Arguments

Value

See Also

Examples

Confidence Interval of Mean

Description

Usage

Arguments

Value

See Also

Examples

Monitor Screen Unit Conversion

Description

Usage

Arguments

Value

See Also

Examples

Monitor Object

Description

Usage

Arguments

Value

See Also

Examples

Normality Tests and Plots

Description