Below, the essence of POSSA
’s power (and type 1
error rate, T1ER) calculation for sequential designs is explained using
the simple example of a t-test. Of course, for t-tests specifically, you
could use other and potentially more user-friendly software – however,
the point here is to understand how the POSSA
works, and,
subsequently, it should be possible for you to extend this example to
virtually any statistical null hypothesis test. Cases of multiple
hypotheses (and recording descriptives) are explained on a separate
page, but these too are straightforward extensions of the present
examples.
This tutorial presupposes the basic conceptual understanding of sequential analyses (and the basics or statistical power and T1ER). There are several related free tutorials online (see e.g. Lakens, 2014).
All parameters mentioned below are described in detail in the full
documentation (see ?sim
and ?pow
).
So let’s say you want to compare the means of two independent
continuous variables with a simple t-test. Your smallest effect size of
interest (SESOI) is 5
units (arbitrary units that could
represent anything – e.g., the height of a plant in cm, or the time of
performing a task in seconds, etc.).
First, write a function that simulates the samples (i.e., observed values) for a single instance of the hypothetical experiment in case the null hypothesis is true (no actual difference between the two population means) as well as in case the alternative hypothesis is true (there is in fact difference).
If the null hypothesis “H0” is true, the two samples have the same
mean, let’s say 0
units; and let’s assume normally
distributed values an SD of 10
units for each sample. For
this, two samples can each be simulated via the function
rnorm(sampleSize, mean = 0, sd = 10)
(where
sampleSize
is a variable to be provided via another
function, as explained later). You can assign this to
sample1
, which represents a baseline variable (e.g., “group
A”), and, separately, to sample2_h0
, which represents a
sample that does not differ from the baseline (“H0” true). If the null
hypothesis “H1” is true, the baseline sample may again have a mean of
0
, but the other sample should have the SESOI,
5
units (again assuming an SD of 10
) – which
can be simulated as rnorm(sampleSize, mean = 5, sd = 10)
.
This can be assigned to sample2_h1
, which represents a
sample that does differ from the baseline (“H1” true). (The baseline
sample has identical properties in case of H0 and H1, so it’s enough to
simulate it only once, and use that same resulting sample for both cases
– for the sake of brevity as well as of computational speed.)
All this is to be put into a function, which takes a
sampleSize
argument and returns the simulated samples as a
named list, as follows.
customSample = function(sampleSize) {
list(
sample1 = rnorm(sampleSize, mean = 0, sd = 10),
sample2_h0 = rnorm(sampleSize, mean = 0, sd = 10),
sample2_h1 = rnorm(sampleSize, mean = 5, sd = 10)
)
}
Note that the customSample
and sampleSize
variables were written with camel case (likeThis), while the
_h0
and _h1
endings used underscores
(like_this). This alternate notation is used in the present tutorial to
help distinguish what names can be freely chosen, and what is
necessarily named so for the POSSA
package: custom names
are all camel case, while underscores indicate notation necessary for
POSSA
. Here specifically, the returned list’s element names
for the sample that varies depending on whether the null (H0) and
alternative (H1) hypothesis is true must be indicated with
_h0
and _h1
endings, respectively, with a
common root (here: sample2_h
). The root name or the other
variable names (customSample
and sampleSize
)
could be named differently, and everything would work just as well.
Now, write a function that performs the test; here, a one-sided t-test (with equal variances assumed – this is just for the example’s outcome’s comparability for other software, but normally Welch’s test should be preferred).
The function’s parameters must be identical to the names of the list
elements returned by the sample function (here: sample1
,
sample2_h0
, and sample2_h1
). The returned
value must be a named vector that contains the derived p values. The
name of each p value must start with p_
, and end with
_h0
for the “H0” outcome and with _h1
for the
“H1” outcome. (Below, it’s simply p_h0
/p_h1
,
but it could just as well be, e.g.,
p_my_t.test_h0
/p_my_t.test_h1
).
customTest = function(sample1, sample2_h0, sample2_h1) {
c(
p_h0 = t.test(sample1, sample2_h0, 'less', var.equal = TRUE)$p.value,
p_h1 = t.test(sample1, sample2_h1, 'less', var.equal = TRUE)$p.value
)
}
(Other information, in particular descriptives such as means or SDs, can also be returned via this function – but this is described in a separate vignette.)
To make sure the two functions are working correctly, as a quick test you can run the following line (with any applicable number argument).
This passes 55
as (arbitrary) sample size to
customSample
, which passes the created samples to
customTest
, which finally returns the named vector with the
p_h0
and p_h1
p values. (Given the
randomization, if you run this line repeatedly, the outcome should
differ each time – however, with large enough numbers, p_h1
is likely to be noticeably smaller than p_h0
.)
Time to load POSSA
.
Now, the POSSA::sim
function can be used with
customSample
for fun_obs
(function for
observation values) and customTest
for
fun_test
(function for statistical testing), and any
desired numbers of observation. (This runs the simulation
45000
times by default, so it takes a while. For quick
testing, set the number to a lower value via the n_iter
parameter, e.g., n_iter = 500
.)
The returned dfPvals
contains all the simulated p
values, and can be directly passed to POSSA::pow
to get the
power for any specified alpha level.
This prints to the console the following.
#> # POSSA pow() results #
#> N(average-total) = 160.0 (if H0 true) or 160.0 (if H1 true)
#> (p) Type I error: .05140; Power: .93327
#> Local alphas: (1) .05000
This is a fixed sample design with 80 observations per each of the
two groups. The function uses an alpha of .05
by default,
this may be modified e.g. to .01
below.
pow(dfPvals, alpha_global = 0.01)
#> # POSSA pow() results #
#> N(average-total) = 160.0 (if H0 true) or 160.0 (if H1 true)
#> (p) Type I error: .00991; Power: .79069
#> Local alphas: (1) .01000
So far, we can easily get the same information from other software,
e.g. pwr::pwr.t.test(n = 80, d = 0.5, alternative = 'greater')
(returns power = 0.93368
) and
pwr::pwr.t.test(n = 80, d = 0.5, sig.level = 0.01, alternative = 'greater')
(returns power = 0.79068
).
However, sequential designs are extremely easy to be expanded from
the above sim
and pow
examples. One just has
to specify multiple instances of observation numbers for the
n_obs
in sim
and specify a local alphas in
pow
, e.g. as below.
dfPvalsSeq = sim(fun_obs = customSample,
n_obs = c(27, 54, 81),
fun_test = customTest)
pow(dfPvalsSeq, alpha_locals = NA)
#> # POSSA pow() results #
#> N(average-total) = 158.6 (if H0 true) or 98.8 (if H1 true)
#> (p) Type I error: .05000; Power: .89922
#> Local alphas: (1) .02288; (2) .02288; (3) .02288
#> Likelihoods of significance if H0 true: (1) .02296; (2) .01631; (3) .01073
#> Likelihoods of significance if H1 true: (1) .42324; (2) .32298; (3) .15300
The alpha_locals = NA
input specifies that all local
alphas are initially NA
, in which case they will all be
calculated to have a single identical number (corresponding to Pocock’s
correction in case of equally spaced looks). Rounding to the
conventional 3 fractional digits, the adjusted constant local alpha
number corresponds to the exact statistics (.023
) provided
in other software,
e.g. gsDesign::gsDesign(k = 3, test.type = 1, alpha = 0.05, sfu = 'Pocock')
.
The console output contains only the most important information, but,
when assigned, the pow
function returns a data frame with
detailed information per each look (and their totals), including the
specified sample sizes, the numbers of iterations, etc. The printed
information includes
p
, the root derived
from p_h0
/p_h1
),The look numbers are always in parenthesis, such that
“(1)
” designates the first look, “(2)
” the
second look, etc.
The console output can also be called as
print(dfPvalsSeq)
. The number of fractional digits to which
the values are to be rounded can be changed via a round_to
argument, such as e.g. print(dfPvalsSeq, round_to = 2)
.
POSSA
helps obtain the local alphas, in sequential
designs, that result in a specified overall (global) T1ER, which may
also be called “global alpha”. This procedure is automatized for most
practical cases, but still users should ideally have a basic idea of how
this works. However, impatient readers (who do not want to have
customized adjustments) can skip to the Specifying initial local
alphas section below.
The essential mechanism is a staircase procedure, where the local
alphas are continually increased when the T1ER is smaller than specified
and decreased when the T1ER is larger then specified, and each direction
change (from decrease to increase or vice versa) moves onto a smaller
step size in the adjustment. For example, the simplest case is when all
local alphas are provided as NA
, as above: in this case,
the initial replacement value (which may be modified via the
adj_init
parameter) is by default the global alpha divided
by the maximum number of looks, as a rough initial estimation.
Subsequently, the given p values are tested with this setting, and a
global T1ER is calculated. If it is larger than specified, the
replacement value is decreased, or vice versa. The amount of decrease is
the first step, by default 0.01
. Then the testing is
repeated, and the change is repeated in the same direction until the
obtained T1ER passes the specified T1ER (e.g., becomes smaller, when
initially larger). Then the replacement value is increased with a
smaller second step size, by default 0.05
. And so on, until
either there are no more steps (as specified) or (more typically) the
obtained T1ER is close enough to the specified one (e.g., by default,
matches it up to 5 fractional digits; may be specified via
alpha_precision
), at which point the procedure stops and
the obtained local alphas (each having the same value) and the
corresponding results are returned and printed.
The adjustment actually happens via a function that can be specified
via the adjust
parameter, but, by default, whenever there
is at least one NA
value among the given
alpha_locals
argument, the function implements the
above-described procedure, and looks as follows.
In this function, adj
means the adjustment value (in
this case a replacement value), prev
the previous vector of
local alphas (obtained in the last adjustment), and orig
the original vector of local alphas as provided for
alpha_locals
. (For the first adjustment, prev
is the same as orig
). Any sort of custom function may be
provided as adjust
, but it cannot contain any parameter
other than these three (adj
, prev
, and
orig
).
By default, if no NA
value is found among the given
alpha_locals
, the adjust
function uses
multiplication to adjust the given values (with adj_init
set to 1
), as follows.
By default, there are altogether 11 step sizes, chosen in a way that
typically results in 5-fractional-digit T1ER match before running
getting to the last step, and the calculation typically takes only a few
seconds. (Step sizes can be modified via the
staircase_steps
parameter.)
Given the above-described default mechanisms, there are, without
modifying the adjustment procedure, two easy ways to provide the
argument for alpha_locals
(for any given p value): (a) a
vector that includes (or consist entirely of) NA
values
that are to be replaced with a constant number that results in the
desired global T1ER, or (b) a vector with all numeric values that are
each to be multiplied with a number in order to obtain a vector of
alphas that again result in the desired global T1ER.
However, you can also simply check the results, without any
adjustment, of any given set of local alphas, by setting
adjust = FALSE
.
Here are some examples, all using the data frame
“dfPvalsSeq
” created above with
POSSA::sim()
.
To use Haybittle–Peto boundary (all interim local alphas
.001
), you can specify .001
for each local
alpha except for the last, which could be NA
to be adjusted
for the given global alpha (here: .025
), as below.
pow(dfPvalsSeq, alpha_locals = c(0.001, 0.001, NA), alpha_global = 0.025)
#> # POSSA pow() results #
#> N(average-total) = 161.8 (if H0 true) or 140.4 (if H1 true)
#> (p) Type I error: .02500; Power: .88411
#> Local alphas: (1) .00100; (2) .00100; (3) .02431
#> Likelihoods of significance if H0 true: (1) .00111; (2) .00067; (3) .02322
#> Likelihoods of significance if H1 true: (1) .09224; (2) .21596; (3) .57591
In this specific scenario, despite noticeably decreased average
sample in case of H1 (140.4
instead of the fixed
162
), the adverse effect of interim stops on T1ER is so
minimal (see, in case of H0, the tiny ratio significant [false positive]
findings at the first and second looks), that the last look’s alpha
(.02431
) hardly differs from the specified global alpha
(.025
).
You can also check what happens without adjustment, using
adjust = FALSE
.
pow(dfPvalsSeq, alpha_locals = c(0.001, 0.001, 0.025), alpha_global = 0.025, adjust = FALSE)
#> # POSSA pow() results #
#> N(average-total) = 161.8 (if H0 true) or 140.4 (if H1 true)
#> (p) Type I error: .02551; Power: .88618
#> Local alphas: (1) .00100; (2) .00100; (3) .02500
#> Likelihoods of significance if H0 true: (1) .00111; (2) .00067; (3) .02373
#> Likelihoods of significance if H1 true: (1) .09224; (2) .21596; (3) .57798
Which leads to the same conclusion from the reverse perspective: the
T1ER hardly exceeds .25
.
Nonetheless, more substantial sample reduction can be achieved with
somewhat more liberal interim local alphas; besides, normally it makes
more sense to increase the local alphas gradually (because, in brief,
given the greater uncertainty of the initial smaller samples, beginning
with lower local alphas and gradually increasing them provides the most
efficient “information gain” for the same eventual T1ER). For this, one
could base the adjustment on one of the popular boundary methods used
for parametric tests, such as the O’Brien-Fleming bounds. These bounds
for parametric tests can be easily calculated via other packages. They
may not apply to nonparametric tests in exactly the same way, but the
principle of exponentially increasing local alphas remains the same. (To
decide what fits your specific scenario best, you could also check the
outcomes of various versions of the Wang-Tsiatis bounds by changing its
Delta parameter; e.g., in gsDesign::gsDesign()
,
set sfu = 'WT'
for Wang-Tsiatis, and vary its Delta via
sfupar
.) Most importantly in any case, the adjustment of
alphas in POSSA
will ensure that the T1ER is at the
specified level. This will only make a difference for other custom tests
and multiple hypotheses, but not for the present t-test example.
For our example of three equally spaced interim looks,
O’Brien-Fleming bounds can be calculated e.g. via
gsDesign::gsDesign(k = 3, test.type = 1, alpha = 0.05, sfu = 'OF')
,
giving the local alphas of 0.0015
, 0.0181
, and
0.0437
. (The same is returned by
rpact::getDesignGroupSequential(typeOfDesign = "OF", informationRates = c(1/3, 2/3, 1), alpha = 0.05)
.)
To merely calculate the T1ER and power for any specific tests, again
use adjust = FALSE
.
pow(dfPvalsSeq, alpha_locals = c(0.0015, 0.0181, 0.0437), alpha_global = 0.05, adjust = FALSE)
#> # POSSA pow() results #
#> N(average-total) = 160.8 (if H0 true) or 118.7 (if H1 true)
#> (p) Type I error: .04951; Power: .93087
#> Local alphas: (1) .00150; (2) .01810; (3) .04370
#> Likelihoods of significance if H0 true: (1) .00162; (2) .01809; (3) .02980
#> Likelihoods of significance if H1 true: (1) .11531; (2) .57211; (3) .24344
(To verify the power via other software, see
e.g. summary(rpact::getSampleSizeMeans(rpact::getDesignGroupSequential(typeOfDesign = "OF", informationRates = c(1/3, 2/3, 1), alpha = 0.05, beta = 1-0.93087), alternative = 0.5))
.)
Since these local alphas were specifically chosen to have a T1ER of
.05
for the t-test comparison used here, they will hardly
change even with adjust = TRUE
. However, to test the
adjustment function, we can set a different global alpha,
e.g. .01
, so that the adjustment will reduce each local
alpha value by multiplication.
pow(dfPvalsSeq, alpha_locals = c(0.0015, 0.0181, 0.0437), alpha_global = 0.025)
#> # POSSA pow() results #
#> N(average-total) = 161.4 (if H0 true) or 126.9 (if H1 true)
#> (p) Type I error: .02498; Power: .87487
#> Local alphas: (1) .00071; (2) .00862; (3) .02080
#> Likelihoods of significance if H0 true: (1) .00089; (2) .00876; (3) .01533
#> Likelihoods of significance if H1 true: (1) .07629; (2) .49747; (3) .30111
(Cf.
summary(rpact::getSampleSizeMeans(rpact::getDesignGroupSequential(typeOfDesign = "asUser", informationRates = c(1/3, 2/3, 1), userAlphaSpending = c(.00071, .0089, .0244), beta = 1-.87487), alternative = 0.5))
.)
Finally, just for quick demonstration, for the same scenario, here is
a function that adjusts all values via additions (constant
increase/decrease of all local alpha values). Also, here the global
alpha is set to be larger (alpha_global = 0.1
), just so as
to make the local alphas increase too in this case.
pow(
dfPvalsSeq,
alpha_locals = c(0.0015, 0.0181, 0.0437),
alpha_global = 0.1,
adjust = function(adj, prev, orig) {
return(orig + adj)
}
)
#> # POSSA pow() results #
#> N(average-total) = 161.4 (if H0 true) or 126.9 (if H1 true)
#> (p) Type I error: .02498; Power: .87487
#> Local alphas: (1) .00071; (2) .00862; (3) .02080
#> Likelihoods of significance if H0 true: (1) .00089; (2) .00876; (3) .01533
#> Likelihoods of significance if H1 true: (1) .07629; (2) .49747; (3) .30111
By default, the interim local alphas will all be zero (i.e., no stopping for significance), and the final local alpha will equal the given global alpha – meaning a simple fixed design (with the maximum specified observation number).
pow(dfPvalsSeq)
#> # POSSA pow() results #
#> N(average-total) = 162.0 (if H0 true) or 162.0 (if H1 true)
#> (p) Type I error: .04860; Power: .93636
#> Local alphas: (1) none; (2) none; (3) .05000
#> Likelihoods of significance if H0 true: (1) 0; (2) 0; (3) .04860
#> Likelihoods of significance if H1 true: (1) 0; (2) 0; (3) .93636
This is to make the setting of futility bounds alone easy
and straightforward. For instance, to stop collection for futility
whenever the p value exceeds .5
(during interim looks), one
can simply add fut_locals = 0.5
.
pow(dfPvalsSeq, fut_locals = 0.5)
#> # POSSA pow() results #
#> N(average-total) = 101.1 (if H0 true) or 158.3 (if H1 true)
#> (p) Type I error: .04516; Power: .91622
#> Local alphas: (1) none; (2) none; (3) .05000
#> Likelihoods of significance if H0 true: (1) 0; (2) 0; (3) .04516
#> Likelihoods of significance if H1 true: (1) 0; (2) 0; (3) .91622
#> Futility bounds: (1) .50000; (2) .50000
#> Likelihoods of exceeding futility alpha if H0 true: (1) .03011; (2) .00176
#> Likelihoods of exceeding futility alpha if H1 true: (1) .03324; (2) .00180
This is equivalent to specifying each futility bound. Since futility
bound is meaningless for the final look, and the present example
(dfPvalsSeq
) contains two interim looks, two values are
required. Hence the
pow(dfPvalsSeq, fut_locals = c(0.5, 0.5))
gives results
equivalent to pow(dfPvalsSeq, fut_locals = 0.5)
(as
above).
Of course, once again, a constant value is not ideal; the futility bound is more effective with values decreasing by each look. The specific choices depends on how much power one is willing to sacrifice in exchange for reduced sample sizes (which, as seen in the outputs, will help primarily in case H0 is true).
pow(dfPvalsSeq, fut_locals = c(0.6, 0.3))
#> # POSSA pow() results #
#> N(average-total) = 100.9 (if H0 true) or 159.3 (if H1 true)
#> (p) Type I error: .04587; Power: .92331
#> Local alphas: (1) none; (2) none; (3) .05000
#> Likelihoods of significance if H0 true: (1) 0; (2) 0; (3) .04587
#> Likelihoods of significance if H1 true: (1) 0; (2) 0; (3) .92331
#> Futility bounds: (1) .60000; (2) .30000
#> Likelihoods of exceeding futility alpha if H0 true: (1) .01549; (2) .01307
#> Likelihoods of exceeding futility alpha if H1 true: (1) .01789; (2) .01336
Importantly, alphas stopping for significance and bounds stopping for futility can be combined in any desired way.
For instance:
pow(
dfPvalsSeq,
alpha_locals = c(0.002, 0.018, 0.044),
fut_locals = c(0.6, 0.3)
)
#> # POSSA pow() results #
#> N(average-total) = 99.7 (if H0 true) or 114.3 (if H1 true)
#> (p) Type I error: .05000; Power: .92229
#> Local alphas: (1) .00211; (2) .01897; (3) .04636
#> Likelihoods of significance if H0 true: (1) .00216; (2) .01864; (3) .02920
#> Likelihoods of significance if H1 true: (1) .13907; (2) .55542; (3) .22780
#> Futility bounds: (1) .60000; (2) .30000
#> Likelihoods of exceeding futility alpha if H0 true: (1) .03304; (2) .37542
#> Likelihoods of exceeding futility alpha if H1 true: (1) .01789; (2) .01336
(Note the robust sample size reduction in case of both H0 and H1, with power hardly changed as compared to the fixed design.)
The alpha_locals
functions same as without futility
bounds; e.g., in this last example it was adjusted via multiplication,
but it could also be again given NA
values to be adjusted
via replacements.
To disable, at any given look, stopping either for significance or for futility, just set the local alpha/bound to zero or to one, respectively. For example, the above example can be modified as below to have no stopping for futility at the first look and no stopping for significance at the second look.
pow(
dfPvalsSeq,
alpha_locals = c(0.002, 0, 0.044),
fut_locals = c(1, 0.3)
)
#> # POSSA pow() results #
#> N(average-total) = 123.8 (if H0 true) or 145.2 (if H1 true)
#> (p) Type I error: .05000; Power: .93416
#> Local alphas: (1) .00235; (2) none; (3) .05167
#> Likelihoods of significance if H0 true: (1) .00247; (2) 0; (3) .04753
#> Likelihoods of significance if H1 true: (1) .14633; (2) 0; (3) .78782
#> Futility bounds: (1) none; (2) .30000
#> Likelihoods of exceeding futility alpha if H0 true: (1) 0; (2) .01867
#> Likelihoods of exceeding futility alpha if H1 true: (1) 0; (2) .01916
(Of course, once again, this example hardly makes sense in a real case; it serves only as a demonstration here.)
Consider the sample creation given in the beginning:
customSample = function(sampleSize) {
list(
sample1 = rnorm(sampleSize, mean = 0, sd = 10),
sample2_h0 = rnorm(sampleSize, mean = 0, sd = 10),
sample2_h1 = rnorm(sampleSize, mean = 5, sd = 10)
)
}
Here, we have two different groups (denoted as “sample1” and
“sample2”), but, since their sample sizes (numbers of observations) are
the same, for convenience the sample size can be passed via a single
parameter (here: sampleSize
). It could also be specified
separately, passing via two parameters. In such a case, the parameter
names must exactly correspond to the sample root names (i.e., the names
without any final 0
/1
that specify “H0” or
“H1”); here, sample1
and sample2_h
.
customSampleTwoParam = function(sample1, sample2_h) {
list(
sample1 = rnorm(sample1, mean = 0, sd = 10),
sample2_h0 = rnorm(sample2_h, mean = 0, sd = 10),
sample2_h1 = rnorm(sample2_h, mean = 5, sd = 10)
)
}
Now, the previous sim
and pow
procedures
can be run just the same, and the results will be identical as in the
first example (with single parameter passed).
dfPvalsSeqTwoParam = sim(fun_obs = customSampleTwoParam,
n_obs = c(27, 54, 81),
fun_test = customTest)
pow(dfPvalsSeqTwoParam, alpha_locals = NA)
Here, given the n_obs = c(27, 54, 81)
with a single
vector, this is automatically assigned to each of the two given sample
parameters (sample1
and sample2_h
). These can
also be assigned separately, and the results will once again be
identical.
dfPvalsSeqTwoParamSeparate = sim(
fun_obs = customSampleTwoParam,
n_obs = list(
sample1 = c(27, 54, 81),
sample2_h = c(27, 54, 81)
),
fun_test = customTest
)
pow(dfPvalsSeqTwoParamSeparate, alpha_locals = NA)
Importantly, this also allows specifying different observation numbers for each of the two samples.
dfPvalsSeqTwoParamSeparate = sim(
fun_obs = customSampleTwoParam,
n_obs = list(
sample1 = c(17, 44, 71),
sample2_h = c(37, 64, 91)
),
fun_test = customTest
)
pow(dfPvalsSeqTwoParamSeparate, alpha_locals = NA)
Now, the outcome is of course different (but would be especially affected in case of unequal variances).
Incidentally: all sequential design examples so far used two interim looks – but there could of course be any other number. For instance, here is a design with a senseless amount of six interim looks (for two differently sized samples), just for the fun of it.
dfPvalsManyLooks = sim(
fun_obs = customSampleTwoParam,
n_obs = list(
sample1 = c(8, 16, 24, 32, 40, 48, 56),
sample2_h = c(10, 20, 30, 40, 50, 60, 70)
),
fun_test = customTest
)
pow(dfPvalsManyLooks, alpha_locals = NA)
#> # POSSA pow() results #
#> N(average-total) = 122.4 (if H0 true) or 78.8 (if H1 true)
#> (p) Type I error: .04991; Power: .78164
#> Local alphas: (1) .01381; (2) .01381; (3) .01381; (4) .01381; (5) .01381; (6) .01381; (7) .01381
#> Likelihoods of significance if H0 true: (1) .01478; (2) .01036; (3) .00682; (4) .00600; (5) .00449; (6) .00378; (7) .00369
#> Likelihoods of significance if H1 true: (1) .10896; (2) .14676; (3) .14107; (4) .12364; (5) .10562; (6) .08716; (7) .06844
In a real experiment, you may not know in advance the exact number of observations at the stopping point. For example, you plan (and preregister) to open an initial 30 slots for participation, after which you would perform the first testing (first interim “look”) – however, some participants will probably have to be excluded (e.g. because of failing attention checks), expected to be only a few, but cannot be known exactly in advance.
Assuming that minor variations in power do not matter, you can simply
calculate with the approximate observation numbers (at the given planned
looks) in advance, and, when the actual observation numbers are known,
the calculation is repeated with these latter numbers to ensure that the
T1ER remains as desired. For example, taking into account some necessary
exclusions, the expected observation numbers with two interim looks are
27
(first look), 54
, and 81
.
This may be calculated as before, using O’Brien-Fleming bounds (as
provided by, e.g.,
gsDesign::gsDesign(test.type = 1, alpha = 0.05, sfu = 'OF', timing = c(27, 54, 81)/81)
,
then adjusted by POSSA
).
dfPvals_planned = sim(fun_obs = customSample,
n_obs = c(27, 54, 81),
fun_test = customTest)
pow(dfPvals_planned, alpha_locals = c(0.0015, 0.0181, 0.0437), alpha_global = 0.05)
#> # POSSA pow() results #
#> N(average-total) = 160.8 (if H0 true) or 118.5 (if H1 true)
#> (p) Type I error: .05000; Power: .93164
#> Local alphas: (1) .00152; (2) .01830; (3) .04419
#> Likelihoods of significance if H0 true: (1) .00164; (2) .01831; (3) .03004
#> Likelihoods of significance if H1 true: (1) .11604; (2) .57309; (3) .24251
Now, let’s say that, due to somewhat more exclusions than expected,
the initial sample contains only 24 valid observations. You can rerun
the simulation with the modified first observation number, and run the
power calculation with initial alphas corrected with
gsDesign::gsDesign(test.type = 1, alpha = 0.05, sfu = 'OF', timing = c(24, 54, 81)/81)
.
dfPvals_first = sim(fun_obs = customSample,
n_obs = c(24, 54, 81),
fun_test = customTest)
pow(dfPvals_first, alpha_locals = c(0.0009, 0.0182, 0.0438), alpha_global = 0.05)
#> # POSSA pow() results #
#> N(average-total) = 161.0 (if H0 true) or 120.6 (if H1 true)
#> (p) Type I error: .05000; Power: .93484
#> Local alphas: (1) .00091; (2) .01850; (3) .04453
#> Likelihoods of significance if H0 true: (1) .00087; (2) .01760; (3) .03153
#> Likelihoods of significance if H1 true: (1) .07056; (2) .61742; (3) .24687
Let’s say that the p value was not below the local alpha
(.02295
), so you move open another 30 slots, and once again
end up with more exclusions than expected, resulting in only
49
valid observations. The adjusted calculation is then as
follows.
gsDesign::gsDesign(test.type = 1, alpha = 0.05, sfu = 'OF', timing = c(24, 49, 81)/81)
# gives new initial local alphas
dfPvals_second = sim(fun_obs = customSample,
n_obs = c(24, 49, 81),
fun_test = customTest)
pow(dfPvals_second, alpha_locals = c(0.0009, 0.0145, 0.0447), alpha_global = 0.05)
#> # POSSA pow() results #
#> N(average-total) = 161.0 (if H0 true) or 119.6 (if H1 true)
#> (p) Type I error: .04996; Power: .93153
#> Local alphas: (1) .00090; (2) .01453; (3) .04479
#> Likelihoods of significance if H0 true: (1) .00091; (2) .01360; (3) .03544
#> Likelihoods of significance if H1 true: (1) .07009; (2) .53733; (3) .32411
(And, if needed, a similar modified recalculation could be done for the last look.)
As can be seen, the power differences are very small. Nonetheless, if it is important to keep the same level of power with high precision (e.g., up to 3 fractional digits), you can plan for (and preregister) to conditionally adjust (increase/decrease) the subsequent interim (if any) and final sample sizes to achieve the same power. (Technically, another option is to conditionally adjust the alpha level, i.e., the T1ER, but that’s rather questionable.)
Go to the next
(and the only other important) POSSA
vignette see how to
record descriptives, easily check varying factors, and test multiple
hypotheses.