This vignette walks through the use of the bdots package for analyzing the bootstrapped differences of time series data. The general workflow will follow three steps:
: During this step, we define the type of curve that will be used to fit our data along with variables to be used in the analysis 2. Curve Refitting : Often, some of the curves returned from the first step have room for improvement. This step allows the user to either quickly attempting refitting a subset of the curves from step one or to manually make adjustments themselves 3. Bootstrap : Having an adequate collection of curves, this function determines the bootstrapped difference, along with computing an adjusted alpha to account for AR1 correlation
This process is represented with three main functions,
bfit -> brefit -> bboot
This package is under active development. The most recent version can
be installed with
devtools::install_github("collinn/bdots").
For our example, we are going to be using eye tracking data from normal hearing individuals and those with cochlear implants using data from the Visual Word Paradigm (VWP).
head(cohort_unrelated)
#> Subject Time DB_cond Fixations LookType Group
#> <num> <int> <int> <num> <fctr> <int>
#> 1: 1 0 50 0.01136364 Cohort 50
#> 2: 1 4 50 0.01136364 Cohort 50
#> 3: 1 8 50 0.01136364 Cohort 50
#> 4: 1 12 50 0.01136364 Cohort 50
#> 5: 1 16 50 0.02272727 Cohort 50
#> 6: 1 20 50 0.02272727 Cohort 50The bfit function will create a curve for each unique
permutation of subject/group variables. Here,
we will let LookType and DB_cond be our
grouping variables, though we may include as many as we wish (or only a
single group assuming that it has multiple values). See
?bfit for argument information.
fit <- bfit(data = cohort_unrelated,
subject = "Subject",
time = "Time",
y = "Fixations",
group = c("DB_cond", "LookType"),
curveFun = doubleGauss(concave = TRUE),
cores = 2)A key thing to note here is the argument for curveType
is passed as a function call with arguments that further specify the
curve. Currently within the bdots package, the available
curves are doubleGauss(concave = TRUE/FALSE),
logistic() (no arguments), and
polynomial(degree = n). While more curves will be added
going forward, users can also specify their own curves, as shown here.
The bfit function returns an object of class
bdotsObj, which inherits from data.table. As
such, this object can be manipulated and explored with standard
data.table syntax. In addition to the subject and the
grouping columns, we also have a fit column, containing the
fit from the gnls package, a value for R2, a
boolean indicating AR1 status, and a final column for
fitCode. The fit code is a numeric quantity representing
the quality of the fit as such:
| fitCode | AR1 | R2 |
|---|---|---|
| 0 | TRUE | R2 > 0.95 |
| 1 | TRUE | 0.8 < R2 < 0.95 |
| 2 | TRUE | R2 < 0.8 |
| 3 | FALSE | R2 > 0.95 |
| 4 | FALSE | 0.8 < R2 < 0.95 |
| 5 | FALSE | R2 < 0.8 |
| 6 | NA | NA |
A fitCode of 6 indicates that a fit was not able to be
made.
In addition to plot and summary functions,
we also have a method to return a matrix of coefficients from the model
fits. Because of the data.table syntax, we can examine
subsets of this object as well
head(coef(fit))
#> mu ht sig1 sig2 base1 base2
#> [1,] 417.6899 0.1986711 145.5628 323.1882 0.01586359 0.03412371
#> [2,] 636.8447 0.2632815 306.2330 214.9787 -0.02154794 0.02858644
#> [3,] 647.5295 0.2547779 496.6745 256.4257 -0.18223561 0.01217570
#> [4,] 734.1526 0.2585742 405.6348 240.2926 -0.05751245 0.03455280
#> [5,] 501.1948 0.2258572 398.7759 158.6752 -0.16159471 0.02529158
#> [6,] 460.7152 0.3067659 382.7322 166.0833 -0.24330879 0.03992168
head(coef(fit[DB_cond == 50, ]))
#> mu ht sig1 sig2 base1 base2
#> [1,] 417.6899 0.1986711 145.5628 323.1882 0.01586359 0.034123710
#> [2,] 647.5295 0.2547779 496.6745 256.4257 -0.18223561 0.012175704
#> [3,] 501.1948 0.2258572 398.7759 158.6752 -0.16159471 0.025291582
#> [4,] 524.7172 0.2479396 287.9935 207.0812 -0.05806979 0.104935655
#> [5,] 549.6930 0.2273107 204.0645 229.6397 -0.01001132 0.028786272
#> [6,] 584.5835 0.1594028 226.3004 420.4861 0.01182188 0.006920171The plots for this object will compare the observed data with the fitted curve. Here is an example of the first four:
plot(fit[1:4, ])
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> ℹ The deprecated feature was likely used in the bdots package.
#> Please report the issue at <https://github.com/collinn/bdots/issues>.
#> This warning is displayed once per session.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.Depending on the curve type and the nature of the data, we might find
that a collection of our fits aren’t very good, which may impact the
quality of the bootstrapping step. Using the brefit
function, users have the option to either quickly attempt to
automatically refit specified curves or to manually review each one and
offer alternative starting parameters. The fitCode argument
provides a lower bound for the fit codes to attempt refitting. The
default is fitCode = 1, indicating that we wish to attempt
refitting all curves that did not have fitCode == 0. The
object returned is the same as that returned by bfit.
## Quickly auto-refit (not run)
refit <- brefit(fit, fitCode = 1L, quickRefit = TRUE)
## Manual refit (not run)
refit <- brefit(fit, fitCode = 1L)For whatever reason, there are some data will will not submit nicely
to a curve of the specfied type. One can quickly remove all observations
with a fit code equal to or greater than the one provided in
bdRemove
table(fit$fitCode)
#>
#> 0 1 2
#> 20 14 2
## Remove all failed curve fits
refit <- bdRemove(fit, fitCode = 6L)
table(refit$fitCode)
#>
#> 0 1 2
#> 20 14 2There is an additional option, removePairs which is
TRUE by default. This indicates that if an observation is
removed, all observations for the same subject should also be removed,
regardless of fit. This ensures that all subjects have their
corresponding pairs in the bootstrapping function for the use of the
paired t-test. If the data are not paired, this can be set to
FALSE.
The final step is the bootstrapping process, performed with
bboot. First, let’s examine the set of curves that we have
available from the first step
: Here, we are interested specifically in the difference between two
fitted curves. For our example case here, this may be the difference
between curves for DB_cond == 50 and
DB_cond == 65 nested within either the Cohort
or Unrelated_Cohort LookTypes (but not
both).
: In this case, we are considering the difference of two difference
curves similar to the one found above. For example, we may denote the
difference between DB_cond 50 and
65 within the Corhort group as \(\text{diff}_{\text{Cohort}}\) and the
differences between DB_cond 50 and
65 within Unrelated_Corhort as \(\text{diff}_{\text{UnrelatedCohort}}\). The
difference of difference function will then return an analysis of \(\text{diff}_{\text{Cohort}}\) - \(\text{diff}_{\text{UnrelatedCohort}}\)
We can express the type of curve that we wish to fit with a modified formula syntax. It’s helpful to read as “the difference of LHS between elements of RHS”
For the first type, we have
## Only one grouping variable in dataset, take bootstrapped difference
Outcome ~ Group1(value1, value2)
## More than one grouping variable in difference, must specify unique value
Outcome ~ Group1(value1, value2) + Group2(value3)That is, we might read this as “difference of Outcome for value1 and value2 within Group1.”
With our working example, we would find the difference of
DB_cond == 50 and DB_cond == 65 within
LookType == "Cohort" with
For this second type of curve, we specify an “inner difference” to be
the difference of groups for which we are taking the difference of. The
syntax for this case uses a diffs function in the
formula:
## Difference of difference. Here, outer difference is Group1, inner is Group2
diffs(Outcome, Group2(value3, value4)) ~ Group1(value1, value2)
## Same as above if three or more grouping variables
diffs(Outcome, Group2(value3, value4)) ~ Group1(value1, value2) + Group3(value5)For the example illustrated in (2) above, the difference \(\text{diff}_{50} - \text{diff}_{65}\)
represents our inner difference, each nested within one of the values
for LookType. The “outer difference” is then difference of
these between LookTypes. The syntax here would be
Here, we show a fit for each
boot1 <- bboot(formula = Fixation ~ DB_cond(50, 65) + LookType(Cohort),
bdObj = refit,
Niter = 1000,
alpha = 0.05,
padj = "oleson",
cores = 2)
boot2 <- bboot(formula = diffs(Fixation, LookType(Cohort, Unrelated_Cohort)) ~ DB_cond(50, 65),
bdObj = refit,
Niter = 1000,
alpha = 0.05,
padj = "oleson",
cores = 2)
#> Warning in bboot(formula = diffs(Fixation, LookType(Cohort, Unrelated_Cohort))
#> ~ : Permutation testing does not yet work for difference of difference
#> analysis. Switching to padj='oleson' insteadFor each, we can then produce a model summary, as well as a plot of difference curves
summary(boot1)
#>
#> bdotsBoot Summary
#>
#> Curve Function: doubleGauss
#> Formula: Fixations ~ (Time < mu) * (exp(-1 * (Time - mu)^2/(2 * sig1^2)) * (ht - base1) + base1) + (mu <= Time) * (exp(-1 * (Time - mu)^2/(2 * sig2^2)) * (ht - base2) + base2)
#> Time Range: (0, 2000) [501 points]
#>
#> Difference of difference: FALSE
#> Paired t-test: TRUE
#> Difference: DB_cond -- 50 65
#>
#> Autocorrelation Estimate:
#> FWER adjust method: oleson
#> Alpha: 0.05
#> Adjusted alpha:
#> Significant Intervals:
#> NULL
plot(boot1)