Extracting 3D array using eeguana

The eeguana package (Nicenboim 2018) provides tools for the pre-processing of EEG data. Once the pre-processing is done with eeguana, the 3D matrix/array of the signals should be extracted from the eeguana. This signals are stored in a 3D matrix/array that is the left part of the formula of the brainperm() function to perform the cluster-mass test or TFCE.

The EEG data (Nicenboim 2019) available on the Open Science Framework repository was analyzed using the eeguana package by Nicenboim, Vasishth, and Rösler (2019). We use the same data-set in the following tutorial.

Downloading data from OSF

From the EEG data repository we found the "03_data_summarization.Rmd" file which provides functions to easily download the EEG data. As it is a large dataset (more than 20GB), we only download the pre-processed signals using the ICA. When following the first chunk of the "03_data_summarization.Rmd" file you should have (at least):

  1. A folder data/ containing 5 csv files, 2 tsv files and 2 folders.
  2. The folder data/preproc_ica/ contains 120 RDA files (1 for each subject). Each file contains an eeg_lst object from eeguana with pre-processed EEG data.
  3. The folder data/opensesame_files/ contains 120 csv files (1 for each subject)

R package

In the following analysis, we mainly use functions from the tidyverse (Wickham et al. 2019), from eeguana (Nicenboim 2018), from permuco (Frossard and Renaud 2020b), from permuco4brain (Frossard and Renaud 2020a), from future (Bengtsson 2020) and from igraph (Csardi and Nepusz 2006):

Computing the ERPs

First, we compute ERPs following the design of the experiment and save them in the folder data_erp/. To reduce the number of samples (and the computing time), we delete samples (or time-point) before the event and downsample to 256Hz. Finally, the ERPs are computed by taking the average for each sample, each channel and each experimental condition, which are the interaction between the within participant factors region and cond.

datadir <- "data/preproc_ica/"

lf <- list.files(datadir)

for(i in 1:length(lf)){
  eeg <- readRDS(paste0(datadir,lf[i]))
  erp <- eeg%>%
    summarize_at(channel_names(.), mean, na.rm = TRUE)
  saveRDS(erp, file = paste0("data_erp/",lf[i]))

The previous script should produce 120 files in the folder data_erp/, one for each participant.

The ERP can be combine into 1 eeg_lst object by running:

datadir <- "data_erp/"
lf <- list.files(datadir)

erps <- list()

for(i in 1:length(lf)){
  erps[[i]] <- readRDS(paste0(datadir,lf[i]))%>%
    mutate(subject_nr = as.numeric(strsplit(.recording,"[._]")[[1]][1]))

erps <- do.call("bind",erps)

Now, the erps object contains the ERPs for all participants. It is eeg_lst object which can be manipulated using the functions from eeguana.

Adding the between-participant variables

We download the between-participant variables by combining 4 files. We choose to keep the age, the batch and the occupation as between-participant factors.

df_subj <- bind_rows(read_csv("data/students.csv") %>%
                         mutate(batch =1, student = "yes"),
                       read_csv("data/students_2.csv") %>%
                         mutate(batch =2, student = "yes"),
                       read_csv("data/nonstudents.csv") %>%
                         mutate(batch =1, student = "no"),
                         mutate(batch =2, student = "no"))%>%
  select(subj_order = NR, subject_nr, age = Alter, batch, 
         occupation = student)

The .csv data does not contain the between-participants variables for all participants, so we choose to delete 8 participants from our analysis.

subject_to_rm <- c(21,35,316,111,222,322,98,245)

erps <- 

df_subj <- 

Finally, we merge the between-participants variables to the erps object and also remove the signals from the ocular and reference channels the erps object.

erps <- 
  left_join(df_subj ,by = c("subject_nr"))

chan_to_rm <- c("M1","M2","HEOG","VEOG")
erps <- 

Extracting the 3D matrix

We extract the 3D matrix using the following script. The dimensions are the design \(\times\) the samples \(\times\) the channels.

signal <- 
  mutate(data = map(data,~as.matrix(.x[-1])))%>%
  invoke(abind::abind,.,along = 3)%>%

The design which contains both the between and within variables is simply extracted using the segments_tbl() function. Moreover, we center the covariate as it will make more sense for the test.

design <- 
  mutate(age_c = age - mean(age))

Finally the graph of adjacency is computed using the 32 channels layout data in eeguana and the position_to_graph() function from permuco4brain. Moreover, we can check the adjacency graph using the plot() function from igraph.

layout <- 

graph <- position_to_graph(layout, delta = .75,name = .channel,
                          x = .x, y = .y, z = .z)

You can use igraph::rglplot(graph) to have a 3D representation of the adjacency between channels. Before running the clustermass test, you may want to save the 3 objects: the 3D array containing the signals, the dataframe containing the design and the graph defining the spatial adjacency of the channels.

# save(signal = signal,
#      design = design,
#      graph = graph,file = "signal_design_graph.RData")

Finally, we use permuco4brain for testing the difference between experimental conditions. We have a repeated measures ANCOVA, with 1 covariate (the centered age), 1 between-participants factors (the occupation) and 2 within-participants factors (region and condition). The permutation tests is computed for all samples, all channels while controlling for the family-wise error rate (FWER).

The plan() function from the future package handles multi-cores computing

formula <- signal ~ age_c*occupation*region*cond +Error(.recording/(region*cond))
model <- brainperm(formula = formula, data = design, graph = graph)

By default, the brainperm() function run the cluster-mass test (Maris and Oostenveld 2007). However, the procedure from Troendle (1995) and the TFCE (Smith and Nichols 2009) are also available by specifying the arguments multcomp = "troendle" or multcomp = "tfce".


Bengtsson, Henrik. 2020. A Unifying Framework for Parallel and Distributed Processing in R Using Futures. https://arxiv.org/abs/2008.00553.

Csardi, Gabor, and Tamas Nepusz. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal Complex Systems: 1695. http://igraph.org.

Frossard, Jaromil, and Olivier Renaud. 2020a. Permuco4brain: Extension of Permuco on Time-Space Data, Like Full Scalp Eeg. https://github.com/jaromilfrossard/permuco4brain.

———. 2020b. Permuco: Permutation Tests for Regression, (Repeated Measures) Anova/Ancova and Comparison of Signals.

Maris, Eric, and Robert Oostenveld. 2007. “Nonparametric Statistical Testing of EEG- and MEG-Data.” Journal of Neuroscience Methods 164 (1): 177–90. https://doi.org/10.1016/j.jneumeth.2007.03.024.

Nicenboim, Bruno. 2018. eeguana: A Package for Manipulating EEG Data in R. https://doi.org/10.5281/zenodo.2533138.

———. 2019. “EEG Data.” osf.io/ut7xq.

Nicenboim, Bruno, Shravan Vasishth, and Frank Rösler. 2019. “Are Words Pre-Activated Probabilistically During Sentence Comprehension? Evidence from New Data and a Bayesian Random-Effects Meta-Analysis Using Publicly Available Data.”

Smith, S, and T Nichols. 2009. “Threshold-Free Cluster Enhancement: Addressing Problems of Smoothing, Threshold Dependence and Localisation in Cluster Inference.” NeuroImage 44 (1): 83–98. https://doi.org/10.1016/j.neuroimage.2008.03.061.

Troendle, James F. 1995. “A Stepwise Resampling Method of Multiple Hypothesis Testing.” Journal of the American Statistical Association 90 (429): 370–78. https://doi.org/10.1080/01621459.1995.10476522.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.