The NeuroDecodeR (NDR) uses two very similar data formats called
raster format
and binned format
. For almost
all analysis, one starts by saving data from each site in
raster format
. One then converts the data to
binned format
using the create_binned_data()
which has the data from all the sites in a single data frame at a
coarser temporal resolution that is stored in a single file. The
binned format
data is then used in all subsequent decoding
analyses. More information about what is required to have data in these
specified formats is described below.
Raster format data contains the data at the highest temporal resolution. For raster data, there is a separate file that contains a data frame of data each site (e.g., for electrophysiology experiments there is a separate file for each single neuron, for EEG experiments there is a separate file for each EEG channel, etc.). The reason for having data from each site in a separate file is to prevent memory from running out of memory when trying to load data from many sites when the data is at a high temporal resolution.
For raster format data, the number of rows in the data frame correspond to the number of trials in the experiment. Data that is in raster format is a data frame that must contain variables (columns) that start with the following prefixes:
labels.XXX
These variables contain labels of which
experimental conditions were shown on a given trial.
time.XXX_YYY
These variables contain the data for a
given time, where where XXX is the start time of the data in a
particular bin and YYY is the end time. The time interval should be
specified such in the form [XXX, YYY), so that the start time is a
closed interval and the end time is an open interval. Thus, for data
that is recording continuously, the value YYY of one bin, would be equal
to the value XXX for the next bin (e.g., time.100_101
,
time.101_102
, time.102_103
, etc.).
There can also be two additional optional variables in a raster format data frame which are:
site_info.XXX
These variables contain additional
meta data about the site. For example, one could have a variable called
site_info.brain_area
which indicates which brain region a
given site came from. All rows for a given site_info.XXX
variable typically have the same value.
trial_number
This variable specifies a unique number
for each row indicating which trial a given row of data came from. This
is useful for data where all sites were recorded simultaneously in order
to allow one to do the decoding on actual simultaneously recorded data
(e.g., by using the ds_basic()
create_simultaneously_recorded_populations
argument).
The class attribute for data in raster format should be set to
attr(raster_data, "class") <- c("raster_data", "data.frame")
.
This will enable the plot() function to correctly plot data in raster
format.
To test whether data correctly conforms to the requirements of raster
format, one can use the internal function
NeuroDecodeR:::test_valid_raster_data_format()
.
Below is an example of raster format data file from the Zhang-Desimone 7 object data set.
raster_dir_name <- file.path(system.file("extdata", package = "NeuroDecodeR"), "Zhang_Desimone_7object_raster_data_small_rda")
full_file_name <- file.path(raster_dir_name, "bp1001spk_01A_raster_data.rda")
# test the file is in valid raster format
NeuroDecodeR::test_valid_raster_format(full_file_name)
# load the data to see the variables in it
load(full_file_name)
head(raster_data[, 1:10])
## site_info.session_ID site_info.recording_channel site_info.unit
## 1 1001 1 A
## 2 1001 1 A
## 3 1001 1 A
## 4 1001 1 A
## 5 1001 1 A
## 6 1001 1 A
## labels.combined_ID_position labels.stimulus_position labels.stimulus_ID
## 1 hand_upper upper hand
## 2 flower_middle middle flower
## 3 guitar_middle middle guitar
## 4 face_upper upper face
## 5 kiwi_middle middle kiwi
## 6 couch_upper upper couch
## time.-500_-499 time.-499_-498 time.-498_-497 time.-497_-496
## 1 0 0 0 0
## 2 0 0 0 0
## 3 0 0 0 0
## 4 0 0 0 0
## 5 0 0 0 0
## 6 0 0 0 0
Binned format data contains data from multiple sites (e.g., data from
many neurons, EEG channels, etc.). Data that is in binned format is very
similar to data that is in raster format except that it contains
information from multiple sites and usually contains the information at
a coarser temporal resolution. For example, binned data would typically
contain firing rates over some time interval sampled at a lower rate, as
opposed to raster format data that would typically contain individual
spikes sampled at a higher rate. Binned format data is typically created
from raster format data using the function
create_binned_data()
which converts a directory of raster
format files into a binned-format file that is used in subsequent
decoding analyses.
Binned format data must be in a data frame where the number of rows in the data frame correspond to the number of trial in all experimental recording sessions across all sites. The binned format data frame must also contain the variables that start with the following prefixes:
siteID.XXX
A unique number indicating a site a given
row of data corresponds to. These are typically automatically generated
by the create_binned_data()
function.
labels.XXX
These variables contain labels of which
experimental conditions occurred on a given trial. These are typically
copied from the raster data when create_binned_data()
is
called.
time.XXX_YYY
These variables contain data in a time
range from [XXX, YYY). These values are typically derived from the
raster data time.XXX_YYY
values when the
create_binned_data()
is called.
There can also be two additional optional variables in a binned format data frame which are:
site_info.XXX
These variables contain additional
meta data out the site. For example, one could have a variable called
site_info.brain_area
which indicated which brain region a
given site came from.
trial_number
This is a variable that specifies a
unique for each row indicating which trial a given row of data came
from. This is useful for data where all sites were recorded
simultaneously in order to allow one to do the decoding on actual
simultaneously recorded data (e.g., by using the ds_basic()
create_simultaneously_recorded_populations
argument).
To test whether data correctly conforms to the requirements of binned
format, one can use the internal function
NeuroDecodeR:::test_valid_binned_data_format()
.
Below is an example of binned format data file from the Zhang-Desimone 7 object data set.
binned_file_name <- system.file("extdata/ZD_150bins_50sampled.Rda", package="NeuroDecodeR")
# test the file is in valid binned format using an internal function
NeuroDecodeR:::test_valid_binned_format(binned_file_name)
# load the data to see the variables in it
load(binned_file_name)
head(binned_data[, 1:10])
## siteID site_info.session_ID site_info.recording_channel site_info.unit
## 1 1 1001 1 A
## 2 1 1001 1 A
## 3 1 1001 1 A
## 4 1 1001 1 A
## 5 1 1001 1 A
## 6 1 1001 1 A
## labels.combined_ID_position labels.stimulus_position labels.stimulus_ID
## 1 hand_upper upper hand
## 2 flower_middle middle flower
## 3 guitar_middle middle guitar
## 4 face_upper upper face
## 5 kiwi_middle middle kiwi
## 6 couch_upper upper couch
## time.-500_-350 time.-450_-300 time.-400_-250
## 1 0.006666667 0.013333333 0.020000000
## 2 0.000000000 0.006666667 0.006666667
## 3 0.000000000 0.000000000 0.000000000
## 4 0.000000000 0.000000000 0.000000000
## 5 0.000000000 0.000000000 0.000000000
## 6 0.000000000 0.006666667 0.006666667