mvDFA
This R
package provides an implementation of
multivariate extensions of a well-known fractal analysis technique,
Detrended Fluctuations Analysis (DFA; Peng et al., 1995, doi:10.1063/1.166141), for
multivariate time series: multivariate DFA (mvDFA). Several coefficients
are implemented that take into account the correlation structure of the
multivariate time series to varying degrees. These coefficients may be
used to analyze long memory and changes in the dynamic structure that
would by univariate DFA. Therefore, this R
package aims to
extend and complement the original univariate DFA (Peng et al., 1995)
for estimating the scaling properties of nonstationary time series.
This requires the package devtools
. The following code
installs devtools
, if it is not installed.
if(!"devtools" %in% installed.packages()) install.packages("devtools")
devtools::install_github("jpirmer/mvDFA")
Load the package:
Choose a covariance matrix \(\Sigma\)
Sigma <- Sigma <- matrix(.5, 4, 4)
diag(Sigma) <- 1
Sigma[3,4] <- Sigma[4,3] <- -.3
Sigma
#> [,1] [,2] [,3] [,4]
#> [1,] 1.0 0.5 0.5 0.5
#> [2,] 0.5 1.0 0.5 0.5
#> [3,] 0.5 0.5 1.0 -0.3
#> [4,] 0.5 0.5 -0.3 1.0
Simulate from the covariance matrix with approximate univariate Hurst-exponents \(H:=(.2, .5, .7, .9)'\).
set.seed(2023)
X <- simulate_cMTS(N = 10^3, H = c(.2, .5, .7, .9),
Sigma = Sigma, simulation_process = "FGN0",
decomposition = "chol", cor_increments = FALSE)
head(X)
#> X1 X2 X3 X4
#> 1 -1.1803464 -2.2468738 -1.0811575 -1.15036111
#> 2 0.6675909 -0.4575880 1.4244876 -1.18225387
#> 3 -0.3257142 -0.4014033 -0.1599604 -0.08143119
#> 4 -0.6453245 -1.4622078 0.1136179 -1.48819077
#> 5 0.8467582 0.3872702 0.3847809 0.54190586
#> 6 1.4111376 -0.1111638 0.5180076 0.57863195
simulation_process = "FGN0"
uses
longmemo::simFGN0
to generate independent (Fractional)
Gaussian Processes, which are then mixed using the Cholesky
decomposition \(D\) of \(\Sigma := DD'\). Changing this argument
to simulation_process = "FGN.fft"
uses
longmemo::simFGN.fft
(using FFT) to generate independent
(Fractional) Gaussian Processes, which are then mixed using the Cholesky
decomposition \(D\) of \(\Sigma := DD'\) via
decomposition = "chol"
. The differences may be inspected
here:
x1 <- simulate_cMTS(N = 3*10^2, H = c(.5), Sigma = as.matrix(1),
simulation_process = "FGN0",
cor_increments = FALSE)
plot(x1$X1, main = "H = 0.5 and FGN0", type = "l")
x2 <- simulate_cMTS(N = 3*10^2, H = c(.5), Sigma = as.matrix(1),
simulation_process = "FGN.fft",
cor_increments = FALSE)
plot(x2$X1, main = "H = 0.5 and FGN.fft", type = "l")
Note: the true Hurst Exponents might be slightly off, since the mixing (weighted sums) of the time series makes them slightly more normal (due to the central limits theorem). Hence, small Hurst-exponents (\(H<0.5\)) tend to be upwards biased, while larger Hurst-exponents tend to be downward biased (\(H>0.5\)).
Further, the use of Cholesky might influence the generation of data
in a different way than another decomposition of \(\Sigma\) would. An alternative is the
eigen-value (or singular value) decomposition via
decomposition = "eigen"
. Further research is needed!
Finally, we did not correlated the increments but the whole time
series (cor_increments = FALSE
), if instead increments
should be correlated use cor_increments = FALSE
.
The mvDFA
function needs only the time series in long
format as a matrix
or data.frame
object.
Multiple cores
can be used (maximum is
parallel::detectCores()
), which reduces computation time
drastically for longer timer series. The steps
argument
manipulates the number of window sizes, which are separated
logarithmically. Larger numbers of steps result in more precise
estimates of the extended Hurst exponents, but require more computation
time. The degree
option specifies the degree of the
polynomial of the time trend used for detrending.
mvDFA_result <- mvDFA(X = X, steps = 50, cores = 1, degree = 1)
mvDFA_result
#> $Ltot
#> tot
#> 0.61
#>
#> $Lgen
#> gen
#> 0.585
#>
#> $Lfull
#> varX1 varX2 varX3 varX4 cov1 cov2 cov3 cov4 cov5 cov6
#> 0.175 0.410 0.714 0.689 0.061 0.221 0.006 0.407 0.255 0.828
#>
#> $LmeanUni
#> meanUni
#> 0.497
#>
#> $univariate_DFA
#> uniX1 uniX2 uniX3 uniX4
#> 0.175 0.410 0.714 0.689
The slope of the log-log regression of the root-mean square
fluctuations \(RMS\) vs. window size
\(s\) gives an estimate for the long
memory coefficient \(L\) within the
data. The output argument Ltot
corresponds to the total
variance approach, which uses root-mean of the sum of the diagonal
elements of the covariance matrix \(C_{\nu,s}\) of the detrended fluctuation
object \(Y_{\nu,s}\) within each window
to compute the corresponding root mean square fluctuation \(RMS_\text{tot}(\nu,s)\) averaged across all
the subsequences \(\nu\) of length
\(s\). Lgen
corresponds to
the generalized variance approach, which uses root-mean of the
determinant of the covariance matrix \(C_{\nu,s}\) of the detrended fluctuation
object \(Y_{\nu,s}\) within each window
to compute the corresponding root mean square fluctuation \(RMS_\text{tot}(\nu,s)\) averaged across all
the subsequences \(\nu\) of length
\(s\). Further, we output the
corresponding coefficient for all variances and covariances individually
(see Lfull
), averaged across all univariate time series
(see LmeanUni
) and done for each univariate time series
individually (see univariate_DFA
).
Further arguments hidden in the primary output are the corresponding \(R^2\) per analysis approach:
mvDFA_result[6:10]
#> $R2tot
#> R2tot
#> 0.9938666
#>
#> $R2gen
#> R2gen
#> 0.9960478
#>
#> $R2full
#> R2varX1 R2varX2 R2varX3 R2varX4 R2cov1 R2cov2
#> 0.9778272936 0.9901501773 0.9950418981 0.9929890034 0.0551671849 0.7633510227
#> R2cov3 R2cov4 R2cov5 R2cov6
#> 0.0001347133 0.7475334587 0.4258491832 0.9966240683
#>
#> $R2meanUni
#> meanUni
#> 0.9890021
#>
#> $R2univariate_DFA
#> uniX1 uniX2 uniX3 uniX4
#> 0.9778273 0.9901502 0.9950419 0.9929890
Further, the used RMS objects and the used window sizes can be extracted using the following arguments:
mvDFA_result$S
#> [1] 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [20] 26 27 29 31 33 36 39 42 46 49 53 58 62 67 73 79 85 92 99
#> [39] 107 116 125 135 146 157 170 183 198 214 231 249
mvDFA_result$RMS_tot
#> [1] 1.281689 1.368366 1.458884 1.544731 1.611343 1.653864 1.760905
#> [8] 1.799242 1.863493 1.955846 1.981228 2.028004 2.103870 2.209529
#> [15] 2.231147 2.268106 2.311766 2.325178 2.477286 2.445302 2.591545
#> [22] 2.680717 2.744865 2.859671 2.910058 3.070996 3.440538 3.630025
#> [29] 3.621038 3.828981 4.227114 4.019679 4.464728 4.718095 5.184642
#> [36] 5.506916 5.319111 5.555983 6.547079 6.257566 7.312100 7.144616
#> [43] 8.118529 7.770209 8.364011 9.113272 9.063049 9.330909 10.364417
#> [50] 10.994388
mvDFA_result$RMS_gen
#> [1] 0.007590369 0.012494294 0.019681587 0.027052130 0.035353029
#> [6] 0.043959272 0.055966346 0.067520320 0.082665344 0.095836936
#> [11] 0.109724870 0.135821072 0.147451569 0.166060295 0.191202942
#> [16] 0.199565142 0.226332470 0.252164023 0.286231795 0.308713374
#> [21] 0.372355755 0.402438336 0.478707604 0.556214631 0.669916959
#> [26] 0.791285660 0.963593035 1.199463797 1.342740595 1.618126034
#> [31] 1.932335106 2.152975678 2.815872783 3.601071975 4.125751287
#> [36] 5.101225013 5.545780169 6.389909270 8.286718763 9.964057240
#> [41] 11.879997793 13.619662829 19.178717599 17.568449365 21.982872905
#> [46] 23.353103586 23.653585467 30.137254988 38.556737506 43.790424706
head(mvDFA_result$CovRMS_s)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 0.6348464 0.6552272 0.6348307 0.6382515 0.4455082 0.4456798 0.4522994
#> [2,] 0.6617175 0.7051219 0.6891878 0.6799842 0.4424144 0.4626567 0.4588896
#> [3,] 0.6758102 0.7392815 0.7545125 0.7455177 0.4614919 0.4602390 0.4867858
#> [4,] 0.7077878 0.8026725 0.7664519 0.8083928 0.4723159 0.4610989 0.5212718
#> [5,] 0.7221552 0.8368071 0.8222754 0.8357834 0.4918202 0.4820956 0.5300174
#> [6,] 0.7383051 0.8367540 0.8562739 0.8699477 0.4750978 0.4990387 0.5187805
#> [,8] [,9] [,10]
#> [1,] 0.4488068 0.4666323 0.3455585
#> [2,] 0.4778061 0.4824361 0.3977616
#> [3,] 0.5149633 0.5038502 0.4669152
#> [4,] 0.5056467 0.5741427 0.4922668
#> [5,] 0.5594221 0.5787164 0.5130264
#> [6,] 0.5358736 0.5812485 0.5720075
which may be used to compute the Hurst exponents by hand or to display the log-log-plots. We present this for the total and the generalized approach:
# total
df_tot <- data.frame(mvDFA_result[c("S", "RMS_tot")])
reg_tot <- lm(I(log10(RMS_tot)) ~ 1 + I(log10(S)), data = df_tot)
coef(reg_tot)[2]
#> I(log10(S))
#> 0.610202
mvDFA_result$Ltot
#> tot
#> 0.610202
plot(log10(df_tot))
abline(reg_tot)
# generalized
df_gen <- data.frame(mvDFA_result[c("S", "RMS_gen")])
reg_gen <- lm(I(log10(RMS_gen)) ~ 1 + I(log10(S)), data = df_gen)
coef(reg_gen)[2]/4
#> I(log10(S))
#> 0.5847321
mvDFA_result$Lgen
#> gen
#> 0.5847321
plot(log10(df_gen))
abline(reg_gen)
Peng, C. K., Havlin, S., Stanley, H. E., & Goldberger, A. L. (1995). Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time-series. Chaos, 5, 82–87. doi:10.1063/1.166141