The following are simulation models included in the
fdaoutlier
package. Some of these models were curated from
research work related to functional depths and outlier detection for
functional data. This documents presents the model equations as well as
their corresponding functions and parameters in fdaoutlier
.
The parameters of the fdaoutlier
functions have been set to
reasonable default values for ease of use.
This is a typical magnitude model in which outliers are shifted from the ‘normal’ non-outlying observations. The main model is of the form:
\[X_i(t) = \mu t + e_i(t),\] and the contamination model model is of the form:
\[X_i(t) = \mu t + qk_i + e_i(t)\] where:
This model can be accessed with the simulation_model1()
function in fdaoutlier
.
library(fdaoutlier)
dtss <- simulation_model1(n = 100, p = 50, outlier_rate = .1,
seed = 50, plot = F)
The returned object is a list containing a matrix of the data and a vector of the indices of the true outliers:
The simulated data can be tuned using additional parameters to
simulation_model1()
. The following parameters modify the
data generated by simulation_model1()
:
mu
: the coefficient \(\mu\) in the main and contamination models
controlling the mean function.q
: the shift parameter \(q\) in the contamination model which
controls how far the outliers are from the mean function.kprob
: the probability that \(k_i = 1\), i.e., \(P(k_i=1)\) in the contamination modelcov_alpha
: the coefficient \(\alpha\) in the covariance function.cov_beta
: the coefficient \(\beta\) in the covariance function.cov_nu
: the coefficient \(\nu\) in the covariance function.Additional plotting parameters allows for modifying the plot title
(plot_title
), the font size of the title
(title_cex
), toggle on/off the display of the legend
(show_legend
), y-axis label (ylabel
) and
x-axis label (xlabel
).
This model generates non-persistent magnitude outliers, i.e., the outliers are magnitude outliers for only a portion of the domain of the functional data. The main model is of the form: \[X_i(t) = \mu t + e_i(t),\] with contamination model of the form: \[X_i(t) = \mu t + qk_iI_{T_i \le t\le T_i+l } + e_i(t)\] where:
A call to simulation_model2()
generates data from this
model:
Additional parameters of simulation_model3()
to which
arguments can be passed are:
mu
: the coefficient \(\mu\) in the main and contamination models
controlling the mean function.q
: the shift parameter \(q\) in the contamination model which
controls how far the outliers are from the mean function.kprob
: the probability that \(k_i = 1\), i.e., \(P(k_i=1)\) in the contamination model.a
, b
: values specifying the interval \([a,b]\) from which \(T_i\) is drawn in the contamination
model.l
: the value of \(l\)
in the contamination model.cov_alpha
: the coefficient \(\alpha\) in the covariance function.cov_beta
: the coefficient \(\beta\) in the covariance function.cov_nu
: the coefficient \(\nu\) in the covariance function.Additional plotting parameters listed for
simulation_model1()
also applies.
This model generates outliers that are magnitude outliers for a part of the domain. The main model is of the form: \[X_i(t) = \mu t + e_i(t),\] with contamination model of the form: \[X_i(t) = \mu t + qk_iI_{T_i \le t } + e_i(t)\] where:
A call to simulation_model3()
generates data from this
model:
Additional parameters of simulation_model3()
to which
arguments can be passed are:
mu
: the coefficient \(\mu\) in the main and contamination models
controlling the mean function.q
: the shift parameter \(q\) in the contamination model which
controls how far the outliers are from the mean function.kprob
: the probability that \(k_i = 1\), i.e., \(P(k_i=1)\) in the contamination model.a
, b
: values specifying the interval \([a,b]\) from which \(T_i\) is drawn in the contamination
model.cov_alpha
: the coefficient \(\alpha\) in the covariance function.cov_beta
: the coefficient \(\beta\) in the covariance function.cov_nu
: the coefficient \(\nu\) in the covariance function.Additional plotting parameters listed for
simulation_model1()
also applies.
This models generates outliers defined on the reversed interval of the main model. The main model is of the form: \[X_i(t) = \mu t(1 - t)^m + e_i(t),\] with contamination model of the form: \[X_i(t) = \mu(1 - t)t^m + e_i(t)\] where:
A call to simulation_model4()
generates data from this
model:
Additional parameters of simulation_model4()
to which
arguments can be passed are:
mu
: the coefficient \(\mu\) in the main and contamination models
controlling the mean function.m
: the constant \(m\)
in the main and contamination models.cov_alpha
: the coefficient \(\alpha\) in the covariance function.cov_beta
: the coefficient \(\beta\) in the covariance function.cov_nu
: the coefficient \(\nu\) in the covariance function.Additional plotting parameters listed for
simulation_model1()
also applies.
This models generates shape outliers with a different covariance structure from that of the main model. The main model is of the form: \[X_i(t) = \mu t + e_i(t),\] with contamination model of the form: \[X_i(t) = \mu t + \tilde{e}_i(t),\] where:
A call to simulation_model5()
generates data from this
model:
Additional parameters of simulation_model5()
to which
arguments can be passed are:
mu
: the coefficient \(\mu\) in the main and contamination models
controlling the mean function.cov_alpha
: the coefficient \(\alpha\) in the covariance function of
\(e_i(t)\).cov_beta
: the coefficient \(\beta\) in the covariance function of \(e_i(t)\).cov_nu
: the coefficient \(\nu\) in the covariance function of \(e_i(t)\).cov_alpha2
: the coefficient \(\alpha\) in the covariance function of
\(\tilde{e}_i(t)\).cov_beta2
: the coefficient \(\beta\) in the covariance function of \(\tilde{e}_i(t)\).cov_nu2
: the coefficient \(\nu\) in the covariance function of \(\tilde{e}_i(t)\).Additional plotting parameters listed for
simulation_model1()
also applies.
This models generates shape outliers that have a different shape for a portion of the domain. The main model is of the form: \[X_i(t) = \mu t + e_i(t),\] with contamination model of the form: \[X_i(t) = \mu t + (-1)^u\cdot q + (-1)^{(1-u)}\left(\frac{1}{\sqrt{r\pi}}\right)\exp{(-z(t-v)^w)} + e_i(t)\] where:
A call to simulation_model6()
generates data from this
model:
Additional parameters of simulation_model6()
to which
arguments can be passed are:
mu
: the coefficient \(\mu\) in the main and contamination models
controlling the mean function.q
: the constant term \(q\) in the contamination model.kprob
: the probability \(P(u
= 1)\)a
, b
: values specifying the interval of
from which \(v\) in the contamination
model is drawn.pi_coeff
: the constant \(r\) in the contamination model.exp_pow
: the constant \(w\) in the contamination model.exp_coeff
: the constant \(z\) in the contamination model.cov_alpha
: the coefficient \(\alpha\) in the covariance function.cov_beta
: the coefficient \(\beta\) in the covariance function.cov_nu
: the coefficient \(\nu\) in the covariance function.Additional plotting parameters listed for
simulation_model1()
also applies.
This model generates pure shape outliers that are periodic. The main model is of the form: \[X_i(t) = \mu t + e_i(t),\] with contamination model of the form: \[X_i(t) = \mu t + k\sin(r\pi(t + \theta)) + e_i(t),\] where:
A call to simulation_model7()
generates data from this
model:
Additional parameters of simulation_model7()
to which
arguments can be passed are:
mu
: the coefficient \(\mu\) in the main and contamination models
controlling the mean function.cov_alpha
: the coefficient \(\alpha\) in the covariance function of
\(e_i(t)\).cov_beta
: the coefficient \(\beta\) in the covariance function of \(e_i(t)\).cov_nu
: the coefficient \(\nu\) in the covariance function of \(e_i(t)\).sin_coeff
: the coefficient \(k\) in the contamination model.pi_coeff
: the coefficient \(r\) in the contamination model.a
, b
: values specifying the interval of
from which \(\theta\) is to be
drawn.Additional plotting parameters listed for
simulation_model1()
also applies.
This model generates pure shape outliers that are periodic. The main model is of the form: \[X_i(t) = k\sin(r\pi t) + e_i(t),\] with contamination model of the form: \[X_i(t) = k\sin(r\pi t + v) + e_i(t),\] where:
A call to simulation_model8()
generates data from this
model:
Additional parameters of simulation_model7()
to which
arguments can be passed are:
cov_alpha
: the coefficient \(\alpha\) in the covariance function of
\(e_i(t)\).cov_beta
: the coefficient \(\beta\) in the covariance function of \(e_i(t)\).cov_nu
: the coefficient \(\nu\) in the covariance function of \(e_i(t)\).sin_coeff
: the coefficient \(k\) in the main and contamination
model.pi_coeff
: the coefficient \(r\) in the main and contamination
model.constant
: the value of the constant \(v\) in the contamination model.Additional plotting parameters listed for
simulation_model1()
also applies.
Periodic functions with outliers of different amplitude. The main model is of the form: \[X_i(t) = a_{1i}\sin \pi + a_{2i}\cos\pi + e_i(t),\] with contamination model of the form: \[X_i(t) = (b_{1i}\sin\pi + b_{2i}\cos\pi)(1-u_i) + (c_{1i}\sin\pi + c_{2i}\cos\pi)u_i + e_i(t),\] where:
A call to simulation_model9()
generates data from this
model:
Additional parameters of simulation_model9()
to which
arguments can be passed are:
kprob
the probability \(P(u_i
= 1)\)ai
a vector of 2 values containing \(a_{1}\) and \(a_{2}\) indicating the interval from which
\(a_{1i}\) and \(a_{2i}\) are drawn in the main model.bi
a vector of 2 values containing \(b_{1}\) and \(b_{2}\) indicating the interval from which
\(a_{1i}\) and \(a_{2i}\) are drawn in the main model.ci
a vector of 2 values containing \(c_{1}\) and \(c_{2}\) indicating the interval from which
\(c_{1i}\) and \(c_{2i}\) are drawn in the main model.cov_alpha
: the coefficient \(\alpha\) in the covariance function of
\(e_i(t)\).cov_beta
: the coefficient \(\beta\) in the covariance function of \(e_i(t)\).cov_nu
: the coefficient \(\nu\) in the covariance function of \(e_i(t)\).Additional plotting parameters listed for
simulation_model1()
also applies.