Quick start notebook for HePPCAT.jl
Step 1: load the package.
using HePPCAT
Step 2: form data into a Vector
with Matrix
elements - each matrix is a group of samples (columns) having equal noise variance.
100×800 Array{Float64,2}: 3.07391 -0.98157 -0.699786 2.6159 … 0.550283 -1.2851 -1.1412 -4.50313 4.68978 -2.07655 0.665155 0.539257 4.68418 -1.32645 -1.7521 -0.890531 -2.83606 -2.9092 0.0711276 -2.11742 -2.8706 2.86943 -4.41902 -0.077702 0.147607 -0.500605 0.412021 -2.69618 -2.92047 -1.00833 -2.31805 -2.68526 -1.30343 0.0444585 -1.88006 0.0287089 0.0713829 0.903092 -3.61745 … 4.60365 -1.55906 -0.713535 -1.73201 -1.44542 0.788838 1.34847 -2.8036 -1.55413 3.3247 ⋮ ⋱ -3.4713 0.830812 -4.29106 0.154452 2.35659 0.647456 1.36104 6.60686 -1.40712 0.21455 -3.35208 … 2.9285 -0.655483 2.03497 -2.12712 0.608553 -1.71688 -4.31894 4.6775 -0.917588 -5.22581 -2.88758 -5.92654 0.172228 -0.556818 2.56659 -3.76559 -1.66873 -4.67977 -4.4196 -3.76059 0.260593 1.16646 -0.332024 -0.779526 -0.122026 2.45712 2.13976 1.77679 -4.94248 -0.604571 -1.58459
100×200 Array{Float64,2}: 0.203052 -0.00674578 -0.119656 … 0.0856759 -0.151344 -0.0163422 -0.0988053 0.035819 0.159803 -0.00993912 -0.0588247 0.158511 0.152398 -0.0689631 0.0267519 0.233296 0.0522925 -0.058361 0.229074 -0.040401 -0.118963 0.157122 -0.0157249 0.00401438 -0.0746297 0.0808521 0.192419 -0.126884 0.00859645 -0.0592678 0.0907297 -0.0831365 -0.158995 … 0.0872615 0.0208823 -0.169194 0.259222 0.0932636 -0.218621 0.165271 -0.227068 -0.0410862 ⋮ ⋱ 0.077498 -0.0552509 0.0836566 0.0425496 -0.0678956 -0.107604 -0.0532653 0.0727037 -0.0164148 … -0.0887275 -0.131049 0.025677 -0.300898 -0.0414545 0.00292205 -0.0864283 0.00120629 0.090873 -0.141627 -0.0181705 0.0522777 -0.135203 -0.136556 -0.112361 0.283674 0.0192399 -0.158253 -0.0215563 -0.0527849 0.0250238 -0.219649 -0.061693 0.0747251 0.0243852 0.166031 0.124621
x
begin # Example: generating synthetic data with heteroscedastic noise
d = 100 # number of features (ambient dimension)
k = 1 # number of factors (latent dimension)
n = [800,200] # number of samples in each group/block
v = [10,0.01] # noise variance for each group/block
L = length(n) # number of groups/blocks
F = randn(d,k)/sqrt(d) # synthetic factor matrix
X = [F*randn(k,n[l]) + sqrt(v[l])*randn(d,n[l]) for l in 1:L] # synthetic data
end
This code generates an example synthetic dataset with heteroscedastic noise: the n[1]
= 800 samples in X[1]
have noise variance v[1]
= 10.0, and the n[2]
= 200 samples in X[2]
have noise variance v[2]
= 0.01.
Step 3: run the heppcat
method.
100×1 Array{Float64,2}: 0.08545307301599947 -0.04131315824148372 -0.040369724823846105 0.14167175171076196 -0.12541484420891294 0.04727082503427198 0.1559739967622669 ⋮ 0.003190286398210981 0.0016334942077702972 -0.09775005299825644 -0.007695169512153544 0.08897560084927778 -0.10818750504322917
1.24644
1×1 Array{Float64,2}: 1.0
10.0689
0.00997343
xxxxxxxxxx
model = heppcat(X,k,100) # run 100 iterations
For more info, use ?heppcat
to access the docstring.
Step 4: extract factor matrix and noise variance estimates.
100×1 Array{Float64,2}:
0.09540347619500922
-0.04612378197439529
-0.045070492438695216
0.15816842057057826
-0.14001851170110163
0.052775176733916136
0.17413606043592242
⋮
0.0035617725811919786
0.0018237030017226028
-0.10913235212189487
-0.008591217324997152
0.09933617736569156
-0.1207851712957901
xxxxxxxxxx
model.F
10.0689
0.00997343
xxxxxxxxxx
model.v
For more info about the returned datatype use ?HePPCATModel
for the docstring.
Evaluating recovery: check recovery of the factor covariance and noise variances.
xxxxxxxxxx
using LinearAlgebra
0.517880090519894
xxxxxxxxxx
norm(model.F*model.F' - F*F')/norm(F*F') # normalized estimation error
2×2 Array{Float64,2}:
10.0689 10.0
0.00997343 0.01
xxxxxxxxxx
[model.v v]
Compared with homoscedastic PPCA:
100×1 Array{Float64,2}: -0.019637630403928213 -0.022884783218402385 0.10709578399944776 -0.03466448445995213 -0.07889845223748505 0.03648735232319211 -0.04349089258619432 ⋮ -0.037032139405059894 -0.02281947747300657 0.058604357203842855 -0.014297849239765969 0.15139590399091565 0.006202155963090815
6.45015
1×1 Array{Float64,2}: 1.0
8.00306
8.00306
xxxxxxxxxx
ppca = heppcat(X,k,0) # initialization is homoscedastic PPCA
7.769109021285145
xxxxxxxxxx
norm(ppca.F*ppca.F' - F*F')/norm(F*F') # normalized estimation error
2×2 Array{Float64,2}:
8.00306 10.0
8.00306 0.01
xxxxxxxxxx
[ppca.v v]
HePPCAT
accounts for heterogeneous quality among the samples and is generally more robust.