#Agnostic Bayes Ensemble Documentation

Overview

I have to thank my employer Auticon Berlin for letting me develop this package in my working time. Agnostic Bayes Ensemble is thought to be basis technology, that will be refined over time, furthermore it forms one pillar of a upcoming machine learning framework, which is supposed to consist of three broad branches:

  • cleaning and transformation of datasets.
  • ensemble algorithms.
  • general applicable meta parameter learning.

There are minimal requirements regarding the installation and usage of this package. Right now, the only prerequisite is running on a machine with Julia 1.X installed. However in the upcoming releases GPU support in form of CUDA will be integrated, from there on out, CUDA-DEV-Kit will become a prerequisite.

This package has been developed to facilitate increased predictive performance, by combining raw base models in an agnostic fashion, i.e. the methods don’t use any assumption regarding the used raw models. Furthermore, we specifically implemented ensemble algorithms that can deal with arbitrary loss function and with regression and classification problems, this holds true for all, except for the dirichletPosterior estimation algorithm, which is limited to classification problems.

The algorithms bootstrapPosteriorEstimation, bootstrapPosteriorCorEstimation, dirichletPosteriorEstimation, TDistPosteriorEstimation infer an actual posterior distribution.

The algorithms δOptimizationMSE , δOptimizationHinge , δOptimizationHingeRegularized, δOptimizationMSERegularized do not, these algorithms are inferring mixing coefficients not required to be true probability distributuions .

Hint: In most cases it is advisable to deactivate Hyperthreading for best performance. However, in some rare cases – depending on the (hardware) platform the package runs on- you will get the best performance with Hyperthreading enabled, to be sure, it is best practice to measure the performance with and without Hyperthreading.

generic methods

make a prediction given trained mixing coefficients and input Matrix.

AgnosticBayesEnsemble.predictEnsembleFunction
predictEnsemble( predictions::Matrix{Float64}, weights::Vector{Float64} )




perform bayesian ensemble prediction.
#Arguments
- `predictions::Matrix{Float64}`: each column is the prediction of one hypothesis.
- `weights::Vector{Float64}`:     mixing weights.
#Return
- `Vector{Float64}`:              prediction y.
source

predictEnsemble( predictions::Vector{Matrix{Float64}}, weights::Vector{Float64} )

perform bayesian ensemble prediction. #Arguments

  • predictions::Vector{Matrix{Float64}}: each matrix is the prediction of one hypothesis.
  • weights::Vector{Float64}: mixing weights.

#Return

  • Vector{Float64}: prediction y.
source

list of algorithms

basic algorithm for computing a true posterior distribution using bootstrap sampling and arbitrary loss functions.

AgnosticBayesEnsemble.bootstrapPosteriorEstimationFunction
bootstrapPosteriorEstimation( errMat::Matrix{Float64}, samplingBatchSize::Int64, nrRuns::Int64 )




compute posterior p( h* = h | S ).
#Arguments
- `errMat::Matrix{Float64}}`: each column is the prediction error of one hypothesis.
- `samplingBatchSize::Int64`: sample size per main iteration.
- `nrRuns::Int64`:            number of passes over predictions.
#Return
- `Vector{Float64}`:          Distribution p( h* = h | S ).
source

basic algorithm for computing a true posterior distribution using bootstrap sampling and arbitrary loss functions, parameter return version.

AgnosticBayesEnsemble.bootstrapPosteriorEstimation!Function
bootstrapPosteriorEstimation!( errMat::Matrix{Float64}, samplingBatchSize::Int64, nrRuns::Int64, p::Array{Float64} )



compute posterior p( h* = h | S ).
#Arguments
- `errMat::Matrix{Float64}}`: each column is the prediction error of one hypothesis.
- `samplingBatchSize::Int64`: sample size per main iteration.
- `nrRuns::Int64`:            number of passes over predictions.
- `p::Vector{Float64}`:       resulting posterior p( h* = h | S ).
#Return
- `nothing`:                  nothing.
source

basic algorithm for computing a true posterior distribution using bootstrap sampling and the linear correlation.

AgnosticBayesEnsemble.bootstrapPosteriorCorEstimationFunction
bootstrapPosteriorCorEstimation( predictions::Matrix{Float64}, t::Vector{Float64}, samplingBatchSize::Int64, nrRuns::Int64 )




compute posterior p( h* = h | S ).
#Arguments
- `predictions::Matrix{Float64}`: each column is the prediction of one hypothesis.
- `t::Vector{Float64}`:           label vector.
- `samplingBatchSize::Int64`:     sample size per main iteration.
- `nrRuns::Int64`:                number of main  iterations.
#Return
- `Vector{Float64}`:              posterior p( h* = h | S ).
source
bootstrapPosteriorCorEstimation( predictions::Vector{Matrix{Float64}}, T::Matrix{Float64}, samplingFactor::Float64, nrRuns::Int64 )




compute posterior p( h* = h | S ).
#Arguments
- `predictions::Matrix{Float64}`: each column is the prediction of one hypothesis.
- `T::Matrix{Float64}`:           label matrix.
- `samplingBatchSize::Int64`:     sample size per main iteration.
- `nrRuns::Int64`:                number of main  iterations.
#Return
- `Vector{Float64}`:              posterior p( h* = h | S ).
source

advanced algorithm, probabilistic inference using a dirichlatian prior.

AgnosticBayesEnsemble.dirichletPosteriorEstimationFunction
dirichletPosteriorEstimation( errMat::Matrix{Float64}, G::Matrix{Float64}, nrRuns::Int64, α_::Float64 )




compute posterior p( h* = h | S ).
# Arguments
- `errMat::Matrix{Float64}`: each column is the prediction error of one hypothesis.
- `G::Matrix{Float64}`:      transformation matrix G.
- `nrRuns::Int64`:           number of sampling runs.
- `α_::Float64`:             scalar prior parameter.
- `sampleSize::Int64`:       number of samples per run.
# Return
- `Vector{Float64}`:         posterior distribution
source
dirichletPosteriorEstimation( errMat::Matrix{Float64}, nrRuns::Int64, α_::Float64 )




compute posterior p( h* = h | S ).
#Arguments
- `errMat::Matrix{Float64}`: each column is the prediction error of one hypothesis.
- `nrRuns::Int64`:           number of main  iterations.
- `α_::Float64`:             scalar prior parameter.
#Return
- `Vector{Float64}`:         posterior p( h* = h | S ).
source

advanced algorithm, probabilistic inference using a dirichlatian prior, improved performance and hardware usage under certain parameters.

AgnosticBayesEnsemble.dirichletPosteriorEstimationV2Function
dirichletPosteriorEstimationV2( errMat::Matrix{Float64}, G::Matrix{Float64}, nrRuns::Int64, α_::Float64, sampleSize::Int64 )




compute posterior p( h* = h | S ), alternative version for improved performance.
# Arguments
- `errMat::Matrix{Float64}`: each column is the prediction error of one hypothesis.
- `G::Matrix{Float64}`:      transformation matrix G.
- `nrRuns::Int64`:           number of sampling runs.
- `α_::Float64`:             scalar prior parameter.
- `sampleSize::Int64`:       number of samples per run.
# Return
- `Vector{Float64}`:         posterior distribution posterior p( h* = h | S ).
source
dirichletPosteriorEstimationV2( errMat::Matrix{Float64}, nrRuns::Int64, α_::Float64, sampleSize::Int64 )




compute posterior p( h* = h | S ), alternative version for improved performance.
# Arguments
- `errMat::Matrix{Float64}`: each column is the prediction of one hypothesis.
- `nrRuns::Int64`:           number of sampling runs.
- `α_::Float64`:             scalar prior parameter.
- `sampleSize::Int64`:       number of samples per run.
# Return
- `Vector{Float64}`:         posterior distribution p( h* = h | S ).
source

precomputation of the transformation Matrix G, should be precomputed once, if bootstrapPosteriorCorEstimation gets called several times.

AgnosticBayesEnsemble.GMatrixFunction
GMatrix( d::Int64 )




compute transformation matrix G.
#Arguments
- `d::Int64`:        number of hypothesis used for prediction.
#Return
- `Matrix{Float64}`: transformation matrix G.
source

advanced algorithm, probabilistic inference using a dirichlatian prior, parameter return version.

AgnosticBayesEnsemble.dirichletPosteriorEstimation!Function
dirichletPosteriorEstimation!( errMat::Matrix{Float64}, nrRuns::Int64, α_::Float64, p::Vector{Float64} )




compute posterior p( h* = h | S ).
#Arguments
- `errMat::Matrix{Float64}`: each column is the prediction error of one hypothesis.
- `nrRuns::Int64`:           number of passes over predictions.
- `α_::Float64`:             meta parameter value.
- `p::Vector{Float64}`:      return value posterior p( h* = h | S ).
#Return
- `nothing`:                 nothing.
source

parameter search for prior parameter α.

AgnosticBayesEnsemble.metaParamSearchValidationDirichletFunction
metaParamSearchValidationDirichlet( Y::Matrix{Float64}, t::Vector{Float64}, nrRuns::Int64, minVal::Float64, maxVal::Float64, nSteps::Int64, holdout::Float64, lossFunc )    




compute best α parameter regarding predictive performance.
#Arguments
- `Y::Matrix{Float64}`: each column is the prediction error of one hypothesis.
- `t::Vector{Float64}`: label vector.
- `nrRuns::Int64`:      number of passes over predictions.
- `minVal::Float64`:    minimum value of α.
- `maxVal::Float64`:    maximum value of α.
- `nSteps::Int64`:      number of steps between min and max val.
- `holdout::Float64`:   percentage used in holdout.
- `lossFunc`:           error function handle.
#Return
- `Vector{Float64} x2`: α_sequence, performance_sequence.
source

advanced algorithm, probabilistic inference using a T-distribution prior.

AgnosticBayesEnsemble.TDistPosteriorEstimationFunction
TDistPosteriorEstimation( errMat::Matrix{Float64}, nrRuns::Int64; [κ_0::Float64=1.0] [, v_0::Float64=Float64( size( errMat, 2 ) )] [, α::Float64=0.5] [, β::Float64=0.25] )




compute posterior p( h* = h | S ).
#Arguments
- `errMat::Matrix{Float64}`:                   each column is the prediction error of one hypothesis.
- `nrRuns::Int64`:                             number of main  iterations.
- `κ_0::Float64=1.0`:                          regularization param.
- `v_0::Float64=Float64( size( errMat, 2 ) )`: regularization param.
- `α::Float64=0.5`:                            regularization param.
- `β::Float64=0.25`:                           regularization param.
#Return
- `Vector{Float64}`:                           posterior p( h* = h | S ).
source

advanced algorithm, probabilistic inference using a T-distribution prior, reference algorithm.

AgnosticBayesEnsemble.TDistPosteriorEstimationReferenceFunction
TDistPosteriorEstimationReference( errMat::Matrix{Float64}, nrRuns::Int64 )




compute posterior p( h* = h | S ).
#Arguments
- `errMat::Matrix{Float64}`: each column is the prediction error of one hypothesis.
- `nrRuns::Int64`:                number of main  iterations.
#Return
- `Vector{Float64}`:              posterior p( h* = h | S ).
source

refine tuning algorithms

given a solution for the ensemble learning problem, this method seeks to further improve the solution by refining it using unconstrainted optimization under Mean Squared Error loss function.

The resulting solutions aren't guaranteed to be valid probability distributions.

AgnosticBayesEnsemble.directOptimNaiveMSEFunction
directOptimNaiveMSE( predMat::Matrix{Float64}, t::Vector{Float64}, p::Vector{Float64} )




compute refined solution _for_ mixing parameter p.
#Arguments
- `predMat::Matrix{Float64}`: each column is the prediction _of_ one hypothesis.
- `t::Vector{Float64}`:       label vector.
- `p::Vector{Float64}`:       initial solution for mixing coefficients.
#Return
- `Vector{Float64}`:          improved initial solution.
source

given a solution for the ensemble learning problem, this method seeks to further improve the solution by refining it using unconstrainted optimization under Hinge loss function.

AgnosticBayesEnsemble.directOptimHingeFunction
directOptimHinge( predMat::Matrix{Float64}, t::Vector{Float64}, p::Vector{Float64} )




compute refined solution _for_ mixing parameter p.
#Arguments
- `predMat::Matrix{Float64}`: each column is the prediction _of_ one hypothesis.
- `t::Vector{Float64}`:       label vector.
- `p::Vector{Float64}`:       initial solution for mixing coefficients.
#Return
- `Vector{Float64}`:          improved initial solution.
source

Tutorials

low level Interface

The Interface was designed to be easy to use, therefore all parameters needed by the algorithms in the package are either y1, y2, y3, …, yk the predictions per raw model along with the label vector T, Or alternatively e1, e2, e3, …, ek the error between predicted and real labels and ground truth T. Some of the methods need additional (prior-) parameters, however this simple basic structure is consistent along all implemented ensemble methods in this package. ___

Examples

"""


using AgnosticBayesEnsemble
using DataFrames
using Random
using Statistics
using StaticArrays
using Optim
using MultivariateStats



#== create artificial predictions and ground truth ==#
function distortBinaryPrediction( y::BitArray{1}, distortionFactor::Float64 )
    res          = deepcopy( y );   
    indices      = rand( 1:1:size( y, 1 ), round( Int64, distortionFactor * size( y, 1 ) ) );
    res[indices] = .!y[indices];
    return res;
end

n    = 100000;
y    = Bool.( rand( 0:1,n ) );
yH1  = distortBinaryPrediction( y, 0.20 );
yH2  = distortBinaryPrediction( y, 0.21 );
yH3  = distortBinaryPrediction( y, 0.22 );
yH4  = distortBinaryPrediction( y, 0.23 );
yH5  = distortBinaryPrediction( y, 0.24 );
yH6  = distortBinaryPrediction( y, 0.24 );
yH7  = distortBinaryPrediction( y, 0.26 );
yH8  = distortBinaryPrediction( y, 0.27 );
yH9  = distortBinaryPrediction( y, 0.28 );
yH10 = distortBinaryPrediction( y, 0.29 );
yH11 = distortBinaryPrediction( y, 0.30 );
yH12 = distortBinaryPrediction( y, 0.33 );
yH13 = distortBinaryPrediction( y, 0.34 );
yH14 = distortBinaryPrediction( y, 0.35 );
yH15 = distortBinaryPrediction( y, 0.36 );
yH16 = distortBinaryPrediction( y, 0.37 );
y    = Float64.( y );

#== split generated prediction set into disjoint sets eval and train==#
limit           = round( Int64, 0.7 * size( y, 1 ) );
predictions     = DataFrame( h1=yH1, h2=yH2, h3=yH3, h4=yH4, h5=yH5, h6=yH6, h7=yH7, h8=yH8, h9=yH9, h10=yH10, h11=yH11, h12=yH12, h13=yH13, h14=yH14, h15=yH15, h16=yH16 );
predTraining    = predictions[1:limit,:];
predEval        = predictions[limit+1:end,:];
predMatTraining = convert( Matrix{Float64}, predTraining );
predMatEval     = convert( Matrix{Float64}, predEval );
errMatTraining  = ( repeat( Float64.( y[1:limit] ),outer = [1,size(predictions,2)] ) .- predMatTraining ).^2;
errMatTraining  = convert( Matrix{Float64}, errMatTraining );
sampleSize      = 32
nrRuns          = 10000
α_              = 1.0

#== use bootstrap correlation algorithm to estimate the model posterior  distribution ==#
PBC = bootstrapPosteriorCorEstimation( predMatTraining, y, sampleSize, nrRuns );

#== use bootstrap algorithm to estimate the model posterior distribution ==#
pB  = bootstrapPosteriorEstimation( errMatTraining, sampleSize, nrRuns ); 

#== use Dirichletian algorithm to estimate the model posterior distribution ==#
PD  = dirichletPosteriorEstimation( errMatTraining, nrRuns, α_ );

#== use T-Distribution algorithm to estimate the model posterior distribution ==#
PT  = TDistPosteriorEstimation( errMatTraining, nrRuns );

sum( PBC ) + sum( pB ) + sum( PD ) + sum( PT ) ≈ 4.0

# output

true

"""

supported problems per algorithm

algorithmunivariate Classificationmultivariate Classificationunivariate Regressionmultivariate Classification
bootstrapyesyesyesyes
bootstrap cor.yesnoyesno
dirichletianyes, only {0,1}-lossyes, only {0,1}-lossnono
t-distributionyesyesyesyes

___

supported problems per fine tuning algorithms

algorithmunivariate Classificationmultivariate Classificationunivariate Regressionmultivariate Classification
δOptimizationMSEyesnoyesno
δOptimizationHingeyesnonono
δOptimizationHingeRegularizedyesnonono
δOptimizationMSERegularizedyesnoyesno

Index