BARN¶

One of the primary ensembles available in BARMPy (currently the only one) is a single hidden layer Neural Network. When you collect several of these NNs, you have Bayesian Additive Regression Networks (BARN).

NN¶

Before building an ensemble, it’s helpful to understand the core component that goes into it. In this case, we use a neural network implemented in sklearn with a few extra methods to ease their use in BARN. Eventually, this class will inherit from an abstract one for general BARM components.

class barmpy.barn.NN(num_nodes=10, weight_donor=None, l=2, lr=0.01, r=42, epochs=20, x_in=None, batch_size=512, solver=None, tol=0.001, reg=0.01, act='logistic', binary=False)[source]¶

Neural Network with single hidden layer implemented with sklearn.

Includes methods to do MCMC transitions and calculations.

accept_donation(donor_num_nodes, donor_weights, donor_intercepts)[source]¶

Replace our weights with those of another NN (passed as weights).

Donor can be different size; if smaller, earlier weights in donee are overwritten.

get_weights()[source]¶: Return sklearn style tuple of (coefs_, intercepts_)

static load(fname)[source]¶: Read a NumPy archive of an NN (such as created by save) into a new NN

log_acceptance(X, Y)[source]¶: Natural log of acceptance probability of transiting from X to Y

log_likelihood(X, Y, std)[source]¶: Log likelihood of NN assuming normally distributed errors.

log_prior(pmf=<bound method rv_discrete.logpmf of <scipy.stats._discrete_distns.poisson_gen object>>)[source]¶: Log prior probability of the NN. Assumes one distribution parameter, self.l.

log_transition(target, q=0.5)[source]¶: Transition probability from self to target

predict(X)[source]¶

save(fname)[source]¶: Save NN to disk as a NumPy archive

train(X, Y)[source]¶: Train network from current position with given data

BARN Class¶

Equipped with a core NN class above, we can train the entire ensemble following our Bayesian procedure.

class barmpy.barn.BARN_base(act='logistic', batch_size=512, burn=None, callbacks={}, dname='default_name', epochs=10, init_neurons=None, l=2, lr=0.01, n_features_in_=None, n_iter=200, ndraw=None, normalize=True, nu=3, num_nets=10, qq=0.9, random_state=42, reg=0.01, silence_warnings=True, solver=None, test_size=0.25, tol=0.001, trans_options=['grow', 'shrink'], trans_probs=[0.4, 0.6], use_tf=False, verbose=False, warm_start=True)[source]¶

Bayesian Additive Regression Networks ensemble.

Specify and train an array of neural nets with Bayesian posterior.

Argument Descriptions:

act - Activation function, ‘logistic’ or ‘relu’, passed to NN
batch_size - batch size for gradient descent solvers, passed to NN
burn - number of MCMC iterations to initially discard for burn-in
callbacks - dictionary of callbacks, keyed by function with values being arguments passed to the callback
dname - dataset name if saving output
epochs - number of neural network training epochs for gradient descent solver
init_neurons - number of neurons to initialize each network in the ensemble to
l - prior distribution (Poisson) expected number of neurons for network in ensemble
lr - learning rate for solvers
n_features_in_ - number of features expected in input data, can be set during setup_nets instead
n_iter - maximum number of MCMC iterations to perform. One iter is a pass through each network in the ensemble
ndraw - number of MCMC draws over which to average models
normalize - flag to normalize input and output before training and prediction
nu - chi-squared parameter for prior on model error, recommend leaving default
num_nets - number of networks in the ensemble
qq - quantile for error prior to compute lambda, recommend leaving default
random_state - random seed for reproducible results
reg - L1L2 regularization weight penalty on NN weights, passed to NN
silence_warnings - flag to turn off convergence warnings from SciPy which may not be helpful
solver - solver preference (‘lbfgs’ or ‘adam’), use None to automatically select based on problem size, passed to NN
test_size - fraction of data to use as held-back testing data if not supplying separate splits to BARN.fit
tol - tolerance used in solver, passed to NN
trans_options - possible state transition options as a list of strings
trans_probs - probability of each state transition option, currently fixed for each option
use_tf - flag to use TensorFlow backend for NN, recommend leaving False unless networks are large
warm_start - flag to allow reusing previously trained ensemble for a given instance (True) or create a fresh ensemble with calling BARN.fit (False), irrelevant if only calling BARN.fit for an instance once

batch_means(num_batch=20, batch_size=None, np_out='val_resid.npy', outfile='var_all.csv', mode='a', burn=None, num=None)[source]¶: Compute batch means variance over computed results.

compute_res(X, Y, i, S=None)[source]¶

Compute the residual for the current iteration, returning total prediction as well as target without contribution from model i.

Optionally use an existing S from previous iteration

create_input_normalizer(X, normalize=False)[source]¶: Setup PCA normalization of input features

create_output_normalizer(Y, normalize=False)[source]¶: Setup PCA normalization of target output

fit(X, Y, Xva=None, Yva=None, Xte=None, Yte=None, n_iter=None)[source]¶

Overall BARN fitting method.

If Xva/Yva not supplied yet such data is requested by self.test_size, then training data is split, using self.test_size fraction as validation. If this validation data is available, it’s used for acceptance. Otherwise, the training data is reused for the probability calculation.

If Xte/Yte not supplied, however, we skip the test data calculation.

get_weights()[source]¶: Obtain weights of the NNs in the ensemble.

static improvement(self, check_every=None, skip_first=0, tol=0)[source]¶

Stop early if performance has not improved for check_every iters.

Allow wiggle room such that if we are within tol % of old best, continue

Skip the first skip_first iters without checking

multidraw()[source]¶

Save multiple draws of the multiple for averaged prediction

Skip the first self.burn iters Otherwise, keep every self.save_every model

phi_viz(outname='phi.png', close=True)[source]¶: Visualize the phi parameter, validation error over time

predict(X)[source]¶

predict_interval(X, alpha=0.05)[source]¶: Predict upper and lower confidence interval at alpha level Aggregate multiple MCMC draws using point and sigma est of each

static rfwsr(self, check_every=None, skip_first=0, t=2, eps=0.01)[source]¶

Relative Fixed Width Stopping Rule

t*sig/sqrt(n) + 1/n <= eps * gbar

Skip the first skip_first iters without checking

sample_sigma()[source]¶: Sample model sigma from posterior distribution (another inverse gamma)

save(outname)[source]¶: Save the entire ensemble of NNs as a Python pickle. Load with pickle too.

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

Xte (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Xte parameter in fit.
Xva (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Xva parameter in fit.
Yte (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Yte parameter in fit.
Yva (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Yva parameter in fit.
n_iter (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for n_iter parameter in fit.

Returns:

self – The updated object.

Return type:

object

setup_nets(n_features_in_=None)[source]¶: Intermediate method to initialize networks in ensemble

static stable_dist(self, check_every=None, skip_first=0, tol=None)[source]¶

Stop early if posterior distribution of neuron distribution is stable

Tolerance is on Wassersten metric (aka Earth Mover Distance)

Skip the first skip_first iters without checking

static trans_enough(self, check_every=None, skip_first=0, ntrans=None)[source]¶

Stop early if fewer than ntrans transitions

Skip the first skip_first iters without checking

update_target(X, Y_old)[source]¶

viz(outname='results.png', extra_slots=0, close=True, initial=False, do_viz=True)[source]¶

Visualize results of BARN analysis.

Shows:

Plot of Y_test vs Y_test_pred_initial (optional)
Plot of Y_test vs Y_test_pred_final

Requires existing self.Xte and self.Yte (usually set by self.fit)

Both barmpy.barn.BARN (for regression) and barmpy.barn.BARN_bin (for binary classification) inherit from barmpy.barn.BARN_base.

Example¶

Let’s walk through a minimal example training an ensemble with BARN. Start by generating some data (or reading in some of your own).

import numpy as np
X = np.random.random([100,2])
# make an ordinary linear relationship
Y = X[:,0] + 2*X[:,1] + np.random.random(100)/10

Now we’ll initialize a BARN setup with 3 NN’s. We’ll use the default

from barmpy.barn import BARN, NN
model = BARN(num_nets=3, dname='example', epochs=100)

Actually running the model is straightforward, but you can tweak the MCMC parameters to your liking. After the specified number of MCMC iterations, your model is ready for pointwise inference by averaging over multiple draws of the MCMC (default 10% post burn-in).

model.fit(X,Y)
Yhat = model.predict(X)
print((Y-Yhat)**2/np.std(Y)) # relative error

If you like, you can use the multiple MCMC draws produced during the modeling to estimate a confidence interval around this pointwise prediction. This uses the pointwise prediction and sigma estimate from each one of the draws to produce a 1-alpha bound centered at the pointwise prediction.

lower, upper = model.predict_interval(X, alpha=0.05)

Custom Callback Example¶

BARMPy also support custom model callbacks. Callbacks are a way to run a routine in between MCMC iterations. This is typically done to either log information or check for an early stopping condition. We provide several callbacks in the library itself, though you can supply your own function as well. We recommend barmpy.barn.BARN_base.improvement to check for early stopping with validation data (note such data will also be used for MCMC acceptance, but not NN training, if provided). The set of all provided callbacks are:

barmpy.barn.BARN_base.improvment - Check if validation error has stopped improving, indicating model has started to overfit and training should stop
barmpy.barn.BARN_base.rfwsr - Relative fixed-width stopping rule to check if MCMC estimate has converged
barmpy.barn.BARN_base.stable_dist - Check if distribution of neuron counts is stable, indicating stationary distribution reached
barmpy.barn.BARN_base.trans_enough - Check if enough transitions were accepted to justify additional MCMC iterations toward stationary posterior

To use a callback, we need to add it to a dictionary and pass that to the callbacks argument of barmpy.barn.BARN (or barmpy.barn.BARN_bin, as appropriate). The key to the dictionary should be the Python function or method itself, while the values are additional arguments provided to that function. Each iteration, we will call each callback function, passing the ensemble itself as the first argument (to enable access to its internals).

Here is a small snippet showing how to use the stable_dist callback:

callbacks = {barmpy.barn.BARN.stable_dist:
                {'check_every':1,
                'skip_first':4}}
model = BARN(num_nets=10,
            callbacks=callbacks,
            n_iter=n_iter)

CV Tuning Example¶

BARN is implemented as an sklearn class, meaning we can use standard sklearn methods like GridSearchCV to tune the hyperparameters for the best possible result. This does take considerably more processing power to test the various parameter configurations, so be mindful when considering the number of possible hyperparameter values.

Much like BART, we apply cross-validated hyperparameter tuning to set the priors (i.e. the expected number of neurons in a network, l). But as with BART, we do not seek an exact match, only something that generally agrees with the data. Below is a short series of examples using various sklearn approaches.

from sklearn import datasets
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from barmpy.barn import BARN
db = datasets.load_diabetes()
scoring = 'neg_root_mean_squared_error'

# exhaustive grid search
## first make prototype with fixed parameters
bmodel = BARN(num_nets=10,
          random_state=0,
          warm_start=True,
          solver='lbfgs')
## declare parameters to exhaust over
parameters = {'l': (1,2,3)}
barncv = GridSearchCV(bmodel, parameters,
                refit=True, verbose=4,
                scoring=scoring)
barncv.fit(db.data, db.target)
print(barncv.best_params_)

# randomized search with distributions
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import poisson
## first make prototype with fixed parameters
bmodel = BARN(num_nets=10,
          random_state=0,
          warm_start=True,
          solver='lbfgs')
## declare parameters and distributions
parameters = {'l': poisson(mu=2)}
barncv = RandomizedSearchCV(bmodel, parameters,
                refit=True, verbose=4,
                scoring=scoring, n_iter=3)
barncv.fit(db.data, db.target)
print(barncv.best_params_)

In particular, note the need to set the scoring = ‘neg_root_mean_squared_error’, which is what we recommend for default regression problems. You can find more scoring options at the sklearn.model_selection page.

Also, when using a method like RandomizedSearchCV, be careful to supply appropriate distributions. Here, l takes discrete values, so we specify a discrete Poisson probability distribution to sample from. Note, however, that this distribution is not the distribution BARN uses for internal MCMC transitions. This distribution is only for CV sampling the prior parameters.

Visualization Example¶

Though BARN is implemented as an sklearn regression class and you can use it with any compatible visualization library, we also have some built-in methods. The first is model.viz. After training, this creates a plot of predicted vs target values, both for initial BARN (i.e. before training) and final results.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from barmpy.barn import BARN, NN
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

db = datasets.load_diabetes()
Xtr, Xte, Ytr, Yte = train_test_split(db.data, db.target, test_size=0.2, random_state=0)

# rescale inputs with PCA (and output normalized)
Xtr_o = np.copy(Xtr)
Xte_o = np.copy(Xte)
scale_x = PCA(n_components=Xtr.shape[1], whiten=False)
scale_x.fit(Xtr)
Xtr = scale_x.transform(Xtr_o)
Xte = scale_x.transform(Xte_o)
Ytr_o = np.copy(Ytr)
Yte_o = np.copy(Yte)
scale_y = StandardScaler() # no need to PCA
scale_y.fit(Ytr.reshape((-1,1)))
Ytr = scale_y.transform(Ytr_o.reshape((-1,1))).reshape(-1)
Yte = scale_y.transform(Yte_o.reshape((-1,1))).reshape(-1)

model = BARN(num_nets=10, dname='example',
   l=1,
   act='logistic',
   epochs=100,
   n_iter=100)
model.fit(Xtr, Ytr, Xte=Xte, Yte=Yte)
model.viz(outname='viz_test.png', initial=True)

Left scatterplot shows initial BARN results. Prediction vs Target values look like flat horizontal line. Training R2 is 0.07 while test RMSE is 0.88. On the right is a similar scatterplot showing trained BARN results. The points start to track the goal 1-1 correspondence. Training R2 is 0.56 while test RMSE is 0.76, an improvement.

We also have tool to view the validation error progression over the MCMC iterations. This can be helpful to assess convergence. Reusing the above model, it looks like this particular BARN model isn’t converging well.

model.phi_viz(outname='', close=False)

Line plot showing error of each MCMC iteration ensemble. Error bounces up and down, not clearly converging.

Coming Soon¶

Tweaking MCMC parameters

BARN¶

NN¶

BARN Class¶

Example¶

Custom Callback Example¶

CV Tuning Example¶

Visualization Example¶

Coming Soon¶

Navigation

Related Topics