BARN

One of the primary ensembles available in BARMPy (currently the only one) is a single hidden layer Neural Network. When you collect several of these NNs, you have Bayesian Additive Regression Networks (BARN).

NN

Before building an ensemble, it’s helpful to understand the core component that goes into it. In this case, we use a neural network implemented in sklearn with a few extra methods to ease their use in BARN. Eventually, this class will inherit from an abstract one for general BARM components.

class barmpy.barn.NN(num_nodes=10, weight_donor=None, l=2, lr=0.01, r=42, epochs=20, x_in=None, batch_size=512, solver=None, tol=0.001, reg=0.01, act='logistic', binary=False)[source]

Neural Network with single hidden layer implemented with sklearn.

Includes methods to do MCMC transitions and calculations.

accept_donation(donor_num_nodes, donor_weights, donor_intercepts)[source]

Replace our weights with those of another NN (passed as weights).

Donor can be different size; if smaller, earlier weights in donee are overwritten.

get_weights()[source]

Return sklearn style tuple of (coefs_, intercepts_)

static load(fname)[source]

Read a NumPy archive of an NN (such as created by save) into a new NN

log_acceptance(X, Y)[source]

Natural log of acceptance probability of transiting from X to Y

log_likelihood(X, Y, std)[source]

Log likelihood of NN assuming normally distributed errors.

log_prior(pmf=<bound method rv_discrete.logpmf of <scipy.stats._discrete_distns.poisson_gen object>>)[source]

Log prior probability of the NN. Assumes one distribution parameter, self.l.

log_transition(target, q=0.5)[source]

Transition probability from self to target

predict(X)[source]
save(fname)[source]

Save NN to disk as a NumPy archive

train(X, Y)[source]

Train network from current position with given data

BARN Class

Equipped with a core NN class above, we can train the entire ensemble following our Bayesian procedure.

class barmpy.barn.BARN_base(act='logistic', batch_size=512, callbacks={}, dname='default_name', epochs=10, init_neurons=None, l=2, lr=0.01, n_features_in_=None, n_iter=200, nu=3, num_nets=10, qq=0.9, random_state=42, reg=0.01, silence_warnings=True, solver=None, test_size=0.25, tol=0.001, trans_options=['grow', 'shrink'], trans_probs=[0.4, 0.6], use_tf=False, warm_start=True)[source]

Bayesian Additive Regression Networks ensemble.

Specify and train an array of neural nets with Bayesian posterior.

Argument Descriptions:

  • act - Activation function, ‘logistic’ or ‘relu’, passed to NN

  • batch_size - batch size for gradient descent solvers, passed to NN

  • callbacks - dictionary of callbacks, keyed by function with values being arguments passed to the callback

  • dname - dataset name if saving output

  • epochs - number of neural network training epochs for gradient descent solver

  • init_neurons - number of neurons to initialize each network in the ensemble to

  • l - prior distribution (Poisson) expected number of neurons for network in ensemble

  • lr - learning rate for solvers

  • n_features_in_ - number of features expected in input data, can be set during setup_nets instead

  • n_iter - maximum number of MCMC iterations to perform. One iter is a pass through each network in the ensemble

  • nu - chi-squared parameter for prior on model error, recommend leaving default

  • num_nets - number of networks in the ensemble

  • qq - quantile for error prior to compute lambda, recommend leaving default

  • random_state - random seed for reproducible results

  • reg - L1L2 regularization weight penalty on NN weights, passed to NN

  • silence_warnings - flag to turn off convergence warnings from SciPy which may not be helpful

  • solver - solver preference (‘lbfgs’ or ‘adam’), use None to automatically select based on problem size, passed to NN

  • test_size - fraction of data to use as held-back testing data if not supplying separate splits to BARN.fit

  • tol - tolerance used in solver, passed to NN

  • trans_options - possible state transition options as a list of strings

  • trans_probs - probability of each state transition option, currently fixed for each option

  • use_tf - flag to use TensorFlow backend for NN, recommend leaving False unless networks are large

  • warm_start - flag to allow reusing previously trained ensemble for a given instance (True) or create a fresh ensemble with calling BARN.fit (False), irrelevant if only calling BARN.fit for an instance once

batch_means(num_batch=20, batch_size=None, np_out='val_resid.npy', outfile='var_all.csv', mode='a', burn=None, num=None)[source]

Compute batch means variance over computed results.

compute_res(X, Y, i, S=None)[source]

Compute the residual for the current iteration, returning total prediction as well as target without contribution from model i.

Optionally use an existing S from previous iteration

fit(X, Y, Xva=None, Yva=None, Xte=None, Yte=None, n_iter=None)[source]

Overall BARN fitting method.

If Xva/Yva not supplied yet such data is requested by self.test_size, then training data is split, using self.test_size fraction as validation. If this validation data is available, it’s used for acceptance. Otherwise, the training data is reused for the probability calculation.

If Xte/Yte not supplied, however, we skip the test data calculation.

get_weights()[source]

Obtain weights of the NNs in the ensemble.

static improvement(self, check_every=None, skip_first=0, tol=0)[source]

Stop early if performance has not improved for check_every iters.

Allow wiggle room such that if we are within tol % of old best, continue

Skip the first skip_first iters without checking

phi_viz(outname='phi.png', close=True)[source]

Visualize the phi parameter, validation error over time

predict(X)[source]
static rfwsr(self, check_every=None, skip_first=0, t=2, eps=0.01)[source]

Relative Fixed Width Stopping Rule

t*sig/sqrt(n) + 1/n <= eps * gbar

Skip the first skip_first iters without checking

sample_sigma()[source]

Sample model sigma from posterior distribution (another inverse gamma)

save(outname)[source]

Save the entire ensemble of NNs as a Python pickle. Load with pickle too.

setup_nets(n_features_in_=None)[source]

Intermediate method to initialize networks in ensemble

static stable_dist(self, check_every=None, skip_first=0, tol=None)[source]

Stop early if posterior distribution of neuron distribution is stable

Tolerance is on Wassersten metric (aka Earth Mover Distance)

Skip the first skip_first iters without checking

static trans_enough(self, check_every=None, skip_first=0, ntrans=None)[source]

Stop early if fewer than ntrans transitions

Skip the first skip_first iters without checking

update_target(X, Y_old)[source]
viz(outname='results.png', extra_slots=0, close=True, initial=False, do_viz=True)[source]

Visualize results of BARN analysis.

Shows:
  1. Plot of Y_test vs Y_test_pred_initial (optional)

  2. Plot of Y_test vs Y_test_pred_final

Requires existing self.Xte and self.Yte (usually set by self.fit)

Both barmpy.barn.BARN (for regression) and barmpy.barn.BARN_bin (for binary classification) inherit from barmpy.barn.BARN_base.

Example

Let’s walk through a minimal example training an ensemble with BARN. Start by generating some data (or reading in some of your own).

import numpy as np
X = np.random.random([100,2])
# make an ordinary linear relationship
Y = X[:,0] + 2*X[:,1] + np.random.random(100)/10

Now we’ll initialize a BARN setup with 3 NN’s. We’ll use the default

from barmpy.barn import BARN, NN
model = BARN(num_nets=3, dname='example', epochs=100)

Actually running the model is straightforward, but you can tweak the MCMC parameters to your liking. After the specified number of MCMC iterations, your model is ready for pointwise inference by using the last ensemble in the chain.

model.fit(X,Y)
Yhat = model.predict(X)
print((Y-Yhat)**2/np.std(Y)) # relative error

Custom Callback Example

BARMPy also support custom model callbacks. Callbacks are a way to run a routine in between MCMC iterations. This is typically done to either log information or check for an early stopping condition. We provide several callbacks in the library itself, though you can supply your own function as well. We recommend barmpy.barn.BARN_base.improvement to check for early stopping with validation data (note such data will also be used for MCMC acceptance, but not NN training, if provided). The set of all provided callbacks are:

  • barmpy.barn.BARN_base.improvment - Check if validation error has stopped improving, indicating model has started to overfit and training should stop

  • barmpy.barn.BARN_base.rfwsr - Relative fixed-width stopping rule to check if MCMC estimate has converged

  • barmpy.barn.BARN_base.stable_dist - Check if distribution of neuron counts is stable, indicating stationary distribution reached

  • barmpy.barn.BARN_base.trans_enough - Check if enough transitions were accepted to justify additional MCMC iterations toward stationary posterior

To use a callback, we need to add it to a dictionary and pass that to the callbacks argument of barmpy.barn.BARN (or barmpy.barn.BARN_bin, as appropriate). The key to the dictionary should be the Python function or method itself, while the values are additional arguments provided to that function. Each iteration, we will call each callback function, passing the ensemble itself as the first argument (to enable access to its internals).

Here is a small snippet showing how to use the stable_dist callback:

callbacks = {barmpy.barn.BARN.stable_dist:
                {'check_every':1,
                'skip_first':4}}
model = BARN(num_nets=10,
            callbacks=callbacks,
            n_iter=n_iter)

CV Tuning Example

BARN is implemented as an sklearn class, meaning we can use standard sklearn methods like GridSearchCV to tune the hyperparameters for the best possible result. This does take considerably more processing power to test the various parameter configurations, so be mindful when considering the number of possible hyperparameter values.

Much like BART, we apply cross-validated hyperparameter tuning to set the priors (i.e. the expected number of neurons in a network, l). But as with BART, we do not seek an exact match, only something that generally agrees with the data. Below is a short series of examples using various sklearn approaches.

from sklearn import datasets
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from barmpy.barn import BARN
db = datasets.load_diabetes()
scoring = 'neg_root_mean_squared_error'

# exhaustive grid search
## first make prototype with fixed parameters
bmodel = BARN(num_nets=10,
          random_state=0,
          warm_start=True,
          solver='lbfgs')
## declare parameters to exhaust over
parameters = {'l': (1,2,3)}
barncv = GridSearchCV(bmodel, parameters,
                refit=True, verbose=4,
                scoring=scoring)
barncv.fit(db.data, db.target)
print(barncv.best_params_)

# randomized search with distributions
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import poisson
## first make prototype with fixed parameters
bmodel = BARN(num_nets=10,
          random_state=0,
          warm_start=True,
          solver='lbfgs')
## declare parameters and distributions
parameters = {'l': poisson(mu=2)}
barncv = RandomizedSearchCV(bmodel, parameters,
                refit=True, verbose=4,
                scoring=scoring, n_iter=3)
barncv.fit(db.data, db.target)
print(barncv.best_params_)

In particular, note the need to set the scoring = ‘neg_root_mean_squared_error’, which is what we recommend for default regression problems. You can find more scoring options at the sklearn.model_selection page.

Also, when using a method like RandomizedSearchCV, be careful to supply appropriate distributions. Here, l takes discrete values, so we specify a discrete Poisson probability distribution to sample from. Note, however, that this distribution is not the distribution BARN uses for internal MCMC transitions. This distribution is only for CV sampling the prior parameters.

Visualization Example

Though BARN is implemented as an sklearn regression class and you can use it with any compatible visualization library, we also have some built-in methods. The first is model.viz. After training, this creates a plot of predicted vs target values, both for initial BARN (i.e. before training) and final results.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from barmpy.barn import BARN, NN
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

db = datasets.load_diabetes()
Xtr, Xte, Ytr, Yte = train_test_split(db.data, db.target, test_size=0.2, random_state=0)

# rescale inputs with PCA (and output normalized)
Xtr_o = np.copy(Xtr)
Xte_o = np.copy(Xte)
scale_x = PCA(n_components=Xtr.shape[1], whiten=False)
scale_x.fit(Xtr)
Xtr = scale_x.transform(Xtr_o)
Xte = scale_x.transform(Xte_o)
Ytr_o = np.copy(Ytr)
Yte_o = np.copy(Yte)
scale_y = StandardScaler() # no need to PCA
scale_y.fit(Ytr.reshape((-1,1)))
Ytr = scale_y.transform(Ytr_o.reshape((-1,1))).reshape(-1)
Yte = scale_y.transform(Yte_o.reshape((-1,1))).reshape(-1)

model = BARN(num_nets=10, dname='example',
   l=1,
   act='logistic',
   epochs=100,
   n_iter=100)
model.fit(Xtr, Ytr, Xte=Xte, Yte=Yte)
model.viz(outname='viz_test.png', initial=True)
Left scatterplot shows initial BARN results.  Prediction vs Target values look like flat horizontal line.  Training R2 is 0.07 while test RMSE is 0.88.  On the right is a similar scatterplot showing trained BARN results.  The points start to track the goal 1-1 correspondence.  Training R2 is 0.56 while test RMSE is 0.76, an improvement.

We also have tool to view the validation error progression over the MCMC iterations. This can be helpful to assess convergence. Reusing the above model, it looks like this particular BARN model isn’t converging well.

model.phi_viz(outname='', close=False)
Line plot showing error of each MCMC iteration ensemble.  Error bounces up and down, not clearly converging.

Coming Soon

  • Tweaking MCMC parameters