BARN¶
One of the primary ensembles available in BARMPy (currently the only one) is a single hidden layer Neural Network. When you collect several of these NNs, you have Bayesian Additive Regression Networks (BARN).
NN¶
Before building an ensemble, it’s helpful to understand the core component that goes into it. In this case, we use a neural network implemented in sklearn with a few extra methods to ease their use in BARN. Eventually, this class will inherit from an abstract one for general BARM components.
- class barmpy.barn.NN(num_nodes=10, weight_donor=None, l=2, lr=0.01, r=42, epochs=20, x_in=None, batch_size=512, solver=None, tol=0.001, reg=0.01, act='logistic', binary=False)[source]¶
Neural Network with single hidden layer implemented with sklearn.
Includes methods to do MCMC transitions and calculations.
- accept_donation(donor_num_nodes, donor_weights, donor_intercepts)[source]¶
Replace our weights with those of another NN (passed as weights).
Donor can be different size; if smaller, earlier weights in donee are overwritten.
BARN Class¶
Equipped with a core NN class above, we can train the entire ensemble following our Bayesian procedure.
- class barmpy.barn.BARN_base(act='logistic', batch_size=512, callbacks={}, dname='default_name', epochs=10, init_neurons=None, l=2, lr=0.01, n_features_in_=None, n_iter=200, nu=3, num_nets=10, qq=0.9, random_state=42, reg=0.01, silence_warnings=True, solver=None, test_size=0.25, tol=0.001, trans_options=['grow', 'shrink'], trans_probs=[0.4, 0.6], use_tf=False, warm_start=True)[source]¶
Bayesian Additive Regression Networks ensemble.
Specify and train an array of neural nets with Bayesian posterior.
Argument Descriptions:
act - Activation function, ‘logistic’ or ‘relu’, passed to NN
batch_size - batch size for gradient descent solvers, passed to NN
callbacks - dictionary of callbacks, keyed by function with values being arguments passed to the callback
dname - dataset name if saving output
epochs - number of neural network training epochs for gradient descent solver
init_neurons - number of neurons to initialize each network in the ensemble to
l - prior distribution (Poisson) expected number of neurons for network in ensemble
lr - learning rate for solvers
n_features_in_ - number of features expected in input data, can be set during setup_nets instead
n_iter - maximum number of MCMC iterations to perform. One iter is a pass through each network in the ensemble
nu - chi-squared parameter for prior on model error, recommend leaving default
num_nets - number of networks in the ensemble
qq - quantile for error prior to compute lambda, recommend leaving default
random_state - random seed for reproducible results
reg - L1L2 regularization weight penalty on NN weights, passed to NN
silence_warnings - flag to turn off convergence warnings from SciPy which may not be helpful
solver - solver preference (‘lbfgs’ or ‘adam’), use None to automatically select based on problem size, passed to NN
test_size - fraction of data to use as held-back testing data if not supplying separate splits to BARN.fit
tol - tolerance used in solver, passed to NN
trans_options - possible state transition options as a list of strings
trans_probs - probability of each state transition option, currently fixed for each option
use_tf - flag to use TensorFlow backend for NN, recommend leaving False unless networks are large
warm_start - flag to allow reusing previously trained ensemble for a given instance (True) or create a fresh ensemble with calling BARN.fit (False), irrelevant if only calling BARN.fit for an instance once
- batch_means(num_batch=20, batch_size=None, np_out='val_resid.npy', outfile='var_all.csv', mode='a', burn=None, num=None)[source]¶
Compute batch means variance over computed results.
- compute_res(X, Y, i, S=None)[source]¶
Compute the residual for the current iteration, returning total prediction as well as target without contribution from model i.
Optionally use an existing S from previous iteration
- fit(X, Y, Xva=None, Yva=None, Xte=None, Yte=None, n_iter=None)[source]¶
Overall BARN fitting method.
If Xva/Yva not supplied yet such data is requested by self.test_size, then training data is split, using self.test_size fraction as validation. If this validation data is available, it’s used for acceptance. Otherwise, the training data is reused for the probability calculation.
If Xte/Yte not supplied, however, we skip the test data calculation.
- static improvement(self, check_every=None, skip_first=0, tol=0)[source]¶
Stop early if performance has not improved for check_every iters.
Allow wiggle room such that if we are within tol % of old best, continue
Skip the first skip_first iters without checking
- phi_viz(outname='phi.png', close=True)[source]¶
Visualize the phi parameter, validation error over time
- static rfwsr(self, check_every=None, skip_first=0, t=2, eps=0.01)[source]¶
Relative Fixed Width Stopping Rule
t*sig/sqrt(n) + 1/n <= eps * gbar
Skip the first skip_first iters without checking
- static stable_dist(self, check_every=None, skip_first=0, tol=None)[source]¶
Stop early if posterior distribution of neuron distribution is stable
Tolerance is on Wassersten metric (aka Earth Mover Distance)
Skip the first skip_first iters without checking
Both barmpy.barn.BARN (for regression) and barmpy.barn.BARN_bin (for binary classification) inherit from barmpy.barn.BARN_base.
Example¶
Let’s walk through a minimal example training an ensemble with BARN. Start by generating some data (or reading in some of your own).
import numpy as np
X = np.random.random([100,2])
# make an ordinary linear relationship
Y = X[:,0] + 2*X[:,1] + np.random.random(100)/10
Now we’ll initialize a BARN setup with 3 NN’s. We’ll use the default
from barmpy.barn import BARN, NN
model = BARN(num_nets=3, dname='example', epochs=100)
Actually running the model is straightforward, but you can tweak the MCMC parameters to your liking. After the specified number of MCMC iterations, your model is ready for pointwise inference by using the last ensemble in the chain.
model.fit(X,Y)
Yhat = model.predict(X)
print((Y-Yhat)**2/np.std(Y)) # relative error
Custom Callback Example¶
BARMPy also support custom model callbacks. Callbacks are a way to run a routine in between MCMC iterations. This is typically done to either log information or check for an early stopping condition. We provide several callbacks in the library itself, though you can supply your own function as well. We recommend barmpy.barn.BARN_base.improvement to check for early stopping with validation data (note such data will also be used for MCMC acceptance, but not NN training, if provided). The set of all provided callbacks are:
barmpy.barn.BARN_base.improvment - Check if validation error has stopped improving, indicating model has started to overfit and training should stop
barmpy.barn.BARN_base.rfwsr - Relative fixed-width stopping rule to check if MCMC estimate has converged
barmpy.barn.BARN_base.stable_dist - Check if distribution of neuron counts is stable, indicating stationary distribution reached
barmpy.barn.BARN_base.trans_enough - Check if enough transitions were accepted to justify additional MCMC iterations toward stationary posterior
To use a callback, we need to add it to a dictionary and pass that to the callbacks argument of barmpy.barn.BARN (or barmpy.barn.BARN_bin, as appropriate). The key to the dictionary should be the Python function or method itself, while the values are additional arguments provided to that function. Each iteration, we will call each callback function, passing the ensemble itself as the first argument (to enable access to its internals).
Here is a small snippet showing how to use the stable_dist callback:
callbacks = {barmpy.barn.BARN.stable_dist:
{'check_every':1,
'skip_first':4}}
model = BARN(num_nets=10,
callbacks=callbacks,
n_iter=n_iter)
CV Tuning Example¶
BARN is implemented as an sklearn class, meaning we can use standard sklearn methods like GridSearchCV to tune the hyperparameters for the best possible result. This does take considerably more processing power to test the various parameter configurations, so be mindful when considering the number of possible hyperparameter values.
Much like BART, we apply cross-validated hyperparameter tuning to set the priors (i.e. the expected number of neurons in a network, l). But as with BART, we do not seek an exact match, only something that generally agrees with the data. Below is a short series of examples using various sklearn approaches.
from sklearn import datasets
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from barmpy.barn import BARN
db = datasets.load_diabetes()
scoring = 'neg_root_mean_squared_error'
# exhaustive grid search
## first make prototype with fixed parameters
bmodel = BARN(num_nets=10,
random_state=0,
warm_start=True,
solver='lbfgs')
## declare parameters to exhaust over
parameters = {'l': (1,2,3)}
barncv = GridSearchCV(bmodel, parameters,
refit=True, verbose=4,
scoring=scoring)
barncv.fit(db.data, db.target)
print(barncv.best_params_)
# randomized search with distributions
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import poisson
## first make prototype with fixed parameters
bmodel = BARN(num_nets=10,
random_state=0,
warm_start=True,
solver='lbfgs')
## declare parameters and distributions
parameters = {'l': poisson(mu=2)}
barncv = RandomizedSearchCV(bmodel, parameters,
refit=True, verbose=4,
scoring=scoring, n_iter=3)
barncv.fit(db.data, db.target)
print(barncv.best_params_)
In particular, note the need to set the scoring = ‘neg_root_mean_squared_error’, which is what we recommend for default regression problems. You can find more scoring options at the sklearn.model_selection page.
Also, when using a method like RandomizedSearchCV, be careful to supply appropriate distributions. Here, l takes discrete values, so we specify a discrete Poisson probability distribution to sample from. Note, however, that this distribution is not the distribution BARN uses for internal MCMC transitions. This distribution is only for CV sampling the prior parameters.
Visualization Example¶
Though BARN is implemented as an sklearn regression class and you can use it with any compatible visualization library, we also have some built-in methods. The first is model.viz. After training, this creates a plot of predicted vs target values, both for initial BARN (i.e. before training) and final results.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from barmpy.barn import BARN, NN
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
db = datasets.load_diabetes()
Xtr, Xte, Ytr, Yte = train_test_split(db.data, db.target, test_size=0.2, random_state=0)
# rescale inputs with PCA (and output normalized)
Xtr_o = np.copy(Xtr)
Xte_o = np.copy(Xte)
scale_x = PCA(n_components=Xtr.shape[1], whiten=False)
scale_x.fit(Xtr)
Xtr = scale_x.transform(Xtr_o)
Xte = scale_x.transform(Xte_o)
Ytr_o = np.copy(Ytr)
Yte_o = np.copy(Yte)
scale_y = StandardScaler() # no need to PCA
scale_y.fit(Ytr.reshape((-1,1)))
Ytr = scale_y.transform(Ytr_o.reshape((-1,1))).reshape(-1)
Yte = scale_y.transform(Yte_o.reshape((-1,1))).reshape(-1)
model = BARN(num_nets=10, dname='example',
l=1,
act='logistic',
epochs=100,
n_iter=100)
model.fit(Xtr, Ytr, Xte=Xte, Yte=Yte)
model.viz(outname='viz_test.png', initial=True)
We also have tool to view the validation error progression over the MCMC iterations. This can be helpful to assess convergence. Reusing the above model, it looks like this particular BARN model isn’t converging well.
model.phi_viz(outname='', close=False)
Coming Soon¶
Tweaking MCMC parameters