BARN¶
One of the primary ensembles available in BARMPy (currently the only one) is a single hidden layer Neural Network. When you collect several of these NNs, you have Bayesian Additive Regression Networks (BARN).
NN¶
Before building an ensemble, it’s helpful to understand the core component that goes into it. In this case, we use a neural network implemented in sklearn with a few extra methods to ease their use in BARN. Eventually, this class will inherit from an abstract one for general BARM components.
- class barmpy.barn.NN(num_nodes=10, weight_donor=None, l=2, lr=0.01, r=42, epochs=20, x_in=None, batch_size=512, solver=None, tol=0.001, reg=0.01, act='logistic', binary=False)[source]¶
Neural Network with single hidden layer implemented with sklearn.
Includes methods to do MCMC transitions and calculations.
- accept_donation(donor_num_nodes, donor_weights, donor_intercepts)[source]¶
Replace our weights with those of another NN (passed as weights).
Donor can be different size; if smaller, earlier weights in donee are overwritten.
BARN Class¶
Equipped with a core NN class above, we can train the entire ensemble following our Bayesian procedure.
- class barmpy.barn.BARN_base(act='logistic', batch_size=512, burn=None, callbacks={}, dname='default_name', epochs=10, init_neurons=None, l=2, lr=0.01, n_features_in_=None, n_iter=200, ndraw=None, normalize=True, nu=3, num_nets=10, qq=0.9, random_state=42, reg=0.01, silence_warnings=True, solver=None, test_size=0.25, tol=0.001, trans_options=['grow', 'shrink'], trans_probs=[0.4, 0.6], use_tf=False, verbose=False, warm_start=True)[source]¶
Bayesian Additive Regression Networks ensemble.
Specify and train an array of neural nets with Bayesian posterior.
Argument Descriptions:
act - Activation function, ‘logistic’ or ‘relu’, passed to NN
batch_size - batch size for gradient descent solvers, passed to NN
burn - number of MCMC iterations to initially discard for burn-in
callbacks - dictionary of callbacks, keyed by function with values being arguments passed to the callback
dname - dataset name if saving output
epochs - number of neural network training epochs for gradient descent solver
init_neurons - number of neurons to initialize each network in the ensemble to
l - prior distribution (Poisson) expected number of neurons for network in ensemble
lr - learning rate for solvers
n_features_in_ - number of features expected in input data, can be set during setup_nets instead
n_iter - maximum number of MCMC iterations to perform. One iter is a pass through each network in the ensemble
ndraw - number of MCMC draws over which to average models
normalize - flag to normalize input and output before training and prediction
nu - chi-squared parameter for prior on model error, recommend leaving default
num_nets - number of networks in the ensemble
qq - quantile for error prior to compute lambda, recommend leaving default
random_state - random seed for reproducible results
reg - L1L2 regularization weight penalty on NN weights, passed to NN
silence_warnings - flag to turn off convergence warnings from SciPy which may not be helpful
solver - solver preference (‘lbfgs’ or ‘adam’), use None to automatically select based on problem size, passed to NN
test_size - fraction of data to use as held-back testing data if not supplying separate splits to BARN.fit
tol - tolerance used in solver, passed to NN
trans_options - possible state transition options as a list of strings
trans_probs - probability of each state transition option, currently fixed for each option
use_tf - flag to use TensorFlow backend for NN, recommend leaving False unless networks are large
warm_start - flag to allow reusing previously trained ensemble for a given instance (True) or create a fresh ensemble with calling BARN.fit (False), irrelevant if only calling BARN.fit for an instance once
- batch_means(num_batch=20, batch_size=None, np_out='val_resid.npy', outfile='var_all.csv', mode='a', burn=None, num=None)[source]¶
Compute batch means variance over computed results.
- compute_res(X, Y, i, S=None)[source]¶
Compute the residual for the current iteration, returning total prediction as well as target without contribution from model i.
Optionally use an existing S from previous iteration
- fit(X, Y, Xva=None, Yva=None, Xte=None, Yte=None, n_iter=None)[source]¶
Overall BARN fitting method.
If Xva/Yva not supplied yet such data is requested by self.test_size, then training data is split, using self.test_size fraction as validation. If this validation data is available, it’s used for acceptance. Otherwise, the training data is reused for the probability calculation.
If Xte/Yte not supplied, however, we skip the test data calculation.
- static improvement(self, check_every=None, skip_first=0, tol=0)[source]¶
Stop early if performance has not improved for check_every iters.
Allow wiggle room such that if we are within tol % of old best, continue
Skip the first skip_first iters without checking
- multidraw()[source]¶
Save multiple draws of the multiple for averaged prediction
Skip the first self.burn iters Otherwise, keep every self.save_every model
- phi_viz(outname='phi.png', close=True)[source]¶
Visualize the phi parameter, validation error over time
- predict_interval(X, alpha=0.05)[source]¶
Predict upper and lower confidence interval at alpha level Aggregate multiple MCMC draws using point and sigma est of each
- static rfwsr(self, check_every=None, skip_first=0, t=2, eps=0.01)[source]¶
Relative Fixed Width Stopping Rule
t*sig/sqrt(n) + 1/n <= eps * gbar
Skip the first skip_first iters without checking
- set_fit_request(*, Xte: bool | None | str = '$UNCHANGED$', Xva: bool | None | str = '$UNCHANGED$', Yte: bool | None | str = '$UNCHANGED$', Yva: bool | None | str = '$UNCHANGED$', n_iter: bool | None | str = '$UNCHANGED$') BARN_base ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
Xte (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Xte
parameter infit
.Xva (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Xva
parameter infit
.Yte (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Yte
parameter infit
.Yva (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Yva
parameter infit
.n_iter (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
n_iter
parameter infit
.
- Returns:
self – The updated object.
- Return type:
- static stable_dist(self, check_every=None, skip_first=0, tol=None)[source]¶
Stop early if posterior distribution of neuron distribution is stable
Tolerance is on Wassersten metric (aka Earth Mover Distance)
Skip the first skip_first iters without checking
Both barmpy.barn.BARN (for regression) and barmpy.barn.BARN_bin (for binary classification) inherit from barmpy.barn.BARN_base.
Example¶
Let’s walk through a minimal example training an ensemble with BARN. Start by generating some data (or reading in some of your own).
import numpy as np
X = np.random.random([100,2])
# make an ordinary linear relationship
Y = X[:,0] + 2*X[:,1] + np.random.random(100)/10
Now we’ll initialize a BARN setup with 3 NN’s. We’ll use the default
from barmpy.barn import BARN, NN
model = BARN(num_nets=3, dname='example', epochs=100)
Actually running the model is straightforward, but you can tweak the MCMC parameters to your liking. After the specified number of MCMC iterations, your model is ready for pointwise inference by averaging over multiple draws of the MCMC (default 10% post burn-in).
model.fit(X,Y)
Yhat = model.predict(X)
print((Y-Yhat)**2/np.std(Y)) # relative error
If you like, you can use the multiple MCMC draws produced during the modeling to estimate a confidence interval around this pointwise prediction. This uses the pointwise prediction and sigma estimate from each one of the draws to produce a 1-alpha bound centered at the pointwise prediction.
lower, upper = model.predict_interval(X, alpha=0.05)
Custom Callback Example¶
BARMPy also support custom model callbacks. Callbacks are a way to run a routine in between MCMC iterations. This is typically done to either log information or check for an early stopping condition. We provide several callbacks in the library itself, though you can supply your own function as well. We recommend barmpy.barn.BARN_base.improvement to check for early stopping with validation data (note such data will also be used for MCMC acceptance, but not NN training, if provided). The set of all provided callbacks are:
barmpy.barn.BARN_base.improvment - Check if validation error has stopped improving, indicating model has started to overfit and training should stop
barmpy.barn.BARN_base.rfwsr - Relative fixed-width stopping rule to check if MCMC estimate has converged
barmpy.barn.BARN_base.stable_dist - Check if distribution of neuron counts is stable, indicating stationary distribution reached
barmpy.barn.BARN_base.trans_enough - Check if enough transitions were accepted to justify additional MCMC iterations toward stationary posterior
To use a callback, we need to add it to a dictionary and pass that to the callbacks argument of barmpy.barn.BARN (or barmpy.barn.BARN_bin, as appropriate). The key to the dictionary should be the Python function or method itself, while the values are additional arguments provided to that function. Each iteration, we will call each callback function, passing the ensemble itself as the first argument (to enable access to its internals).
Here is a small snippet showing how to use the stable_dist callback:
callbacks = {barmpy.barn.BARN.stable_dist:
{'check_every':1,
'skip_first':4}}
model = BARN(num_nets=10,
callbacks=callbacks,
n_iter=n_iter)
CV Tuning Example¶
BARN is implemented as an sklearn class, meaning we can use standard sklearn methods like GridSearchCV to tune the hyperparameters for the best possible result. This does take considerably more processing power to test the various parameter configurations, so be mindful when considering the number of possible hyperparameter values.
Much like BART, we apply cross-validated hyperparameter tuning to set the priors (i.e. the expected number of neurons in a network, l). But as with BART, we do not seek an exact match, only something that generally agrees with the data. Below is a short series of examples using various sklearn approaches.
from sklearn import datasets
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from barmpy.barn import BARN
db = datasets.load_diabetes()
scoring = 'neg_root_mean_squared_error'
# exhaustive grid search
## first make prototype with fixed parameters
bmodel = BARN(num_nets=10,
random_state=0,
warm_start=True,
solver='lbfgs')
## declare parameters to exhaust over
parameters = {'l': (1,2,3)}
barncv = GridSearchCV(bmodel, parameters,
refit=True, verbose=4,
scoring=scoring)
barncv.fit(db.data, db.target)
print(barncv.best_params_)
# randomized search with distributions
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import poisson
## first make prototype with fixed parameters
bmodel = BARN(num_nets=10,
random_state=0,
warm_start=True,
solver='lbfgs')
## declare parameters and distributions
parameters = {'l': poisson(mu=2)}
barncv = RandomizedSearchCV(bmodel, parameters,
refit=True, verbose=4,
scoring=scoring, n_iter=3)
barncv.fit(db.data, db.target)
print(barncv.best_params_)
In particular, note the need to set the scoring = ‘neg_root_mean_squared_error’, which is what we recommend for default regression problems. You can find more scoring options at the sklearn.model_selection page.
Also, when using a method like RandomizedSearchCV, be careful to supply appropriate distributions. Here, l takes discrete values, so we specify a discrete Poisson probability distribution to sample from. Note, however, that this distribution is not the distribution BARN uses for internal MCMC transitions. This distribution is only for CV sampling the prior parameters.
Visualization Example¶
Though BARN is implemented as an sklearn regression class and you can use it with any compatible visualization library, we also have some built-in methods. The first is model.viz. After training, this creates a plot of predicted vs target values, both for initial BARN (i.e. before training) and final results.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from barmpy.barn import BARN, NN
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
db = datasets.load_diabetes()
Xtr, Xte, Ytr, Yte = train_test_split(db.data, db.target, test_size=0.2, random_state=0)
# rescale inputs with PCA (and output normalized)
Xtr_o = np.copy(Xtr)
Xte_o = np.copy(Xte)
scale_x = PCA(n_components=Xtr.shape[1], whiten=False)
scale_x.fit(Xtr)
Xtr = scale_x.transform(Xtr_o)
Xte = scale_x.transform(Xte_o)
Ytr_o = np.copy(Ytr)
Yte_o = np.copy(Yte)
scale_y = StandardScaler() # no need to PCA
scale_y.fit(Ytr.reshape((-1,1)))
Ytr = scale_y.transform(Ytr_o.reshape((-1,1))).reshape(-1)
Yte = scale_y.transform(Yte_o.reshape((-1,1))).reshape(-1)
model = BARN(num_nets=10, dname='example',
l=1,
act='logistic',
epochs=100,
n_iter=100)
model.fit(Xtr, Ytr, Xte=Xte, Yte=Yte)
model.viz(outname='viz_test.png', initial=True)
We also have tool to view the validation error progression over the MCMC iterations. This can be helpful to assess convergence. Reusing the above model, it looks like this particular BARN model isn’t converging well.
model.phi_viz(outname='', close=False)
Coming Soon¶
Tweaking MCMC parameters