🦒 📙 🐋 Machine Learning: Mixing Ensemble in Python 🔋 👭 🔥

Blending is an ensemble machine learning algorithm. This is a colloquial name for stacked generalization [hereinafter we will use the term “packaging” instead of the term “stacking” accepted in scientific works] or a packaging ensemble, where instead of training the metamodel on predictions outside the groups made by the base model, the model is trained on predictions made on an independent dataset.

Blending has been used to describe batching models that have pooled many hundreds of models in a $ 1,000,000 Netflix machine learning competition, and as such, blending remains a popular method and name for batching in machine learning contests such as Kaggle ... Especially for the start of a new stream of the course "Machine Learning"Share a tutorial on how to develop and evaluate a mixed ensemble in python. After completing this tutorial, you will know:

Mixed ensembles are a type of model packaging in which the metamodel is trained using predictions on an independent test validation dataset instead of predictions during k-fold cross-validation.
How to develop a mixed ensemble, including model training and prediction functions based on new data.
How to evaluate mixed ensembles for classification and predictive regression modeling problems.

Tutorial overview

This tutorial is divided into four parts. Here they are:

Mixed ensemble.
Creation of a mixed ensemble.
Mixed ensemble in a classification problem.
Mixed ensemble in a regression problem.

Mixed ensemble

Blending is an ensemble machine learning technique that uses a machine learning model to learn how to best combine predictions from multiple ensemble member models.

So mixing is the same as stacking generalization known as stacking. Blending and batching are often used interchangeably in the same article or model description.

Many machine learning practitioners have found success using batching and related techniques to improve prediction accuracy over any of the individual models. In some contexts, batching is also called blending. We will also interchange terms here.

Feature-Weighted Linear Stacking , 2009.

A batch model architecture contains two or more baseline models, often referred to as level zero models, and a metamodel that combines the baseline model's predictions as a level one model. The metamodel is trained based on the predictions made by the base models on out-of-sample data.

Zero-level models ( baseline models ) are models trained on training data whose forecasts are collected.
The first level model ( metamodel ) is a model that learns to best combine the forecasts of the underlying models.

However, blending has certain connotations for building a packaged ensemble model. Blending can offer the development of a stack ensemble, where the base models are any type of machine learning model, and the metamodel is a linear model that "blends" the predictions of the base models. For example, a linear regression model when predicting a numeric value, or a logistic regression model when predicting a class label, computes the weighted sum of the predictions made by the base models and will be treated as mixing predictions.

Mixed ensemble : Use a linear model, such as linear regression or logistic regression, as a metamodel in a batch model ensemble.

Blending was a term commonly used for a packaged ensemble during the 2009 Netflix competition. The competition involved teams looking for predictive models that performed better than the native Netflix algorithm, with a $ 1,000,000 prize awarded to the team that achieved a 10 percent improvement in performance.

Our solution with RMSE = 0.8643 ^ 2 is a linear mixing of over 100 results. […] Throughout the description of the methods, we highlight the specific predictors that participated in the final mixed solution.

The BellKor 2008 Solution to the Netflix Prize , 2008.

Thus, mixing is the colloquial term for ensemble learning with a model architecture such as batching. It is rarely, if ever, used in textbooks or academic papers other than those related to machine learning in competition. Most commonly, the term blending is used to describe a specific application of batching where a metamodel is trained on predictions made by base models with an independent validation dataset. In this context, packaging is reserved for the metamodel trained on predictions during the cross-validation procedure.

Blending : A batching-type ensemble where a metamodel is trained on predictions made on an independent dataset.
Batching : A batch-type ensemble where a metamodel is trained on predictions made during k-fold cross-validation.

This distinction is common in the Kaggle machine learning competition community.

Mixing is the word coined by the Netflix winners. It is very close to generalization, but a little simpler and the risk of information leakage is less. […] By mixing, instead of generating predictions during cross-validation for the training set, you create a small independent set of, say, 10% of the training set. The batching model is then trained on only this small set.

Kaggle Ensemble Guide , MLWave, 2015.

We use the last definition of blending. Let's see how it is implemented.

Mixed ensemble development

The scikit-learn library does not support mixing out of the box at the time of this writing. But we can implement it ourselves using scikit-learn models. First, you need to create a set of base models. These can be any models that we like for a regression or classification problem. We can define a get_models () function that returns a list of models, where each model is defined as a tuple with a name and a customized classifier or regression object. For example, for a classification problem, we could use logistic regression, kNN, decision tree, SVM, and naive Bayesian model.

# get a list of base models
def get_models():
models = list()
models.append(('lr', LogisticRegression()))
models.append(('knn', KNeighborsClassifier()))
models.append(('cart', DecisionTreeClassifier()))
models.append(('svm', SVC(probability=True)))
models.append(('bayes', GaussianNB()))
return models

Next, we need to train the mixing model. Recall that the base models are trained on the training dataset. The metamodel is trained on the predictions made by each base model on an independent dataset.

First, we can loop through the models in a list and train each of them in turn on the training dataset. Also, in this loop, we can use a trained model to make a prediction on an independent dataset (validation) and store predictions for the future.

...
# fit all models on the training set and predict on hold out set
meta_X = list()
for name, model in models:
# fit in training set
model.fit(X_train, y_train)
# predict on hold out set
yhat = model.predict(X_val)
# reshape predictions into a matrix with one column
yhat = yhat.reshape(len(yhat), 1)
# store predictions as input for blending
meta_X.append(yhat)

We now have "meta_X *" representing the input that can be used to train the meta model. Each column or object represents the output of one base model. Each line represents one sample from an independent dataset. We can use the hstack () function to ensure that this dataset is a two-dimensional numpy array, as expected by the machine learning model.

...
# create 2d array from predictions, each set is an input feature
meta_X = hstack(meta_X)

Now we can train our meta model. It can be any machine learning model we like, like logistic regression for a classification problem.

...
# define blending model
blender = LogisticRegression()
# fit on predictions from base models
blender.fit(meta_X, y_val)

You can bundle it all together into a function called fit_ensemble () that trains the blending model using the training and independent validation dataset.

# fit the blending ensemble
def fit_ensemble(models, X_train, X_val, y_train, y_val):
# fit all models on the training set and predict on hold out set
meta_X = list()
for name, model in models:
# fit in training set
model.fit(X_train, y_train)
# predict on hold out set
yhat = model.predict(X_val)
# reshape predictions into a matrix with one column
yhat = yhat.reshape(len(yhat), 1)
# store predictions as input for blending
meta_X.append(yhat)
# create 2d array from predictions, each set is an input feature
meta_X = hstack(meta_X)
# define blending model
blender = LogisticRegression()
# fit on predictions from base models
blender.fit(meta_X, y_val)
return blender

The next step is to use a mixing ensemble to predict new data. This is a two step process. The first step is to use each base model for forecasting. The predictions are then put together and used as input to the mixing model to make the final prediction.

We can use the same cycle as when training the model. That is, collect the predictions of each base model into a training dataset, add the predictions together, and call predict () on the mixing model with that meta-level dataset. Predict_ensemble () functionbelow implements these actions. Given training a list of base models, training an ensemble blender, and a dataset (such as a test dataset or new data), it will return a set of predictions for the dataset.

# make a prediction with the blending ensemble
def predict_ensemble(models, blender, X_test):
# make predictions with base models
meta_X = list()
for name, model in models:
# predict with base model
yhat = model.predict(X_test)
# reshape predictions into a matrix with one column
yhat = yhat.reshape(len(yhat), 1)
# store prediction
meta_X.append(yhat)
# create 2d array from predictions, each set is an input feature
meta_X = hstack(meta_X)
# predict
return blender.predict(meta_X)

We now have all the elements needed to implement a mixed ensemble for classification or predictive regression modeling problems.

Mixed ensemble for classification problem

In this section, we will look at using blending for the classification task. First, we can use the make_classification () function to create a synthetic binary classification problem with 10,000 examples and 20 input features. The entire example is shown below.

# test classification dataset
from sklearn.datasets import make_classification
# define dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)
# summarize the dataset
print(X.shape, y.shape)

Running the example creates a dataset and summarizes the inputs and outputs.

(10000, 20) (10000,)

Next, we need to split the dataset into training and test sets first, and then the training set into a subset used to train the basic models and a subset used to train the metamodel. In this case, we will use a 50-50 split for the training and test sets, and then a 67-33 split for the training and validation sets.

...
# split dataset into train and test sets
X_train_full, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.5, random_state=1)
# split training set into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_train_full, y_train_full, test_size=0.33, random_state=1)
# summarize data split
print('Train: %s, Val: %s, Test: %s' % (X_train.shape, X_val.shape, X_test.shape))

You can then use the get_models () function from the previous section to create the classification models used in the ensemble. The fit_ensemble () function can then be called to train the mixed ensemble on these datasets, and the predict_ensemble () function can be used to make predictions on an independent dataset.

...
# create the base models
models = get_models()
# train the blending ensemble
blender = fit_ensemble(models, X_train, X_val, y_train, y_val)
# make predictions on test set
yhat = predict_ensemble(models, blender, X_test)

Finally, we can evaluate the performance of the blend model by reporting the classification accuracy on the test dataset.

...
# evaluate predictions
score = accuracy_score(y_test, yhat)
print('Blending Accuracy: %.3f' % score)

A complete example of estimating a mixed ensemble in a synthetic binary classification problem is given below.

# blending ensemble for classification using hard voting
from numpy import hstack
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB

# get the dataset
def get_dataset():
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)
return X, y

# get a list of base models
def get_models():
models = list()
models.append(('lr', LogisticRegression()))
models.append(('knn', KNeighborsClassifier()))
models.append(('cart', DecisionTreeClassifier()))
models.append(('svm', SVC()))
models.append(('bayes', GaussianNB()))
return models

# fit the blending ensemble
def fit_ensemble(models, X_train, X_val, y_train, y_val):
# fit all models on the training set and predict on hold out set
meta_X = list()
for name, model in models:
# fit in training set
model.fit(X_train, y_train)
# predict on hold out set
yhat = model.predict(X_val)
# reshape predictions into a matrix with one column
yhat = yhat.reshape(len(yhat), 1)
# store predictions as input for blending
meta_X.append(yhat)
# create 2d array from predictions, each set is an input feature
meta_X = hstack(meta_X)
# define blending model
blender = LogisticRegression()
# fit on predictions from base models
blender.fit(meta_X, y_val)
return blender

# make a prediction with the blending ensemble
def predict_ensemble(models, blender, X_test):
# make predictions with base models
meta_X = list()
for name, model in models:
# predict with base model
yhat = model.predict(X_test)
# reshape predictions into a matrix with one column
yhat = yhat.reshape(len(yhat), 1)
# store prediction
meta_X.append(yhat)
# create 2d array from predictions, each set is an input feature
meta_X = hstack(meta_X)
# predict
return blender.predict(meta_X)

# define dataset
X, y = get_dataset()
# split dataset into train and test sets
X_train_full, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.5, random_state=1)
# split training set into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_train_full, y_train_full, test_size=0.33, random_state=1)
# summarize data split
print('Train: %s, Val: %s, Test: %s' % (X_train.shape, X_val.shape, X_test.shape))
# create the base models
models = get_models()
# train the blending ensemble
blender = fit_ensemble(models, X_train, X_val, y_train, y_val)
# make predictions on test set
yhat = predict_ensemble(models, blender, X_test)
# evaluate predictions
score = accuracy_score(y_test, yhat)
print('Blending Accuracy: %.3f' % (score*100))

Running the example first reports a summary of all datasets and then the precision of the ensemble in the test dataset.

Note: Your results may vary due to the stochastic nature of the algorithm or estimation procedure, or differences in numerical precision. Consider running the example several times and comparing the average result.

Here we see that the mixed ensemble achieved a classification accuracy of about 97.900%.

Train: (3350, 20), Val: (1650, 20), Test: (5000, 20)
Blending Accuracy: 97.900

In the previous example, the prediction of clear class labels was combined using a blending model. This is a type of hard voting . An alternative is a method where each model predicts the class probabilities and uses a metamodel to mix the probabilities. This is a kind of soft vote that can sometimes lead to better performance. First, we need to configure models that return probabilities, such as the SVM model.

# get a list of base models
def get_models():
models = list()
models.append(('lr', LogisticRegression()))
models.append(('knn', KNeighborsClassifier()))
models.append(('cart', DecisionTreeClassifier()))
models.append(('svm', SVC(probability=True)))
models.append(('bayes', GaussianNB()))
return models

Then the underlying models need to be modified to predict probabilities instead of clear class labels. This can be achieved by calling predict_proba () inside fit_ensemble () while training the underlying models.

...
# fit all models on the training set and predict on hold out set
meta_X = list()
for name, model in models:
# fit in training set
model.fit(X_train, y_train)
# predict on hold out set
yhat = model.predict_proba(X_val)
# store predictions as input for blending
meta_X.append(yhat)

This means that the meta-dataset used to train the meta-model will have n columns per classifier, where n is the number of classes in the forecasting problem, in our case there are two classes. We also need to change the predictions made by the base models when using the mixing model to predict new data.

...
# make predictions with base models
meta_X = list()
for name, model in models:
# predict with base model
yhat = model.predict_proba(X_test)
# store prediction
meta_X.append(yhat)

The whole example of using mixing on predicted class probabilities for a synthetic binary classification problem is given below.


# blending ensemble for classification using soft voting
from numpy import hstack
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB

# get the dataset
def get_dataset():
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)
return X, y

# get a list of base models
def get_models():
models = list()
models.append(('lr', LogisticRegression()))
models.append(('knn', KNeighborsClassifier()))
models.append(('cart', DecisionTreeClassifier()))
models.append(('svm', SVC(probability=True)))
models.append(('bayes', GaussianNB()))
return models

# fit the blending ensemble
def fit_ensemble(models, X_train, X_val, y_train, y_val):
# fit all models on the training set and predict on hold out set
meta_X = list()
for name, model in models:
# fit in training set
model.fit(X_train, y_train)
# predict on hold out set
yhat = model.predict_proba(X_val)
# store predictions as input for blending
meta_X.append(yhat)
# create 2d array from predictions, each set is an input feature
meta_X = hstack(meta_X)
# define blending model
blender = LogisticRegression()
# fit on predictions from base models
blender.fit(meta_X, y_val)
return blender

# make a prediction with the blending ensemble
def predict_ensemble(models, blender, X_test):
# make predictions with base models
meta_X = list()
for name, model in models:
# predict with base model
yhat = model.predict_proba(X_test)
# store prediction
meta_X.append(yhat)
# create 2d array from predictions, each set is an input feature
meta_X = hstack(meta_X)
# predict
return blender.predict(meta_X)

# define dataset
X, y = get_dataset()
# split dataset into train and test sets
X_train_full, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.5, random_state=1)
# split training set into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_train_full, y_train_full, test_size=0.33, random_state=1)
# summarize data split
print('Train: %s, Val: %s, Test: %s' % (X_train.shape, X_val.shape, X_test.shape))
# create the base models
models = get_models()
# train the blending ensemble
blender = fit_ensemble(models, X_train, X_val, y_train, y_val)
# make predictions on test set
yhat = predict_ensemble(models, blender, X_test)
# evaluate predictions
score = accuracy_score(y_test, yhat)
print('Blending Accuracy: %.3f' % (score*100))

Running the example first reports a summary of all datasets and then the precision of the ensemble in the test set.

Note: Your results may vary due to the stochastic nature of the algorithm or estimation procedure, or differences in numerical precision. Try the example several times and compare the average result.

Here we see that the mixing of the class probabilities led to an increase in the classification accuracy up to about 98.240%.

Train: (3350, 20), Val: (1650, 20), Test: (5000, 20)
Blending Accuracy: 98.240

A mixed ensemble is only effective if it is able to outperform any of the individual models in it. We can confirm this by evaluating each of the base models separately. Each baseline model can be trained on the entire training dataset (as opposed to a mixing ensemble) and evaluated on the test dataset (just like in a mixing ensemble). The example below demonstrates this by evaluating each baseline model separately.

# evaluate base models on the entire training dataset
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB

# get the dataset
def get_dataset():
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)
return X, y

# get a list of base models
def get_models():
models = list()
models.append(('lr', LogisticRegression()))
models.append(('knn', KNeighborsClassifier()))
models.append(('cart', DecisionTreeClassifier()))
models.append(('svm', SVC(probability=True)))
models.append(('bayes', GaussianNB()))
return models

# define dataset
X, y = get_dataset()
# split dataset into train and test sets
X_train_full, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.5, random_state=1)
# summarize data split
print('Train: %s, Test: %s' % (X_train_full.shape, X_test.shape))
# create the base models
models = get_models()
# evaluate standalone model
for name, model in models:
# fit the model on the training dataset
model.fit(X_train_full, y_train_full)
# make a prediction on the test dataset
yhat = model.predict(X_test)
# evaluate the predictions
score = accuracy_score(y_test, yhat)
# report the score
print('>%s Accuracy: %.3f' % (name, score*100))

Running the example first reports a summary of all three datasets and then the accuracy of each base model in the test set.

Note: Your results may differ due to the stochastic nature of the algorithm, or the estimation procedure, or differences in numerical precision. Try the example several times and compare the average result.

In this case, we see that all models perform worse than the mixed ensemble. The interesting thing is that we can see that the SVM is very close to achieving 98,200% accuracy, compared to 98,240% accuracy achieved with the mixed ensemble.

Train: (5000, 20), Test: (5000, 20)
>lr Accuracy: 87.800
>knn Accuracy: 97.380
>cart Accuracy: 88.200
>svm Accuracy: 98.200
>bayes Accuracy: 87.300

We can choose a mixed ensemble as our final model. This includes training the ensemble on the entire training dataset and making predictions using new examples. In particular, the entire training set is divided into training and validation sets for training the base and metamodels, respectively, and then the ensemble can be used in forecasting. A complete example of forecasting new data using a mixed ensemble for classification looks like this:

# example of making a prediction with a blending ensemble for classification
from numpy import hstack
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB

# get the dataset
def get_dataset():
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)
return X, y

# get a list of base models
def get_models():
models = list()
models.append(('lr', LogisticRegression()))
models.append(('knn', KNeighborsClassifier()))
models.append(('cart', DecisionTreeClassifier()))
models.append(('svm', SVC(probability=True)))
models.append(('bayes', GaussianNB()))
return models

# fit the blending ensemble
def fit_ensemble(models, X_train, X_val, y_train, y_val):
# fit all models on the training set and predict on hold out set
meta_X = list()
for _, model in models:
# fit in training set
model.fit(X_train, y_train)
# predict on hold out set
yhat = model.predict_proba(X_val)
# store predictions as input for blending
meta_X.append(yhat)
# create 2d array from predictions, each set is an input feature
meta_X = hstack(meta_X)
# define blending model
blender = LogisticRegression()
# fit on predictions from base models
blender.fit(meta_X, y_val)
return blender

# make a prediction with the blending ensemble
def predict_ensemble(models, blender, X_test):
# make predictions with base models
meta_X = list()
for _, model in models:
# predict with base model
yhat = model.predict_proba(X_test)
# store prediction
meta_X.append(yhat)
# create 2d array from predictions, each set is an input feature
meta_X = hstack(meta_X)
# predict
return blender.predict(meta_X)

# define dataset
X, y = get_dataset()
# split dataset set into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.33, random_state=1)
# summarize data split
print('Train: %s, Val: %s' % (X_train.shape, X_val.shape))
# create the base models
models = get_models()
# train the blending ensemble
blender = fit_ensemble(models, X_train, X_val, y_train, y_val)
# make a prediction on a new row of data
row = [-0.30335011, 2.68066314, 2.07794281, 1.15253537, -2.0583897, -2.51936601, 0.67513028, -3.20651939, -1.60345385, 3.68820714, 0.05370913, 1.35804433, 0.42011397, 1.4732839, 2.89997622, 1.61119399, 7.72630965, -2.84089477, -1.83977415, 1.34381989]
yhat = predict_ensemble(models, blender, [row])
# summarize prediction
print('Predicted Class: %d' % (yhat))

Running the example trains a mixed ensemble model on a dataset and then uses it to predict a new row of data, as it would if using the model in an application.

Train: (6700, 20), Val: (3300, 20)
Predicted Class: 1

Let's look at how we might evaluate the mixed ensemble for regression.

Mixed ensemble for regression problem

In this section, we will look at using batching for a regression problem. First, we can use the make_regression () function to create a synthetic regression problem with 10,000 samples and 20 input features. The entire example is shown below.

# test regression dataset
from sklearn.datasets import make_regression
# define dataset
X, y = make_regression(n_samples=10000, n_features=20, n_informative=10, noise=0.3, random_state=7)
# summarize the dataset
print(X.shape, y.shape)

Running the example creates a dataset and summarizes the input and output components.

(10000, 20) (10000,)

Next, you can define a list of regression models to use as baseline. In this case, we use linear regression, kNN, decision tree and SVM models.

# get a list of base models
def get_models():
models = list()
models.append(('lr', LinearRegression()))
models.append(('knn', KNeighborsRegressor()))
models.append(('cart', DecisionTreeRegressor()))
models.append(('svm', SVR()))
return models

The fit_ensemble () function used to train the ensemble does not change, except that the model used for mixing must be changed to regression. Here we use a linear regression model.

...
# define blending model
blender = LinearRegression()

Given that this is a regression problem, we will evaluate the performance of the model using the error metric, in this case the average absolute error, or (abbreviated) MAE.

...
# evaluate predictions
score = mean_absolute_error(y_test, yhat)
print('Blending MAE: %.3f' % score)

The entire example of a mixed ensemble for a synthetic regression predictive modeling problem is shown below:

# evaluate blending ensemble for regression
from numpy import hstack
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR

# get the dataset
def get_dataset():
X, y = make_regression(n_samples=10000, n_features=20, n_informative=10, noise=0.3, random_state=7)
return X, y

# get a list of base models
def get_models():
models = list()
models.append(('lr', LinearRegression()))
models.append(('knn', KNeighborsRegressor()))
models.append(('cart', DecisionTreeRegressor()))
models.append(('svm', SVR()))
return models

# fit the blending ensemble
def fit_ensemble(models, X_train, X_val, y_train, y_val):
# fit all models on the training set and predict on hold out set
meta_X = list()
for name, model in models:
# fit in training set
model.fit(X_train, y_train)
# predict on hold out set
yhat = model.predict(X_val)
# reshape predictions into a matrix with one column
yhat = yhat.reshape(len(yhat), 1)
# store predictions as input for blending
meta_X.append(yhat)
# create 2d array from predictions, each set is an input feature
meta_X = hstack(meta_X)
# define blending model
blender = LinearRegression()
# fit on predictions from base models
blender.fit(meta_X, y_val)
return blender

# make a prediction with the blending ensemble
def predict_ensemble(models, blender, X_test):
# make predictions with base models
meta_X = list()
for name, model in models:
# predict with base model
yhat = model.predict(X_test)
# reshape predictions into a matrix with one column
yhat = yhat.reshape(len(yhat), 1)
# store prediction
meta_X.append(yhat)
# create 2d array from predictions, each set is an input feature
meta_X = hstack(meta_X)
# predict
return blender.predict(meta_X)

# define dataset
X, y = get_dataset()
# split dataset into train and test sets
X_train_full, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.5, random_state=1)
# split training set into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_train_full, y_train_full, test_size=0.33, random_state=1)
# summarize data split
print('Train: %s, Val: %s, Test: %s' % (X_train.shape, X_val.shape, X_test.shape))
# create the base models
models = get_models()
# train the blending ensemble
blender = fit_ensemble(models, X_train, X_val, y_train, y_val)
# make predictions on test set
yhat = predict_ensemble(models, blender, X_test)
# evaluate predictions
score = mean_absolute_error(y_test, yhat)
print('Blending MAE: %.3f' % score)

The example first prints a summary of the three datasets and then the MAE on the test set.

Note: Your results may vary due to the stochastic nature of the algorithm or estimation procedure, or differences in numerical precision. Try the example several times and compare the average result.

Here we see that the ensemble reached an MAE of about 0.237 on the test dataset.

Train: (3350, 20), Val: (1650, 20), Test: (5000, 20)
Blending MAE: 0.237

As with classification, a mixed ensemble is only useful if it performs better than any of the basic ensemble models.

We can test this by evaluating each baseline model in isolation, first training it on the entire training set (as opposed to an ensemble) and making predictions on the test dataset (as in an ensemble). In the example below, each of the baseline models is estimated in isolation against a synthetic predictive regression modeling dataset.

# evaluate base models in isolation on the regression dataset
from numpy import hstack
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR

# get the dataset
def get_dataset():
X, y = make_regression(n_samples=10000, n_features=20, n_informative=10, noise=0.3, random_state=7)
return X, y

# get a list of base models
def get_models():
models = list()
models.append(('lr', LinearRegression()))
models.append(('knn', KNeighborsRegressor()))
models.append(('cart', DecisionTreeRegressor()))
models.append(('svm', SVR()))
return models

# define dataset
X, y = get_dataset()
# split dataset into train and test sets
X_train_full, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.5, random_state=1)
# summarize data split
print('Train: %s, Test: %s' % (X_train_full.shape, X_test.shape))
# create the base models
models = get_models()
# evaluate standalone model
for name, model in models:
# fit the model on the training dataset
model.fit(X_train_full, y_train_full)
# make a prediction on the test dataset
yhat = model.predict(X_test)
# evaluate the predictions
score = mean_absolute_error(y_test, yhat)
# report the score
print('>%s MAE: %.3f' % (name, score))

Running the example first summarizes the training and test sets and then the MAE of each base model in the test dataset.

Note: Your results may vary due to the stochastic nature of the algorithm or estimation procedure, or differences in numerical precision. Try the example several times and compare the average result.

Here you can see that in fact the linear regression model performed slightly better than the mixed ensemble, reaching an MAE of 0.236 compared to 0.237. This may have to do with how the synthetic dataset was built.

However, in this case, we would prefer to use a linear regression model for exactly this task. This situation underscores the importance of validating the performance of the contributing models before accepting the ensemble model as final.

Train: (5000, 20), Test: (5000, 20)
>lr MAE: 0.236
>knn MAE: 100.169
>cart MAE: 133.744
>svm MAE: 138.195

Again, you can apply the mixed ensemble as the final regression model. The approach involves splitting the entire dataset into training and test sets in order to train the baseline and metamodels on them, respectively, then the ensemble can be used to predict a new data row. A complete example of predicting new data using a mixed ensemble for a regression problem is given below.

# example of making a prediction with a blending ensemble for regression
from numpy import hstack
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR

# get the dataset
def get_dataset():
X, y = make_regression(n_samples=10000, n_features=20, n_informative=10, noise=0.3, random_state=7)
return X, y

# get a list of base models
def get_models():
models = list()
models.append(('lr', LinearRegression()))
models.append(('knn', KNeighborsRegressor()))
models.append(('cart', DecisionTreeRegressor()))
models.append(('svm', SVR()))
return models

# fit the blending ensemble
def fit_ensemble(models, X_train, X_val, y_train, y_val):
# fit all models on the training set and predict on hold out set
meta_X = list()
for _, model in models:
# fit in training set
model.fit(X_train, y_train)
# predict on hold out set
yhat = model.predict(X_val)
# reshape predictions into a matrix with one column
yhat = yhat.reshape(len(yhat), 1)
# store predictions as input for blending
meta_X.append(yhat)
# create 2d array from predictions, each set is an input feature
meta_X = hstack(meta_X)
# define blending model
blender = LinearRegression()
# fit on predictions from base models
blender.fit(meta_X, y_val)
return blender

# make a prediction with the blending ensemble
def predict_ensemble(models, blender, X_test):
# make predictions with base models
meta_X = list()
for _, model in models:
# predict with base model
yhat = model.predict(X_test)
# reshape predictions into a matrix with one column
yhat = yhat.reshape(len(yhat), 1)
# store prediction
meta_X.append(yhat)
# create 2d array from predictions, each set is an input feature
meta_X = hstack(meta_X)
# predict
return blender.predict(meta_X)

# define dataset
X, y = get_dataset()
# split dataset set into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.33, random_state=1)
# summarize data split
print('Train: %s, Val: %s' % (X_train.shape, X_val.shape))
# create the base models
models = get_models()
# train the blending ensemble
blender = fit_ensemble(models, X_train, X_val, y_train, y_val)
# make a prediction on a new row of data
row = [-0.24038754, 0.55423865, -0.48979221, 1.56074459, -1.16007611, 1.10049103, 1.18385406, -1.57344162, 0.97862519, -0.03166643, 1.77099821, 1.98645499, 0.86780193, 2.01534177, 2.51509494, -1.04609004, -0.19428148, -0.05967386, -2.67168985, 1.07182911]
yhat = predict_ensemble(models, blender, [row])
# summarize prediction
print('Predicted: %.3f' % (yhat[0]))

Running the example trains an ensemble model on a dataset and then uses it to predict a new row of data, as it would if using the model in an application.

Train: (6700, 20), Val: (3300, 20)
Predicted: 359.986

This section contains resources on this topic if you want to dive deeper into it.

Feature-Weighted Linear Stacking, 2009.
The BellKor 2008 Solution to the Netflix Prize, 2008.
Kaggle Ensemble Guide, MLWave, 2015.

Netflix, Wikipedia.

And don't forget about the promo code HABR , which adds 10% to the banner discount.

More courses

Machine Learning: Mixing Ensemble in Python

Tutorial overview

Mixed ensemble

Mixed ensemble development

Mixed ensemble for classification problem

Mixed ensemble for regression problem

Recommended articles

More articles: