In which we set up Kfold Cross Validation to assess model performance, spend quite a while tweaking our model, use hyperparameter tuning, but then end up not actually improving our model. Source: https://somewan.design There are a few things I'd like to accomplish today, first I'd like to generate some sort of accuracy score just using the training data. So we're going to set up Kfold cross validation and explore how that works. Secondly there's some minor tweaks I'd like to make to the variables  encoding some of them differently, and adding a couple of new variables. Finally we're going to run hyperparameter tuning to see if that increases our performance. Once we've got Kfold CV set up we'll be able to investigate this improvement in performance without uploading to Kaggle which will be interesting. Mobile users  once again, if the below is not rendering well, please try rotating to landscape view.
In [1]:
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
In [2]:
path = "C:\Work\Machine Learning experiments\Kaggle\House Price"
os.chdir(path)
In [3]:
train_data = pd.read_csv("train.csv")
test_data = pd.read_csv("test.csv")
In [4]:
features = ['Neighborhood','OverallQual','OverallCond','BldgType','HouseStyle']
Y = train_data['SalePrice']
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])
columns = X.columns
ColList = columns.tolist()
missing_cols = set( X_test.columns )  set( X.columns )
missing_cols2 = set( X.columns )  set( X_test.columns )
# Add a missing column in test set with default value equal to 0
for c in missing_cols:
X[c] = 0
for c in missing_cols2:
X_test[c] = 0
# Ensure the order of column in the test set is in the same order than in train set
X = X[X_test.columns]
RFmodel = RandomForestRegressor(random_state=1)
RFmodel.fit(X,Y)
Out[4]:
RandomForestRegressor(random_state=1)
In [5]:
# Let's see how this model scores using cross validation
scores = cross_val_score(RFmodel, X, Y, cv=10)
scores
[scores.mean(), scores.std()]
Out[5]:
[0.7499322339157881, 0.029658128350339423]
In [6]:
# Okay we scored 75% according to the CV measure, let's run the same process on
# the other models we've set up so far, first the version where we drop all columns
# which have an Nan in them
TrainColumns = train_data.columns
frames = [train_data, test_data]
Total_data = pd.concat(frames)
features = []
featuresNotUsed = []
for i in TrainColumns:
if not(Total_data[i].isnull().values.any()):
if not(i == 'Id'):
if not(i == 'SalePrice'):
features.append(i)
else:
featuresNotUsed.append(i)
Y = train_data['SalePrice']
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])
columns = X.columns
ColList = columns.tolist()
missing_cols = set( X_test.columns )  set( X.columns )
missing_cols2 = set( X.columns )  set( X_test.columns )
# Add a missing column in test set with default value equal to 0
for c in missing_cols:
X[c] = 0
for c in missing_cols2:
X_test[c] = 0
# Ensure the order of column in the test set is in the same order than in train set
X = X[X_test.columns]
RFmodel = RandomForestRegressor(random_state=1)
RFmodel.fit(X,Y)
predictions = RFmodel.predict(X_test)
In [7]:
scores = cross_val_score(RFmodel, X, Y, cv=10)
scores
[scores.mean(), scores.std()]
Out[7]:
[0.8408222755620794, 0.048773947402117185]
In [8]:
# Okay our score has improved, we're now at around 85%, let's try the version
# where we fill in all the Nan with 0s
features= TrainColumns[1:79]
for i in features:
train_data[i] = train_data[i].fillna(0)
test_data[i] = test_data[i].fillna(0)
Y = train_data['SalePrice']
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])
columns = X.columns
ColList = columns.tolist()
missing_cols = set( X_test.columns )  set( X.columns )
missing_cols2 = set( X.columns )  set( X_test.columns )
# Add a missing column in test set with default value equal to 0
for c in missing_cols:
X[c] = 0
for c in missing_cols2:
X_test[c] = 0
# Ensure the order of column in the test set is in the same order than in train set
X = X[X_test.columns]
RFmodel = RandomForestRegressor(random_state=1)
RFmodel.fit(X,Y)
Out[8]:
RandomForestRegressor(random_state=1)
In [9]:
scores = cross_val_score(RFmodel, X, Y, cv=10)
scores
[scores.mean(), scores.std()]
Out[9]:
[0.858899065803126, 0.05176910554982082]
In [10]:
# Okay, we've used CV and tested our three models 'internally' rather
# than against the test set, and we've got broadly similar answers.
# Our first model scored about 75%, our second 85%, and our third 86%.
# Interestingly, our Std Dev went up at each step, which matches what
# we would expect, our model has become more accurate but at the cost of
# becoming more complex, and moving towards overfitting
# Let's run some hyperparameter tuning on our most successful model
#%%Set up space of possible hyperparams
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4]
# Method of selecting samples for training each tree
bootstrap = [True, False]
# Create the random grid
random_grid = {'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf,
'bootstrap': bootstrap}
print(random_grid)
{'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000], 'max_features': ['auto', 'sqrt'], 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 2, 4], 'bootstrap': [True, False]}
In [11]:
RFmodel = RandomForestRegressor(random_state=1)
# Random search of parameters, using 3 fold cross validation,
# search across 100 different combinations, and use all available cores
RFmodel_random = RandomizedSearchCV(estimator = RFmodel, param_distributions = random_grid, n_iter = 10, cv = 3, verbose=2, random_state=1, n_jobs = 1)
# Fit the random search model
RFmodel_random.fit(X, Y)
Fitting 3 folds for each of 10 candidates, totalling 30 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 30 out of 30  elapsed: 2.4min finished
Out[11]:
RandomizedSearchCV(cv=3, estimator=RandomForestRegressor(random_state=1), n_jobs=1, param_distributions={'bootstrap': [True, False], 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None], 'max_features': ['auto', 'sqrt'], 'min_samples_leaf': [1, 2, 4], 'min_samples_split': [2, 5, 10], 'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}, random_state=1, verbose=2)
In [13]:
RFmodel_random.best_params_
Out[13]:
{'n_estimators': 1600, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 'sqrt', 'max_depth': 40, 'bootstrap': False}
In [14]:
scores = cross_val_score(RFmodel_random, X, Y, cv=10)
scores
[scores.mean(), scores.std()]
Fitting 3 folds for each of 10 candidates, totalling 30 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 30 out of 30  elapsed: 1.9min finished Fitting 3 folds for each of 10 candidates, totalling 30 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 30 out of 30  elapsed: 1.9min finished Fitting 3 folds for each of 10 candidates, totalling 30 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 30 out of 30  elapsed: 1.9min finished Fitting 3 folds for each of 10 candidates, totalling 30 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 30 out of 30  elapsed: 1.9min finished Fitting 3 folds for each of 10 candidates, totalling 30 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 30 out of 30  elapsed: 2.1min finished Fitting 3 folds for each of 10 candidates, totalling 30 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 30 out of 30  elapsed: 1.9min finished Fitting 3 folds for each of 10 candidates, totalling 30 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 30 out of 30  elapsed: 2.0min finished Fitting 3 folds for each of 10 candidates, totalling 30 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 30 out of 30  elapsed: 2.0min finished Fitting 3 folds for each of 10 candidates, totalling 30 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 30 out of 30  elapsed: 1.9min finished Fitting 3 folds for each of 10 candidates, totalling 30 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 30 out of 30  elapsed: 1.9min finished
Out[14]:
[0.8686354196314181, 0.039199207651046085]
In [15]:
# While setting up the previous version I realised that we'd missed out
# one variable  Sale Condition had not been included, let's quickly
# rerun the model, but including this.
train_data = pd.read_csv("train.csv")
test_data = pd.read_csv("test.csv")
features= TrainColumns[1:80]
for i in features:
train_data[i] = train_data[i].fillna(0)
test_data[i] = test_data[i].fillna(0)
Y = train_data['SalePrice']
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])
columns = X.columns
ColList = columns.tolist()
missing_cols = set( X_test.columns )  set( X.columns )
missing_cols2 = set( X.columns )  set( X_test.columns )
# Add a missing column in test set with default value equal to 0
for c in missing_cols:
X[c] = 0
for c in missing_cols2:
X_test[c] = 0
# Ensure the order of column in the test set is in the same order than in train set
X = X[X_test.columns]
RFmodel = RandomForestRegressor(random_state=1)
RFmodel.fit(X,Y)
Out[15]:
RandomForestRegressor(random_state=1)
In [16]:
scores = cross_val_score(RFmodel, X, Y, cv=10)
scores
[scores.mean(), scores.std()]
Out[16]:
[0.859128262549325, 0.0536366209565343]
In [17]:
# Let's add try a few tweaks and see how our score changes
# I've been reading a few online tutorials, and there are a
# few things people have done which seem to like good ideas
# First  some of the categorical variables have an ordering
# whereas we've encoded them just using onehotencoding
# e.g. Exterior Quality comes in the form 'good/average/excellent'
# Whereas we've ignored this ordering
# Taken from the following:
# https://www.kaggle.com/serigne/stackedregressionstop4onleaderboard
train_data = pd.read_csv("train.csv")
test_data = pd.read_csv("test.csv")
frames = [train_data, test_data]
total_data = pd.concat(frames)
cols = ('FireplaceQu', 'BsmtQual', 'BsmtCond', 'GarageQual', 'GarageCond',
'ExterQual', 'ExterCond','HeatingQC', 'PoolQC', 'KitchenQual', 'BsmtFinType1',
'BsmtFinType2', 'Functional', 'Fence', 'BsmtExposure', 'GarageFinish', 'LandSlope',
'LotShape', 'PavedDrive', 'Street', 'Alley', 'CentralAir', 'MSSubClass', 'OverallCond',
'YrSold', 'MoSold')
# process columns, apply LabelEncoder to categorical features
for c in cols:
lbl = LabelEncoder()
lbl.fit(list(total_data[c].values))
total_data[c] = lbl.transform(list(total_data[c].values))
train_data = total_data[0:1460]
test_data = total_data[1460:2919]
features = total_data.columns
features = features.drop('Id')
features = features.drop('SalePrice')
features
Out[17]:
Index(['MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1', 'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual', 'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual', 'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC', 'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType', 'SaleCondition'], dtype='object')
In [18]:
pd.options.mode.chained_assignment = None # default='warn'
In [19]:
for i in features:
train_data[i] = train_data[i].fillna(0)
test_data[i] = test_data[i].fillna(0)
Y = train_data['SalePrice']
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])
columns = X.columns
ColList = columns.tolist()
missing_cols = set( X_test.columns )  set( X.columns )
missing_cols2 = set( X.columns )  set( X_test.columns )
# Add a missing column in test set with default value equal to 0
for c in missing_cols:
X[c] = 0
for c in missing_cols2:
X_test[c] = 0
# Ensure the order of column in the test set is in the same order than in train set
X = X[X_test.columns]
RFmodel = RandomForestRegressor(random_state=1)
RFmodel.fit(X,Y)
Out[19]:
RandomForestRegressor(random_state=1)
In [20]:
scores = cross_val_score(RFmodel, X, Y, cv=10)
scores
[scores.mean(), scores.std()]
Out[20]:
[0.8602885832464457, 0.05312459024978507]
In [21]:
# Okay, so that change seems to have done very little to improve the model
# we've decreased our score by less than 1%
In [22]:
# Let's try one more thing  let's add another variable corresponding
# to total sqr footage, note this is also taken from :
# https://www.kaggle.com/serigne/stackedregressionstop4onleaderboard
train_data = pd.read_csv("train.csv")
test_data = pd.read_csv("test.csv")
In [23]:
features= TrainColumns[1:80]
for i in features:
train_data[i] = train_data[i].fillna(0)
test_data[i] = test_data[i].fillna(0)
frames = [train_data, test_data]
total_data = pd.concat(frames)
total_data['TotalSF'] = total_data['TotalBsmtSF'] + total_data['1stFlrSF'] + total_data['2ndFlrSF']
In [24]:
features = total_data.columns
features = features.drop('Id')
features = features.drop('SalePrice')
features
Out[24]:
Index(['MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1', 'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual', 'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual', 'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC', 'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType', 'SaleCondition', 'TotalSF'], dtype='object')
In [25]:
cols = ('FireplaceQu', 'BsmtQual', 'BsmtCond', 'GarageQual', 'GarageCond',
'ExterQual', 'ExterCond','HeatingQC', 'PoolQC', 'KitchenQual', 'BsmtFinType1',
'BsmtFinType2', 'Functional', 'Fence', 'BsmtExposure', 'GarageFinish', 'LandSlope',
'LotShape', 'PavedDrive', 'Street', 'Alley', 'CentralAir', 'MSSubClass', 'OverallCond',
'YrSold', 'MoSold')
# process columns, apply LabelEncoder to categorical features
for c in cols:
lbl = LabelEncoder()
lbl.fit(list(total_data[c].values))
total_data[c] = lbl.transform(list(total_data[c].values))
train_data = total_data[0:1460]
test_data = total_data[1460:2919]
Y = train_data['SalePrice']
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])
columns = X.columns
ColList = columns.tolist()
missing_cols = set( X_test.columns )  set( X.columns )
missing_cols2 = set( X.columns )  set( X_test.columns )
# Add a missing column in test set with default value equal to 0
for c in missing_cols:
X[c] = 0
for c in missing_cols2:
X_test[c] = 0
# Ensure the order of column in the test set is in the same order than in train set
X = X[X_test.columns]
RFmodel = RandomForestRegressor(random_state=1)
RFmodel.fit(X,Y)
Out[25]:
RandomForestRegressor(random_state=1)
In [26]:
scores = cross_val_score(RFmodel, X, Y, cv=10)
scores
[scores.mean(), scores.std()]
Out[26]:
[0.8667013439741871, 0.04532036910923081]
In [ ]:
# This actually appears to improve our model!
# Let's quickly run hyperparameter tuning on this version
# which will combine the best of all we have done above
In [27]:
#%%Set up space of possible hyperparams
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4]
# Method of selecting samples for training each tree
bootstrap = [True, False]
# Create the random grid
random_grid = {'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf,
'bootstrap': bootstrap}
In [28]:
RFmodel = RandomForestRegressor(random_state=1)
# Random search of parameters, using 3 fold cross validation,
# search across 100 different combinations, and use all available cores
RFmodel_random = RandomizedSearchCV(estimator = RFmodel, param_distributions = random_grid, n_iter = 30, cv = 3, verbose=2, random_state=1, n_jobs = 1)
# Fit the random search model
RFmodel_random.fit(X, Y)
predictions = RFmodel_random.predict(X_test)
Fitting 3 folds for each of 30 candidates, totalling 90 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 25 tasks  elapsed: 1.6min [Parallel(n_jobs=1)]: Done 90 out of 90  elapsed: 4.6min finished
In [29]:
RFmodel_random.best_params_
Out[29]:
{'n_estimators': 1600, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 'sqrt', 'max_depth': 40, 'bootstrap': False}
In [30]:
RFmodel = RandomForestRegressor(n_estimators = 1600,
min_samples_split = 2,
min_samples_leaf = 1,
max_features = 'sqrt',
max_depth= 40,
bootstrap= False,
random_state=1)
In [31]:
scores = cross_val_score(RFmodel, X, Y, cv=10)
scores
[scores.mean(), scores.std()]
Out[31]:
[0.877473773223001, 0.03875787125404092]
In [65]:
scores = cross_val_score(RFmodel_random, X, Y, cv=5)
scores
[scores.mean(), scores.std()]
Fitting 3 folds for each of 30 candidates, totalling 90 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 25 tasks  elapsed: 1.1min [Parallel(n_jobs=1)]: Done 90 out of 90  elapsed: 3.5min finished Fitting 3 folds for each of 30 candidates, totalling 90 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 25 tasks  elapsed: 1.2min [Parallel(n_jobs=1)]: Done 90 out of 90  elapsed: 3.3min finished Fitting 3 folds for each of 30 candidates, totalling 90 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 25 tasks  elapsed: 1.1min [Parallel(n_jobs=1)]: Done 90 out of 90  elapsed: 3.2min finished Fitting 3 folds for each of 30 candidates, totalling 90 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 25 tasks  elapsed: 1.1min [Parallel(n_jobs=1)]: Done 90 out of 90  elapsed: 3.1min finished Fitting 3 folds for each of 30 candidates, totalling 90 fits [Parallel(n_jobs=1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=1)]: Done 25 tasks  elapsed: 1.1min [Parallel(n_jobs=1)]: Done 90 out of 90  elapsed: 3.1min finished
Out[65]:
[0.8769289170485848, 0.02251117120317736]
In [66]:
output = pd.DataFrame({'ID': test_data.Id, 'SalePrice': predictions})
output.to_csv('my_submission  V4  RF.csv',index=False)
And all that remains is to upload our latest submission as see how it performs. Fingers crossed. And our score has not actually improved! hmmmm.. I'm slightly stumped here as I really thought it would improve given we've tided up our variables, and we've added hyperparameter tuning. There's a few possibilities:
So that's all for today. Slightly disappointing that we didn't improve out score, but we did learn a few new tricks which will hopefully come in useful later on. Tune in next time when we have our final attempt at the House Price Kaggle competition. 
AuthorI work as a pricing actuary at a reinsurer in London. Categories
All
Archives
April 2021

Leave a Reply.