We previously introduced a method of deriving large loss claims inflation from a large loss claims bordereaux, and we then spent some time understanding how robust the method is depending on how much data we have, and how volatile the data is. In this post we're finally going to play around with making the method more accurate, rather than just poking holes in it. To do this, we are once again going to simulate data with a baked-in inflation rate (set to 5% here), and then we are going to vary the metric we are using to extract an estimate of the inflation from the data. In particular, we are going to look at using the Nth largest loss by year, where we will vary N from 1 - 20.
Photo by Julian Dik. I was recently in Losbon, so here is a cool photo of the city. Not really related to the blog post, but to be honest it's hard thinking of photos with some link to inflation, so I'm just picking nice photos as this point!
Here is our python code:
import numpy as np import pandas as pd import scipy.stats as scipy from math import exp from math import log from math import sqrt from scipy.stats import lognorm from scipy.stats import poisson from scipy.stats import linregress import matplotlib.pyplot as plt
Distmean = 1000000.0 DistStdDev = Distmean*1.5 AverageFreq = 100 years = 20 ExposureGrowth = 0.0 Mu = log(Distmean/(sqrt(1+DistStdDev**2/Distmean**2))) Sigma = sqrt(log(1+DistStdDev**2/Distmean**2)) LLThreshold = 1e6 Inflation = 0.05 s = Sigma scale = exp(Mu) MedianTop10Method =  AllLnOutput = 
for sim in range(10000): SimOutputLL =  year = 0 Frequency=  for year in range(years): FrequencyInc = poisson.rvs(AverageFreq*(1+ExposureGrowth)**year,size = 1) Frequency.append(FrequencyInc) r = lognorm.rvs(s,scale = scale, size = FrequencyInc) r = np.multiply(r,(1+Inflation)**year) r = np.sort(r)[::-1] r_LLOnly = r[(r>= LLThreshold)] SimOutputLL.append(np.transpose(r_LLOnly)) SimOutputLL = pd.DataFrame(SimOutputLL).transpose() AllLnOutputSim =  MedianTop10MethodSim =  for iRow in range(1,21): a = np.log(SimOutputLL.iloc[iRow]) AllLnOutputSim.append(a) b = linregress(a.index,a).slope MedianTop10MethodSim.append(b) AllLnOutput.append(AllLnOutputSim) MedianTop10Method.append(MedianTop10MethodSim) AllLnOutputdf = pd.DataFrame(AllLnOutput) OutputMean =  OutputStdDev =  dfMedianTop10Method= pd.DataFrame(MedianTop10Method) for iRow in range(0,20): OutputMean.append(np.mean(np.exp(dfMedianTop10Method[iRow]) -1)) OutputStdDev.append(np.std(np.exp(dfMedianTop10Method[iRow]) -1)) print(OutputMean) print(OutputStdDev) plt.plot(OutputStdDev) plt.xlabel('Nth value to use') plt.ylabel('Standard Deviation') plt.title('Standard Deviation vs Nth large loss')
[0.050145087623911413, 0.050114063936035846, 0.05008171646318982, 0.05004295735703755, 0.050032712657938, 0.050014427471083624, 0.05001184154389972, 0.05002322239170273, 0.05000059530333641, 0.04998014653293243, 0.05002513041036643, 0.049987258053581535, 0.049978122421359926, 0.04999577129950346, 0.04997788849739706, 0.04992582226827509, 0.04988217239699916, 0.049813271987513466, 0.0497039041388662, 0.04951880036014647] [0.013745415886517658, 0.011764812681484629, 0.010643862238651042, 0.009893413918701141, 0.00929811434851868, 0.008829621751760679, 0.00852287511743378, 0.00827148148117868, 0.008059106267070865, 0.007896990729065285, 0.0077614481203254, 0.007619604123042998, 0.007507459106587108, 0.00741696242472465, 0.007320604159513552, 0.007224126566330418, 0.0071451595403478, 0.0070669619394482614, 0.007013056254295168, 0.0069269062482146026]
Text(0.5, 1.0, 'Standard Deviation vs Nth large loss')
In [ ]:
Let's check this graph against our previous runs. We know that when looking at the 5th largest loss, i.e. the median of the top 10, we had a standard deviation of around 0.9%. We can see in the above graph, when reading off 5 from the x-axis, that this matches. So far so good.
What is the graph telling us? It's saying that as we increase the value of N, we get a decrease in the standard deviation of our estimate. It's initially quite a big reduction, going from the largest loss by year against the 5th largest loss results in a 35% reduction in standard deviation, but then it levels off as we continue. Going from the 5th largest loss to the 10 largest loss only results in an approx. 15% reduction.
This is good info to know! It means that we can immediately improve our estimate of claims inflation, just by using a lower attachment. You sometimes see people comparing the 'largest inflated loss' by year as a sense check on their inflation rate, but what this chart is telling us is actually that's the most volatile metric you can pick out of this range, and even the 2nd largest loss would be an improvement.
So why do we have a lower standard deviation of estimate when we use a higher Nth value? I suspect this is driven by the lower volatility inherent in the metric. i.e. the 10th largest claim in a dataset, should have lower volatility than the 9th largest claim, and the 8th largest claim, etc.. So as we move down the distribution, we are implicitly reducing the noise in our data, and that means our estimate will be more accurate.
Next time I'm going to examine this phenomenon further. It would be really cool if we could somehow link this improved accuracy, with the results we obtain previously around the improvement in accuracy related to the reduction in volatility of the ground-up loss distribution.
I work as an actuary and underwriter at a global reinsurer in London.