We previously introduced a method of deriving large loss claims inflation from a large loss claims bordereaux, and we then spent some time understanding how robust the method is depending on how much data we have, and how volatile the data is. In this post we're finally going to play around with making the method more accurate, rather than just poking holes in it. To do this, we are once again going to simulate data with a baked-in inflation rate (set to 5% here), and then we are going to vary the metric we are using to extract an estimate of the inflation from the data. In particular, we are going to look at using the Nth largest loss by year, where we will vary N from 1 - 20.

Here is our python code:

In [1]:

`import numpy as npimport pandas as pdimport scipy.stats as scipyfrom math import expfrom math import logfrom math import sqrtfrom scipy.stats import lognormfrom scipy.stats import poissonfrom scipy.stats import linregressimport matplotlib.pyplot as plt`

In [2]:

`Distmean = 1000000.0DistStdDev = Distmean*1.5AverageFreq = 100years = 20ExposureGrowth = 0.0Mu = log(Distmean/(sqrt(1+DistStdDev**2/Distmean**2)))Sigma = sqrt(log(1+DistStdDev**2/Distmean**2))LLThreshold = 1e6Inflation = 0.05s = Sigmascale = exp(Mu)MedianTop10Method = []AllLnOutput = []`

In [4]:

`for sim in range(10000): SimOutputLL = [] year = 0 Frequency= [] for year in range(years): FrequencyInc = poisson.rvs(AverageFreq*(1+ExposureGrowth)**year,size = 1) Frequency.append(FrequencyInc) r = lognorm.rvs(s,scale = scale, size = FrequencyInc[0]) r = np.multiply(r,(1+Inflation)**year) r = np.sort(r)[::-1] r_LLOnly = r[(r>= LLThreshold)] SimOutputLL.append(np.transpose(r_LLOnly)) SimOutputLL = pd.DataFrame(SimOutputLL).transpose() AllLnOutputSim = [] MedianTop10MethodSim = [] for iRow in range(1,21): a = np.log(SimOutputLL.iloc[iRow]) AllLnOutputSim.append(a) b = linregress(a.index,a).slope MedianTop10MethodSim.append(b) AllLnOutput.append(AllLnOutputSim) MedianTop10Method.append(MedianTop10MethodSim) AllLnOutputdf = pd.DataFrame(AllLnOutput)OutputMean = []OutputStdDev = []dfMedianTop10Method= pd.DataFrame(MedianTop10Method)for iRow in range(0,20): OutputMean.append(np.mean(np.exp(dfMedianTop10Method[iRow]) -1)) OutputStdDev.append(np.std(np.exp(dfMedianTop10Method[iRow]) -1)) print(OutputMean)print(OutputStdDev)plt.plot(OutputStdDev)plt.xlabel('Nth value to use')plt.ylabel('Standard Deviation')plt.title('Standard Deviation vs Nth large loss')`

[0.050145087623911413, 0.050114063936035846, 0.05008171646318982, 0.05004295735703755, 0.050032712657938, 0.050014427471083624, 0.05001184154389972, 0.05002322239170273, 0.05000059530333641, 0.04998014653293243, 0.05002513041036643, 0.049987258053581535, 0.049978122421359926, 0.04999577129950346, 0.04997788849739706, 0.04992582226827509, 0.04988217239699916, 0.049813271987513466, 0.0497039041388662, 0.04951880036014647][0.013745415886517658, 0.011764812681484629, 0.010643862238651042, 0.009893413918701141, 0.00929811434851868, 0.008829621751760679, 0.00852287511743378, 0.00827148148117868, 0.008059106267070865, 0.007896990729065285, 0.0077614481203254, 0.007619604123042998, 0.007507459106587108, 0.00741696242472465, 0.007320604159513552, 0.007224126566330418, 0.0071451595403478, 0.0070669619394482614, 0.007013056254295168, 0.0069269062482146026]

Out[4]:

Text(0.5, 1.0, 'Standard Deviation vs Nth large loss')

In [ ]:

` `

Let's check this graph against our previous runs. We know that when looking at the 5th largest loss, i.e. the median of the top 10, we had a standard deviation of around 0.9%. We can see in the above graph, when reading off 5 from the x-axis, that this matches. So far so good.

What is the graph telling us? It's saying that as we increase the value of N, we get a decrease in the standard deviation of our estimate. It's initially quite a big reduction, going from the largest loss by year against the 5th largest loss results in a 35% reduction in standard deviation, but then it levels off as we continue. Going from the 5th largest loss to the 10 largest loss only results in an approx. 15% reduction.

This is good info to know! It means that we can immediately improve our estimate of claims inflation, just by using a lower attachment. You sometimes see people comparing the 'largest inflated loss' by year as a sense check on their inflation rate, but what this chart is telling us is actually that's the most volatile metric you can pick out of this range, and even the 2nd largest loss would be an improvement.

So why do we have a lower standard deviation of estimate when we use a higher Nth value? I suspect this is driven by the lower volatility inherent in the metric. i.e. the 10th largest claim in a dataset, should have lower volatility than the 9th largest claim, and the 8th largest claim, etc.. So as we move down the distribution, we are implicitly reducing the noise in our data, and that means our estimate will be more accurate.

Next time I'm going to examine this phenomenon further. It would be really cool if we could somehow link this improved accuracy, with the results we obtain previously around the improvement in accuracy related to the reduction in volatility of the ground-up loss distribution.

So far when using the method, we've just left the mean and standard deviation of our severity distribution equal to 1m, and 1.5m respectively. When we did so, I made the observation that the standard deviation of our estimated inflation was around 1% with 20 years of data, and 0.5% with 30 years of data. These values are however dependent on the choice of severity standard deviation, so don't consider them to be some sort of universal constants.

In order to understand the effect on the sampling error when varying the volatility of our losses, we're going to have to do some more modelling. In the code below, I loop through an array containing CVs ranging from 0.5 to 10 in 0.5 increments, which we will then be used to vary the severity distribution. We then record the standard deviation of our estimate using a (fixed) 20 years of data and provide the output as a graph.