Backtesting inflation modelling - median of top x losses

29/7/2022

I wrote a quick script to backtest one particular method of deriving claims inflation from loss data. I first came across the method in 'Pricing in General Insurance' by Pietro Parodi [1], but I'm not sure if the method pre-dates the book or not.

In order to run the method all we require is a large loss bordereaux, which is useful from a data perspective. Unlike many methods which focus on fitting a curve through attritional loss ratios, or looking at ultimate attritional losses per unit of exposure over time, this method can easily produce a *large loss* inflation pick. Which is important as the two can often be materially different.

Source: Willis Building and Lloyd's building, @Colin, https://commons.wikimedia.org/wiki/User:Colin

The code works by simulating 10 years of individual losses from a Poisson-Lognormal model, and then applying 5% inflation pa. We then throw away all losses below the large loss threshold, to put ourselves in the situation as if we'd only been supplied with a large loss claims listing. We then analyse the change over time of the 'median of the top 10 claims'. We select this slightly funky looking statistic as it should increase over time in the presence of inflation, but by looking at the median rather than the mean, we've taken out some of the spikiness. Since we hardcoded 5% inflation into the simulated data, we are looking to arrive back at this value when we apply the method to the synthetic data.

I've pasted the code below, but jumping to the conclusions, here's a few take-aways:

The method does work - note the final answer is 5.04%, and with a few more sims this does appear to approach the original pick of 5%, which is good, the method provides an unbiased estimate.
The standard deviation is really high - the standard deviation is of the order of 50% of the mean. Assuming a normal distribution, we'd expect 95% of values to sit between 0%-10% - which is a huge range. In practice even an extra 1% additional inflation in our modelling can often cause a big swing in loss cost, so the method as currently presented, and using this particular set up of simulated loss data is basically useless.
The method is thrown off by changes in the FGU claim count - I haven't shown it below, but if you amend the 'Exposure Growth' value below from 0%, the method no longer provides an unbiased estimate. If the data includes growth, then it tends to over-estimate inflation, and vice-versa if the FGU claim count reduces over time. Parodi does mention this in the book an offers a work-around which I haven't included below, but will write up another time.
It's a non-parametric - I do like the fact that it's a non-parametric method. The other large loss inflation methods I'm aware of all involve assuming some underlying probability distribution for the data (exponential, pareto, etc.).
We can probably improve the method - the method effectively ignores all the data other than the 5th largest claim within a given year. So we reduce the entire analysis to the rate of change of just 10 numbers. One obvious extension of the method would be to average across the change in multiple percentiles of the distribution, we could also explore other robust statistics (e.g. Parodi mentions trimmed means?). I'll also set this up another time to see if we get an improvement in performance.

In [1]:

import numpy as np
import pandas as pd
import scipy.stats as scipy
from math import exp
from math import log
from math import sqrt
from scipy.stats import lognorm
from scipy.stats import poisson
from scipy.stats import linregress

In [2]:

Distmean = 1000000.0
DistStdDev = Distmean*1.5
AverageFreq = 100
years = 10
ExposureGrowth = 0.0

Mu = log(Distmean/(sqrt(1+DistStdDev**2/Distmean**2)))
Sigma = sqrt(log(1+DistStdDev**2/Distmean**2))

LLThreshold = 1e6
Inflation = 0.05

s = Sigma
scale= exp(Mu)

In [3]:

MedianTop10Method = []
AllLnOutput = []

for sim in range(5000):

    SimOutputFGU = []
    SimOutputLL = []
    year = 0
    Frequency= []
    for year in range(years):
        FrequencyInc = poisson.rvs(AverageFreq*(1+ExposureGrowth)**year,size = 1)
        Frequency.append(FrequencyInc)
        r = lognorm.rvs(s,scale = scale, size = FrequencyInc[0])
        r = np.multiply(r,(1+Inflation)**year)
        r = np.sort(r)[::-1]
        r_LLOnly = r[(r>= LLThreshold)]
        SimOutputFGU.append(np.transpose(r))
        SimOutputLL.append(np.transpose(r_LLOnly))
        
    SimOutputFGU = pd.DataFrame(SimOutputFGU).transpose()
    SimOutputLL = pd.DataFrame(SimOutputLL).transpose()    
    a = np.log(SimOutputLL.iloc[5])
    AllLnOutput.append(a)
    b = linregress(a.index,a).slope
    MedianTop10Method.append(b)

AllLnOutputdf = pd.DataFrame(AllLnOutput)

dfMedianTop10Method= pd.DataFrame(MedianTop10Method)
dfMedianTop10Method['Exp-1'] = np.exp(dfMedianTop10Method[0]) -1
print(np.mean(dfMedianTop10Method['Exp-1']))
print(np.std(dfMedianTop10Method['Exp-1']))

0.050423461401442896
0.02631028930074786

[1] - Pricing in General Insurance, By Pietro Parodi, ISBN 9781466581449, Chapman and Hall/CRC

Backtesting inflation modelling - median of top x losses

Leave a Reply.

Author

Sign up to get updates when new posts are added

Categories

Archives