On reflection, I think the Bayesian method is still superior to the actuarial method, but it’s interesting that we can still get a decent answer out of the curve fitting approach.

The book is available free online (link at the end of the post), so I’m just going to paste the full text of the question below rather than rehashing Mackay’s writing:

I’m going to describe the actuarial method by giving a worked example. Let’s generate some sample data so we have something concrete to talk about, I’m going to be using Jupyter notebook to do this, and all code is written in Python. Note below I’m having to use the truncated exponential distribution as we have no information about particles that decay more than 20cm from the origin. If we knew about such particles, but only that they had a value >20, then this would be right censoring, and we would still use the exponential distribution. Truncation, where we don’t even know about the existence of such particles is subtly different and requires a different distribution.

In [1]:

`from scipy.stats import truncexponimport matplotlib.pyplot as pltimport numpy as npimport seaborn as sns%matplotlib inline`

In [2]:

`# Generate 50 samples from a truncated exponential with lambda = 10, truncation = 20b = 2scale = 10r = truncexpon.rvs(b, loc=0, scale = 10, size=500, random_state = 42)`

In [3]:

`# Plot a histogram of the random samplefig, ax = plt.subplots(1, 1)ax.hist(r, bins=50)plt.show()`

In [4]:

`# The histogram looks like exponential decay as expected.#Let's fix b = 2, and then solve for Lambda:exponfit = truncexpon.fit(r, fb = 2, floc = 0)exponfit`

Out[4]:

(2, 0, 9.780167769576435)

So we managed to derive a pretty close approximation to the true lambda by fitting the truncated exponential (10 vs 9.78). With a high enough sample size, I’m sure our method would converge to the true value. So we’ve answered the question right? Why should we bother with Mackay’s Bayesian method at all?

If we wish to say things probabilistically about Lambda (such as what is the probability that Lambda = 11 given the data?), then the actuarial approach gives us no real information to go on. All the actuarial approaches does is spit out a single ‘best estimate’ based on maximum likelihood estimation. The Bayesian approach, when supplied with an appropriate prior distribution actually gives us the full posterior distribution of Lambda given the data. So if we wish to say anything more about lambda we definitely do need to use the Bayesian method.

I like the fact that the actuarial approach gets you where you need to go without having to use complex frequentist methods (no selecting bins and building histograms, no inventing complex estimators and showing they converge, etc.)

[1] ‘Information Theory, Inference and Learning Algorithms’ - http://www.inference.org.uk/mackay/itila/book.html

]]>It’s a good question; our intrepid reader is definitely on to something here. Let’s talk through the issue and see if we can gain some clarity.

Let’s start with the parts that we definitely know to be true and work outwards from there. Firstly the Poisson distribution is indeed often used to model claims frequency. It is probably the single most common distribution used to model claims frequency. Second up - the distribution does assume that events occur independently (more precisely it assumes that probability of a given event occurring is independent of other events occurring, and the rate at which events occur is independent of any occurrence.) And third up – this independence property often does not bear out in reality. Many real-world situations have series of dependent events.

Any time this independence assumption does not apply in real life, issues could be introduced into the modelling by the use of a Poisson distribution.

**So why still use it?**

You might ask – so why do people still use Poisson distribution?

There’s a few reasons, but I think the biggest two are:

· It’s often simpler to use the Poisson than the alternatives. The Negative Binomial for example does not require this type of strict independence however the Negative Binomial, unlike the Poisson has two parameters and is a little bit fiddly to use in Excel. (the Poisson has only one and is fairly straightforward to use in Excel)

· In certain situation it may not affect the answer much anyway. If for example you are only offering a single limit of cover, it might be the case that one loss would completely erode your limit anyway, so you are not concerned about the possibility of multiple linked losses occurring.

**When should I not use a Poisson Distribution?**

There are certain situations where you would definitely want to be very careful when using a Poisson for claims frequency. One such example would be when modelling US windstorms. It is widely believed that these events exhibit some form of clustering within a given year, this is borne out both empirically but also follows just from some simple reasoning around the process which generates the windstorms. Empirically we can see the clustering by just noting a few example of years with multiple extreme windstorms - 2017 had Harvey, Irma, Maria, 2005 had Katrina, Rita, and Wilma. These types of years should be very uncommon if a Poisson Distribution was a good fit for the underlying process. In terms of the argument from climate modelling - it goes as follows; US windstorms occurring with a given season will all be generated by the same (or at least similar) climatic conditions which are occurring in that year (El Nino, etc.). Therefore if conditions are such as to be conducive to extreme windstorms, you're quite likely to end up with a few of them in that year.

So US Windstorms - we should probably not use a Poisson Distribution.

**Dependent estimation**

We do have to be a little bit careful when we are throwing around the term ‘independent’ that we are referring to the exact type of independence that the Poisson Distribution requires. There are certain dependency structures that exist in reality, which would not disqualify the Poisson Distribution from our modelling. A conceptual confusion that can arise is that we are actually okay with dependency in the*estimation *of claims frequency for a given block of business, and this would not contradict the assumption underlying the Poisson Distribution.

Let’s give an example to make it more concrete. Suppose I am modelling the frequency of breakdowns for a subset of a motor book (perhaps all policyholders who own a Toyota Prius in a given postcode). Then our estimation of the expected number of breakdowns for a given policyholder within that group would depend on how many breakdowns had occurred in the wider group (assuming we are basing our estimate on the wider dataset), so our estimate of the number of breakdowns for this particular policyholder is clearly not independent of the number of breakdowns of the other policyholders.

In fact, it's not that our estimate is just 'weakly correlated' with the other policyholders, it is directly derived from it. But that does not mean we can't use a Poisson Distribution! This particular dependency structure does not contradict any of the assumptions of the Poisson Distribution. So the point to bear in mind, is that in order to use the Poisson Distribution we don’t require that every single thing be independent of every other possible thing, we require only the*specific type of independence *given in the opening section above.

**Conclusion**

I hope that helps clarify a little why the Poisson Distribution is used in actuarial modelling, and some of the limitations we should be aware of. If you’ve got any further questions, then please feel free to drop me an email using the address on the right.

]]>Any time this independence assumption does not apply in real life, issues could be introduced into the modelling by the use of a Poisson distribution.

You might ask – so why do people still use Poisson distribution?

There’s a few reasons, but I think the biggest two are:

· It’s often simpler to use the Poisson than the alternatives. The Negative Binomial for example does not require this type of strict independence however the Negative Binomial, unlike the Poisson has two parameters and is a little bit fiddly to use in Excel. (the Poisson has only one and is fairly straightforward to use in Excel)

· In certain situation it may not affect the answer much anyway. If for example you are only offering a single limit of cover, it might be the case that one loss would completely erode your limit anyway, so you are not concerned about the possibility of multiple linked losses occurring.

There are certain situations where you would definitely want to be very careful when using a Poisson for claims frequency. One such example would be when modelling US windstorms. It is widely believed that these events exhibit some form of clustering within a given year, this is borne out both empirically but also follows just from some simple reasoning around the process which generates the windstorms. Empirically we can see the clustering by just noting a few example of years with multiple extreme windstorms - 2017 had Harvey, Irma, Maria, 2005 had Katrina, Rita, and Wilma. These types of years should be very uncommon if a Poisson Distribution was a good fit for the underlying process. In terms of the argument from climate modelling - it goes as follows; US windstorms occurring with a given season will all be generated by the same (or at least similar) climatic conditions which are occurring in that year (El Nino, etc.). Therefore if conditions are such as to be conducive to extreme windstorms, you're quite likely to end up with a few of them in that year.

So US Windstorms - we should probably not use a Poisson Distribution.

We do have to be a little bit careful when we are throwing around the term ‘independent’ that we are referring to the exact type of independence that the Poisson Distribution requires. There are certain dependency structures that exist in reality, which would not disqualify the Poisson Distribution from our modelling. A conceptual confusion that can arise is that we are actually okay with dependency in the

Let’s give an example to make it more concrete. Suppose I am modelling the frequency of breakdowns for a subset of a motor book (perhaps all policyholders who own a Toyota Prius in a given postcode). Then our estimation of the expected number of breakdowns for a given policyholder within that group would depend on how many breakdowns had occurred in the wider group (assuming we are basing our estimate on the wider dataset), so our estimate of the number of breakdowns for this particular policyholder is clearly not independent of the number of breakdowns of the other policyholders.

In fact, it's not that our estimate is just 'weakly correlated' with the other policyholders, it is directly derived from it. But that does not mean we can't use a Poisson Distribution! This particular dependency structure does not contradict any of the assumptions of the Poisson Distribution. So the point to bear in mind, is that in order to use the Poisson Distribution we don’t require that every single thing be independent of every other possible thing, we require only the

I hope that helps clarify a little why the Poisson Distribution is used in actuarial modelling, and some of the limitations we should be aware of. If you’ve got any further questions, then please feel free to drop me an email using the address on the right.

We’ve had comments like the following,

Obviously CEOs are going to do what CEOs naturally do - talk up their company, focus on the positives - but is there any merit in looking at an ex-Covid position, or is this a red herring and should we instead be focusing strictly on the incl-Covid results?

I actually think there is a middle ground we can take which tries to balance both perspectives, and I’ll elaborate that method below.

First let's collate the FY 2020 results. The table below contains the high-level figures which I’ve pulled from from individual annual reports:

I’m going to suggest that a natural and useful way of looking at a Reinsurer’s combined ratio, is to strip out the effects of large one off events, but then add them back on as a loading, spread across multiple years. This applies not just to Covid, but to any super-cat event (let’s say anything excess 50bn to market). To see how this would work, let add another column to our table representing the ex-covid combined ratio, and also a column with the amount that covid added to the combined ratio.

So interestingly, Munich Re had the best CR ex Covid, but were also the worst effected by Covid. SCOR on the other hand were not as heavily hit by Covid, but then had an ex-Covid CR which was not as strong.

The step we are going to take now is to add back on the effect of Covid, but on an amortised basis. The idea being that a Covid style event will not happen every year, but pandemics have happened periodically through human history, and will undoubtedly happen again. What’s our estimate of how frequently they will occur? Well, we’ve had two serious pandemics in the last 100 years – Covid and Spanish Flu, so let’s just go with 1 in 50. It’s an interesting thought experiment trying to pick this number, and you can easily make arguments that it should be higher or lower than this pick, but we’ll use 1 in 50 for the time being.

What does this do to our table? Let’s create a normalised CR, which is equal to the ex. Covid CR but with a 1-in-50 loading of the ‘Covid effect’.

The step we are going to take now is to add back on the effect of Covid, but on an amortised basis. The idea being that a Covid style event will not happen every year, but pandemics have happened periodically through human history, and will undoubtedly happen again. What’s our estimate of how frequently they will occur? Well, we’ve had two serious pandemics in the last 100 years – Covid and Spanish Flu, so let’s just go with 1 in 50. It’s an interesting thought experiment trying to pick this number, and you can easily make arguments that it should be higher or lower than this pick, but we’ll use 1 in 50 for the time being.

What does this do to our table? Let’s create a normalised CR, which is equal to the ex. Covid CR but with a 1-in-50 loading of the ‘Covid effect’.

So we see that putting Covid in as a 1-in-50 really doesn’t add much to a normalised CR. If you’re going to take a 10% hit from Covid once every 50 years, that’s actually fairly manageable. Is this the end of the story then, case closed? Not quite…

If we are going to amortise Covid down to its proper return period, we should really add back other major events which did not occur this year as a normalised loading. For example, HIM in 2017 hit the major Reinsurers for something like a 15%-30% additional loss ratio. If we call HIM - or a US WS of a similar magnitude - a 1 in 10 (another pick that could be debated further), then this would add between 1.5% and 3% to our normalised CR.

Another loading we should consider is the effect of a potential 400bn super-cat (from cyber/ US Windstorm/any other source), let’s call this a 1-in-100, and assume that it would be 4 times more costly than HIM (which we’ll say was a 100bn to the market). Then this would add between 60%-120% to loss ratios approx. once every 100 years, i.e. 0.6%-1.2% annualised.

Here is how our normalised CR looks adding in these two style events on top of our covid effect.

If we are going to amortise Covid down to its proper return period, we should really add back other major events which did not occur this year as a normalised loading. For example, HIM in 2017 hit the major Reinsurers for something like a 15%-30% additional loss ratio. If we call HIM - or a US WS of a similar magnitude - a 1 in 10 (another pick that could be debated further), then this would add between 1.5% and 3% to our normalised CR.

Another loading we should consider is the effect of a potential 400bn super-cat (from cyber/ US Windstorm/any other source), let’s call this a 1-in-100, and assume that it would be 4 times more costly than HIM (which we’ll say was a 100bn to the market). Then this would add between 60%-120% to loss ratios approx. once every 100 years, i.e. 0.6%-1.2% annualised.

Here is how our normalised CR looks adding in these two style events on top of our covid effect.

Now we are seeing normalised CRs which for everyone except Munich are probably above long term targets. Obviously, this is just a sketch of a method, there are a large number of tweaks that would need to be make in practice. For example, each Reinsurer is going to react differently to a 100bn cat event, whereas we’ve just used a flat 2% for all reinsurers. One method would be to derive loss ratio ‘additions’ based on individual company losses from past events. So for example, try to infer information about Hannover’s US WS exposure by how they were effected by HIM, and use this as information when picking a prospective pick for Hannover. Reinsurers are also going to react differently depending on where geographically the loss occurs – US WS vs JP WS, etc? And moreover, none of these losses are operating within a static system either – if a HIM style loss occurred again tomorrow I’m sure the losses for each reinsurer would look different to their 2017 results depending on the action they took post-loss to tweak their underwriting focus and controls.

**What’s your surplus? **

The final concept I want to touch on is that since these are financial year results, we should be careful in how we treat the ex-cat CR. Each reinsurer is going to be sitting on a different surplus (or deficit) in their reserves, and they will each be in a different stage of addressing or releasing this.

Let’s say one of the reinsurers has a lower ex-cat CR than another, here are four scenarios which are all plausible on the face of it.

And what if the correct answer is some combination of the above?

**Conclusion**

The thought I’d like to conclude on is that when analysing these results, the important point to remember is that super-cat events will occur. In the years they occur, reinsurers will put up bad results and this is not necessarily the end of the world. However if we strip out the effect of these style of events, then we need to remember that this ex-supercat CR also needs to contain enough margin but pay for the amortised cost of the supercats, and then leave us some profit.

]]>The final concept I want to touch on is that since these are financial year results, we should be careful in how we treat the ex-cat CR. Each reinsurer is going to be sitting on a different surplus (or deficit) in their reserves, and they will each be in a different stage of addressing or releasing this.

Let’s say one of the reinsurers has a lower ex-cat CR than another, here are four scenarios which are all plausible on the face of it.

- The company with the lower CR is releasing surplus reserves from prior years, and the other company does not have a similar surplus in their reserves to draw upon.
- As above, except the other company does have a surplus in reserve but they are choosing not to draw upon it yet.
- The company with lower CR is genuinely writing more profitable business.
- The company with the higher CR has written a large amount of new business, which has been booked to a conservative position in respect of IBNR, and which is expected to be revised down in subsequent years.

And what if the correct answer is some combination of the above?

The thought I’d like to conclude on is that when analysing these results, the important point to remember is that super-cat events will occur. In the years they occur, reinsurers will put up bad results and this is not necessarily the end of the world. However if we strip out the effect of these style of events, then we need to remember that this ex-supercat CR also needs to contain enough margin but pay for the amortised cost of the supercats, and then leave us some profit.

So as a public service announcement, for all those people Googling the term in the future, here are my thoughts on two types of exposure inflation:

**Version 1 - Exposure inflation in the context of rate change and premium.**

This context arises when we are thinking about the effects of rate change from one year to the next.

For a fuller explanation, here is a link to the post which my friend found (which was not actually the type of exposure inflation he was looking for):

https://www.lewiswalsh.net/blog/should-i-inflate-my-loss-ratios

The short version is that if your measurement of rate change is the difference in the following metric: (premium / exposure) from one year to the next, then if you apply this rate change % to a loss ratio from a prior year, you do not then need to apply claims inflation. If you do apply claims inflation to a loss ratio, then you need to apply a corresponding exposure inflation to the premium figure. A full explanation of why can be found through the link above.

As an aside, the metric (premium / exposure) is called the ‘rate’, so I guess this is the origin of the term rate change as we are talking about measuring ‘rate change’ as the % change in ‘rate’ from one year to the next. Honestly this only just clicked for me…

**Version 2 – Exposure inflation in the context of aggregate reinsurance.**

This context arises when you are pricing an aggregate (re)insurance contract or ILS layer, and exposure has increased over time.

To make it concrete, I’m going to work through an example. Let’s say you’re looking at a property cat non-proportional reinsurance layer, which attaches at 100, and has a limit of 100 (in whatever units you like) and you are given the following loss history (which I just made up but is realistic enough to be useful).

This context arises when we are thinking about the effects of rate change from one year to the next.

For a fuller explanation, here is a link to the post which my friend found (which was not actually the type of exposure inflation he was looking for):

https://www.lewiswalsh.net/blog/should-i-inflate-my-loss-ratios

The short version is that if your measurement of rate change is the difference in the following metric: (premium / exposure) from one year to the next, then if you apply this rate change % to a loss ratio from a prior year, you do not then need to apply claims inflation. If you do apply claims inflation to a loss ratio, then you need to apply a corresponding exposure inflation to the premium figure. A full explanation of why can be found through the link above.

As an aside, the metric (premium / exposure) is called the ‘rate’, so I guess this is the origin of the term rate change as we are talking about measuring ‘rate change’ as the % change in ‘rate’ from one year to the next. Honestly this only just clicked for me…

This context arises when you are pricing an aggregate (re)insurance contract or ILS layer, and exposure has increased over time.

To make it concrete, I’m going to work through an example. Let’s say you’re looking at a property cat non-proportional reinsurance layer, which attaches at 100, and has a limit of 100 (in whatever units you like) and you are given the following loss history (which I just made up but is realistic enough to be useful).

We’ll suppose that ‘Exposure’ is measured in something like ‘total house years on risk’ or ‘annual number of policies’, or some other measure which is not itself subject to inflation. If your exposure metric is measured in dollars and is itself subject to inflation, then you would need to on-level that before using it, but for the sake of not over-complicating this example, I’m going to assume we don’t need to make this adjustment.

We can see that for this account, exposure has increased substantially in the last 10 years, and is set to increase by another 10% in 2021. We also see that the Agg Loss by year appears to be increasing roughly in line with exposure, which is what we would expect if we have selected a sensible exposure metric, and when looking at an aggregate contract.

If we wanted to calculate the burning cost for this contract, then we can do so fairly easily in a Spreadsheet:

We can see that for this account, exposure has increased substantially in the last 10 years, and is set to increase by another 10% in 2021. We also see that the Agg Loss by year appears to be increasing roughly in line with exposure, which is what we would expect if we have selected a sensible exposure metric, and when looking at an aggregate contract.

If we wanted to calculate the burning cost for this contract, then we can do so fairly easily in a Spreadsheet:

But we’re not done yet, now we need to think about making a few adjustments – firstly we normally expect some form of claims inflation. Since we’re talking property let’s chuck in 3% compound:

And now we need to inflate the losses for the increase in exposure. We noted earlier that losses seem to be increasing over time as exposure increases, and since 2021 has a higher exposure than any prior year, we’re going to have to increase our historic losses in line with this before using them to price the prospective policy period. Let’s apply a proportionate increase to all losses in line with the increase in exposure between the year the loss occurred and 2021.

The right most column represents our final answer for the burning cost – and it makes quite a difference! Our average loss to layer has gone up from just 10 using the version with no adjustment, to 39 with the version with both adjustments.

So what does all this have to do with Exposure inflation? ‘Exposure inflation’ refers to the increase in the size of loss when we scale up for the increase in exposure. Just as we scaled the individual losses in line with claims inflation, we also scaled the individual (aggregate) losses in line with the exposure increase.

Note in the above example we were pricing an*aggregate *contract, i.e. we applied the limit and attachment to the aggregate losses for the year. __If this had been a per risk, or per event XoL layer then we would not inflate losses in line with an increase in exposure__. For per risk, or per event contracts, generally an increase in exposure would increase the *frequency* of losses to the layer, rather than the size of the loss to layer. The correct adjustment in that case is to calculate the loss to layer on a non-exposure adjusted (but inflated) basis, and then scale up the final loss to layer in line with the increase in exposure in the period. Note in that context, I would not normally expect that adjustment to be called ‘exposure inflation’ I’d simply call it ‘adjusting for increase in exposure’, you might think exposure inflation is a sensible thing to call it, but trust me the people you are talking to might get confused.

I like to think about this distinction between applying the exposure increase to the losses and then calculating recoveries for aggregate contracts, vs. calculating recoveries first and then applying the exposure increase though the lens of the collective risk model (also known as frequency-severity modelling). Unlike the burning cost method, the collective risk model has solid theoretical foundations, and we can think of a burning cost as a simplified approximation to the CRM. When setting up a CRM, for a per risk, or per event contract, we would account for an increase in exposure though an adjustment to the frequency parameter rather than an adjustment to the severity parameter. The increase to frequency would then have a proportionate effect on the modelled loss to layer calculated based on the severity component. Whereas for an agg layer, under the CRM we would apply an exposure increase to the severity component. This treatment provides us with our theoretical justification for how to approach the same problem in a burning cost model.

]]>

So what does all this have to do with Exposure inflation? ‘Exposure inflation’ refers to the increase in the size of loss when we scale up for the increase in exposure. Just as we scaled the individual losses in line with claims inflation, we also scaled the individual (aggregate) losses in line with the exposure increase.

Note in the above example we were pricing an

I like to think about this distinction between applying the exposure increase to the losses and then calculating recoveries for aggregate contracts, vs. calculating recoveries first and then applying the exposure increase though the lens of the collective risk model (also known as frequency-severity modelling). Unlike the burning cost method, the collective risk model has solid theoretical foundations, and we can think of a burning cost as a simplified approximation to the CRM. When setting up a CRM, for a per risk, or per event contract, we would account for an increase in exposure though an adjustment to the frequency parameter rather than an adjustment to the severity parameter. The increase to frequency would then have a proportionate effect on the modelled loss to layer calculated based on the severity component. Whereas for an agg layer, under the CRM we would apply an exposure increase to the severity component. This treatment provides us with our theoretical justification for how to approach the same problem in a burning cost model.

Source: https://somewan.design

This is going to be our final post on the Kaggle House Price challenge, so with that in mind let's quickly recap what we've done in the previous 4 posts. This challenge was our first attempt at a regression problem as opposed to a classification problem, but to be honest the workflow has felt very similar.

We defaulted to using a Random Forest algorithm as it's known to produce pretty good results out of the box, and we'd had experience of it before. Interestingly our best model was simply the version in which we fed all the features into the algorithm rather than trying to select which features to use. This has been a bit of a revelation over the last few weeks. If we are choosing the right algorithms, feature selection can easily be handled by the models themselves, and it appears they can do a better job than I can.

I looked into whether we can expect an improvement in predictive value by removing outliers, by removing collinearity, or by normalising the data, but largely the info I read online pointed to Random Forests not really expecting much of a predictive performance increase from doing the above, and anytime you process the data there's a chance info is lost which will outweigh any benefits. There would be other ancillary benefits beside predictive increases - in run speed, interpretability, etc. but in terms of raw predictive power increase - not so much.

So what's in store for the final post? Firstly, I want to try to correct an issue I think we introduced last time. When we carried our label encoding, we did not order the values properly and I think this could have removed some useful structure from the model. For example, in the training data the feature 'Basement Quality' originally had string including 'Good', 'Average', etc. which we used label encode to become the integers 1-5, but we didn't necessarily do this in the right order. So first order of business is to fix this and encode in a sensible order (Excellent = 1 , good =2, etc.)

Second and final task for today is to try out a new algorithm. I thought about building a linear model of some description, but I decided to just stick with another tree based model this time. The model we are going to use is Gradient Boosted Regression from the Sklearn package. In turns out this model immediately gives us a performance boost, and we end up climbing quite a few places on the leader board.