This post is a follow up to a previous post, which I would recommend reading first if you haven't already:
In our previous modelling, in order to assess how extreme the 2021 German floods were, we compared the consensus estimate at the time for the floods (\$6bn insured loss) against a distribution parameterised using historic flood losses in Germany from 1994-2020. Since I posted that modelling however, as often happens in these cases, the consensus estimate has changed. The insurance press is now reporting a value of around \$8.3 bn . So what does that do for our modelling and our conclusions from last time?
As I’m sure you are aware July 2021 saw some of the worst flooding in Germany in living memory. Die Welt currently has the death toll for Germany at 166 .
Obviously this is a very sad time for Germany, but one aspect of the reporting that caught my attention was how much emphasis was placed on climate change when reporting on the floods. For example, the BBC , the Guardian , and even the Telegraph  all bring up the role that climate change played in the contributing to the severity of the flooding.
The question that came to my mind, is can we really infer the presence of climate change just from this one event? The flooding has been described as a ‘1-in-100 year event’ , but does this bear out when we analyse the data, and how strong evidence is this of the presence of climate change?
Image - https://unsplash.com/@kurokami04
David Mackay includes an interesting Bayesian exercise in one of his books . It’s introduced as a situation where a Bayesian approach is much easier and more natural than equivalent frequentist methods. After mulling it over for a while, I thought it was interesting that Mackay only gives a passing reference to what I would consider the obvious ‘actuarial’ approach to this problem, which doesn’t really fit into either category – curve fitting via maximum likelihood estimation.
On reflection, I think the Bayesian method is still superior to the actuarial method, but it’s interesting that we can still get a decent answer out of the curve fitting approach.
The book is available free online (link at the end of the post), so I’m just going to paste the full text of the question below rather than rehashing Mackay’s writing:
I received an email from a reader recently asking the following (which for the sake of brevity and anonymity I’ve paraphrased quite liberally)
I’ve been reading about the Poisson Distribution recently and I understand that it is often used to model claims frequency, I’ve also read that the Poisson Distribution assumes that events occur independently. However, isn’t this a bit of a contradiction given the policyholders within a given risk profile are clearly dependent on each other?
It’s a good question; our intrepid reader is definitely on to something here. Let’s talk through the issue and see if we can gain some clarity.
Financial Year 2020 results have now been released for the top 5 reinsurers and on the face of it, they don’t make pretty reading. The top 5 reinsurers all exceeded 100% combined ratio, i.e. lost money this year on an underwriting basis. Yet much of the commentary has been fairly upbeat. Commentators have downplayed the top line result, and have instead focused on an ‘as-if’ position, how companies performed ex-Covid.
We’ve had comments like the following, (anonymised because I don’t want to look like I’m picking on particular companies):
"Excluding the impact of Covid-19, [Company X] delivers a very strong operating capital generation"
“In the pandemic year 2020 [Company Y] achieved a very good result, thereby again demonstrating its superb risk-carrying capacity and its broad diversification.”
Obviously CEOs are going to do what CEOs naturally do - talk up their company, focus on the positives - but is there any merit in looking at an ex-Covid position, or is this a red herring and should we instead be focusing strictly on the incl-Covid results?
I actually think there is a middle ground we can take which tries to balance both perspectives, and I’ll elaborate that method below.
The term exposure inflation can refer to a couple of different phenomena within insurance. A friend mentioned a couple of weeks ago that he was looking up the term in the context of pricing a property cat layer and he stumbled on one of my blog posts where I use the term. Apparently my blog post was one of the top search results, and there wasn’t really much other useful info, but I was actually talking about a different type of exposure inflation, so it wasn’t really helpful for him.
So as a public service announcement, for all those people Googling the term in the future, here are my thoughts on two types of exposure inflation:
In which we correct our label encoding method from last time, try out a new algorithm - Gradient Boosted Regression - and finally managed to improve our score (by quite a lot it turns out)
An Actuary learns Machine Learning - Part 9 - Cross Validation / Label Encoding / Feature Engineering
In which we set up K-fold Cross Validation to assess model performance, spend quite a while tweaking our model, use hyper-parameter tuning, but then end up not actually improving our model.
An Actuary learns Machine Learning - Part 8 - Data Cleaning / more Null Values / more Random Forests
In which we deal with those pesky null values, add additional variables to our Random Forest model, but only actually improve our score by a marginal amount.
In which we plot an excessive number of graphs, fix our problems with null values, re-run our algorithm, and significantly improve our accuracy.
In which we start a new Kaggle challenge, try out a new Python IDE, build our first regression model, but most importantly - make these blog posts look much cleaner.
In which we take our final stab at the titanic challenge by ‘throwing the kitchen sink’ at the problem, setting up another 5 different machine learning models and seeing if they improve our performance (hint they do not, but hopefully it's still interesting)
In which we do more data exploration, find and then fix a mistake in our previous model, spend some time on feature engineering, and manage to set a new high-score.
An Actuary learns Machine Learning – Part 3 – Automatic testing/feature importance/K-fold cross validation
In which we don’t actually improve our model but we do improve our workflow - being able to check our test score ourselves, analysing the importance of each variable using an algorithm, and then using an algorithm to select the best hyper-parameters
In which we build our first machine learning model in Python, beat our previous Excel model on our first attempt, and then fail multiple time to improve this new model…
I work as an actuary and underwriter at a global reinsurer in London.