|
In which we back test our previous method, almost accidentally p-value hack, and then run a test on how unusual the last 5 years have been.
In our last two posts, we analyzed the Swiss Re nat cat Sigma data [1]. We built a simple model to validate a claim that a $300bn year is a 1 in 10 year event (which didn’t look like an unreasonable claim), and we looked at how the volatility in annual natural cat claims had changed over time (it had actually reduced). We said last time that we’d do one final post on this data set before moving on, firstly checking that our volatility modelling approach was sound. And secondly, looking at how unusual the last 5 years of losses have been, and whether this is more consistent with noise, or more consistent with some sort of underlying change.
Since I've been rinsing the Swiss Re data, I thought I'd find a nice photo of Switzerland, and stumbled across this great one of the Weisshorn. @samferrara
The models we are going to build are quite straightforward, and we’ll put both in the same Jupyter notebook (mainly because I still can’t figure out how to properly embed two notebooks into the same blog post…). The modelling part 1 - sanity check The first model validates the methodology we used in the previous post by running it against synthetic data. Just to briefly recap what we looked at, we analysed the change in volatility over time by tracking 5 different metrics, 5 and 10 year CoV, 5 and 10 year MAD, and CDF percentile over time. Intuitively, it certainly felt like if the volatility had reduced, that our methodology would have captured that change. But I wasn’t 100% comfortable until I ran the same tests on synthetic data in which the change in volatility had been baked in. What does this look like in practice? We are going to simulate a 30 year window of losses from a loss distribution (which I’ve picked to have a sensible shape, but the exact parameterisation doesn’t really matter) with a constant known increase in volatility across the 30 years. This time series is then simulated 5k times, we then apply the methods from the previous post of tracking the 5 metrics to each of the time series, and check on average whether we detect the volatility increase. I sometimes run sanity checks like these, it’s not really a taught actuarial technique, you see it much more in machine learning, but I think when you are using methods that are new to you, or developing your own, it's rarely a bad idea. The modelling part 2 - are the last 5 years unusual? Then the next model we will build will attempt to answer the question of whether the last 5 years have been statistically unusual or not. I should probably say that I started to go down a route of effectively p-hacking myself before I had to reassess what I was doing. My reasoning was – I’ve got this distribution of 30 years of on-levelled losses, why not run 5k simulations from this time series and then test what is the prob of having such a low 5 year rolling CV? I ran this, and it came out at like a 0.2% chance. And I thought wow that’s interesting, it really is very unusual. But then I realized I’d broken one of the cardinal rules of statistics which is that you can’t test a hypothesis against the same data you used to generate it. Of course my model should tell me that the CV in last 5 years was unusually low, because I only ran the model because I thought it was in the first place. It’s the equivalent of being surprised to see 5 red cars in a row, and then building a model to test the probability of seeing exactly 5 red cars. Really, to test if what I saw was unusual I need to model the probability of all unusual things involving multiple cars, which would be something more like seeing 4 or more cars of the same colour in a row, which is probably much more likely. Having seen the nat cat data, I can’t then forget it. So what can we do? At a push I am comfortable testing very generic hypotheses, which is where I settled. With that in mind, in the below model, we are going to look at the probability of not having any ‘large’ (defined as a year exceeding \$300bn on a revalued basis), in the last 5 years. I tested a few different thresholds, which I don't show below, and the answer was not very sensitive. I chose \$300bn as that is roughly a revalued 2017 (HIM) year, which to me feels 'large'. It turns out when we run this test, the answer is that it is actually not very unusual to see this. While not definitive, it is one piece of evidence to suggest that the low volatility for the last 5 years is just not that remarkable. Because by inference, had we had a large loss in the last 5, the CoV would be much higher. It's partly the absence of large years which has caused the low CoV, and this absence is unremarkable. Due to concerns over p-hacking, it's really hard to say anything more concrete than this, but I would mention, if we just eyeball the bottom graph, it does seem to support the same conclusion. Conclusions We saw from part 1 of the Jupyter notebook that our method of tracking CoV, MAD, and percentiles did work. So that’s good, sanity check passed. And then in part 2, we saw that the probability of not having any $300bn+ years in the last 5 was 41%. Lower numbers are more surprising, so in this context 41% is not that remarkable. If we look at the final graph, it's hard not to find ourselves making judgements about the last 5 years, but I have to remind myself that it's a bit weak to make strong conclusions from just eye balling a graph. One of the takeaways I want to leave you with is that it's actually really hard to say anything with conviction about what has happened yet. The real test now, and this would be in line with best statistical practice, is to note the phenomenon as per our current data, and then see if it continues to occur in 2026 and 2027. [1] sigma 1/2025: Natural catastrophes: insured losses on trend to USD 145 billion in 2025 | Swiss Re |
AuthorI work as an actuary and underwriter at a global reinsurer in London. Categories
All
Archives
February 2026
|

RSS Feed
Leave a Reply.