Aggregating probability forecasts18/3/2022 There's some interesting literature from the world of forecasting and natural sciences on the best way to aggregate predictions from multiple models/sources. For a wellwritten, moderately technical introduction, see the following by Jaime Sevilla: forum.effectivealtruism.org/posts/sMjcjnnpoAQCcedL2/whenpoolingforecastsusethegeometricmeanofodds Jaime’s article suggests a geometric mean of odds as the preferred method of aggregating predictions. I would argue however that when it comes to actuarial pricing, I'm more of a fan of the arithmetic mean, I'll explain why below. Problem statement
Let’s set up a mini problem so we’ve got some concrete numbers to discuss. Suppose we've been provided with two cat model outputs  the first gives a 0.1% chance of a 100m loss (1 in 1,000), the other a 2% chance of a 100m loss (1 in 50). How should we go about combining the models to come up with a single aggregate prediction? The obvious first thing to try would be to just average them > (0.1% + 2% )*0.5= 1.05%. This is known as the arithmetic average. If we knew one of the models was more accurate (perhaps it's outperformed in the past), we could extend this to an arithmetic weighted average > 0.1% *0.6 + 2% * 0.4 = 0.086% But Jaime agues (as supported by much literature) that a better method is to take the geometric average of the odds. [1] Geometric average of odds The geometric average of odds works as follows: The odds of the first model are > 0.1%/(10.1%) = 0.1% The odds of the second model are > 2%/(12%) = 2.04% The geometric average of odds = sqrt(0.1% * 2.04%) = 0.45% So this method gives a lower estimate than above (0.45% vs 1.05%). Another way of thinking of this is to say that it gives much more weight to the first model, the equivalent of a weighted average which gives about an 80% weighting to the 0.1% pick. If we play around some more and change the first model to a 0.01% chance, the geometric average of odds drops to 0.143%, quite a big jump. The problem with the Arithmetic mean So why is the geometric mean of odds preferred? Firstly, it outperforms in some empirical studies (geopolitical forecasting when assessed against a Brier score [3]). Secondly, it also outperforms in simulation studies. [2] and thirdly it has desirable theoretical properties, such as ‘external Bayesianity’. [2] The ‘downside’ of the arithmetic mean can be made clear by just a simple example. Consider the difference between the agg prediction of (0.1%, 2%), and (0.01%, 2%). Both models average to basically 1% (1.05%, and 1.005%), but the reasoning goes that model one is signalling a significantly lower probability (by a factor of 10), so shouldn’t this move our agg prediction by more than the tiny amount given by the arithmetic average? Why this doesn’t work in actuarial modelling This issue with applying this reasoning to actuarial modelling, is often (but not always) predictions of 0.01% are essentially meaningless. In the context of a cat model, a 0.01% is a once in ten thousandyear event, something that has happened maybe once in recorded history. How would you ever know if this prediction was well calibrated? These are very complex models with many moving pieces, I’ve seen examples of cat models where the probability of a given level of loss moves by a factor of 1020, just based on a few parameter selections and how the portfolio of risks is coded. As well as tails being far from perfectly credible, in my experience, they tend to err in the direction of being chronically underweight. An arithmetic average, precisely because of this lack of sensitivity to changes in the extreme tail, is going to be more robust against this type of tail underestimation. Let’s go back to our toy example, our two models give a 0.01% chance of a 100m loss, and a 2% chance, it’s a pretty brave move to select 0.143% as your agg pick. If we do so, and then sell a cat XOL layer at 0.33% GROL, we might think we’ve written to a 43% GLR. But what if the actual loss cost is 0.75% (I would argue this is perfectly consistent with the two modelled outputs we’ve received 0.01%, 2%), then we’d actually be writing to a 230% GLR. [1] https://forum.effectivealtruism.org/posts/sMjcjnnpoAQCcedL2/whenpoolingforecastsusethegeometricmeanofodds [2] https://link.springer.com/article/10.1007/s1100401293963 [3] https://www.sciencedirect.com/science/article/abs/pii/S0169207013001635?subid1=2021083103475383a803d0425dbe2b3b&via%3Dihub 
AuthorI work as an actuary and underwriter at a global reinsurer in London. Categories
All
Archives
July 2022

Leave a Reply.