Aggregating probability forecasts

Aggregating probability forecasts

18/3/2022

There's some interesting literature from the world of forecasting and natural sciences on the best way to aggregate predictions from multiple models/sources.

For a well-written, moderately technical introduction, see the following by Jaime Sevilla:
forum.effectivealtruism.org/posts/sMjcjnnpoAQCcedL2/when-pooling-forecasts-use-the-geometric-mean-of-odds

Jaime’s article suggests a geometric mean of odds as the preferred method of aggregating predictions. I would argue however that when it comes to actuarial pricing, I'm more of a fan of the arithmetic mean, I'll explain why below.

Problem statement

Let’s set up a mini problem so we’ve got some concrete numbers to discuss. Suppose we've been provided with two cat model outputs - the first gives a 0.1% chance of a 100m loss (1 in 1,000), the other a 2% chance of a 100m loss (1 in 50). How should we go about combining the models to come up with a single aggregate prediction?

The obvious first thing to try would be to just average them -> (0.1% + 2% )*0.5= 1.05%. This is known as the arithmetic average.

If we knew one of the models was more accurate (perhaps it's outperformed in the past), we could extend this to an arithmetic weighted average -> 0.1% *0.6 + 2% * 0.4 = 0.086%

But Jaime agues (as supported by much literature) that a better method is to take the geometric average of the odds. [1]

Geometric average of odds

The geometric average of odds works as follows:

The odds of the first model are -> 0.1%/(1-0.1%) = 0.1%
The odds of the second model are -> 2%/(1-2%) = 2.04%

The geometric average of odds = sqrt(0.1% * 2.04%) = 0.45%

So this method gives a lower estimate than above (0.45% vs 1.05%). Another way of thinking of this is to say that it gives much more weight to the first model, the equivalent of a weighted average which gives about an 80% weighting to the 0.1% pick.

If we play around some more and change the first model to a 0.01% chance, the geometric average of odds drops to 0.143%, quite a big jump.

The problem with the Arithmetic mean

So why is the geometric mean of odds preferred? Firstly, it outperforms in some empirical studies (geopolitical forecasting when assessed against a Brier score [3]). Secondly, it also outperforms in simulation studies. [2] and thirdly it has desirable theoretical properties, such as ‘external Bayesianity’. [2]

The ‘downside’ of the arithmetic mean can be made clear by just a simple example. Consider the difference between the agg prediction of (0.1%, 2%), and (0.01%, 2%). Both models average to basically 1% (1.05%, and 1.005%), but the reasoning goes that model one is signalling a significantly lower probability (by a factor of 10), so shouldn’t this move our agg prediction by more than the tiny amount given by the arithmetic average?

Why this doesn’t work in actuarial modelling

This issue with applying this reasoning to actuarial modelling, is often (but not always) predictions of 0.01% are essentially meaningless. In the context of a cat model, a 0.01% is a once in ten thousand-year event, something that has happened maybe once in recorded history. How would you ever know if this prediction was well calibrated? These are very complex models with many moving pieces, I’ve seen examples of cat models where the probability of a given level of loss moves by a factor of 10-20, just based on a few parameter selections and how the portfolio of risks is coded.

As well as tails being far from perfectly credible, in my experience, they tend to err in the direction of being chronically under-weight. An arithmetic average, precisely because of this lack of sensitivity to changes in the extreme tail, is going to be more robust against this type of tail under-estimation.

Let’s go back to our toy example, our two models give a 0.01% chance of a 100m loss, and a 2% chance, it’s a pretty brave move to select 0.143% as your agg pick. If we do so, and then sell a cat XOL layer at 0.33% GROL, we might think we’ve written to a 43% GLR. But what if the actual loss cost is 0.75% (I would argue this is perfectly consistent with the two modelled outputs we’ve received 0.01%, 2%), then we’d actually be writing to a 230% GLR.

[1] https://forum.effectivealtruism.org/posts/sMjcjnnpoAQCcedL2/when-pooling-forecasts-use-the-geometric-mean-of-odds
[2] https://link.springer.com/article/10.1007/s11004-012-9396-3
[3] https://www.sciencedirect.com/science/article/abs/pii/S0169207013001635?subid1=20210831-0347-5383-a803-d0425dbe2b3b&via%3Dihub

Aggregating probability forecasts

Leave a Reply.

Author

Sign up to get updates when new posts are added

Categories

Archives

Aggregating probability forecasts

Leave a Reply.

Author

Sign up to get updates when new posts are added​

Categories

Archives

Sign up to get updates when new posts are added