Why do we use the Poisson Distribution as the default distribution for modelling Claims Frequency for an insurance portfolio?
Why do we even have a default? When we are setting up a frequencyseverity model to model claims from a insurance portfolio, we will normally approach the fitting of a frequency distribution and the fitting of the severity distribution quite differently. For the frequency distribution the standard approach is to attempt to fit a Poisson distribution, and only look at other distributions if the Poisson is not a good fit (even then we normally limit our search to Negative Binomial, and maybe Binomial at a stretch) When we fit a severity model however, we will often fit quite a large range of different continuous probability distribution to the empirical Claim Severity CDF using some sort of curve fitting software and then select the most appropriate curve. Some Distributions are used more often than others, for example, LogNormal, Pareto, Weibull are all common curves to use, but there is no single curve that we would assume the severity distribution conforms to by default. So why is a Poisson distribution a natural distribution to use to model claims frequency? And why is there no 'natural' distribution for claim severity? In this post I thought I would write up an interesting result that shows that a counting distribution which has a number of basic properties will be distributed with a Poisson distribution. We will then be able to see that are reasonable assumptions to make about Claims Frequency for an insurance portfolio.
Poisson Distribution
Before working through this result, here are a list of additional properties the Poisson Distribution has which make it easy to work with:
$$ N \sim Poi( \lambda )$$
Then
The Result
Let $A(t)$ for ($t >0$) denote the the number of claims in the interval $[0,t]$. With $A(0) = 0$.
Suppose:
$$P( A(t + \delta t)  A(t) \geq 2 ) = o( \delta t)$$
The Proof
Define $P_n (t) = P( A(t) = n ) $
We then examine the change in $P_n(t)$ over a time period $\delta t$ and then take the limit as $\delta t$ approaches $0$.
for $n > 0$: $$P_n(t + \delta t) = P_n(t)(1 − \lambda \delta t) + P_{n−1} (t) \lambda \delta t + o( \delta t) $$ for $n = 0$: $$P_0(t + \delta t) = P_0(t)(1 − \lambda \delta t) + o( \delta t)$$
This follows from the facts that there are two distinct ways for $n$ claims to happen in a time period $t + \delta t$. Either we get $n$ claims in time $t$ and no claims in $\delta t$, or $n1$ claims in time $t$ and one claim in $\delta t$.
We can rewrite these equations as: for $n > 0$: $$\frac { Pn(t + δt) − Pn(t)}{\delta t} = \frac {−P_n(t)( \lambda \delta t) + P_{n−1}(t) \lambda \delta t + o(\delta t)} { \delta t}$$ for $n = 0$: $$\frac {P_0 ( t + \delta t) − P_0(t)} {\delta t} = \frac {P_0(t)(− \lambda \delta t) + o(\delta t) } {\delta t}$$ Now take the limit as $\delta \to 0$ which gives: for $n>0$: $$\frac { d P_n (t)} {dt} = − \lambda P_n(t) + \lambda P_{n−1}(t)$$ for $n=0$: $$\frac {d P_0 (t)} {dt} =  \lambda P_0 (t)$$ From inspection we can see that $P_0 (t) = e^{ { \lambda} }$ The proof for the general case can easily be shown with induction. We therefore see that $P_n$ has a Poisson distribution.
Poisson as the natural distribution
So we see that, in so far as insurance claims occur in line with the assumptions (independently over the time interval, and only one at a time) we can expect the claims frequency to have a Poisson Distribution. In addition, the Poisson Distribution has a number of properties which make it easy to work with  having a single parameter, having simple formulas for the mean and variance, etc. Therefore, whenever we are fitting a claim frequency model, we will almost always try the Poisson Distribution first. 
AuthorI work as a pricing actuary at a reinsurer in London. Categories
All
Archives
April 2021
