This post is about two pieces of writing released this week, and how easy it is for even smart people to be wrong.
The story starts with an open letter written to the UK Government signed by 200+ scientists, condemning the government’s response to the Coronavirus epidemic; that the response was not forceful enough, and that the government was risking lives by their current course of action. The letter was widely reported and even made it to the BBC frontpage, pretty compelling stuff. Link [1] The issue is that as soon as you start scratching beneath the surface, all is not quite what it seems. Of the 200+ scientists, about 1/3 are PhD students, not an issue in and of itself, but picking out some of the subjects we’ve got:
Negative Binomial in VBA21/9/2019
Have you ever tried to simulate a negative binomial random variable in a Spreadsheet?
If the answer to that is ‘nope – I’d just use Igloo/Metarisk/Remetrica’ then consider yourself lucky! Unfortunately not every actuary has access to a decent software package, and for those muddling through in Excel, this is not a particularly easy task. If on the other hand your answer is ‘nope – I’d use Python/R, welcome to the 21st century’. I’d say great, I like using those programs as well, but sometimes for reasons out of your control, things just have to be done in Excel. This is the situation I found myself in recently, and here is my attempt to solve it: Attempt 0 The first step I took in attempting to solve the problem was of course to Google it, then cross my fingers and hope that someone else has already solved it and this is just going to be a simple copy and paste. Unfortunately when I did search for VBA code to generate a negative binomial random variable, nothing comes up. In fact, nothing comes up when searching for code to simulate a Poisson random variable in VBA. Hopefully if you've found your way here, looking for this exact thing, then you're in luck, just scroll to the bottom and copy and paste my code. When I Googled it, there were a few solutions that almost solved the problem; there is a really useful Excel addin called ‘Real statistics’ which I’ve used a few times: http://www.realstatistics.com/ It's a free excel addin, and it does have functionality to simulate negative bimonials. If however you need someone to be able to rerun the Spreadsheet, they also will need to have it installed. In that case, you might as well use Python, and then hard code the frequency numbers. Also, I have had issues with it slowing Excel down considerably, so I decided not to use this in this case. I realised I’d have to come up with something myself, which ideally would meet the following criteria
How hard can that be? Attempt 1 I’d seen a trick before (from Pietro Parodi’s excellent book ‘Pricing in General Insurance’) that a negative binomial can be thought of as a Poisson distribution with a Gamma distribution as the conjugate prior. See the link below for more details: https://en.wikipedia.org/wiki/Conjugate_prior#Table_of_conjugate_distributions Since Excel has a built in Gamma inverse, we have simplified the problem to needing to write our own Poisson inverse. We can then easily generate negative binomials using a two step process:
Great, so we’ve reduced our problem to just being able to simulate a Poisson in VBA. Unfortunately there’s still no built in Poisson inverse in Excel (or at least the version I have), so we now need a VBA based method to generate this. There is another trick we can use for this  which is also taken from Pietro Parodi  the waiting time for a Poisson dist is an Exponential Dist. And the CDF of an Exponential dist is simple enough that we can just invert it and come up with a formula for generating an Exponential sample. We then set up a loop and add together exponential values, to arrive at Poisson sample. The code for this is give below: Function Poisson_Inv(Lambda As Double) s = 0 N = 0 Do While s < 1 u = Rnd() s = s  (Application.WorksheetFunction.Ln(u) / Lambda) k = k + 1 Loop BH_Poisson_Inv = (k  1) End Function The VBA code for our negative binomial is therefore: Function NegBinOld2(b, r) Dim Lambda As Double Dim N As Long u = Rnd() Lambda = Application.WorksheetFunction.Gamma_Inv(u, r, b) N = Poisson_Inv(Lambda) NegBinOld2 = N End Function Does this do everything we want?
There are a couple of downside of though:
This leads us on to Attempt 2 Attempt 2 If we pass the VBA a random uniform sample, then whenever we hit refresh in the Spreadsheet the random sample will refresh, which will force the Negative Binomial to resample. Without this, sometimes the VBA will function will not reload. i.e. we can use the sample to force a refresh whenever we like. Adapting the code gives the following: Function NegBinOld(b, r, Rnd1 As Double) Dim Lambda As Double Dim N As Long u = Rnd1 Lambda = Application.WorksheetFunction.Gamma_Inv(u, r, b) N = Poisson_Inv(Lambda) NegBinOld = N End Function So this solves the refresh problem. What about the random seed problem? Even though we now always get the same lambda for a given rand – and personally I quite like to hardcode these in the Spreadsheet once I’m happy with the model, just to speed things up. We still use the VBA rand function to generate the Poisson, this means everytime we refresh, even when passing it the same rand, we will get a different answer and this answer will be nonreplicable. This suggests we should somehow use the first random uniform sample to generate all the others in a deterministic (but still pseudorandom) way. Attempt 3 The way I implemented this was to the set the seed in VBA to be equal to the uniform random we are passing the function, and then using the VBA random number generator (which works deterministically for a given seed) after that. This gives the following code: Function NegBin(b, r, Rnd1 As Double) Rnd (1) Randomize (Rnd1) Dim Lambda As Double Dim N As Long u = Rnd() Lambda = Application.WorksheetFunction.Gamma_Inv(u, r, b) N = Poisson_Inv(Lambda) NegBin = N End Function So we seem to have everything we want – a free, quick, solution that can be bundled in a Spreadsheet, which allows other people to rerun without installing any software, and we’ve also eliminated the forced refresh issue. What more could we want? The only slight issue with the last version of the negative binomial is that our parameters are still specified in terms of ‘b’ and ‘r’. Now what exactly are ‘b’ and ‘r’ and how do we relate them to our sample data? I’m not quite sure.... The next trick is shamelessly taken from a conversation I had with Guy Carp’s chief Actuary about their implementation of severity distributions in MetaRisk. Attempt 4 Why can't we reparameterise the distribution using parameters that we find useful, instead of feeling bound by using the standard statistics textbook definition (or even more specifically the list given in the appendix to ‘Loss Models – from data to decisions’, which seems to be somewhat of an industry standard), why can't we redefine all the parameters from all common actuarial distributions using a systematic approach for parameters? Let's imagine a framework where no matter which specific severity distribution you are looking at, the first parameter contains information about the mean (even better if it is literally scaled to the mean in some way), the second contains information about the shape or volatility, the third contains information about the tail weight, and so on. This makes fitting distributions easier, it makes comparing the goodness of fit of different distributions easier, and it make sense checking our fit much easier. I took this idea, and tied this in neatly to a method of moments parameterisation, whereby the first value is simply the mean of the distribution, and the second is the variance over the mean. This gives us our final version: Function NegBin(Mean, VarOMean, Rnd1 As Double) Rnd (1) Randomize (Rnd1) Dim Lambda As Double Dim N As Long b = VarOMean  1 r = Mean / b u = Rnd() Lambda = Application.WorksheetFunction.Gamma_Inv(u, r, b) N = Poisson_Inv(Lambda) NegBin = N End Function Function Poisson_Inv(Lambda As Double) s = 0 N = 0 Do While s < 1 u = Rnd() s = s  (Application.WorksheetFunction.Ln(u) / Lambda) k = k + 1 Loop BH_Poisson_Inv = (k  1) End Function Poisson Distribution for small Lambda23/4/2019
I was asked an interesting question a couple of weeks ago when talking through some modelling with a client.
We were modelling an airline account, and for various reasons we had decided to base our large loss modelling on a very basic topdown allocation method. We would take a view of the market losses at a few different return periods, and then using a scenario approach, would allocate losses to our client proportionately. Using this method, the frequency of losses is then scaled down by the % of major policies written, and the severity of losses is scaled down by the average line size. To give some concrete numbers (which I’ve made up as I probably shouldn’t go into exactly what the client’s numbers were), let's say the company was planning on taking a line on around 10% of the Major Airline Risks, and their average line was around 1%. We came up with a table of return periods for market level losses. The table looked something like following (the actual one was also different to the table below, but not miles off):
Then applying the 10% hit factor if there is a loss, and the 1% line written, we get the following table of return periods for our client:
Hopefully all quite straightforward so far. As an aside, it is quite interesting to sometimes pare back all the assumptions to come up with something transparent and simple like the above. For airline risks, the largest single policy limit is around USD 2.5bn, so we are saying our worst case scenario is a single full limit loss, and that each year this has around a 1 in 50 chance of occurring. We can then directly translate that into an expected loss, in this case it equates to 50m (i.e. 2.5bn *0.02) of pure loss cost. If we don't think the market is paying this level of premium for this type of risk, then we better have a good reason for why we are writing the policy!
So all of this is interesting (I hope), but what was the original question the client asked me? We can see from the chart that for the market level the highest return period we have listed is 1 in 50. Clearly this does translate to a much longer return period at the client level, but in the meeting where I was asked the original question, we were just talking about the market level. The client was interested in what the 1 in 200 at the market level was and what was driving this in the modelling. The way I had structured the model was to use four separate risk sources, each with a Poisson frequency (lambda set to be equal to the relevant return period), and a fixed severity. So what this question translates to is, for small Lambdas $(<<1)$, what is the probability that $n=2$, $n=3$, etc.? And at what return period is the $n=2$ driving the $1$ in $200$? Let’s start with the definition of the Poisson distribution: Let $N \sim Poi(\lambda)$, then: $$P(N=n) = e^{\lambda} \frac{ \lambda ^ n}{ n !} $$ We are interested in small $\lambda$ – note that for large $\lambda$ we can use a different approach and apply sterling’s approximation instead. Which if you are interested, I’ve written about here: www.lewiswalsh.net/blog/poissondistributionwhatistheprobabilitythedistributionisequaltothemean
For small lambda, the insight is to use a Taylor expansion of the $e^{\lambda}$ term. The Taylor expansion of $e^{\lambda}$ is:
$$ e^{\lambda} = \sum_{i=0}^{\infty} \frac{\lambda^i}{ i!} = 1  \lambda + \frac{\lambda^2}{2} + o(\lambda^2) $$
We can then examine the pdf of the Poisson distribution using this approximation: $$P(N=1) =\lambda e^{\lambda} = \lambda ( 1 – \lambda + \frac{\lambda^2}{2} + o(\lambda^2) ) = \lambda  \lambda^2 +o(\lambda^2)$$
as in our example above, we have:
$$ P(N=1) ≈ \frac{1}{50} – {\frac{1}{50}}^2$$
This means that, for small lambda, the probability that $N$ is equal to $1$ is always slightly less than lambda. Now taking the case $N=2$: $$P(N=2) = \frac{\lambda^2}{2} e^{\lambda} = \frac{\lambda^2}{2} (1 – \lambda +\frac{\lambda^2}{2} + o(\lambda^2)) = \frac{\lambda^2}{2} \frac{\lambda^3}{2} +\frac{\lambda^4}{2} + o(\lambda^2) = \frac{\lambda^2}{2} + o(\lambda^2)$$
So once again, for $\lambda =\frac{ 1}{50}$ we have:
$$P(N=2) ≈ 1/50 ^ 2 /2 = P(N=1) * \lambda / 2$$
In this case, for our ‘1 in 50’ sized loss, we would expect to have two such losses in a year once every 5000 years! So this is definitely not driving our 1 in 200 result.
We can add some extra columns to our market level return periods as follows:
So we see for the assumptions we made, around the 1 in 200 level our losses are still primarily being driven by the P(N=1) of the 2.5bn loss, but then in addition we will have some losses coming through corresponding to P(N=2) and P(N=3) of the 250m and 500m level, and also combinations of the other return periods.
So is this the answer I gave to the client in the meeting? …. Kinda, I waffled on a bit about this kind of thing, but then it was only after getting back to the office that I thought about trying to breakdown analytically which loss levels we can expect to kick in at various return periods. Of course all of the above is nice but there is an easier way to see the answer, since we’d already stochastically generated a YLT based on these assumptions, we could have just looked at our YLT, sorted by loss size and then gone to the 99.5 percentile and see what sort of losses make up that level. The above analysis would have been more complicated if we have also varied the loss size stochastically. You would normally do this for all but the most basic analysis. The reason we didn’t in this case was so as to keep the model as simple and transparent as possible. If we had varied the loss size stochastically then the 1 in 200 would have been made up of frequency picks of various return periods, combined with severity picks of various return periods. We would have had to arbitrarily fix one in order to say anything interesting about the other one, which would not have been as interesting. Gaming my office's Cheltenham Sweepstake31/3/2019 Every year we have an office sweepstake for the Cheltenham horse racing festival. Like most sweepstakes, this one attempts to remove the skill, allowing everyone to take part without necessarily needing to know much about horse racing. In case you’re not familiar with a Sweepstake, here’s a simple example of how one based on the World Cup might work:
Note that in order for this to work properly, you need to ensure that the team draw is not carried out until everyone who wants to play has put their money in – otherwise you introduce some optionality and people can then decide whether to enter based on the teams that are still left. i.e. if you know that Germany and Spain have already been picked then there is less value in entering the competition. The rules for our Cheltenham sweepstake were as follows: The Rules The festival comprises 7 races a day for 4 days, for a total of 28 races. The sweepstake costs £20 to enter, and the winnings are calculated as follows: 3rd place in competition = 10% of pot 2nd place in competition = 20% of pot 1st place in competition = 70% of pot Each participant picks one horse per race each morning. Points are then calculated using the following scoring system:
A running total is kept throughout the competition, and the winner is determined after the final race. The odds the scoring are based on are set using the odds printed in the Metro Newspaper on the morning of the races. (as an example, for a horse which has odds of 11/2 in the Metro  if the horse then places 1st, if we selected this horse, we would win (1+11/2)*5 = 32.5 points) Initial thoughts Any set of betting odds can be converted to an implied probability of winning, these would be the odds which over the long run would cause you to breakeven if the race were repeated multiple times with each horse winning a proportion equal to its probability of winning. Because the scoring in our sweepstake is based on the betting odds, using implied probabilities derived from the odds to help select our horses ends up cancelling itself out (which was the intention when designing the rules). The implied probability can be calculated as one over the odds As an aside, the bookie is indifferent to whether this is the correct probability of winning, they structures the odds purely on the ratio of how their customers are betting. They then structure the odds so that they make money on each race, regardless of which horse wins, for an explanation of this, see my post on creating a Dutchbook: www.lewiswalsh.net/blog/archives/122017 Here is an example showing how we would calculate the implied probabilities using some made up odds: We can then use the implied probabilities we just calculated to see what would happen if each horse finished in the first three positions. Once we have done this, we can then calculate the Expected Value of betting on each horse: We see that the payout varies for each horse, but the EV is the same. This is by design, the intention is that it should not matter which horse you bet on, the sweepstake rules equalise everyone. So what can we  an actuary who knows nothing much about horse racing  do? I don’t have any special knowledge that would allow me to select a horse which would beat the odds listed in the Metro, we appear to be at an impasse. I could attempt to build a model of which horse will win, and then select my horse based on that, but unless it proves to be more accurate than the Metro odds, I might as well just pick at random. Furthermore, if I could build such a model, then I could just start betting actual money. This probably shows you what a difficult problem this is! There's no such thing as free money. It would be a cool project to try, and it’s something I’ve been meaning to attempt for a while, but that’s best saved for another day. Attempt 1  Metro vs prerace odds My first thought was that we can exploit the difference in odds between those published in the Metro in the morning, and the latest odds published closer to the start of the race. It seems reasonable to assume that the odds just before the race should be more accurate than the metro odds. There will have been time for additional information to be included in the more up to date odds, e.g. the weather is worse than expected therefore horse x is expected to be slower than usual. Since the payout will be based on the Metro, we will then be able to maximise our EV by exploiting this differential. Our table will end up looking something like this: We see that we have very small variations in the EVs for some of the horses. It looks like according to this analysis Horse 3 would be the best selection as it has the highest EVs for 1st, 2nd, and 3rd. Based on this strategy, we would then go through each race and select the horse with the highest EV. Is this what I did? No, for a couple of reasons. The biggest issue was that the Metro did not publish odds in the morning for all races, meaning we couldn’t use the Metro, and therefore the rules of the sweepstake were amended to use the official prerace odds to calculate the payout instead. This meant there was only one set of odds used, and our edge disappeared! Even if we had used this method, there was a more fundamental issue  the margins we ended up with were tiny anyway. The Metro vs prerace odds did not swing wildly, meaning that even selecting the horse with the highest EV were only marginally better than picking at random. So, was there an alternative? Attempt 2  2nd and 3rd place odds My next attempt at an exploitative strategy was based on the insight that the payout multiplier for 2nd and 3rd place was based on the odds of the horse coming 1st, rather than the odds of the horse coming 2nd or 3rd. The expected value of a horse was not quite as I calculated above, it was actually: $$EV = P(1)*p_1 + P(2)*p_2 + P(3)*p_3$$ Above, we were using the implied probability of the horse coming first as a proxy for the probability it would come second and third. This is not the same, and some betting websites do allow you to bet on whether a horse will come 2nd or 3rd. For websites that do not allow you to bet directly on this, then we may still be able to calculate it from the odds for whether a horse finishes in the top 2 or 3 places. We just need to subtract out the implied probability of coming 1st from the probability of coming in the top 2, and then subtracting this out from coming top 3 etc. I therefore added some more columns to my table above, corresponding to the probability of the horses coming 2nd and 3rd, and then used this to calculate the EV instead. We see that the final column, Total EV, now has quite different values for each horse. In this case Horse 15, Seddon has an EV of 11.72. The favourite horse on the other hand  number 7  only has an EV of 6.2. The intuitive explanation of this is that the probability of Seddon coming first is very low – this is the reflected in the long odds of 67, this then gives us a large multiplier, but the odds of the horse coming second or third are actually relatively less far out – the fact that it is not the favourite actually increases the odds of it coming in a position which is not 1st. But then if does come in 2nd or 3rd, we would still apply the same large multiplier for the odds of it coming 1st. This then gives us our 2nd edge – we can gain a positive EV by focusing. As a thought experiment, imagine we have a race with three horses – horse A is a clear favourite, horse B is an average horse, and horse C is clearly the weakest. By betting on horse C – the odds of it winning should be very low, so the multiple should be very high, but then this multiple will be applied even if it comes 2nd or 3rd, which is exactly where it is expected to finish. This therefore suggests our next potential strategy – select horses which maximise our EV using the implied probabilities of the horses coming 2nd, 3rd etc. So is this what I did? Well kind of.... The issue with this approach is that typically the horses that provide the best EV also have very long odds. In the race analysed above, our horse has an EV of 11.7, but it only has a 7% chance overall of coming in the top 3. In race two for example, the horse with the best EV actually only had a 2.7% chance of coming in the top 3. Since there are only 28 races in total, if each horse we selected only had a 2.7% chance of coming in, then the probability of us getting 0 points overall in the entire competition would then be: $(12.7 \%)^{28} = 48 \%$ So there is roughly a 50% chance we will get 0 points! Alternatively, if we selected the favourite every time, we could expect it to come top 3 almost every time, and thereby guarantee ourselves points most races, but it also has the lowest EV. So we have what appears to be a risk vs reward trade off. Pick outsiders and give ourselves the highest EV overall, or pick the favourites thereby reducing our overall EV but also reducing our volatility. This leads us neatly to attempt 3  trying to think about how to maximise our probability of winning the competition rather than simply maximising EV for each race. Attempt 3  EP curves From the work above, we now have our model of the position each horse will finish in each race – using the 1st, 2nd, and 3rd implied probabilities  and we have the payout for each horse – using the odds of the horse coming 1st. We can then bring our selection of horse and these probabilities together in a summary tab and simulate our daily score stochastically using a Monte Carlo method. To do this we just need to turn the implied probabilities into a CDF, and lookup the value of each position and repeat 10,000 times. The output for this then ends up looking like the following, where the value is the number of points we will for a given race. So we see that across this sample of 20 simulations, most days we to end up with 0 points overall, but a few days have very high scores. So far so good! The next step is to set up an EP table of the following, which looks like the following: The EP table gives us the probability of exceeding various scores in the competition based on our horse selections. In this case, we see that there is a 1 in 20 chance of getting 453 points or greater in the day. This is useful even on its own – when I was deciding which horses to bet on, I simply played around with the selections until I got an EP table I was comfortable with. My reasoning was quite basic – I decided I wanted to maximise the 1 in 20 value. I wanted to give myself something like a 1/4 chance of winning the whole competition and a 3/4 chance of getting very few points. Since there were four days of races, dividing the 1/4 by another 4 suggests we should be looking at maximising the 1 in 20 level (I admit this reasoning was a bit rough, but it seemed to serve its purpose) The insight here is that the payout structure of the sweepstake is such that coming in the top 3 is all that matters, and in particular coming 1st is disproportionately rewarded. To see this, we can think of the daily selection of horses as attempting to maximise the EV of the overall prize rather than the EV of our overall score  maximising the EV of each race is only a means to this end. So we are actually interested in maximising the following: $0.7 * P(1st) + 0.2 + P(2nd) + 0.1 * P(3rd)$ Which will largely be dominated by P(1st), given the $0.7$ factor. This is largely the strategy I went for in the end. Attempt 4  Game theory? I’ve brushed over one difficulty above; in order to maximise our prize EV we need to consider not just which strategy we should take, but how this strategy will fare against the strategies that other people will take. If everyone is maximising their 1 in 20 return period then there’s little point us doing exactly the same. Luckily for me, most people were doing little more than picking horses randomly. We could then formalise this assumption, and come up with a numerical solution to the problem above. To do this, we would simulate our returns for each day across 10,000 simulations as above, but this time we would compare ourselves against a ‘base strategy’ of random selection of horses and we would simulate this base strategy for the approximately 40 people who entered. Each simulation would then give us a ranking we would finish in the competition, Here is an example of what that would look like: And we could then convert this into an EP table which would look like the following: So we see that if we select these horses, we end up having somewhere between a 1 in 10 and a 1 in 5 chance of winning the competition. Now that we have all of this set up, we can then optimise our horse selection to target a particular return period. I didn’t actually end up setting up the above for the sweepstake, but I suspect it would have been an improvement on my approach Attempt 5  Multiple day? There is a further refinement we can make to the above. We have so far only really been focusing on maximising our chance of winning by using a fixed strategy throughout the competition. But there is no reason we have to do this. After the first day, we should really be including the current scores of each competitor as part of our calculation of our ranking. i.e. if person 1 had a great day and now has 200 points but we had a bad day and still have 0 points, by accounting for this, the model should automatically increase our volatility i.e. start picking horses with longer odds so as to increase the chance of us catching up. If on the other hand, we had a really good first day and are now in the lead, the model should then automatically reduce our volatility and start selecting the favourites more often to help safely maintain our lead. How did it work in practice? I ended up placing 2nd, and taking 20% of the prize pot which was great! I was behind for most of the competition but then pulled back on the last day when a 66/1 came in 1st place, and I picked up 330 points off a single race. This may look miraculous, but is exactly how the model is supposed to work. Does that have any applicability to gambling generally? Unfortunately not, basically all of the work above is based on exploiting the specific scoring system of the sweepstake. There's no real way of apply this to gambling generally. Extending the Copula Method26/8/2018 If you have ever generated Random Variables stochastically using a Gaussian Copula, you may have noticed that the correlation of the generated sample ends up being lower than the value of the Covariance matrix of the underlying multivariate Gaussian Distribution. For an explanation of why this happens you can check out a previous post of mine: www.lewiswalsh.net/blog/correlationsfriedrichgaussandcopula. It would be nice if we could amend our method to compensate for this drop. As a quick fix, we can simply run the model a few times and fudge the Covariance input until we get the desired Correlation value. If the model runs quickly, this is quite easy to do, but as soon as the model starts to get bigger and slower, it quickly becomes impractical to run it three of four times just to get the output Correlation we desire. We can do better than this. The insight we rely on is that for a Gaussian Copula, the Pearson Correlation in the generated sample just depends on the Covariance Value. We can therefore create a precomputed table of Input and Output values, and use this to select the correct input value for the desired output. I wrote some R code to do just that, we compute a table of Pearson's Correlations obtained for various Input Covariance values when using the Gaussian Copula. a < library(MASS) library(psych) set.seed(100) m < 2 n < 10^6 OutputCor < 0 InputCor < 0 for (i in 1:100) { sigma < matrix(c(1, i/100, i/100, 1), nrow=2) z < mvrnorm(n,mu=rep(0, m),Sigma=sigma,empirical=T) u < pnorm(z) OutputCor[i] < cor(u,method='pearson')[1,2] InputCor[i] < i/10 } OutputCor InputCor Here is a sample from the table of results. You can see that the drop is relatively modest, but it does apply consistent across the whole table. Here is a graph showing the drop in values:
Updated Algorithm
We can then use the precomputed table, interpolating where necessary, to give us a Covariance value for our Multivariate Gaussian Distribution which will generate the desired Pearson Product Moment Correlation Value. So for example, if we would like to generate a sample with a Pearson Product Moment value of $0.5$, according to our table, we would need to use $0.517602$ as an input Covariance. We can test these values using the following code: a < library(MASS) library(psych) set.seed(100) m < 2 n < 5000000 sigma < matrix(c(1, 0.517602, 0.517602, 1), nrow=2) z < mvrnorm(n,mu=rep(0, m),Sigma=sigma,empirical=T) u < pnorm(z) cor(u,method='pearson') Analytic Formulas I tried to find an analytic formula for the Product Moment values obtained in this manner, but I couldn't find anything online, and I also wasn't able to derive one myself. If we could find one, then instead of using the precompued table, we would be able to simply calculate the correct value. While searching, I did come across a number of interesting analytic formulas linking the values of Kendall's Tau, Spearman's Rank, and the input Covariance.. All the formulas below are from Fang, Fang, Kotz (2002) Link to paper: www.sciencedirect.com/science/article/pii/S0047259X01920172 The paper gives the following two results, where $\rho$ is the Pearson's Product Moment
$$\tau = \frac{2}{\pi} arcsin ( \rho ) $$ $$ {\rho}_s = \frac{6}{\pi} arcsin ( \frac{\rho}{2} ) $$
We can then use these formulas to extend our method above further to calculate an input Covariance to give any desired Kendall Tau, or Spearman's Rank. I initially thought that they would link the Pearson Product Moment value with Kendall or Spearman's measure, in which case we would still have to use the precomputed table. After testing it I realised that it is actually linking the Covariance to Kendall and Spearman's measures. Thinking about it, Kendall's Tau, and Spearman's Rank are both invariant to the reverse Guassian transformation when moving from $z$ to $u$ in the algorithm. Therefore the problem of deriving an analytic formula for them is much simpler as one only has to link their values for a multivariate Guassian Distribution. Pearson's however does change, therefore it is a completely different problem and may not even have a closed form solution. As an example of how to use the above formula, suppose we'd like our generated data to have a Kendall's Tau of $0.4$. First we need to invert the Kendall's Tau formula: $$ \rho = sin ( \frac{ \tau \pi }{2} ) $$ We then plug in $\rho = 0.4 $ giving:
$$ \rho = sin ( \frac{ o.4 \pi }{2} ) = 0.587785 $$
Giving usan input Covariance value of $0.587785$
We can then test this value with the following R code:
a < library(MASS) library(psych) set.seed(100) m < 2 n < 50000 sigma < matrix(c(1, 0.587785, 0.587785, 1), nrow=2) z < mvrnorm(n,mu=rep(0, m),Sigma=sigma,empirical=T) u < pnorm(z) cor(z,method='kendall') Which we see gives us the value of $\tau$ we want. In this case the difference between the input Covariance $0.587785$, and the value of Kendall's Tau $0.4$ is actually quite significant. It's the second week of your new job Capital Modelling job. After days spent sorting IT issues, getting lost coming back from the toilets, and perfecting your new commute to work (probability of getting a seat + probability of delay * average journey temperature.) your boss has finally given you your first real project to work on. You've been asked to carry out an annual update of the Underwriting Risk Capital Charge for a minor part of the company's Motor book. Not the grandest of analysis you'll admit, this particular class only makes up about 0.2% of the company's Gross Written Premium, and the Actuaries who reserve the company's bigger classes would probably consider the number of decimal places used in the annual report more material than your entire analysis. But you know in your heart of hearts that this is just another stepping stone on your inevitable meteoric rise to Chief Actuary in the Merger and Acquisition department, where one day you will pass judgement on billion dollar deals inbetween expensive lunches with CFOs, and drinks with journalists on glamorous rooftop bars. The company uses inhouse reserving software, but since you're not that familiar with it, and because you want to make a good impression, you decide to carry out extensive checking of the results in Excel. You fire up the Capital Modelling Software (which may or may not have a name that means a house made out of ice), put in your headphones and grind it out. Hours later you emerge triumphant, and you've really nailed it, your choice of correlation (0.4), and correlation method (Gaussian Copula) is perfect. As planned you run extracts of all the outputs, and go about checking them in Excel. But what's this? You set the correlation to be 0.4 in the software, but when you check the correlation yourself in Excel, it's only coming out at 0.384?! What's going on? Simulating using Copulas The above is basically what happened to me (minus most of the actual details. but I did set up some modelling with correlated random variables and then checked it myself in Excel and was surprised to find that the actual correlation in the generated output was always lower than the input.) I looked online but couldn't find anything explaining this phenomenon, so I did some investigating myself. So just to restate the problem, when using Monte Carlo simulation, and generating correlated random variables using the Copula method. When we actually check the correlation of the generated sample, it always has a lower correlation than the correlation we specified when setting up the modelling. My first thought for why this was happening was that were we not running enough simulations and that the correlations would eventually converge if we just jacked up the number of simulations. This is the kind of behaviour you see when using Monte Carlo simulation and not getting the mean or standard deviation expected from the sample. If you just churn through more simulations, your output will eventually converge. When creating Copulas using the Gaussian Method, this is not the case though, and we can test this. I generated the graph below in R to show the actual correlation we get when generating correlated random variables using the Copula method for a range of different numbers of simulations. There does seem to be some sort of loose limiting behaviour, as the number of simulations increases, but the limit appears to be around 0.384 rather than 0.4. The actual explanation First, we need to briefly review the algorithm for generating random variables with a given correlation using the normal copula. Step 1  Simulate from a multivariate normal distribution with the given covariance matrix. Step 2  Apply an inverse gaussian transformation to generate random variables with marginal uniform distribution, but which still maintain a dependency structure Step 3  Apply the marginal distributions we want to the random variables generated in step 2 We can work through these three steps ourselves, and check at each step what the correlation is. The first step is to generate a sample from the multivariate normal. I'll use a correlation of 0.4 though out this example. Here is the R code to generate the sample: a < library(MASS) library(psych) set.seed(100) m < 2 n < 1000 sigma < matrix(c(1, 0.4, 0.4, 1), nrow=2) z < mvrnorm(n,mu=rep(0, m),Sigma=sigma,empirical=T) And here is a Scatterplot of the generated sample from the multivariate normal distribution: We now want to check the product moment correlation of our sample, which we can do using the following code: cor(z,method='pearson') Which gives us the following result: > cor(z,method='pearson') [,1] [,2] [1,] 1.0 0.4 [2,] 0.4 1.0 So we see that the correlation is 0.4 as expected. The Psych package has a useful function which produces a summary showing a Scatterplot, the two marginal distribution, and the correlation: Let us also check Kendall's Tau and Spearman's rank at this point. This will be instructive later on. We can do this using the following code: cor(z,method='spearman') cor(z,method='Kendall') Which gives us the following results: > cor(z,method='spearman') [,1] [,2] [1,] 1.0000000 0.3787886 [2,] 0.3787886 1.0000000 > cor(z,method='kendall') [,1] [,2] [1,] 1.0000000 0.2588952 [2,] 0.2588952 1.0000000 Note that this is less than 0.4 as well, but we will discuss this further later on.
We now need to apply step 2 of the algorithm, which is applying the inverse Gaussian transformation to our multivariate normal distribution. We can do this using the following code:
u < pnorm(z) We now want to check the correlation again, which we can do using the following code: cor(z,method='spearman') Which gives the following result: > cor(z,method='spearman') [,1] [,2] [1,] 1.0000000 0.3787886 [2,] 0.3787886 1.0000000 Here is the Psych summary again: u is now marginally uniform (hence the name). We can see this by looking at the Scatterplot and marginal pdfs above. We also see that the correlation has dropped to 0.379, down from 0.4 at step 1. The Pearson correlation measures the linear correlation between two random variables. We generated normal random variables, which had the required correlation, but then we applied a nonlinear (inverse Gaussian) transformation. This nonlinear step is the source of the dropped correlation in our algorithm. We can also retest Kendall's Tau, and Spearman's at this point using the following code: cor(z,method='spearman') cor(z,method='Kendall') This gives us the following result: > cor(u,method='spearman') [,1] [,2] [1,] 1.0000000 0.3781471 [2,] 0.3781471 1.0000000 > cor(u,method='kendall') [,1] [,2] [1,] 1.0000000 0.2587187 [2,] 0.2587187 1.0000000 Interestingly, these values have not changed from above! i.e. we have preserved these measures of correlation between step 1 and step 2. It's only the Pearson correlation measure (which is a measure of linear correlation) which has not been preserved. Let's now apply the step 3, and once again retest our three correlations. The code to carry out step 3 is below: x1 < qgamma(u[,1],shape=2,scale=1) x2 < qbeta(u[,2],2,2) df < cbind(x1,x2) pairs.panels(df) The summary for step 3 looks like the following. This is the end goal of our method. We see that our two marginal distributions have the required distribution, and we have a correlation between them of 0.37. Let's recheck our three measures of correlation. cor(df,method='pearson') cor(df,meth='spearman') cor(df,method='kendall') > cor(df,method='pearson') x1 x2 x1 1.0000000 0.3666192 x2 0.3666192 1.0000000 > cor(df,meth='spearman') x1 x2 x1 1.0000000 0.3781471 x2 0.3781471 1.0000000 > cor(df,method='kendall') x1 x2 x1 1.0000000 0.2587187 x2 0.2587187 1.0000000 So the Pearson has reduced again at this step, but the Spearman and Kendall's Tau are once again the same.
Does this matter?
This does matter, let's suppose you are carrying out capital modelling and using this method to correlate your risk sources. Then you would be underestimating the correlation between random variables, and therefore potentially underestimating the risk you are modelling. Is this just because we are using a Gaussian Copula? No, this is the case for all Copulas. Is there anything you can do about it? Yes, one solution is to just increase the input correlation by a small amount, until we get the output we want. A more elegant solution would be to build this scaling into the method. The amount of correlation lost at the second step is dependent just on the input value selected, so we could precompute a table of input and output correlations, and then based on the desired output, we would be able to look up the exact input value to use. Recently I've been reading The Mathematics of Poker (2009, Bill Chen and Jerrod Ankenman) and I came across an interesting idea that I thought I'd write about. For me, understanding how to analyse this situation really gets to the heart of how I think about poker. I'd love to spend more time playing and studying poker but it's such a timesink and I don't really have the oodles of free time it would require, but every now and again I'll still open up a poker book and read about something, this is one of those interesting topics I was reading about hopefully you find it as interesting as I do. Calling a shove preflop in heads up The scenario being analysed in the book is a relatively common situation, particularly in online poker where people are more inclined to shove than in real life games. The question is: How should we analyse whether to call when we have a moderately strong hand against an opponent who has gone allin pre flop. Let's set up an example so that we have something concrete to talk about. Say we have pocket Kings pre flop, and our opponent goes allin, how should we decide whether to call? Obviously without knowing our opponent's hand there is no 100% correct answer. There is however one very useful way of analysing the situation. Equity against a range We need to ask ourselves  what cards would are opponent go allin with, and how does my current hand fare against that range? i.e. we need to calculate our equity against our opponent's range. We are adding another layer of stochastic uncertainty on the event, instead of trying to guess what hand our opponent has (which is almost impossible) we are trying to guess what kind of hands he might go allin with (which is very much possible). We then take this extra level of uncertainty and calculate the correct mathematical way to proceed. On the one extreme, let's suppose that based on our read of how our opponent is playing, we might think that they would only go allin with very strong hands, in this case just pocket Aces. We would then be a 4:1 underdog if we call with Ks, and we should definitely fold. (In order to calculate this we could use any standard poker calculator like the following) www.cardplayer.com/pokertools/oddscalculator/texasholdem One the other hand, suppose we know for a fact that our opponent has not looked at their cards at all but has still decided to go all in. In this case we should definitely call. The only cards we will be behind are pocket Aces, which make up a small fraction of the possible hands that our opponent could shove with, and we will be ahead or equal against all other possible hands. Therefore we would have a positive EV when calling. What if our read on our opponent's range is somewhere in between though? What we need to do is calculate our equity against each individual hand in our opponent's range, and then calculate the probability of our opponent having a given hand from that range. That is to say, in order to calculate the conditional expectation, we need to calculate the unconditional expectations against each hand and then multiply by the conditional probability of our opponent having that hand, given our belief about our opponent's range. Numerical Example Let's go back to our numerical example, and suppose that we have pocket Kings, and we put our opponent on either Pocket Aces, Kings, Queens, or Jacks. All of these hands are equally likely, so there is a 25% chance of our opponent having each hand. We can look up our equity against each hand (after you've been playing for a while, you naturally start to memorise hand equities anyway) Our probability of winning is then: $$P(A) * P(beating A) + P(K)*P(beating K) + P(Q)*P(beating Q) + P(J) * P(beating J)$$ Putting in our values: $$ 0.25*0.2 + 0.25*0.5 + 0.25*0.8 + 0.25*0.8 = 0.575.$$ We therefore see we have a positive expectation against this range, and should call. No one actually thinks like this in real games? It is a general misconception that professional poker is a game where players are trying to guess exactly what hand their opponent has, are constantly trying to bluff each other, or trying to pick up on subtle tells or signs that their opponent is or isn't bluffing. The more mundane truth is that poker is ultimately a game of imperfect information, where the better player is the one who can correctly interpret the betting information their opponent is giving them, and can then quickly and accurately make the kind of judgements described above during a game. Obviously poker players are not carrying out these calculations in their head to multiple decimal places in real time, what they will do though is review their hands after a game, calculate exactly what they should have done, and try to build up an intuition as to what the correct answer is, so that in the middle of a game they can quickly make decisions. Software to Analyse this situation Is there an easy software based method way of calculating our equity against a range? After I did a quick google there are a few programs that offer this type of analysis. For example: combonator.com/ www.powerequilab.com/ More interestingly though, I also found the following open source software, that can be adapted to carry out this type of analysis: github.com/zekyll/OMPEval At some point, I might try to use this code to set up a page on this website to let people analyse this situation. The Reinstatement Premium payable following a loss to an Excess of Loss contract, is related to the ceded loss to the contract by a simple formula. Therefore, it seems reasonable that we should be able to come up with a simple formula relating the price charged for the Excess of Loss contract to the price charged for a Reinstatement Premium Protection (RPP) cover. I was in a meeting last week with two brokers who were trying to do just this. They had come up with an indicative price for an Excess of Loss programme and were trying to use this to price the equivalent RPP cover. At the time I didn't have an answer for them, and when I did a quick Google, nothing came up. When thinking about it subsequently though, there are a couple of easy approximate methods we can use. Below I discuss three different methods for how you can price an RPP cover, two of which do not require any stochastic modelling assuming you already know the price of the Excess of Loss layers. Let's quickly review what we mean by all these terms so that we are starting from the same point. What is a Reinstatement Premium? If you already understand how a Reinstatement Premium works, then feel free to skip this section. Most Excess of Loss contracts will have some form of reinstatement premium. This is a payment from the Insurer to the Reinsurer to reinstate the protection in the event of a loss. In the London market, most contracts willl have either $1$, $2$, or $3$ reinstatements and generally these will be payable at $100 \%$. What is a Reinstatement Premium Protection Cover? The reinstatement premiums can be quite a large proportion of the overall premium paid for the Excess of Loss reinsurance. This is especially the case for lower level, working layers. Furthermore, the Reinstatement Premium will be payable when the insurer has just suffered a loss. From the point of view of the insurer, this additional payment comes at the worst possible time. The Insurer is being asked to fork over another large premium to the Reinsurer, just after having suffered a loss. To address some of these concerns, Reinsurers developed a product called a Reinstatement Premium Protection cover (RPP cover). This cover pays the Insurer's Reinstatement Premium for them, which gives the insurer further indemnification in the event of a loss. Here's an example of how it works in practice: Let's suppose we are considering a $5m$ xs $5m$ Excess of Loss contract, there is one reinstatement at $100 \%$ (written $1$ @ $100 \%$), and the Rate on Line is $25 \%$. The Rate on Line is just the Premium divided by the Limit. So here, the Premium can be found by multiplying the Limit and the RoL: $$5m* 25 \% = 1.25m$$ So we see that the Insurer will have to pay the Reinsurer $1.25m$ at the start of the contract. Now let's suppose there is a loss of $7m$. The Insurer will recover $2m$ from the Resinsurer, but they will also have to make a payment to cover the reinstatement premium of: $\frac {2m} {5m} * (5m * 25 \% ) = 2m * 25 \% = 0.5m$ to reinstate the cover. So the Insurer will actually have to pay out $5.5m$. The RPP cover, if purchased by the insurer, would pay the additional $0.5m$ on behalf of the insurer Now that we know how it works, how would we price the RPP cover? Three methods for pricing an RPP cover Method 1  Full stochastic model If we have priced the original Excess of Loss layer ourselves using a Monte Carlo model, then it should be relatively straight forward to price the RPP cover. We can just look at the expected Reinstatements, and apply a suitable loading for profit and expenses. This loading will probably be broadly in line with the loading that is applied to the expected losses to the Excess of Loss layer, but accounting for the fact that the writer of the RPP cover will not receive any form of Reinstatement for their Reinsurance. What if we do not have a stochastic model set up to price the Excess of Loss layer? What if all we know is the price being charged for the Excess of Loss layer? Method 2  Simple formula This was the situation I was in when I was asked to price the RPP cover last week. The broker had come up with some very approximate pricing for a programme of Excess of Loss layers. This pricing was driven by the burning cost, and other commercial factors rather than any actuarial modelling. The brokers wanted to come up with an approximate price for the RPP cover, just based on the price of the Excess of Loss layer. The two should be related, as they pay out dependant on the same underlying losses. So what can we say? If we denote the Expected Losses to the layer by $EL$, then the Expected Reinstatement Premium should be: $$EL * ROL $$ To see this is the case, I used the following reasoning; if we had losses in one year equal to the $EL$ (I'm talking about actual losses, not expected losses here), then the Reinstatement Premium for that year would be the proportion of the layer which had been exhausted $\frac {EL} {Limit} $ multiplied by the Deposit Premium $Limit * ROL$ i.e.: $$ RPP = \frac{EL} {Limit} * Limit * ROL = EL * ROL$$ Great! So we have our formula right? The issue now is that we don't know what the $EL$ is. We do however know the $ROL$, does this help? If we let $DP$ denote the deposit premium, which is the amount we initially pay for the Excess of Loss layer and we assume that we are dealing with a working layer, then we can assume that: $$DP = EL * (1 + \text{ Profit and Expense Loading } ) $$ Plugging this into our formula above, we can then conclude that the expected Reinstatement Premiums will be: $$\frac {DP} { \text{ Profit and Expense Loading } } * ROL $$ In order to turn this into a price (which we will denote $RPP$) rather than an expected loss, we then need to load our formula for profit and expenses i.e. $$RPP = \frac {DP} {\text{ Profit and Expense Loading }} * ROL * ( \text{ Profit and Expense Loading } ) $$Which with cancellation gives us: $$RPP = DP * ROL $$ Which is our first very simple formula for the price that should be charged for an RPP. Was there anything we missed out though in our analysis? Method 3  A more complicated formula: There is one subtlety we glossed over in order to get our simple formula. The writer of the Excess of Loss layer will also receive the Reinstatement Premiums during the course of the contract. The writer of the RPP cover on the other hand, will not receive any reinstatement premiums (or anything equivalent to a reinstatement premium). Therefore, when comparing the Premium charged for an Excess of Loss layer against the Premium charged for the equivalent RPP layer, we should actually consider the total expected Premium for the Excess of Loss Layer rather than just the Deposit Premium. What will the additional premium be? We already have a formula for the expected Reinstatement premium: $$EL * ROL $$ Therefore the total expected premium for the Excess of Loss Layer is the Deposit Premium plus the additional Premium: $$ DP + EL * ROL $$ This total expected premium is charged in exchange for an expected loss of $EL$. So at this point we know the Total Expected Premium for the Excess of Loss contract, and we can relate the expected loss to the Excess of Loss layer to the Expected Loss to the RPP contract. i.e. For an expected loss to the RPP of $EL * ROL$, we would actually expect an equivalent premium for the RPP to be: $$ RPP = (DP + EL * ROL) * ROL $$ This formula is already loaded for Profit and Expenses, as it is based on the total premium charged for the Excess of Loss contract. It does however still contain the $EL$ as one of its terms which we do not know. We have two choices at this point. We can either come up with an assumption for the profit and expense loading (which in this hard market might be as little as only be $5 \%  10 \%$ ). And then replace $EL$ with a scaled down $DP$: $$RPP = \frac{DP} {1.075} * ( 1 + ROL) * ROL $$ Or we could simply replace the $EL$ with the $DP$, which is partially justified by the fact that the $EL$ is only used to multiply the $ROL$, and will therefore have a relatively small impact on the result. Giving us the following formula: $$RPP = DP ( 1 + ROL) * ROL $$ Which of the three methods is the best? The full stochastic model is always going to be the best in my opinion. If we do not have access to one though, then out of the two formulas, the more complicated formula we derived should be more accurate (by which I mean more actuarially correct). If I was doing this in practice, I would probably calculate both, to generate some sort of range, but tend towards the second formula. That being said, when I compared the prices that the Brokers had come up with, which is based on what they thought they could actually place in the market, against my formulas, I found that the simple version of the formula was actually closer to the Broker's estimate of how much these contacts could be placed for in the market. Since the simple formula always comes out with a lower price than the more complicated formula, this suggests that there is a tendency for RPPs to be underpriced in the market. This systematic underpricing may be driven by commercial considerations rather than faulty reasoning on the part of market participants. According to the Broker I was discussing these contracts with, a common reason for placing an RPP is to give a Reinsurer who does not currently have a line on the underlying Excess of Loss layer, but who would like to start writing it, a chance to have an involvement in the same risk, without diminishing the signed lines for the existing markets. So let's say that Reinsurer A writes $100 \%$ of the Excess of Loss contract, and Reinsurer B would like to take a line on the contract. The only way to give them a line on the Excess of Loss contract is to reduce the line that Reinsurer A has. The insurer may not wish to do this though if Reinsurer A is keen to maintain their line. So the Insurer may allow Reinsurer B to write the RPP cover instead, and leave Reinsurer A with $100 \%$ of the Excess of Loss contract. This commercial factor may be one of the reasons that traditionally writers of an RPP would be inclined to give favourable terms relative to the Excess of Loss layer so as to encourage the insurer to allow them on to the main programme and to encourage them to allow them to wrte the RPP cover at all. Moral Hazard One point that is quite interesting to note about how these deals are structured is that RPP covers can have quite a significant moral hazard effect on the Insurer. The existence of Reinstatement Premiums is at least partially a mechanism to prevent moral hazard on the part of the Insurer. To see why this is the case, let's go back to our example of the $5m$ xs $5m$ layer. An insurer who purchases this layer is now exposed to the first $5m$ of any loss. But they are indemnified for the portion of the loss above $5m$, up to a limit of $5m$. If the insurer is presented with two risks which are seeking insurance  one with a total sum insured of $10m$, and another with a total sum insured of $6m$, the net retained exposure is the same for both risks from the point of view of the insurer. By including a reinstatement premium as part of the Excess of Loss layer, an therefore ensuring that the insurer has to make a payment any time a loss ceded to the layer, the reinsurer is ensuring that the insurer keeps their financial incentive to not have losses in this range. By purchasing an RPP cover, the insurer is removing their financial interest in losses which are ceded to the layer. There is an interesting conflict of interest in that the RPP cover will almost always be written by a different reinsurer to the Excess of Loss layer. The Reinsurer that is writing the RPP cover is therefore increasing the moral hazard risk whichever Reinsurer has written the Excess of Loss layer. Which will almost always be business written by one of the Reinsurer's competitors! Working Layers and unlimited Reinstatements Another point to note is that this pricing analysis makes a couple of implicit assumptions. The first is that there is a sensible relationship between the expected loss to the layer and the premium charged for the layer. This will normally only be the case for 'working layers'. These are layers to which a reasonable amount of loss activity is expected. If we are dealing with clash or other higher layers, then the pricing of these layers will be more heavily driven by considerations beyond the expected loss to the layer. These might be capital considerations on the part of the Reinsurer, commercial considerations such as Another implicit assumption in this analysis is that the reinstatements offered are unlimited,. If this is not the case, then the statement that the expected reinstatement is $EL * ROL$ no longer holds. If we have limited reinstatements (which is the case in practice most of the time) then we would expect the expected reinstatement to be less than or equal to this. Compound Poisson Loss Model in VBA13/12/2017 I was attempting to set up a Loss Model in VBA at work yesterday. The model was a CompoundPoisson FrequencySeverity model, where the number of events is simulated from a Poisson distribution, and the Severity of events is simulated from a Severity curve. There are a couple of issues you naturally come across when writing this kind of model in VBA. Firstly, the inbuilt array methods are pretty useless, in particular dynamically resizing an array is not easy, and therefore when initialising each array it's easier to come up with an upper bound on the size of the array at the beginning of the program and then not have to amend the array size later on. Secondly, Excel has quite a low memory limit compared to the total available memory. This is made worse by the fact that we are still using 32bit Office on most of our computers (for compatibility reasons) which has even lower limits. This memory limit is the reason we've all seen the annoying 'Out of Memory' error, forcing you to close Excel completely and reopen it in order to run a macro. The output of the VBA model was going to be a YLT (Yearly Loss Table), which could then easily be pasted into another model. Here is an example of a YLT with some made up numbers to give you an idea: It is much quicker in VBA to create the entire YLT in VBA and then paste it to Excel at the end, rather than pasting one row at a time to Excel. Especially since we would normally run between 10,000 and 50,000 simulations when carrying out a Monte Carlo Simulation. We therefore need to create and store an array when running the program with enough rows for the total number of losses across all simulations, but we won't know how many losses we will have until we actually simulate them. And this is where we come across our main problem. We need to come up with an upper bound for the size of this array due to the issues with dynamically resizing arrays, but since this is going to be a massive array, we want the upper bound to be as small as possible so as to reduce the chance of a memory overflow error. Upper Bound What we need then is an upper bound on the total number of losses across all the simulations years. Let us denote our Frequency Distribution by $N_i$, and the number of Simulations by $n$. We know that $N_i$ ~ $ Poi( \lambda ) \: \forall i$. Lets denote the total size of the YLT array by $T$. We know that $T$ is going to be: $$T = \sum_{1}^{n} N_i$$ We now use the result that the sum of two independent Poisson distributions is also a Poisson distribution with parameter equal to the sum of the two parameters. That is, if $X$ ~ $Poi( \lambda)$ , and $Y$ ~ $Poi( \mu)$, then $X + Y$ ~ $Poi( \lambda + \mu)$. By induction this result can then be extended to any finite sum of independent Poisson Distributions. Allowing us to rewrite $T$ as: $$ T \sim Poi( n \lambda ) $$ We now use another result, a Poisson Distribution approaches a Normally Distribution as $ \lambda \to \infty $. In this case, $ n \lambda $ is certainly large, as $n$ is going to be set to be at least $10,000$. We can therefore say that: $$ T \sim N ( n \lambda , n \lambda ) $$ Remember that $T$ is the distribution of the total number of losses in the YLT, and that we are interested in coming up with an upper bound for $T$. Let's say we are willing to accept a probabilistic upper bound. If our upper bound works 1 in 1,000,000 times, then we are happy to base our program on it. If this were the case, even if we had a team of 20 people, running the program 10 times a day each, the probability of the program failing even once in an entire year is only 4%. I then calculated the $Z$ values for a range of probabilities, where $Z$ is the unit Normal Distribution, in particular, I included the 1 in 1,000,000 Z value. We then need to convert our requirement on $T$ to an equivalent requirement on $Z$. $$ P ( T \leq x ) = p $$ If we now adjust $T$ so that it can be replaced with a standard Normal Distribution, we get: $$P \left( \frac {T  n \lambda} { \sqrt{ n \lambda } } \leq \frac {x  n \lambda} { \sqrt{ n \lambda } } \right) = p $$
Now replacing the left hand side with $Z$ gives:
$$P \left( Z \leq \frac {x  n \lambda} { \sqrt{ n \lambda } } \right) = p $$
Hence, our upper bound is given by:
$$T \lessapprox Z \sqrt{n \lambda} + n \lambda $$
Dividing through by $n \lambda $ converts this to an upper bound on the factor above the mean of the distribution. Giving us the following:
$$ T \lessapprox Z \frac {1} { \sqrt{n \lambda}} + 1 $$
We can see that given $n \lambda$ is expected to be very large and the $Z$ values relatively modest, this bound is actually very tight.
For example, if we assume that $n = 50,000$, and $\lambda = 3$, then we have the following bounds: So we see that even at the 1 in 1,000,000 level, we only need to set the YLT array size to be 1.2% above the mean in order to not have any overflow errors on our array. References (1) Proof that the sum of two independent Poisson Distributions is another Poisson Distribution math.stackexchange.com/questions/221078/poissondistributionofsumoftworandomindependentvariablesxy
(2) Normal Approximation to the Poisson Distribution.
stats.stackexchange.com/questions/83283/normalapproximationtothepoissondistribution I think the first time I read about the Difference Engine was actually in the novel of the same name by William Gibson and Bruce Sterling. The book is an alternative history, set in the 19th centuary where Charles Babbage, actually finished building the Difference Engine, a Mechanical calculator he designed. This in turn lead to him getting funding for the Analytical Engine, a Turingcomplete mechanical computer which Babbage also designed in real life, but also didn't actually finish building. I really enjoyed the book, but how plausible was this chain of events? How did the Difference Engine work? And what problem was Babbage trying to solve when he came up with the idea for the Difference Engine? Computers before Computers Before electronic computers were invented, scientist and engineers were forced to use various tricks and short cuts to enable them to carry out difficult calculations by hand. One short cut that was used extensively, and one which Babbage would have been very familiar with, was the use of log tables to speed up multiplication. A log tables is simply a table which lists the values of the logarithmic function. If like me you've never used a log table, then how are they useful? Log Tables The property of logarithms that makes them useful in simplifying certain calculations is that: $ log(AB) = log(A) + log(B) $ We use this property often in number theory when we wish to turn a multiplicative problem in to an arithmetic problem and vice versa. In this case it's more straightforward though. If we want to calculate $A*B$ we can convert the figures into logs, and then we just need to add the logs together and convert back to obtain the value of $A*B$. Lets say we want to calculate $ 134.7 * 253.9 $ . What we can do instead is calculate: $ log( 134.7) = 2.1294 $ and $ log (253.9) = 2.4047 $ then we just need to add together $ 2.1294 + 2.4047 = 4.5341 $ and convert back from logs
$ 10^{4.5340} = 34200 $ which we can easily verify as the number we require. Haven't we just made our problem even harder though? Before we needed to multiply two numbers, and now instead we need to do two conversions and an addition, admittedly it's easier to add two large numbers together than multiply them, but what about the conversion? The way around this is to have a book of the values of the log function so that we can just look up the log of any number we are interested in, allowing us to easily convert to and from logs. This is probably a good point to introduce Charles Babbage properly, Babbage was an English Mathematician, Inventor, and Philosopher born in 1791. He was a rather strange guy, as well as making important contributions to Computer Science, Mathematics and Economics, Babbage founded multiple societies. One society was founded to investigate paranormal activity, one was founded to promote the use of Leibniz notation in calculus, and another was founded in an attempt to foster support for the banning of organ grinders. When he wasn't keeping himself busy investigating the supernatural, Babbage was also a keen astronomer. Since astronomy is computation heavy, this meant that Babbage was forced to make extensive use of the log tables that were available at the time. These tables had all been calculated by hand by people called computers. Being a computer was a legitimate job at one point, they would sit and carry out calculations by hand all day every day, not the most exciting job if you ask me. Because the log tables had all been created by teams of human calculators, errors had naturally crept in. The tables were also very expensive to produce. This lead Babbage to conceive of a mechanical machine for calculating log tables, he called this invention the Difference Engine. Method of differences A difference engine uses the method of finite differences to calculate the integer values of polynomial functions. I remember noticing something similar to the method of finite differences when I was playing around trying to guess the formulas for sequences of integers at school. If someone gives you an integer sequence and they ask you to find the formula, say we are given, $1,4,7,10,13,...$ Then, in this case, the answer is easy, we can see that we are just adding 3 each time. To make this completely obvious, we can write in the differences between the numbers.. 1  4  7  10  13  ... 3 3 3 3 What if we are given a slightly more complex sequence though? For example: $1,3,6,10,15,...$ This one is a bit more complicated, let's see what happens when we add in the differences again: 1  3  6  10  15 2 3 4 5 Now we see that there is an obvious pattern in the number we are adding on each time. Looking at the differences between these numbers we see: 1  3  6  10  15 2 3 4 5 1 1 1 So what's happened here? We now have stability on the second level of the differences. It turns out that this is equivalent to the underlying formula being a quadratic. In this case the formula is $0.5*x^2+1.5x+1$. If we assume the first number in the sequence is equivalent to $x=0$ we can now easily recreate the sequence, and easily calculate the next value. Let's try a difficult example, if we are given the following sequence and told to guess the next value, we can use a similar method to get the answer. $2,5,18,47,98,177$ Setting up the method: 2  5  18  47  98  177 3 13 29 51 79 10 16 22 28 6 6 6 Since we get a constant at the third level, this sequence must be a cubic, once we know this, it's much easier to guess what the actual formula is. In this case it is x^3x^2x+3. Babbage's insight was that we can calculate the next value in this sequence just by adding on another diagonal to this table of differences. Adding $6$ to $28$ gives $34$, then adding $34$ to $79$ gives $113$, and then adding $113$ to $177$ gives us $290$. Which means that the next value in the sequence is $290$ So we get: 2  5  18  47  98  177  290 3 13 29 51 79 113 10 16 22 28 34 6 6 6 6 As you might guess, this process generalises to higher order polynomials. For a given sequence, if we keep trying the differences of the differences and eventually get to a constant then we will know the sequence is formed by a polynomial, and we will also know the degree of the polynomial. So if you are ever a given an integer sequence and asked to find the pattern, always check the differences and see if it eventually becomes a constant, if it does, then you will know the order of the polynomial which defines the sequence and you will also be able to easily compute the next value directly. So how does this apply to Babbage's difference engine? The insight is that we have here a method of enumerating the integer values of a polynomial just using addition. Also at each stage we only need to store the values of the leading diagonal. And each polynomial is uniquely determined by specifying its differences. The underlying message is that multiplication is difficult. In order to come up with a shortcut for multiplication, we use log tables to make the problem additive. And even further, we now have a method for calculating the log function which also avoids multiplication. So given we have this method of calculating the integer values of a polynomial, how can we use this to calculate values of the log function? Approximating Functions The obvious way to approximate a log function with a polynomial would be to just take its Taylor expansion. For example, the Taylor expansion of $ log ( 1  x ) $ is: $ log ( 1  x ) =  \sum^{\infty}_{n=1} \frac{x^n}n $ There is a downside to using the Taylor expansion though. Given the mechanical constraints at the time, Babbagge's Difference Engine could only simulate a 7th degree polynomial. So how close can we get with a Taylor expansion? We can use Taylor's theorem to calculate the convergence of the approximation, but this will be quite a bit of work, and since we can easily calculate the actual value of the log function it's easier to just test the approximation with a computer. So taking $log(0.5)$ as an example, when I use a calculator, I am told that it equals $0.6931$, but when I check the 7th order polynomial I get $0.6923$, and it's not until I get to the 10th polynomial that we are accurate even to 4 digits. If we require a more accurate approximation, we will have to use numerical methods in conjunction with a restriction on the range of convergence. This would mean that if we wished to compute $log(x)$ on the interval [0,1], for 100 different points on the interval, we would break [0,1] into subintervals and then use a different polynomial fit for each subinterval. If you'd like to read more about how the actual machine worked then the Wikipedia is really useful. en.wikipedia.org/wiki/Difference_engine And if you are interested in reading more about the mathematics behind using a Difference Engine then the following website is really good: edthelen.org/bab/babintro.html Driverless, Autonomous, Selfdriving, robotic, drone cars, whatever you want to call them, I think selfdriving cars are going to be awesome.. The potential benefits include:
But how far away are we from this being a reality? It seems like we are constantly being told that selfdriving cars are just on the horizon, and that wide spread use of selfdriving cars will arrive sooner than we think. It got me thinking though, surely even when manufacturers start churning out driverless cars, isn't it still going to take a considerable amount of time before they begin to replace all the cars currently being driven? Most people will not suddenly go out and replace their current car the moment selfdriving cars are available on the market. Replacing all the old cars Almost all cars that are currently being driven today will never be selfdriving, it will only be new cars, after a certain point, that will start to be selfdriving. So even if all new cars from now on were to be self driving, there would still be a delay as old driven cars were slowly replaced by self driving cars. I thought I'd try to do some modelling to see how quickly would this process might take place. As a starting point, let's assume that we will start to see selfdriving cars being produced by 2019. I found the following report from the Department of Transport which details the number of cars on the road today, and the number of new cars registered every year. www.gov.uk/government/uploads/system/uploads/attachment_data/file/608374/vehiclelicensingstatistics2016.pdf The important statistics are: There are currently $37.3$m cars on the road Each year the number of registered cars increases by approximately $600,000$ Around $3.3$m new cars are registered every year. I then extrapolated these statistics based on three different scenarios: Scenario 1  All new cars from 2019 onward are driverless We can see that even under this very optimistic scenario, it's not until 2025 that we will see a majority of cars on the road being driverless. It's probably not reasonable though to assume that all new cars produced after 2019 will be driverless, so let's look at the effect of slowing increasing the proportion of new cars that are driverless. Scenario 2  Assuming linear increase in % of new cars produced which are Driverless between 2019 and 2030 In this scenario we assume that in 2018 all new cars are driven, and that by 2030 all new cars are driverless, and we assume a linear increase in the % of new cars which are driverless between these two years. We see that under this scenario, it's not until 2030 that we start to see a majority of driverless cars on the road. To get an alternative view, let's look at a quicker rate of adoption, let's suppose instead that by 2025, all new cars will be driverless. Now we see that a majority of cars are driverless by around 2027, with a strong majority emerging by 2030. Conclusion Even when we assume that driverless cars will start to be produced by 2019, based on current trends of car replacement, and depending on the speed at which self driving cars are produced, we shouldn't expect a majority of cars on the road to be driverless until at least the late 20s or maybe even early 30s. So when analysts say that driverless cars will be common much sooner than people expect, they need to be careful about how they define common. Fact 1  If Airline Companies were ran like Rocket Companies, a trip from London to New York would cost about $300,000. Prior to SpaceX, launching a a Satellite into orbit was expensive. Really, really expensive. The most commonly used launch vehicle was the Ariane 5, manufactured by the French Company Arianespace, and costing somewhere in the region of USD75 million per launch. SpaceX have stated that their eventual hope is target a cost of USD6 million per launch. How do they hope to achieve a more than 10 times cost reduction? Better Economies of Scale? Improved Manufacturing Processes? Cheap Labour? 3d Printing? Actually none of the above. The answer is actually kind of obvious once you hear it though. They want to make rockets that can be used more than once! As a though experiment, let's apply the concept of reusability to a familiar mode of transport. If we take the most common airliner in the world  the Boeing 737  then we can compare the cost of a flight with full reusability against the cost of the flight without reusability. A new Boeing 737 costs in the region of USD75 million (depending on the exact model) whereas a return ticket to New York on a Boeing 737 only costs about USD600 per seat. Multiplying the number of seats (~250) by the cost of a one way ticket (~USD300) we see that the total cost of the flight is somewhere around the USD75,000. So each flight costs about 0.1% of the total cost of the airliner. Suppose though, that like a rocket, a 737 could only be flown once. Then all of a sudden, instead of costing less than USD75,000 per flight, the flight would include the full cost of the 737, meaning each ticket would cost around USD300,000! By creating rockets that are reusable, SpaceX hopes to get similar cost savings in the arena of launch vehicles. SpaceX is actually quite close to achieving full reusability, here is a cool video of a Falcon 9 rocket making a vertical landing on a drone ship in April 2016. Fact 2  It is easier to land the rocket on a Drone Ships than the original launch site In the video above, the Falcon 9 lands on a floating platform in the ocean. Given landing on a ship in the ocean seems to add another layer of complexity to an already difficult task why did SpaceX decide to do this? A few ideas spring to mind, you might think that this is because the rocket is not sufficiently accurate and the drone ship has to be able to move around to get into the right place, or possibly that the ocean is a safer environment to land on given the lack of nearby people or property to damage. Or perhaps SpaceX are doing this as a form of marketing given the fact that it looks pretty cool. However, the actual reason is surprisingly prosaic, and also, once you think about it, kind of obvious again. It'all has to do with how you get a satellite into orbit in the first place. When a rocket attempts to get a satellite into orbit it does not just go vertically up in the air, in order for a satellite to be in a stable orbit, it actually needs to be moving perpendicular to the earth's surface at a very high speed (which can be calculated based on the height of the orbit) in order to not fall back to earth. I wrote a model in Python of a two stage rocket launching a satellite into orbit around earth. I then ran the model multiple times keeping the overall thrust of the rocket exactly the same every time, but varying the initial direction of acceleration. Using this model we can see the large effect that varying the path the rocket takes has on the ease in which a satellite can be put into orbit. In the first video, the launch rocket is sent almost vertically up in the air, and the satellite (the blue part) makes very little progress before it crashes back down to earth. In the second attempt, we angle the rocket towards the horizon more, in this case, by 55 degrees. And we can see that whilst the satellite does not make it into orbit, it does come closer than the first attempt. In the final attempt, we angle the rocket at a 45 degree angle to the horizon. Now we see that the satellite does in fact make it into orbit. Note that the only thing we have changed in all these attempts is the angle of launch, the acceleration of the rocket has not been changed at all. The point to note from all this is that in order to get a satellite into orbit, the rocket that was used to put it into orbit also needs to have a lot of horizontal velocity. There are quite a few simplifications in our model. For example, we have not included air resistance which will have an impact as in practice there will be a benefit in gaining as much vertical height as possible initially so as to reduce air resistance. The model also does not take account of the fact that the mass of the rocket will reduce as it gets higher due to the fuel expended. However neither of these simplifications effects the the general principle of needing horizontal acceleration. To tie it back to our original point, from watching the videos you might be able to tell the problem that SpaceX found themselves dealing with when using this approach. The first stage of the rocket ends up miles away from the launch site! It turns out that for SpaceX, given the location of their launch site, the landing site for these rockets tends to be in the ocean, and this is why that they land the rockets on barges in the ocean.
Fact 3  The floating ships are named after ships in Iain M. Banks novels
SpaceX have two landing barges (also known as Autonomous Spaceport Drone Ships) that they use to land the rockets on. The barges are called 'Just Read The Instructions' and 'Of Course I Still Love You'. These may seem like very unusual names for SpaceX to pick, unless you spot that they are in fact names of spaceships in Iain M. Banks' Culture series, a series of scifi books set in a Utopian future with an interstellar human race. Python Code In case anyone is interested, I have pasted the Python code for the two stage rocket model below. My code was partially based on the following model, which simulates the nbody problem in Python. fiftyexamples.readthedocs.io/en/latest/gravity.html import turtle import math turtle.color("White") turtle.sety(100) turtle.color("Black") turtle.circle(100) turtle.seth(90) turtle.color("Red") Vx = 0.3 Vy = 0.3 SecondStage = False for sec in range(3500): CurrentX = turtle.xcor() CurrentY = turtle.ycor() d = math.sqrt(CurrentX ** 2 + CurrentY ** 2) f = 17 / (d ** 2) theta = math.atan2(CurrentY, CurrentX) fx = math.cos(theta) * f fy = math.sin(theta) * f Vx += fx Vy += fy if turtle.pencolor() == "Red": turtle.goto(turtle.xcor() + Vx, turtle.ycor() + Vy) if d < 100: turtle.color("white") if sec == 250: SecondStage = True New = turtle.Turtle() New.color("white") NewCurrentX = CurrentX NewCurrentY = CurrentY New.goto(NewCurrentX,NewCurrentY) New.color("blue") VNx = Vx + 0.05 VNy = Vy  0.2 if SecondStage == True: NewCurrentX = New.xcor() NewCurrentY = New.ycor() d = math.sqrt(NewCurrentX ** 2 + NewCurrentY ** 2) f = 17 / (d ** 2) theta = math.atan2(NewCurrentY, NewCurrentX) fx = math.cos(theta) * f fy = math.sin(theta) * f VNx += fx VNy += fy New.goto(New.xcor() + VNx, New.ycor() + VNy) if d < 100: New.color("white") turtle.exitonclick() "I don't know what you mean by 'glory,' " Alice said. Humpty Dumpty smiled contemptuously. "Of course you don't—till I tell you. I meant 'there's a nice knockdown argument for you!' " "But 'glory' doesn't mean 'a nice knockdown argument'," Alice objected. "When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean—neither more nor less." "The question is," said Alice, "whether you can make words mean so many different things." "The question is," said Humpty Dumpty, "which is to be master—that's all." I don't think Lewis Carroll had 'Big Data' or 'Machine Learning' in mind when he penned these words, however I think the quote is quite apt in this context. All to often these buzzwords seem to fall foul to the Humpty Dumpty principle, they mean just what the speaker chooses them to mean  regardless of what the words actually mean to anyone else. So what do these terms actually mean? Machine Learning The field of study which investigates algorithms that give computers the ability to learn without being explicitly programmed. What do we mean by ‘learn’ in this context? The definition used by Machine Learning practitioners, originally stated by Arthur Samuel is: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E." So what problems can Machine Learning algorithms be applied to? The main advances in machine learning have been in the following areas:
A trait shared by all these problems is that previously computers were thought to be incapable of tackling them. This is one reason why Machine Learning is such an exciting and growing field of study. If you'd like to know more about Machine Learning then Andrew Ng at Stanford University has released a really good free online course through Coursera which can be accessed through the following link: Big Data Big Data can be defined as data which conforms to the 3Vs. Big Data is available at a higher volume, higher velocity (rate at which data is generated) and/or greater variety than normal data sources. So for example, looking at an insurance company, claims data would not count as Big Data, the volume will be fairly low, velocity will be slow, and variety will be fairly uniform. The browsing patterns of an aggregator website on the other hand would count as Big Data. For example, the amount of time someone spends on Comparethemarket.com, their clicks, what they search for, how many searches they make, how often they return to the website before making a purchase, etc. would count as Big Data. There would be a massive volume of data to analyse and the data would be available in real time. (It wouldn’t meet the variety criteria, but that’s not a necessary condition) Due to the need to extract useful information from Big Data, and the difficulties created by the 3Vs, we cannot rely on traditional methods of data analysis. Given the volume and velocity of Big Data, we require methods of analysis that does not need to be programmed explicitly, this is where Machine Learning fits in. Machine Learning in the guise of speech and handwriting recognition can also be important if the data generated is in audio form but needs to be combined with other data. Data Mining Data Mining is a catch all term for the process of analysing and summarising data into useful information. Data may be in the form of Big Data, and methods used may be based on Machine Learning (where the algorithm learns from the data) or may be more traditional. Data Visualisation Data Visualisation is the process of creating visual graphics that aid in understanding and exploring data. It has become increasingly important for two reasons, firstly, the rise in the volume of data sets means that new methods are required to understand data, secondly, an increase in computing power means that more advanced visualisation techniques are now possible. Data Science Data Science is a broad term which encompasses processes which aim to extract knowledge or insight from Data. Data science therefore includes all the previous fields. For example, in carrying at an analysis, we will first collect our data, which may or may not be in the form of Big Data, we will then mine our data, possibly using machine learning, and then present our results through Data Visualisation. Retirement at 120?14/8/2016 If people could live forever, then would they still be able to retire or would pension schemes become bankrupt from having to pay out never ending streams of money to eternally youthful pensioners? This might sound like an overly academic question, however I've noticed a glut of articles recently about the prospects of substantially extending the human lifespan, for example: http://gizmodo.com/peterthielisrightaboutonething1785104345 And a feature in this week's economist: http://www.economist.com/news/leaders/21704791sciencegettinggripswaysslowageingrejoicelongsideeffectscanbe Many of these articles raise the point that extending the human lifespan would have an effect on the affordability of retirement. Claiming that retirement as we know it would come to an end. Is this correct though? Most of the articles that I found simply state that retirement would now be impossible without explaining precisely why. I thought I'd do some calculations and see how much of an effect increased lifespans would have on pension scheme funding, It turns out that the situation is not as bad as it seems. First we need to introduce an actuarial technique that allows us to calculate the cost of paying someone a pension into the future. Perpetuity Suppose that we have a payment stream of amount $£1$, payable annually, and paid forever. Graphically we are talking about the following situation:: Can we place a value on this series of payments? You might suppose that since we are paying out an infinite amount of money that the value of the perpetuity (which is the name of a payment stream that continues indefinitely) should be infinite. However finance uses a concept called ‘net present value’ or NPV, to assign a value to this stream, and this value turns out to be finite. Time value of money First we need some extra information, let's assume that we have access to a riskfree bank account that pays out an interest rate of $i$% per annum. (For example, $i$ might be $5$). So that if we invest $1$ at time $t=0$, it will be worth $1*(1+i)$ at time $t=1$, and worth $1*(1+i)*(1+i)$ at time $t=2$. Let's first solve a simplified problem and just consider the value of the first payment of $1$ at time $1$. If we wish to invest an amount of money now, so that we can pay the $1$ due at time $t = 1$ with the money we have invested, then if we put $1*(1+i)^{1}$ in the bank account now, this will be worth $1$ at time $1$. Similarly to invest an amount now so that we will be able to pay the amount $1$ at time $n$, we need to invest ${1/(1+i)}^{n}.$ Going back to the original problem, we can use this result to calculate the amount we should invest now so that we can pay all the future payments. It will be: $\sum_{k=1}^{\infty} {1/(1+i)}^{k} $ This is just a geometric series, which sums to $1/ (1 (1+i)) = 1/i.$ Therefore the amount we need to invest is $1/i$. So, if $i=5%$ as we had earlier, then we actually only need to invest $20$ now in order to pay able to pay someone $1$ every year, for ever! Checking the Result Let's check that this makes sense. Suppose that we do invest this amount at $t=0$, then at $t=1$ we will have $(20)(1.05) = 1 + 20$. Which gives us $1$ to pay the perpetuity, and leaves enough to invest the original amount again. We can see that if nothing else changes, this system can continue indefinitely. Returning to our previous question, would it be possible to pay a pension to someone who will live forever? The answer is yes. We can even calculate the amount that we would need to invest now to pay the pension. Increasing pensions This system we have derived is not very realistic though. Most pensions increase over time, for example the state pension in the UK increases by the minimum of CPI, $2.5%$ and the average annual wage growth. Given that a pension that increases over time will not just pay out an infinite amount, but will also grow to be infinitely big. Would such a pension still work in a scenario where pensioners live forever? It turns out in fact that yes, such a system is still sustainable under certain conditions. Let’s suppose that our perpetuity increases at a rate of $g%$ per year, so here we might assume that $g=2.5$. Then if we derive the amount that we will need to invest now (at an investment rate of $i%$) we find that we will need $((1+g)/(1+i))^n$. Summing all of these values gives the following initial value: $\sum_{k=1}^{\infty} {(1+g)/(1+i)}^{k} $this is another geometric sum, however now we need to consider convergence, this sum will converge iff $((1+g)*(1+i)) <1$. That is, the sum will converge iff we can find an investment that grows faster than the perpetuity we have promised to pay. If the sum converges then it will converge to $(1/(ig))$. So using the example we had earlier, where $i=5%$, $g=2.5%$ we would need to invest $ 1/0.025 = 40$. Which is substantially more than the nonincreasing annuity but still finite. So we see that even when the pension is increasing, we can still afford to set it up. Mortality Premium Can’t we still make the argument that pensions will become unaffordable due to the fact that these perpetuities will still cost a lot more. Let's compare the value of an annuity assuming the pensioner will live to the average life expectancy with one where the pensioner is assumed to live forever. Looking in my orange tables (the name for the Formula and Tables for Actuarial Exams) I see that the cost of a nonincreasing pension of $1$ per year, paid to a $65$ year old male at a discount rate of $4%$ is $12.66$. Compare this to the value of $25$ to pay the pension forever, we see that the cost of the pension is roughly double. This amount is much higher, but it's interesting to note that the increase in cost is similar as that between increasing and nonincreasing pensions! Can we actually live forever We should also consider the fact that mortality will never actually be $0$ even if ageing is eliminated. Presumably accidents, and possibly even illness unrelated to age would still exist and for this reason we would not expect to have to pay these pensions forever. If you live long enough, then the probability of all nonimpossible events occurring at some point should eventually reach $1$. This means that eventually even the most unlikely accidents would eventually happen if you lived long enough.
There was an interesting article in the Economist last week (July 23rd edition) about the growing dependency ratio in the developed world and whether this is sustainable. (the dependency ratio is the ratio of the number of working aged people to number of nonworking age people).
It was definitely an interesting read, and I will post a link below, it raises interesting questions about the transfer of wealth from one generation to the next and the fairness of this transfer in a system where there are fewer working age people paying in. One thing about the article that struck me though was that the UN Population Division publishes estimates of the working age population all the way out to 2100! I can't even predict what I will have for lunch tomorrow. Is this a case of the UN being wildly optimistic about their modelling skills? Or perhaps over such a large data sample (the entire human population) the law of large numbers prevails and we can pick out some clear trends which we expect to continue in the future. Let's try to look into how good these population models are. My first step was to try to find estimates of population growth published by the UN Population Division in the past and see how accurate they had been. Every couple of years the UN Population Division publishes a report on World Population Prospects, I managed to find report going back to 2000, link below for the 2000 addition: Previous Estimates I found as many of these reports as I could and chucked the data into a table. Luckily the reports are fairly standardised and it is possible to look at the sequence of estimates.
We see that the best estimate of the World Population in 2050 has actually been relatively stable since the year 2000. The standard deviation of the medium estimate is only in the range of 200m, which to me seems like a reasonable volatility for such a far reaching statistic.
An interesting aspect of this data is the drop in the last column between 2000 and 2004. The final column represents the estimate of the population in 2050 assuming that birth rates do not decline. Since we do not have a large shift in the medium estimate between these two dates, this drop in birth rates must have been factored in for the 2000 analysis. So to answer the original question  how accurate can we be about Population Growth  so far pretty accurate. If the UN is able to accurately model the change in birth rates, then perhaps there is hope that they will be able to model population as a whole accurately.
Link to the original article from the economist:

AuthorI work as a pricing actuary at a reinsurer in London. Categories
All
Archives
May 2020
