Negative Binomial in VBA21/9/2019
Have you ever tried to simulate a negative binomial random variable in a Spreadsheet?
If the answer to that is ‘nope – I’d just use Igloo/Metarisk/Remetrica’ then consider yourself lucky! Unfortunately not every actuary has access to a decent software package, and for those muddling through in Excel, this is not a particularly easy task. If on the other hand your answer is ‘nope – I’d use Python/R, welcome to the 21st century’. I’d say great, I like using those programs as well, but sometimes for reasons out of your control, things just have to be done in Excel. This is the situation I found myself in recently, and here is my attempt to solve it: Attempt 0 The first step I took in attempting to solve the problem was of course to Google it, then cross my fingers and hope that someone else has already solved it and this is just going to be a simple copy and paste. Unfortunately when I did search for VBA code to generate a negative binomial random variable, nothing comes up. In fact, nothing comes up when searching for code to simulate a Poisson random variable in VBA. Hopefully if you've found your way here, looking for this exact thing, then you're in luck, just scroll to the bottom and copy and paste my code. When I Googled it, there were a few solutions that almost solved the problem; there is a really useful Excel addin called ‘Real statistics’ which I’ve used a few times: http://www.realstatistics.com/ It's a free excel addin, and it does have functionality to simulate negative bimonials. If however you need someone to be able to rerun the Spreadsheet, they also will need to have it installed. In that case, you might as well use Python, and then hard code the frequency numbers. Also, I have had issues with it slowing Excel down considerably, so I decided not to use this in this case. I realised I’d have to come up with something myself, which ideally would meet the following criteria
How hard can that be? Attempt 1 I’d seen a trick before (from Pietro Parodi’s excellent book ‘Pricing in General Insurance’) that a negative binomial can be thought of as a Poisson distribution with a Gamma distribution as the conjugate prior. See the link below for more details: https://en.wikipedia.org/wiki/Conjugate_prior#Table_of_conjugate_distributions Since Excel has a built in Gamma inverse, we have simplified the problem to needing to write our own Poisson inverse. We can then easily generate negative binomials using a two step process:
Great, so we’ve reduced our problem to just being able to simulate a Poisson in VBA. Unfortunately there’s still no built in Poisson inverse in Excel (or at least the version I have), so we now need a VBA based method to generate this. There is another trick we can use for this  which is also taken from Pietro Parodi  the waiting time for a Poisson dist is an Exponential Dist. And the CDF of an Exponential dist is simple enough that we can just invert it and come up with a formula for generating an Exponential sample. We then set up a loop and add together exponential values, to arrive at Poisson sample. The code for this is give below: Function Poisson_Inv(Lambda As Double) s = 0 N = 0 Do While s < 1 u = Rnd() s = s  (Application.WorksheetFunction.Ln(u) / Lambda) k = k + 1 Loop BH_Poisson_Inv = (k  1) End Function The VBA code for our negative binomial is therefore: Function NegBinOld2(b, r) Dim Lambda As Double Dim N As Long u = Rnd() Lambda = Application.WorksheetFunction.Gamma_Inv(u, r, b) N = Poisson_Inv(Lambda) NegBinOld2 = N End Function Does this do everything we want?
There are a couple of downside of though:
This leads us on to Attempt 2 Attempt 2 If we pass the VBA a random uniform sample, then whenever we hit refresh in the Spreadsheet the random sample will refresh, which will force the Negative Binomial to resample. Without this, sometimes the VBA will function will not reload. i.e. we can use the sample to force a refresh whenever we like. Adapting the code gives the following: Function NegBinOld(b, r, Rnd1 As Double) Dim Lambda As Double Dim N As Long u = Rnd1 Lambda = Application.WorksheetFunction.Gamma_Inv(u, r, b) N = Poisson_Inv(Lambda) NegBinOld = N End Function So this solves the refresh problem. What about the random seed problem? Even though we now always get the same lambda for a given rand – and personally I quite like to hardcode these in the Spreadsheet once I’m happy with the model, just to speed things up. We still use the VBA rand function to generate the Poisson, this means everytime we refresh, even when passing it the same rand, we will get a different answer and this answer will be nonreplicable. This suggests we should somehow use the first random uniform sample to generate all the others in a deterministic (but still pseudorandom) way. Attempt 3 The way I implemented this was to the set the seed in VBA to be equal to the uniform random we are passing the function, and then using the VBA random number generator (which works deterministically for a given seed) after that. This gives the following code: Function NegBin(b, r, Rnd1 As Double) Rnd (1) Randomize (Rnd1) Dim Lambda As Double Dim N As Long u = Rnd() Lambda = Application.WorksheetFunction.Gamma_Inv(u, r, b) N = Poisson_Inv(Lambda) NegBin = N End Function So we seem to have everything we want – a free, quick, solution that can be bundled in a Spreadsheet, which allows other people to rerun without installing any software, and we’ve also eliminated the forced refresh issue. What more could we want? The only slight issue with the last version of the negative binomial is that our parameters are still specified in terms of ‘b’ and ‘r’. Now what exactly are ‘b’ and ‘r’ and how do we relate them to our sample data? I’m not quite sure.... The next trick is shamelessly taken from a conversation I had with Guy Carp’s chief Actuary about their implementation of severity distributions in MetaRisk. Attempt 4 Why can't we reparameterise the distribution using parameters that we find useful, instead of feeling bound by using the standard statistics textbook definition (or even more specifically the list given in the appendix to ‘Loss Models – from data to decisions’, which seems to be somewhat of an industry standard), why can't we redefine all the parameters from all common actuarial distributions using a systematic approach for parameters? Let's imagine a framework where no matter which specific severity distribution you are looking at, the first parameter contains information about the mean (even better if it is literally scaled to the mean in some way), the second contains information about the shape or volatility, the third contains information about the tail weight, and so on. This makes fitting distributions easier, it makes comparing the goodness of fit of different distributions easier, and it make sense checking our fit much easier. I took this idea, and tied this in neatly to a method of moments parameterisation, whereby the first value is simply the mean of the distribution, and the second is the variance over the mean. This gives us our final version: Function NegBin(Mean, VarOMean, Rnd1 As Double) Rnd (1) Randomize (Rnd1) Dim Lambda As Double Dim N As Long b = VarOMean  1 r = Mean / b u = Rnd() Lambda = Application.WorksheetFunction.Gamma_Inv(u, r, b) N = Poisson_Inv(Lambda) NegBin = N End Function Function Poisson_Inv(Lambda As Double) s = 0 N = 0 Do While s < 1 u = Rnd() s = s  (Application.WorksheetFunction.Ln(u) / Lambda) k = k + 1 Loop BH_Poisson_Inv = (k  1) End Function Poisson Distribution for small Lambda23/4/2019
I was asked an interesting question a couple of weeks ago when talking through some modelling with a client.
We were modelling an airline account, and for various reasons we had decided to base our large loss modelling on a very basic topdown allocation method. We would take a view of the market losses at a few different return periods, and then using a scenario approach, would allocate losses to our client proportionately. Using this method, the frequency of losses is then scaled down by the % of major policies written, and the severity of losses is scaled down by the average line size. To give some concrete numbers (which I’ve made up as I probably shouldn’t go into exactly what the client’s numbers were), let's say the company was planning on taking a line on around 10% of the Major Airline Risks, and their average line was around 1%. We came up with a table of return periods for market level losses. The table looked something like following (the actual one was also different to the table below, but not miles off):
Then applying the 10% hit factor if there is a loss, and the 1% line written, we get the following table of return periods for our client:
Hopefully all quite straightforward so far. As an aside, it is quite interesting to sometimes pare back all the assumptions to come up with something transparent and simple like the above. For airline risks, the largest single policy limit is around USD 2.5bn, so we are saying our worst case scenario is a single full limit loss, and that each year this has around a 1 in 50 chance of occurring. We can then directly translate that into an expected loss, in this case it equates to 50m (i.e. 2.5bn *0.02) of pure loss cost. If we don't think the market is paying this level of premium for this type of risk, then we better have a good reason for why we are writing the policy!
So all of this is interesting (I hope), but what was the original question the client asked me? We can see from the chart that for the market level the highest return period we have listed is 1 in 50. Clearly this does translate to a much longer return period at the client level, but in the meeting where I was asked the original question, we were just talking about the market level. The client was interested in what the 1 in 200 at the market level was and what was driving this in the modelling. The way I had structured the model was to use four separate risk sources, each with a Poisson frequency (lambda set to be equal to the relevant return period), and a fixed severity. So what this question translates to is, for small Lambdas $(<<1)$, what is the probability that $n=2$, $n=3$, etc.? And at what return period is the $n=2$ driving the $1$ in $200$? Let’s start with the definition of the Poisson distribution: Let $N \sim Poi(\lambda)$, then: $$P(N=n) = e^{\lambda} \frac{ \lambda ^ n}{ n !} $$ We are interested in small $\lambda$ – note that for large $\lambda$ we can use a different approach and apply sterling’s approximation instead. Which if you are interested, I’ve written about here: www.lewiswalsh.net/blog/poissondistributionwhatistheprobabilitythedistributionisequaltothemean
For small lambda, the insight is to use a Taylor expansion of the $e^{\lambda}$ term. The Taylor expansion of $e^{\lambda}$ is:
$$ e^{\lambda} = \sum_{i=0}^{\infty} \frac{\lambda^i}{ i!} = 1  \lambda + \frac{\lambda^2}{2} + o(\lambda^2) $$
We can then examine the pdf of the Poisson distribution using this approximation: $$P(N=1) =\lambda e^{\lambda} = \lambda ( 1 – \lambda + \frac{\lambda^2}{2} + o(\lambda^2) ) = \lambda  \lambda^2 +o(\lambda^2)$$
as in our example above, we have:
$$ P(N=1) ≈ \frac{1}{50} – {\frac{1}{50}}^2$$
This means that, for small lambda, the probability that $N$ is equal to $1$ is always slightly less than lambda. Now taking the case $N=2$: $$P(N=2) = \frac{\lambda^2}{2} e^{\lambda} = \frac{\lambda^2}{2} (1 – \lambda +\frac{\lambda^2}{2} + o(\lambda^2)) = \frac{\lambda^2}{2} \frac{\lambda^3}{2} +\frac{\lambda^4}{2} + o(\lambda^2) = \frac{\lambda^2}{2} + o(\lambda^2)$$
So once again, for $\lambda =\frac{ 1}{50}$ we have:
$$P(N=2) ≈ 1/50 ^ 2 /2 = P(N=1) * \lambda / 2$$
In this case, for our ‘1 in 50’ sized loss, we would expect to have two such losses in a year once every 5000 years! So this is definitely not driving our 1 in 200 result.
We can add some extra columns to our market level return periods as follows:
So we see for the assumptions we made, around the 1 in 200 level our losses are still primarily being driven by the P(N=1) of the 2.5bn loss, but then in addition we will have some losses coming through corresponding to P(N=2) and P(N=3) of the 250m and 500m level, and also combinations of the other return periods.
So is this the answer I gave to the client in the meeting? …. Kinda, I waffled on a bit about this kind of thing, but then it was only after getting back to the office that I thought about trying to breakdown analytically which loss levels we can expect to kick in at various return periods. Of course all of the above is nice but there is an easier way to see the answer, since we’d already stochastically generated a YLT based on these assumptions, we could have just looked at our YLT, sorted by loss size and then gone to the 99.5 percentile and see what sort of losses make up that level. The above analysis would have been more complicated if we have also varied the loss size stochastically. You would normally do this for all but the most basic analysis. The reason we didn’t in this case was so as to keep the model as simple and transparent as possible. If we had varied the loss size stochastically then the 1 in 200 would have been made up of frequency picks of various return periods, combined with severity picks of various return periods. We would have had to arbitrarily fix one in order to say anything interesting about the other one, which would not have been as interesting. Gaming my office's Cheltenham Sweepstake31/3/2019 Every year we have an office sweepstake for the Cheltenham horse racing festival. Like most sweepstakes, this one attempts to remove the skill, allowing everyone to take part without necessarily needing to know much about horse racing. In case you’re not familiar with a Sweepstake, here’s a simple example of how one based on the World Cup might work:
Note that in order for this to work properly, you need to ensure that the team draw is not carried out until everyone who wants to play has put their money in – otherwise you introduce some optionality and people can then decide whether to enter based on the teams that are still left. i.e. if you know that Germany and Spain have already been picked then there is less value in entering the competition. The rules for our Cheltenham sweepstake were as follows: The Rules The festival comprises 7 races a day for 4 days, for a total of 28 races. The sweepstake costs £20 to enter, and the winnings are calculated as follows: 3rd place in competition = 10% of pot 2nd place in competition = 20% of pot 1st place in competition = 70% of pot Each participant picks one horse per race each morning. Points are then calculated using the following scoring system:
A running total is kept throughout the competition, and the winner is determined after the final race. The odds the scoring are based on are set using the odds printed in the Metro Newspaper on the morning of the races. (as an example, for a horse which has odds of 11/2 in the Metro  if the horse then places 1st, if we selected this horse, we would win (1+11/2)*5 = 32.5 points) Initial thoughts Any set of betting odds can be converted to an implied probability of winning, these would be the odds which over the long run would cause you to breakeven if the race were repeated multiple times with each horse winning a proportion equal to its probability of winning. Because the scoring in our sweepstake is based on the betting odds, using implied probabilities derived from the odds to help select our horses ends up cancelling itself out (which was the intention when designing the rules). The implied probability can be calculated as one over the odds As an aside, the bookie is indifferent to whether this is the correct probability of winning, they structures the odds purely on the ratio of how their customers are betting. They then structure the odds so that they make money on each race, regardless of which horse wins, for an explanation of this, see my post on creating a Dutchbook: www.lewiswalsh.net/blog/archives/122017 Here is an example showing how we would calculate the implied probabilities using some made up odds: We can then use the implied probabilities we just calculated to see what would happen if each horse finished in the first three positions. Once we have done this, we can then calculate the Expected Value of betting on each horse: We see that the payout varies for each horse, but the EV is the same. This is by design, the intention is that it should not matter which horse you bet on, the sweepstake rules equalise everyone. So what can we  an actuary who knows nothing much about horse racing  do? I don’t have any special knowledge that would allow me to select a horse which would beat the odds listed in the Metro, we appear to be at an impasse. I could attempt to build a model of which horse will win, and then select my horse based on that, but unless it proves to be more accurate than the Metro odds, I might as well just pick at random. Furthermore, if I could build such a model, then I could just start betting actual money. This probably shows you what a difficult problem this is! There's no such thing as free money. It would be a cool project to try, and it’s something I’ve been meaning to attempt for a while, but that’s best saved for another day. Attempt 1  Metro vs prerace odds My first thought was that we can exploit the difference in odds between those published in the Metro in the morning, and the latest odds published closer to the start of the race. It seems reasonable to assume that the odds just before the race should be more accurate than the metro odds. There will have been time for additional information to be included in the more up to date odds, e.g. the weather is worse than expected therefore horse x is expected to be slower than usual. Since the payout will be based on the Metro, we will then be able to maximise our EV by exploiting this differential. Our table will end up looking something like this: We see that we have very small variations in the EVs for some of the horses. It looks like according to this analysis Horse 3 would be the best selection as it has the highest EVs for 1st, 2nd, and 3rd. Based on this strategy, we would then go through each race and select the horse with the highest EV. Is this what I did? No, for a couple of reasons. The biggest issue was that the Metro did not publish odds in the morning for all races, meaning we couldn’t use the Metro, and therefore the rules of the sweepstake were amended to use the official prerace odds to calculate the payout instead. This meant there was only one set of odds used, and our edge disappeared! Even if we had used this method, there was a more fundamental issue  the margins we ended up with were tiny anyway. The Metro vs prerace odds did not swing wildly, meaning that even selecting the horse with the highest EV were only marginally better than picking at random. So, was there an alternative? Attempt 2  2nd and 3rd place odds My next attempt at an exploitative strategy was based on the insight that the payout multiplier for 2nd and 3rd place was based on the odds of the horse coming 1st, rather than the odds of the horse coming 2nd or 3rd. The expected value of a horse was not quite as I calculated above, it was actually: $$EV = P(1)*p_1 + P(2)*p_2 + P(3)*p_3$$ Above, we were using the implied probability of the horse coming first as a proxy for the probability it would come second and third. This is not the same, and some betting websites do allow you to bet on whether a horse will come 2nd or 3rd. For websites that do not allow you to bet directly on this, then we may still be able to calculate it from the odds for whether a horse finishes in the top 2 or 3 places. We just need to subtract out the implied probability of coming 1st from the probability of coming in the top 2, and then subtracting this out from coming top 3 etc. I therefore added some more columns to my table above, corresponding to the probability of the horses coming 2nd and 3rd, and then used this to calculate the EV instead. We see that the final column, Total EV, now has quite different values for each horse. In this case Horse 15, Seddon has an EV of 11.72. The favourite horse on the other hand  number 7  only has an EV of 6.2. The intuitive explanation of this is that the probability of Seddon coming first is very low – this is the reflected in the long odds of 67, this then gives us a large multiplier, but the odds of the horse coming second or third are actually relatively less far out – the fact that it is not the favourite actually increases the odds of it coming in a position which is not 1st. But then if does come in 2nd or 3rd, we would still apply the same large multiplier for the odds of it coming 1st. This then gives us our 2nd edge – we can gain a positive EV by focusing. As a thought experiment, imagine we have a race with three horses – horse A is a clear favourite, horse B is an average horse, and horse C is clearly the weakest. By betting on horse C – the odds of it winning should be very low, so the multiple should be very high, but then this multiple will be applied even if it comes 2nd or 3rd, which is exactly where it is expected to finish. This therefore suggests our next potential strategy – select horses which maximise our EV using the implied probabilities of the horses coming 2nd, 3rd etc. So is this what I did? Well kind of.... The issue with this approach is that typically the horses that provide the best EV also have very long odds. In the race analysed above, our horse has an EV of 11.7, but it only has a 7% chance overall of coming in the top 3. In race two for example, the horse with the best EV actually only had a 2.7% chance of coming in the top 3. Since there are only 28 races in total, if each horse we selected only had a 2.7% chance of coming in, then the probability of us getting 0 points overall in the entire competition would then be: $(12.7 \%)^{28} = 48 \%$ So there is roughly a 50% chance we will get 0 points! Alternatively, if we selected the favourite every time, we could expect it to come top 3 almost every time, and thereby guarantee ourselves points most races, but it also has the lowest EV. So we have what appears to be a risk vs reward trade off. Pick outsiders and give ourselves the highest EV overall, or pick the favourites thereby reducing our overall EV but also reducing our volatility. This leads us neatly to attempt 3  trying to think about how to maximise our probability of winning the competition rather than simply maximising EV for each race. Attempt 3  EP curves From the work above, we now have our model of the position each horse will finish in each race – using the 1st, 2nd, and 3rd implied probabilities  and we have the payout for each horse – using the odds of the horse coming 1st. We can then bring our selection of horse and these probabilities together in a summary tab and simulate our daily score stochastically using a Monte Carlo method. To do this we just need to turn the implied probabilities into a CDF, and lookup the value of each position and repeat 10,000 times. The output for this then ends up looking like the following, where the value is the number of points we will for a given race. So we see that across this sample of 20 simulations, most days we to end up with 0 points overall, but a few days have very high scores. So far so good! The next step is to set up an EP table of the following, which looks like the following: The EP table gives us the probability of exceeding various scores in the competition based on our horse selections. In this case, we see that there is a 1 in 20 chance of getting 453 points or greater in the day. This is useful even on its own – when I was deciding which horses to bet on, I simply played around with the selections until I got an EP table I was comfortable with. My reasoning was quite basic – I decided I wanted to maximise the 1 in 20 value. I wanted to give myself something like a 1/4 chance of winning the whole competition and a 3/4 chance of getting very few points. Since there were four days of races, dividing the 1/4 by another 4 suggests we should be looking at maximising the 1 in 20 level (I admit this reasoning was a bit rough, but it seemed to serve its purpose) The insight here is that the payout structure of the sweepstake is such that coming in the top 3 is all that matters, and in particular coming 1st is disproportionately rewarded. To see this, we can think of the daily selection of horses as attempting to maximise the EV of the overall prize rather than the EV of our overall score  maximising the EV of each race is only a means to this end. So we are actually interested in maximising the following: $0.7 * P(1st) + 0.2 + P(2nd) + 0.1 * P(3rd)$ Which will largely be dominated by P(1st), given the $0.7$ factor. This is largely the strategy I went for in the end. Attempt 4  Game theory? I’ve brushed over one difficulty above; in order to maximise our prize EV we need to consider not just which strategy we should take, but how this strategy will fare against the strategies that other people will take. If everyone is maximising their 1 in 20 return period then there’s little point us doing exactly the same. Luckily for me, most people were doing little more than picking horses randomly. We could then formalise this assumption, and come up with a numerical solution to the problem above. To do this, we would simulate our returns for each day across 10,000 simulations as above, but this time we would compare ourselves against a ‘base strategy’ of random selection of horses and we would simulate this base strategy for the approximately 40 people who entered. Each simulation would then give us a ranking we would finish in the competition, Here is an example of what that would look like: And we could then convert this into an EP table which would look like the following: So we see that if we select these horses, we end up having somewhere between a 1 in 10 and a 1 in 5 chance of winning the competition. Now that we have all of this set up, we can then optimise our horse selection to target a particular return period. I didn’t actually end up setting up the above for the sweepstake, but I suspect it would have been an improvement on my approach Attempt 5  Multiple day? There is a further refinement we can make to the above. We have so far only really been focusing on maximising our chance of winning by using a fixed strategy throughout the competition. But there is no reason we have to do this. After the first day, we should really be including the current scores of each competitor as part of our calculation of our ranking. i.e. if person 1 had a great day and now has 200 points but we had a bad day and still have 0 points, by accounting for this, the model should automatically increase our volatility i.e. start picking horses with longer odds so as to increase the chance of us catching up. If on the other hand, we had a really good first day and are now in the lead, the model should then automatically reduce our volatility and start selecting the favourites more often to help safely maintain our lead. How did it work in practice? I ended up placing 2nd, and taking 20% of the prize pot which was great! I was behind for most of the competition but then pulled back on the last day when a 66/1 came in 1st place, and I picked up 330 points off a single race. This may look miraculous, but is exactly how the model is supposed to work. Does that have any applicability to gambling generally? Unfortunately not, basically all of the work above is based on exploiting the specific scoring system of the sweepstake. There's no real way of apply this to gambling generally. Converting a Return Period to a RoL15/3/2019 I came across a useful way of looking at Rate on Lines last week, I was talking to a broker about what return periods to use in a model for various levels of airline market loss (USD250m, USD500m, etc.). The model was intended to be just a very high level, transparent market level model which we could use as a framework to discuss with an underwriter. We were talking through the reasonableness of the assumptions when the broker came out with the following: 'Well, you’d pay about 12.5 on line in the retro market at that attachment level, so that’s a 1 in 7 breakeven right?' My response was: 'ummmm, come again?' His reasoning was as follows: Suppose the ILW pays $1$ @ $100$% reinstatements, and that it costs $12.5$% on line. Then if the layer suffers a loss, the insured will have a net position on the contract of $75$%. This is the $100$% limit which they receive due to the loss, minus the original $12.5$% Premium, minus an additional $12.5$% reinstatement Premium. The reinsurer will now need another $6$ years at $12.5$% RoL $(0.0125 * 6 = 0.75)$ to recover the limit and be at breakeven. Here is a breakdown of the cashflow over the seven years for a $10m$ stretch at $12.5$% RoL: So the loss year plus the six clean years, tells us that if a loss occurs once every 7 years, then the contract is at breakeven for this level of RoL.
So this is kind of cool  any time we have a RoL for a retro layer, we can immediately convert it to a Return Period for a loss which would trigger the layer. Generalisation 1 – various rates on line We can then generalise this reasoning to apply to a layer with an arbitrary RoL. Using the same reasoning as above, the breakeven return period ends up being: $RP= 1 + \frac{(12*RoL)}{RoL}$ Inverting this gives: $RoL = \frac{1}{(1 + RP)}$ So let's say we have an ILW costing $7.5$% on line, the breakeven return period is: $1 + \frac{(10.15)}{0.075} = 11.3$ Or let’s suppose we have a $1$ in $19$ return period, the RoL will be: $0.05 = \frac{1}{(1 + 19)}$ Generalisation 2 – other nonproportional layers The formula we derived above was originally intended to apply to ILWs, but it also holds any time we think the loss to the layer, if it occurs, will be a total loss. This might be the case for a cat layer, or a clash layers (layers which have an attachment above the underwriting limit for a single risk), or any layer with a relatively high attachment point compared to the underwriting limit. Adjustments to the formulas There are a few of adjustments we might need to make to these formulas before using them in practice. Firstly, the RoL above has no allowance for profit or expense loading, we can account for this by converting the market RoL to a technical RoL, this is done by simply dividing the RoL by $120130$% (or any other appropriate profit/expense loading). This has the effect of increasing the number of years before the loss is expected to occur. Alternately, if layer does not have a paid reinstatement, or has a different factor than $100$%, then we would need to amend the multiple we are multiplying the RoL by in the formula above. For example, with nil paid reinstatements, the formula would be: $RP = 1 + \frac{(1RoL)}{RoL}$ Another refinement we might wish to make would be to weaken the total loss assumption. We would then need to reduce the RoL by an appropriate amount to account for the possibility of partial losses. It’s going to be quite hard to say how much this should be adjusted for – the lower the layer the more it would need to be. Extending the Copula Method26/8/2018 If you have ever generated Random Variables stochastically using a Gaussian Copula, you may have noticed that the correlation of the generated sample ends up being lower than the value of the Covariance matrix of the underlying multivariate Gaussian Distribution. For an explanation of why this happens you can check out a previous post of mine: www.lewiswalsh.net/blog/correlationsfriedrichgaussandcopula. It would be nice if we could amend our method to compensate for this drop. As a quick fix, we can simply run the model a few times and fudge the Covariance input until we get the desired Correlation value. If the model runs quickly, this is quite easy to do, but as soon as the model starts to get bigger and slower, it quickly becomes impractical to run it three of four times just to get the output Correlation we desire. We can do better than this. The insight we rely on is that for a Gaussian Copula, the Pearson Correlation in the generated sample just depends on the Covariance Value. We can therefore create a precomputed table of Input and Output values, and use this to select the correct input value for the desired output. I wrote some R code to do just that, we compute a table of Pearson's Correlations obtained for various Input Covariance values when using the Gaussian Copula. a < library(MASS) library(psych) set.seed(100) m < 2 n < 10^6 OutputCor < 0 InputCor < 0 for (i in 1:100) { sigma < matrix(c(1, i/100, i/100, 1), nrow=2) z < mvrnorm(n,mu=rep(0, m),Sigma=sigma,empirical=T) u < pnorm(z) OutputCor[i] < cor(u,method='pearson')[1,2] InputCor[i] < i/10 } OutputCor InputCor Here is a sample from the table of results. You can see that the drop is relatively modest, but it does apply consistent across the whole table. Here is a graph showing the drop in values:
Updated Algorithm
We can then use the precomputed table, interpolating where necessary, to give us a Covariance value for our Multivariate Gaussian Distribution which will generate the desired Pearson Product Moment Correlation Value. So for example, if we would like to generate a sample with a Pearson Product Moment value of $0.5$, according to our table, we would need to use $0.517602$ as an input Covariance. We can test these values using the following code: a < library(MASS) library(psych) set.seed(100) m < 2 n < 5000000 sigma < matrix(c(1, 0.517602, 0.517602, 1), nrow=2) z < mvrnorm(n,mu=rep(0, m),Sigma=sigma,empirical=T) u < pnorm(z) cor(u,method='pearson') Analytic Formulas I tried to find an analytic formula for the Product Moment values obtained in this manner, but I couldn't find anything online, and I also wasn't able to derive one myself. If we could find one, then instead of using the precompued table, we would be able to simply calculate the correct value. While searching, I did come across a number of interesting analytic formulas linking the values of Kendall's Tau, Spearman's Rank, and the input Covariance.. All the formulas below are from Fang, Fang, Kotz (2002) Link to paper: www.sciencedirect.com/science/article/pii/S0047259X01920172 The paper gives the following two results, where $\rho$ is the Pearson's Product Moment
$$\tau = \frac{2}{\pi} arcsin ( \rho ) $$ $$ {\rho}_s = \frac{6}{\pi} arcsin ( \frac{\rho}{2} ) $$
We can then use these formulas to extend our method above further to calculate an input Covariance to give any desired Kendall Tau, or Spearman's Rank. I initially thought that they would link the Pearson Product Moment value with Kendall or Spearman's measure, in which case we would still have to use the precomputed table. After testing it I realised that it is actually linking the Covariance to Kendall and Spearman's measures. Thinking about it, Kendall's Tau, and Spearman's Rank are both invariant to the reverse Guassian transformation when moving from $z$ to $u$ in the algorithm. Therefore the problem of deriving an analytic formula for them is much simpler as one only has to link their values for a multivariate Guassian Distribution. Pearson's however does change, therefore it is a completely different problem and may not even have a closed form solution. As an example of how to use the above formula, suppose we'd like our generated data to have a Kendall's Tau of $0.4$. First we need to invert the Kendall's Tau formula: $$ \rho = sin ( \frac{ \tau \pi }{2} ) $$ We then plug in $\rho = 0.4 $ giving:
$$ \rho = sin ( \frac{ o.4 \pi }{2} ) = 0.587785 $$
Giving usan input Covariance value of $0.587785$
We can then test this value with the following R code:
a < library(MASS) library(psych) set.seed(100) m < 2 n < 50000 sigma < matrix(c(1, 0.587785, 0.587785, 1), nrow=2) z < mvrnorm(n,mu=rep(0, m),Sigma=sigma,empirical=T) u < pnorm(z) cor(z,method='kendall') Which we see gives us the value of $\tau$ we want. In this case the difference between the input Covariance $0.587785$, and the value of Kendall's Tau $0.4$ is actually quite significant. It's the second week of your new job Capital Modelling job. After days spent sorting IT issues, getting lost coming back from the toilets, and perfecting your new commute to work (probability of getting a seat + probability of delay * average journey temperature.) your boss has finally given you your first real project to work on. You've been asked to carry out an annual update of the Underwriting Risk Capital Charge for a minor part of the company's Motor book. Not the grandest of analysis you'll admit, this particular class only makes up about 0.2% of the company's Gross Written Premium, and the Actuaries who reserve the company's bigger classes would probably consider the number of decimal places used in the annual report more material than your entire analysis. But you know in your heart of hearts that this is just another stepping stone on your inevitable meteoric rise to Chief Actuary in the Merger and Acquisition department, where one day you will pass judgement on billion dollar deals inbetween expensive lunches with CFOs, and drinks with journalists on glamorous rooftop bars. The company uses inhouse reserving software, but since you're not that familiar with it, and because you want to make a good impression, you decide to carry out extensive checking of the results in Excel. You fire up the Capital Modelling Software (which may or may not have a name that means a house made out of ice), put in your headphones and grind it out. Hours later you emerge triumphant, and you've really nailed it, your choice of correlation (0.4), and correlation method (Gaussian Copula) is perfect. As planned you run extracts of all the outputs, and go about checking them in Excel. But what's this? You set the correlation to be 0.4 in the software, but when you check the correlation yourself in Excel, it's only coming out at 0.384?! What's going on? Simulating using Copulas The above is basically what happened to me (minus most of the actual details. but I did set up some modelling with correlated random variables and then checked it myself in Excel and was surprised to find that the actual correlation in the generated output was always lower than the input.) I looked online but couldn't find anything explaining this phenomenon, so I did some investigating myself. So just to restate the problem, when using Monte Carlo simulation, and generating correlated random variables using the Copula method. When we actually check the correlation of the generated sample, it always has a lower correlation than the correlation we specified when setting up the modelling. My first thought for why this was happening was that were we not running enough simulations and that the correlations would eventually converge if we just jacked up the number of simulations. This is the kind of behaviour you see when using Monte Carlo simulation and not getting the mean or standard deviation expected from the sample. If you just churn through more simulations, your output will eventually converge. When creating Copulas using the Gaussian Method, this is not the case though, and we can test this. I generated the graph below in R to show the actual correlation we get when generating correlated random variables using the Copula method for a range of different numbers of simulations. There does seem to be some sort of loose limiting behaviour, as the number of simulations increases, but the limit appears to be around 0.384 rather than 0.4. The actual explanation First, we need to briefly review the algorithm for generating random variables with a given correlation using the normal copula. Step 1  Simulate from a multivariate normal distribution with the given covariance matrix. Step 2  Apply an inverse gaussian transformation to generate random variables with marginal uniform distribution, but which still maintain a dependency structure Step 3  Apply the marginal distributions we want to the random variables generated in step 2 We can work through these three steps ourselves, and check at each step what the correlation is. The first step is to generate a sample from the multivariate normal. I'll use a correlation of 0.4 though out this example. Here is the R code to generate the sample: a < library(MASS) library(psych) set.seed(100) m < 2 n < 1000 sigma < matrix(c(1, 0.4, 0.4, 1), nrow=2) z < mvrnorm(n,mu=rep(0, m),Sigma=sigma,empirical=T) And here is a Scatterplot of the generated sample from the multivariate normal distribution: We now want to check the product moment correlation of our sample, which we can do using the following code: cor(z,method='pearson') Which gives us the following result: > cor(z,method='pearson') [,1] [,2] [1,] 1.0 0.4 [2,] 0.4 1.0 So we see that the correlation is 0.4 as expected. The Psych package has a useful function which produces a summary showing a Scatterplot, the two marginal distribution, and the correlation: Let us also check Kendall's Tau and Spearman's rank at this point. This will be instructive later on. We can do this using the following code: cor(z,method='spearman') cor(z,method='Kendall') Which gives us the following results: > cor(z,method='spearman') [,1] [,2] [1,] 1.0000000 0.3787886 [2,] 0.3787886 1.0000000 > cor(z,method='kendall') [,1] [,2] [1,] 1.0000000 0.2588952 [2,] 0.2588952 1.0000000 Note that this is less than 0.4 as well, but we will discuss this further later on.
We now need to apply step 2 of the algorithm, which is applying the inverse Gaussian transformation to our multivariate normal distribution. We can do this using the following code:
u < pnorm(z) We now want to check the correlation again, which we can do using the following code: cor(z,method='spearman') Which gives the following result: > cor(z,method='spearman') [,1] [,2] [1,] 1.0000000 0.3787886 [2,] 0.3787886 1.0000000 Here is the Psych summary again: u is now marginally uniform (hence the name). We can see this by looking at the Scatterplot and marginal pdfs above. We also see that the correlation has dropped to 0.379, down from 0.4 at step 1. The Pearson correlation measures the linear correlation between two random variables. We generated normal random variables, which had the required correlation, but then we applied a nonlinear (inverse Gaussian) transformation. This nonlinear step is the source of the dropped correlation in our algorithm. We can also retest Kendall's Tau, and Spearman's at this point using the following code: cor(z,method='spearman') cor(z,method='Kendall') This gives us the following result: > cor(u,method='spearman') [,1] [,2] [1,] 1.0000000 0.3781471 [2,] 0.3781471 1.0000000 > cor(u,method='kendall') [,1] [,2] [1,] 1.0000000 0.2587187 [2,] 0.2587187 1.0000000 Interestingly, these values have not changed from above! i.e. we have preserved these measures of correlation between step 1 and step 2. It's only the Pearson correlation measure (which is a measure of linear correlation) which has not been preserved. Let's now apply the step 3, and once again retest our three correlations. The code to carry out step 3 is below: x1 < qgamma(u[,1],shape=2,scale=1) x2 < qbeta(u[,2],2,2) df < cbind(x1,x2) pairs.panels(df) The summary for step 3 looks like the following. This is the end goal of our method. We see that our two marginal distributions have the required distribution, and we have a correlation between them of 0.37. Let's recheck our three measures of correlation. cor(df,method='pearson') cor(df,meth='spearman') cor(df,method='kendall') > cor(df,method='pearson') x1 x2 x1 1.0000000 0.3666192 x2 0.3666192 1.0000000 > cor(df,meth='spearman') x1 x2 x1 1.0000000 0.3781471 x2 0.3781471 1.0000000 > cor(df,method='kendall') x1 x2 x1 1.0000000 0.2587187 x2 0.2587187 1.0000000 So the Pearson has reduced again at this step, but the Spearman and Kendall's Tau are once again the same.
Does this matter?
This does matter, let's suppose you are carrying out capital modelling and using this method to correlate your risk sources. Then you would be underestimating the correlation between random variables, and therefore potentially underestimating the risk you are modelling. Is this just because we are using a Gaussian Copula? No, this is the case for all Copulas. Is there anything you can do about it? Yes, one solution is to just increase the input correlation by a small amount, until we get the output we want. A more elegant solution would be to build this scaling into the method. The amount of correlation lost at the second step is dependent just on the input value selected, so we could precompute a table of input and output correlations, and then based on the desired output, we would be able to look up the exact input value to use. If you have played around with Correlating Random Variables using a Correlation Matrix in [insert your favourite financial modelling software] then you may have noticed the requirement that the Correlation Matrix be positive semidefinite. But what exactly does this mean? And how would we check this ourselves in VBA or R? Mathematical Definition Let's start with the Mathematical definition. To be honest, it didn't really help me much in understanding what's going on, but it's still useful to know.
A symmetric $n$ x $n$ matrix $M$ is said to be positive semidefinite if the scalar $z^T M z $ is positive for every nonzero column vector $z$ of $n$ real numbers.
If I am remembering my first year Linear Algebra course correctly, then Matrices can be thought of as transformations on Vector Spaces. Here the Vector Space would be a collection of Random Variables. I'm sure there's some clever way in which this gives us some kind of nondegenerate behaviour. After a bit of research online I couldn't really find much. Intuitive Definition The intuitive explanation is much easier to understand. The requirement comes down to the need for internal consistency between the correlations of the Random Variables. For example, suppose we have three Random Variables, A, B, C. Let's suppose that A and B are highly correlated, that is to say, when A is a high value, B is also likely to be a high value. Let's also suppose that A and C are highly correlated, so that if A is a high value, then C is also likely to be a high value. We have now implicitly defined a constraint on the correlation between B and C. If A is high both B and C are also high, so it can't be the case that B and C are negatively correlated, i.e. that when B is high, C is low. Therefore some correlation matrices will give relations which are impossible to model. Alternative characterisations You can find a number of necessary and sufficient conditions for a matrix to be positive definite, I've included some of them below. I used number 2 in the VBA code for a real model I set up to check for positive definiteness. 1. All Eigenvalues are positive. If you have studied some Linear Algebra, then you may not be surprised to learn that there is a characterization using Eigenvalues. It seems like just about anything to do with Matrices can be restated in terms of Eigenvalues. I'm not really sure how to interpret this condition though in an intuitive way. 2. All leading principal minors are all positive This is the method I used to code the VBA algorithm below. The principal minors are just another name for the determinant of the upperleft $k$ by $k$ submatrix. Since VBA has a built in method for returning the determinant of a matrix this was quite an easy method to code. 3. It has a unique Cholesky decomposition I don't really understand this one properly, however I remember reading that Cholesky decomposition is used in the Copula Method when sampling Random Variables, therefore I suspect that this characterisation may be important! Since I couldn't really write much about Cholesky decomposition here is a picture of Cholesky instead, looking quite dapper. All 2x2 matrices are positive semidefinite Since we are dealing with Correlation Matrices, rather than arbitrary Matrices, we can actually show apriori that all 2 x 2 Matrices are positive semidefinite. Proof
Let M be a $2$ x $2$ correlation matrix.
$$M = \begin{bmatrix} 1&a\\ a&1 \end{bmatrix}$$ And let $z$ be the column vector $M = \begin{bmatrix} z_1\\ z_2 \end{bmatrix}$ Then we can calculate $z^T M z$ $$z^T M z = {\begin{bmatrix} z_1\\ z_2 \end{bmatrix}}^T \begin{bmatrix} 1&a\\ a&1 \end{bmatrix} \begin{bmatrix} z_1\\ z_2 \end{bmatrix} $$ Multiplying this out gives us: $$ = {\begin{bmatrix} z_1\\ z_2 \end{bmatrix}}^T \begin{bmatrix} z_1 & a z_2 \\ a z_1 & z_2 \end{bmatrix} = z_1 (z_1 + a z_2) + z_2 (a z_1 + z_2)$$ We can then simplify this to get: $$ = {z_1}^2 + a z_1 z_2 + a z_1 z_2 + {z_2}^2 = (z_1 + a z_2)^2 \geq 0$$ Which gives us the required result. This result is consistent with our intuitive explanation above, we need our Correlation Matrix to be positive semidefinite so that the correlations between any three random variables are internally consistent. Obviously, if we only have two random variables, then this is trivially true, so we can define any correlation between two random variables that we like. Not all 3x3 matrices are positive semidefinite The 3x3 case, is simple enough that we can derive explicit conditions. We do this using the second characterisation, that all principal minors must be greater than or equal to 0.
Demonstration
Let M be a $3$ x $3$ correlation matrix: $$M = \begin{bmatrix} 1&a&b\\ a&1&c \\ b&c&1 \end{bmatrix}$$ We first check the determinant of the $2$ x $2$ sub matrix. We need that: $ \begin{vmatrix} 1 & a \\ a & 1 \end{vmatrix} \geq 0 $ By definition: $ \begin{vmatrix} 1 & a \\ a & 1 \end{vmatrix} = 1  a^2$ We have that $  a  \leq 1 $, hence $  a^2  \leq 1 $, and therefore: $  1 a^2  \geq 0 $ Therefore the determinant of the $2$ x $2$ principal submatrix is always positive. Now to check the full $3$ x $3$. We require: $ \begin{vmatrix} 1 & a & b \\ a & 1 & c \\ b & c & 1 \end{vmatrix} \geq 0 $ By definition: $ \begin{vmatrix} 1 & a & b \\ a & 1 & c \\ b & c & 1 \end{vmatrix} = 1 ( 1  c^2)  a (a  bc) + b(ac  b) = 1 + 2abc  a^2  b^2  c^2 $ Therefore in order for a $3$ x $3$ matrix to be positive demidefinite we require: $a^2 + b^2 + c^2  2abc = 1 $ I created a 3d plot in R of this condition over the range [0,1].
It's a little hard to see, but the way to read this graph is that the YZ Correlation can take any value below the surface. So for example, when the XY Corr is 1, and the XZ Corr is 0, the YZ Corr has to be 0. When the XY Corr is 0 on the other hand, and XZ Corr is also 0, then the YZ Corr can be any value between 0 and 1. Checking that a Matrix is positive semidefinite using VBA When I needed to code a check for positivedefiniteness in VBA I couldn't find anything online, so I had to write my own code. It makes use of the excel determinant function, and the second characterization mentioned above. Note that we only need to start with the 3x3 sub matrix as we know from above that all 1x1 and all 2x2 determinants are positive. This is not a very efficient algorithm, but it works and it's quite easy to follow. Function CheckCorrMatrixPositiveDefinite() Dim vMatrixRange As Variant Dim vSubMatrix As Variant Dim iSubMatrixSize As Integer Dim iRow As Integer Dim iCol As Integer Dim bIsPositiveDefinite As Boolean bIsPositiveDefinite = True vMatrixRange = Range(Range("StartCorr"), Range("StartCorr").Offset(inumberofrisksources  1, inumberofrisksources  1)) ' Only need to check matrices greater than size 2 as determinant always greater than 0 when less than or equal to size 2' If iNumberOfRiskSources > 2 Then For iSubMatrixSize = iNumberOfRiskSources To 3 Step 1 ReDim vSubMatrix(iSubMatrixSize  1, iSubMatrixSize  1) For iRow = 1 To iSubMatrixSize For iCol = 1 To iSubMatrixSize vSubMatrix(iRow  1, iCol  1) = vMatrixRange(iRow, iCol) Next Next 'If the determinant of the matrix is 0, then the matrix is semipositive definite' If Application.WorksheetFunction.MDeterm(vSubMatrix) < 0 Then CheckCorrMatrixisPositiveDefinite = False bIsPositiveDefinite = False End If Next End If If bIsPositiveDefinite = True Then CheckCorrMatrixPositiveDefinite = True Else CheckCorrMatrixPositiveDefinite = False End If End Function Checking that a Matrix is positive semidefinite in R Let's suppose that instead of VBA you were using an actually user friendly language like R. What does the code look like then to check that a matrix is positive semidefinite? All we need to do is install a package called 'Matrixcalc', and then we can use the following code: is.positive.definite( Matrix ) That's right, we needed to code up our own algorithm in VBA, whereas with R we can do the whole thing in one line using a built in function! It goes to show that the choice of language can massively effect how easy a task is. Song Lyrics in China Mieville Novels8/3/2018 I just finished reading China Mieville's novel Kraken. It was really cool, it did take me a while to get into Mieville's voice, but once I got into the swing of it I really enjoyed it. Here is the blurb in case you are interested: "In the Darwin Centre at London’s Natural History Museum, Billy Harrow, a cephalopod specialist, is conducting a tour whose climax is meant to be the Centre’s prize specimen of a rare Architeuthis duxbetter known as the Giant Squid. But Billy’s tour takes an unexpected turn when the squid suddenly and impossibly vanishes into thin air." One of the lines in the book struck me as surprisingly familiar. Here is the quote from the book: "She flicked through a pad by her bed, where she made notes of various summonings. A spaceape, all writhing tentacles, to stimulate her audio nerve directly? Too much attitude." After thinking about this for a moment, it clicked that this is a reference to a Burial song called Spaceape (from his self titled album). The line goes: "Living spaceapes, creatures, covered, smothered in writhing tentacles Stimulating the audio nerve directly" I couldn't find any reference online to the inclusion of Burial lyrics in Mieville's novels. Okay I thought, that's a cool a easter egg, but it got my thinking, are there any other song lyrics buried in Mieville's books? And if so, is there any way we can scan them automatically? Collecting Lyrics The first step was to get a database of song lyrics which we can use to scan the novels for, Unfortunately there is no easy place to find a database of song lyrics, so I was forced to scrape them from a lyrics site. I used the following free chrome extension web scraper which is very easy to use, and in my experience very reliable: webscraper.io/ After about 10 minutes of setting it up, and about an hour of leaving it to run. I had managed to scrape most of Burial's lyrics in to csv files. I also scraped lyrics by Kode9 and Spaceape so I could see if they were referenced anywhere. It's hard to know which artist I should look for, but both of these have been mentioned by Mieville in interviews. The web scraping addin has an easy to use GUI. Here is a screenshot of what it looks like to set it up: Ebooks in text format The next step was then to get his ebooks into a format that I could easily analyse. I assumed that I would need them in a csv format, but I actually got away with using .txt in the end. In order to get them into .txt. I used the builtin bulk converter in the following free ebook management program: calibreebook.com/ Here is a screenshot of Calibre. It is also very easy to use, and freely available online. Analysing the text This is now the hardest part of the problem. We have electronic copies of China Mieville's novels in .txt format, and we have a collection of lyrics in .txt format which we would like to compare them against, how can we programmatically analyse whether Mieville references other Burial lyrics in one of his books? If we simply attempt to match entire strings, then we have the issue that we might miss out on some references due to small changes in word ordering or punctuation. For example, in the example above using Burial's Spaceape, the wording is slightly different and the tenses of some of the words have been changed, therefore looking for an exact match between lyrics and text will probably not work. If on the other hand we don't match complete strings, but just try to match words, then we will be overwhelmed by small words like 'the' and 'a' which will be used multiple times in both Burial's song lyrics, and in China Mieville's novels. There are two main approaches I came up with.to solve this problem. My first thought was to match individual words, generating a huge list of matches, and then to count the number of uses of each word in Mieville's novels, and then sorting by the words that match but which are also the most uncommon. For example I would imagine that Spaceape is only ever used once in all of Mieville's novels, giving us information about how unusual this word is. Combined with the fact that this word is also used in a Burial lyric, gives us enough information to assume that there is a high probability of a match, at this point we could investigate manually. I ultimately didn't go down this road. Instead, I had the idea to try to adapt plagiarism detection software to this problem. When you think about it, the two problems are actually quite similar. Plagiarism detection is about trying to automatically check two documents for similar phrases, without relying on complete matches. Open Source Plagiarism Detection I found the following freetouse program created by Lou Bloomfield of the University of Virginia which is perfect for what I was trying to do. plagiarism.bloomfieldmedia.com/wordpress/ It compares two sets of files and then creates a side by side hyperlinked comparison, which can be viewed in chrome, highlighting the parts of the documents where a possible match has been detected. There are various settings you can tweak to specify how close of a match you are interested in. I have included a screenshot below of the section where the Spaceape line is detected. There were about 500 matches detected when I ran this, but it only took about a minute to scroll through and check up on the ones that looked significant. Results Ultimately, this analysis felt like a bit of a failure. These were the only lyrics I could find in all of his novels and while there is always the chance that I need to expand the number of artists I'm looking at, or refine my detection methods I imagine this is all there is. I still thought the process was quite cool so I thought I'd write up what I had done anyway. If you have any thoughts, let me know by leaving a comment. Drinking Game Analysis3/3/2018
When you are playing a drinking game, do you ever catch yourself calculating the probabilities of various events happening? If you are like most people the answer to that is probably "..... um..... no....... why would I ever do that...."
Okay, maybe that's not the kind of thing you think about when playing a drinking game. But I bet you think it would be interesting to know what the odds are anyway? No? Really? Still not that interested? .... Okay... well I find it interesting, so I'm going to write about it anyway. Eyes up/Eyes down I've only played this game a couple of times but it's pretty fun, it requires no equipment, and has quite a lot of drinking. All players sit in a circle, one player calls out "Eye's down" at which point everyone looks down, the same player then says "Eye's up" at which point everyone looks up at a random person in the circle. If the person you are looking at is also looking at you, then you both drink. Pretty simple. What is the probability that you will drink? Let's denote the event that you have matched someone with $M$. Then: $$P(M) = \frac{1}{n1} $$ Where $n$ is the number of players.
To see that this is true, assume that everyone is picking a random person to look at, then it shouldn't matter who you are looking at, that person will always have $n1$ people to pick from, and therefore a $ \frac{1}{n1} $ chance of looking at you.
Of course people in real life tend not to pick a random person to look at, and even if they attempt to pick a random person, people have been shown to be terrible at picking randomly. But for the purposes of this analysis, unless we make this assumption, the only alternative would be to play the game hundreds of times, record how often matches are made, and then make an empirical claim. As fun as it sounds to play this game hundreds of times in a row, it would be better for your health to just assume a uniform spread of guesses. The fact that you have a $ \frac{1}{n1} $ chance of getting a match means that the more people are playing the game, the less likely each person is to drink. Does this apply to the group as a whole though? What is the probability that someone in the group drinks? If we had a hundred people playing, then each individual would hardly ever drink. In fact you would expect them to drink once every $99$ goes. But would you expect it to also be unlikely that anyone at all drinks? I spent about an hour trying to calculate this and didn't get very far. Calculating through conditional probabilities doesn't seem to help, and I couldn't come up with a decent approach to count the permutations of all the possible ways of selecting pairings for elements in a set, such that there are no 2cycles. In the end I gave up and just wrote a program to calculate the answer stochastically. Monte Carlo analysis really is such a versatile tool for problems like these. Anytime you can formulate the problem easily, but struggle to come up with an analytical solution, Monte Carlo analysis will probably work. Here is the table I generated of the probability of someone in the group drinking for a group of size n:
There is a definite trend here, if we plot the values on the chart it looks pretty clear that out solution converges to $0.6$. No idea why though! I might come back to this problem another time, but if anyone has any thoughts then please let me know.
Odds So this is not technically a drinking game, but is played quite often by people who are out drinking. The rules for this game are pretty simple, anytime you would like to dare someone to do something you say "odds of you doing this". The dare might be anything, from 'eating a spoon full of chilli powder, downing your drink, or even getting a tattoo (that last one is a real one I heard about from a friend who playing the game while travelling in Eastern Europe). The person you have challenged then gives you a number. They might say $20$, then each of you count down from $3$ and say an integer from $1$ to $20$ (or whatever number they selected). If you both said the same number, then they have to do the dare. Therefore the higher number you pick, the less likely it is that you will have to do the dare. What are the odds of you having to do whatever it is you are betting on, based on you selecting the number $n$? This is quite a straightforward problem, the answer is just $\frac{1} {n} $. I was playing this game last week with someone who thought the answer might be $\frac{1} {n^2} $. This would give a very different likelihood of having to do the dare. For example if you selected $20$, then you would have a $5 \%$ chance of having to do the dare, but according to my friend's calculation he thought you would have $0.25 \%$ chance of doing the dare. Here is a table showing the correct probabilities for various values of $n$. Dirty Pint Flip I wouldn't recommend playing this game to be honest. I played it in uni once, but it's a bit grim. All the players sit in a circle with a central pint glass in the middle, the players take it in turn to pour any amount of their own drink into the central pint glass as they would like, they then have to flip a coin and guess whether it will be heads or tails. If they get it right, then the game moves on to the person to their left, if they get it wrong, then they have to down the dirty pint. When I played it some people were drinking wine, some people were drinking beer... it was not a good combination. What is the probability that you will have to drink? This is quite straight forward, it is simply the probability that you will guess the coin flip correctly. Which is just $\frac{1}{2} $. What about the question of how much you will drink on average? That is, on average if you have to drink, how many people will have poured their drink into the glass before you drink? We can model this as a geometric distribution and calculate the probabilities of each possible number of people. Let's denote the number of people who have poured in before you, given you are about to drink as $N$, then:
$$N \sim Geo(0.5) $$
Giving us the following table: The expected value of the number of drinks is then the sumproduct of these two columns . This gives us a value of $2$. Therefore, if you have to drink, the average number of drinks that will have been poured into the pint is $2$. Two Dice Nominate This is another game I've only played a couple of times. And I'm not sure if it has a better name than the one I've given it, if you know of one then let me know. In this game, you sit in a circle with a range of drinks on the table, you take it in turns to nominate a person, a drink, and a number between $2$ and $12$. You then roll two dice and add them together. If you have rolled the correct number, the person you nominated drinks the drink you selected, if however you roll a double, then you drink that drink. If it is both a double and you get it correct, then you both have to drink. The reason that this game does not have much depth to it is that you can always just pick the most likely number to come up when rolling two dice. Here is a table showing the probability of each value coming up: Therefore you might as well pick $7$ every single time! Is there anything else we can say about this game? We know that if we roll a double, then we have to drink the drink. What is the probability of you rolling a double in this game? The probability of this is always just $\frac{1}{6} $. Interesting this is the same probability as rolling the dice such that the sum is $7$. Therefore, if you do pick $7$, there is an equal chance of you and the person you are nominating drinking the drink. Recently I've been reading The Mathematics of Poker (2009, Bill Chen and Jerrod Ankenman) and I came across an interesting idea that I thought I'd write about. For me, understanding how to analyse this situation really gets to the heart of how I think about poker. I'd love to spend more time playing and studying poker but it's such a timesink and I don't really have the oodles of free time it would require, but every now and again I'll still open up a poker book and read about something, this is one of those interesting topics I was reading about hopefully you find it as interesting as I do. Calling a shove preflop in heads up The scenario being analysed in the book is a relatively common situation, particularly in online poker where people are more inclined to shove than in real life games. The question is: How should we analyse whether to call when we have a moderately strong hand against an opponent who has gone allin pre flop. Let's set up an example so that we have something concrete to talk about. Say we have pocket Kings pre flop, and our opponent goes allin, how should we decide whether to call? Obviously without knowing our opponent's hand there is no 100% correct answer. There is however one very useful way of analysing the situation. Equity against a range We need to ask ourselves  what cards would are opponent go allin with, and how does my current hand fare against that range? i.e. we need to calculate our equity against our opponent's range. We are adding another layer of stochastic uncertainty on the event, instead of trying to guess what hand our opponent has (which is almost impossible) we are trying to guess what kind of hands he might go allin with (which is very much possible). We then take this extra level of uncertainty and calculate the correct mathematical way to proceed. On the one extreme, let's suppose that based on our read of how our opponent is playing, we might think that they would only go allin with very strong hands, in this case just pocket Aces. We would then be a 4:1 underdog if we call with Ks, and we should definitely fold. (In order to calculate this we could use any standard poker calculator like the following) www.cardplayer.com/pokertools/oddscalculator/texasholdem One the other hand, suppose we know for a fact that our opponent has not looked at their cards at all but has still decided to go all in. In this case we should definitely call. The only cards we will be behind are pocket Aces, which make up a small fraction of the possible hands that our opponent could shove with, and we will be ahead or equal against all other possible hands. Therefore we would have a positive EV when calling. What if our read on our opponent's range is somewhere in between though? What we need to do is calculate our equity against each individual hand in our opponent's range, and then calculate the probability of our opponent having a given hand from that range. That is to say, in order to calculate the conditional expectation, we need to calculate the unconditional expectations against each hand and then multiply by the conditional probability of our opponent having that hand, given our belief about our opponent's range. Numerical Example Let's go back to our numerical example, and suppose that we have pocket Kings, and we put our opponent on either Pocket Aces, Kings, Queens, or Jacks. All of these hands are equally likely, so there is a 25% chance of our opponent having each hand. We can look up our equity against each hand (after you've been playing for a while, you naturally start to memorise hand equities anyway) Our probability of winning is then: $$P(A) * P(beating A) + P(K)*P(beating K) + P(Q)*P(beating Q) + P(J) * P(beating J)$$ Putting in our values: $$ 0.25*0.2 + 0.25*0.5 + 0.25*0.8 + 0.25*0.8 = 0.575.$$ We therefore see we have a positive expectation against this range, and should call. No one actually thinks like this in real games? It is a general misconception that professional poker is a game where players are trying to guess exactly what hand their opponent has, are constantly trying to bluff each other, or trying to pick up on subtle tells or signs that their opponent is or isn't bluffing. The more mundane truth is that poker is ultimately a game of imperfect information, where the better player is the one who can correctly interpret the betting information their opponent is giving them, and can then quickly and accurately make the kind of judgements described above during a game. Obviously poker players are not carrying out these calculations in their head to multiple decimal places in real time, what they will do though is review their hands after a game, calculate exactly what they should have done, and try to build up an intuition as to what the correct answer is, so that in the middle of a game they can quickly make decisions. Software to Analyse this situation Is there an easy software based method way of calculating our equity against a range? After I did a quick google there are a few programs that offer this type of analysis. For example: combonator.com/ www.powerequilab.com/ More interestingly though, I also found the following open source software, that can be adapted to carry out this type of analysis: github.com/zekyll/OMPEval At some point, I might try to use this code to set up a page on this website to let people analyse this situation. Was the Lognormal Distribution misnamed?20/2/2018 I was thinking about this last week at work when I was coding part of a model involving the parameters of a truncated lognormal distribution. The lognormal distribution definitely feels like it was named the wrong way round. What is a Lognormal Distribution?
We say that a Random Variable $X$ has a LogNormal Distribution that is:
$$ X \sim LogN( \mu , { \sigma }^2 ) $$ if: $$ Log (X) \sim N( \mu , { \sigma }^2 ) $$In other words, a Lognormal distribution is a distribution such that the log of the distribution is a normal distribution. It is not, as you might think, a distribution which is the log of the normal distribution. So if $Y \sim N( \mu , {\sigma}^2 ) $ then $Log ( Y ) $ is not a lognormal distribution, instead $ e ^ Y $ is a lognormal distribution. So to create a lognormal distribution, we don't take the log of the normal distribution, we take the exponential! Why does this matter?
Definitions are just definitions after all, and as long as everyone knows how something is defined and there is no ambiguity one definition is usually as good as another. In this case though, defining it in this way does have some ugly and unnatural consequences. For example, if we take the result that the sum of two independent normal distributions is also a normal distribution, i.e.
If: $$ X \sim N( {\mu}_1 , {{\sigma}_1}^2 ) , Y \sim N( {\mu}_2 , {{\sigma}_2}^2 ) $$ Then: $$ X + Y \sim N( {\mu}_1 + {\mu}_2 , {{\sigma}_1}^2 + {{\sigma}_2}^2 ) $$ Then applying this result to the lognormal distribution, we get: If $ X \sim LogN( {\mu}_1 , {{\sigma}_1}^2 ) $ and $ Y \sim LogN( {\mu}_2 , {{\sigma}_2}^2 ) $ assuming independence, Then:$$ XY \sim LogN( {\mu}_1 + {\mu}_2 , {{\sigma}_1}^2 + {{\sigma}_2}^2 ) $$ Maybe this doesn't look too bad to you. But what if I replace $X$ and $Y$ with ${LogN}_1$ and ${LogN}_2$? Then we get: $$ {LogN}_1 * {LogN}_2 \sim LogN( {\mu}_1 + {\mu}_2 , {{\sigma}_1}^2 + {{\sigma}_2}^2 ) $$ This should definitely look wrong to you! Remember that for a standard logarithm: $$ Log (AB) = Log(A) + Log(B) $$ Instead we have an identity that looks much more like an exponential:$$ e^A * e^B = e^{ (A + B ) } $$ And that's precisely because we are dealing with an exponential! The lognormal distribution is simply the exponential of the normal, which is a much more natural way of phrasing it than to say that the lognormal distribution is a distribution such that the logarithm of the distribution is a normal distribution. So we have two reasons why the Lognormal Distbribution should have been called the Exponential Normal Distribution (Or possibly the XNormal Distribution for short). The identity above makes perfect sense when using exponentials, and we would have a naming convention that is much more natural. The Reinstatement Premium payable following a loss to an Excess of Loss contract, is related to the ceded loss to the contract by a simple formula. Therefore, it seems reasonable that we should be able to come up with a simple formula relating the price charged for the Excess of Loss contract to the price charged for a Reinstatement Premium Protection (RPP) cover. I was in a meeting last week with two brokers who were trying to do just this. They had come up with an indicative price for an Excess of Loss programme and were trying to use this to price the equivalent RPP cover. At the time I didn't have an answer for them, and when I did a quick Google, nothing came up. When thinking about it subsequently though, there are a couple of easy approximate methods we can use. Below I discuss three different methods for how you can price an RPP cover, two of which do not require any stochastic modelling assuming you already know the price of the Excess of Loss layers. Let's quickly review what we mean by all these terms so that we are starting from the same point. What is a Reinstatement Premium? If you already understand how a Reinstatement Premium works, then feel free to skip this section. Most Excess of Loss contracts will have some form of reinstatement premium. This is a payment from the Insurer to the Reinsurer to reinstate the protection in the event of a loss. In the London market, most contracts willl have either $1$, $2$, or $3$ reinstatements and generally these will be payable at $100 \%$. What is a Reinstatement Premium Protection Cover? The reinstatement premiums can be quite a large proportion of the overall premium paid for the Excess of Loss reinsurance. This is especially the case for lower level, working layers. Furthermore, the Reinstatement Premium will be payable when the insurer has just suffered a loss. From the point of view of the insurer, this additional payment comes at the worst possible time. The Insurer is being asked to fork over another large premium to the Reinsurer, just after having suffered a loss. To address some of these concerns, Reinsurers developed a product called a Reinstatement Premium Protection cover (RPP cover). This cover pays the Insurer's Reinstatement Premium for them, which gives the insurer further indemnification in the event of a loss. Here's an example of how it works in practice: Let's suppose we are considering a $5m$ xs $5m$ Excess of Loss contract, there is one reinstatement at $100 \%$ (written $1$ @ $100 \%$), and the Rate on Line is $25 \%$. The Rate on Line is just the Premium divided by the Limit. So here, the Premium can be found by multiplying the Limit and the RoL: $$5m* 25 \% = 1.25m$$ So we see that the Insurer will have to pay the Reinsurer $1.25m$ at the start of the contract. Now let's suppose there is a loss of $7m$. The Insurer will recover $2m$ from the Resinsurer, but they will also have to make a payment to cover the reinstatement premium of: $\frac {2m} {5m} * (5m * 25 \% ) = 2m * 25 \% = 0.5m$ to reinstate the cover. So the Insurer will actually have to pay out $5.5m$. The RPP cover, if purchased by the insurer, would pay the additional $0.5m$ on behalf of the insurer Now that we know how it works, how would we price the RPP cover? Three methods for pricing an RPP cover Method 1  Full stochastic model If we have priced the original Excess of Loss layer ourselves using a Monte Carlo model, then it should be relatively straight forward to price the RPP cover. We can just look at the expected Reinstatements, and apply a suitable loading for profit and expenses. This loading will probably be broadly in line with the loading that is applied to the expected losses to the Excess of Loss layer, but accounting for the fact that the writer of the RPP cover will not receive any form of Reinstatement for their Reinsurance. What if we do not have a stochastic model set up to price the Excess of Loss layer? What if all we know is the price being charged for the Excess of Loss layer? Method 2  Simple formula This was the situation I was in when I was asked to price the RPP cover last week. The broker had come up with some very approximate pricing for a programme of Excess of Loss layers. This pricing was driven by the burning cost, and other commercial factors rather than any actuarial modelling. The brokers wanted to come up with an approximate price for the RPP cover, just based on the price of the Excess of Loss layer. The two should be related, as they pay out dependant on the same underlying losses. So what can we say? If we denote the Expected Losses to the layer by $EL$, then the Expected Reinstatement Premium should be: $$EL * ROL $$ To see this is the case, I used the following reasoning; if we had losses in one year equal to the $EL$ (I'm talking about actual losses, not expected losses here), then the Reinstatement Premium for that year would be the proportion of the layer which had been exhausted $\frac {EL} {Limit} $ multiplied by the Deposit Premium $Limit * ROL$ i.e.: $$ RPP = \frac{EL} {Limit} * Limit * ROL = EL * ROL$$ Great! So we have our formula right? The issue now is that we don't know what the $EL$ is. We do however know the $ROL$, does this help? If we let $DP$ denote the deposit premium, which is the amount we initially pay for the Excess of Loss layer and we assume that we are dealing with a working layer, then we can assume that: $$DP = EL * (1 + \text{ Profit and Expense Loading } ) $$ Plugging this into our formula above, we can then conclude that the expected Reinstatement Premiums will be: $$\frac {DP} { \text{ Profit and Expense Loading } } * ROL $$ In order to turn this into a price (which we will denote $RPP$) rather than an expected loss, we then need to load our formula for profit and expenses i.e. $$RPP = \frac {DP} {\text{ Profit and Expense Loading }} * ROL * ( \text{ Profit and Expense Loading } ) $$Which with cancellation gives us: $$RPP = DP * ROL $$ Which is our first very simple formula for the price that should be charged for an RPP. Was there anything we missed out though in our analysis? Method 3  A more complicated formula: There is one subtlety we glossed over in order to get our simple formula. The writer of the Excess of Loss layer will also receive the Reinstatement Premiums during the course of the contract. The writer of the RPP cover on the other hand, will not receive any reinstatement premiums (or anything equivalent to a reinstatement premium). Therefore, when comparing the Premium charged for an Excess of Loss layer against the Premium charged for the equivalent RPP layer, we should actually consider the total expected Premium for the Excess of Loss Layer rather than just the Deposit Premium. What will the additional premium be? We already have a formula for the expected Reinstatement premium: $$EL * ROL $$ Therefore the total expected premium for the Excess of Loss Layer is the Deposit Premium plus the additional Premium: $$ DP + EL * ROL $$ This total expected premium is charged in exchange for an expected loss of $EL$. So at this point we know the Total Expected Premium for the Excess of Loss contract, and we can relate the expected loss to the Excess of Loss layer to the Expected Loss to the RPP contract. i.e. For an expected loss to the RPP of $EL * ROL$, we would actually expect an equivalent premium for the RPP to be: $$ RPP = (DP + EL * ROL) * ROL $$ This formula is already loaded for Profit and Expenses, as it is based on the total premium charged for the Excess of Loss contract. It does however still contain the $EL$ as one of its terms which we do not know. We have two choices at this point. We can either come up with an assumption for the profit and expense loading (which in this hard market might be as little as only be $5 \%  10 \%$ ). And then replace $EL$ with a scaled down $DP$: $$RPP = \frac{DP} {1.075} * ( 1 + ROL) * ROL $$ Or we could simply replace the $EL$ with the $DP$, which is partially justified by the fact that the $EL$ is only used to multiply the $ROL$, and will therefore have a relatively small impact on the result. Giving us the following formula: $$RPP = DP ( 1 + ROL) * ROL $$ Which of the three methods is the best? The full stochastic model is always going to be the best in my opinion. If we do not have access to one though, then out of the two formulas, the more complicated formula we derived should be more accurate (by which I mean more actuarially correct). If I was doing this in practice, I would probably calculate both, to generate some sort of range, but tend towards the second formula. That being said, when I compared the prices that the Brokers had come up with, which is based on what they thought they could actually place in the market, against my formulas, I found that the simple version of the formula was actually closer to the Broker's estimate of how much these contacts could be placed for in the market. Since the simple formula always comes out with a lower price than the more complicated formula, this suggests that there is a tendency for RPPs to be underpriced in the market. This systematic underpricing may be driven by commercial considerations rather than faulty reasoning on the part of market participants. According to the Broker I was discussing these contracts with, a common reason for placing an RPP is to give a Reinsurer who does not currently have a line on the underlying Excess of Loss layer, but who would like to start writing it, a chance to have an involvement in the same risk, without diminishing the signed lines for the existing markets. So let's say that Reinsurer A writes $100 \%$ of the Excess of Loss contract, and Reinsurer B would like to take a line on the contract. The only way to give them a line on the Excess of Loss contract is to reduce the line that Reinsurer A has. The insurer may not wish to do this though if Reinsurer A is keen to maintain their line. So the Insurer may allow Reinsurer B to write the RPP cover instead, and leave Reinsurer A with $100 \%$ of the Excess of Loss contract. This commercial factor may be one of the reasons that traditionally writers of an RPP would be inclined to give favourable terms relative to the Excess of Loss layer so as to encourage the insurer to allow them on to the main programme and to encourage them to allow them to wrte the RPP cover at all. Moral Hazard One point that is quite interesting to note about how these deals are structured is that RPP covers can have quite a significant moral hazard effect on the Insurer. The existence of Reinstatement Premiums is at least partially a mechanism to prevent moral hazard on the part of the Insurer. To see why this is the case, let's go back to our example of the $5m$ xs $5m$ layer. An insurer who purchases this layer is now exposed to the first $5m$ of any loss. But they are indemnified for the portion of the loss above $5m$, up to a limit of $5m$. If the insurer is presented with two risks which are seeking insurance  one with a total sum insured of $10m$, and another with a total sum insured of $6m$, the net retained exposure is the same for both risks from the point of view of the insurer. By including a reinstatement premium as part of the Excess of Loss layer, an therefore ensuring that the insurer has to make a payment any time a loss ceded to the layer, the reinsurer is ensuring that the insurer keeps their financial incentive to not have losses in this range. By purchasing an RPP cover, the insurer is removing their financial interest in losses which are ceded to the layer. There is an interesting conflict of interest in that the RPP cover will almost always be written by a different reinsurer to the Excess of Loss layer. The Reinsurer that is writing the RPP cover is therefore increasing the moral hazard risk whichever Reinsurer has written the Excess of Loss layer. Which will almost always be business written by one of the Reinsurer's competitors! Working Layers and unlimited Reinstatements Another point to note is that this pricing analysis makes a couple of implicit assumptions. The first is that there is a sensible relationship between the expected loss to the layer and the premium charged for the layer. This will normally only be the case for 'working layers'. These are layers to which a reasonable amount of loss activity is expected. If we are dealing with clash or other higher layers, then the pricing of these layers will be more heavily driven by considerations beyond the expected loss to the layer. These might be capital considerations on the part of the Reinsurer, commercial considerations such as Another implicit assumption in this analysis is that the reinstatements offered are unlimited,. If this is not the case, then the statement that the expected reinstatement is $EL * ROL$ no longer holds. If we have limited reinstatements (which is the case in practice most of the time) then we would expect the expected reinstatement to be less than or equal to this. Myst  Solving the Gear Puzzle26/1/2018 Over the Christmas break, I finally got around to playing some Computer Games, which I unfortunately haven't done in ages. One really fun game (if you're like me and love difficult puzzles) is a game called Myst. It was originally released in 1993 as a point and click puzzle adventure game, but it has subsequently been remade, using the Unity engine, as a full 3d interactive game. I really liked the fact that there is no time limit, no danger of dying, just a cool story, interesting puzzles and relaxing music. One puzzle that I thought was cool involved aligning gear wheels. If you haven't played the game and don't want spoilers, then stop reading now. The Gear Puzzle We are given the contraption in the image below, and are told that we need to end up with the numbers 2,2,1 (reading from top to bottom). We have three levers we can pull, the left lever moves the top two gears left once, the right lever moves the bottom two gears right once. The lever on the back wall resets the machine. The starting values of the gears are 3,3,3. So how many times do we need to pull each lever in order to end up at 2,2,1? After playing around with it for a while I began to think that it might not even be possible to get to 2,2,1 using the two levers. Then I thought if that's true, can I prove it as well? Proof that the puzzle is impossible Instead of thinking of the value as $3$, let's call it $0$ instead. We can then think of the starting state as a vector:
$$ \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} $$ And we can think of the left lever as adding on the following vector: $$ \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix} $$ And the right lever as adding the following vector: $$ \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix} $$ We need to work in $mod$ $3$ in order to properly account for the fact that after we go past $3$, we come back to $1$. Then the goal is to produce the state: $$ \begin{bmatrix} 2 \\ 2 \\ 1 \end{bmatrix} $$
To see that we are not able to do this with just the basic operations from the starting point we are given, note that the the sum of the first and third elements of our starting vector is equal to the second vector mod $3$, that is, $0 + 0 \equiv 0$ $(mod$ $3)$. For the vector we trying to create, the first and third is not congruent to the second element mod $3$. Instead we have $ 2 + 1 \equiv 0$ $(mod$ $3) $ rather than $2$.
The two operations we are applying to our starting vector preserve this congruence property, as it applies to both the vectors we are adding on as well. Therefore, we can conclude that this property is an invariant of the system, and because our goal does not satisfy this condition, we will never be able to reach:
$$ \begin{bmatrix} 2 \\ 2 \\ 1 \end{bmatrix} $$
by just using simple pulls of the lever. We need to realise this in order to force us to look for an alternative way to solve the puzzle. Otherwise we would probably just keep yanking on the levers in the hope that we will eventually get there. Once we realise this and start looking around, we are able to find another way to manipulate the machine, allowing us to make the numbers we need. How do Bookmakers make money?23/12/2017 "The safest way to double you money is to fold it in half" Kin Hubbard "I like to play Blackjack. I'm not addicted to gambling. I'm addicted to sitting in a semicircle." Mitch Hedberg Gambling is a risky business, in the long run everyone loses. Well not everyone. Bookmakers always seem to do alright. How is it that bookmakers make so much money out of gambling then? Is it through their superior wit and street smarts, or is there something else going on? It turns out that the method used by bookmakers involves a lot less insight and risk than you might think. The process is mathematical, and is guaranteed to turn a profit, and actually quite interesting. Making a Dutch Book Let's start with an example, suppose I am a bookmaker offering odds on the Super Bowl coin toss. We all know that for a fair coin, the probability of getting a heads is 50%, and the probability of getting tails is 50%. We might suppose therefore that we would set our odds at 1/1, meaning for a bet of £1 if you win, you get £1 back plus your original £1. This means that you would double your money if you win (which happens about half the time) and lose all your money if you lose (which happens about half the time). Over time, if you made lots of bets of £1 at these odds, you would expect to break even. Let's suppose though, that for reasons unknown to us, far more money is being bet on heads than tails. For example, suppose £1,000 is put on heads, and only £500 is put on tails. We can then examine our position under two scenarios, one where heads wins, and one where tails wins. We have taken in £1,500 in total, and our outgoings are either £2,000 or £1,000. Making us a profit of £500 half the time, and £500 half the time. Given each outcome is equally likely, over the long run if we keep offering odds like these we will expect to break even. We can do better than this though. We can actually set the odds so that we break even every single time, not just in the long run. In order to see this, let's suppose that we change the odds so that they are 1/2 for heads, and 2/1 for tails. So that means, for every £2 bet on heads, we will pay out £1, plus the original £2 bet, and for every £1 bet on tails, we will pay out £2, plus the original £1 bet. Under these new odds, we would then end up with the following position: We see now that because of how we have amended the odds, we don't actually care who wins. We will always pay out the same amount. The Bookmaker can then adjust the odds slightly, so that they will always pays out less than they have taken in. For example, they might offer 5/11 for heads, and 9/5 for tails. Under these odds, not only will the Bookmaker not care who wins, but they will always make a small margin on all bets placed. The exact figures in this case would be: So we see that the Bookmaker has a foolproof system of always making money from gambling, and plus, they don't even need to be good at predicting who will win. Irrational odds? Isn't it crazy to offer odds of 2/1 on a coin toss though? We all know that the actual odds for a fair coin should be 1/1! The answer is that as soon as people see these odds, they should start betting on tails, and by supply and demand the odds for tails will start to move down, and the odds for heads will start to move up. The bookmaker themselves might also decide to take a position at these odds and effectively bet money themselves by allowing the payout to be skewed towards a 50/50 split, giving their payout a positive Expected Value. The first time I read about this system of bookmaking was in the book "Financial Calculus: an introduction to derivative pricing" by Baxter and Rennie, which I was reading in preparation for the IFoA ST6 exam. Baxter and Rennie brought it up because Bookmakers are actually undertaking a form of arbitrage, similar to the series of notional trades used when deriving the price of futures contracts. You can see the similarities by simply noting that both the bookmaker and the derivatives trader are acting so as to not have any exposure to the actual value of the underlying event. By doing this, they don't actually need to take a view at all on what the expected outcome is, but they can instead exploit the relative values of different parts of the market. What margin to online bookmakers charge? I had a quick look at some odds offered by online bookmakers on a couple of events. One of the events I looked at was UFC 218. BestFightOdds.com gives a comparison of the odds offered by various betting websites. This is mainly to help people pick which website to bet on in order to get the best odds, but we can use it to compare the margins that the different websites charge. Assuming an equal bet is placed on each fight on the card, I calculated the following margins for the websites listed by BestFightOdds: They range from a low of 2.62%, up to about 6%. The individual margin on each fight varies quite a bit, and also I suspect each website will run different margins on different sports and different events. The reason for this is that some of the higher margin websites might offer more incentives, and more free bets than the lower margin websites., plus they might be more established and spend more on advertising allowing them to charge more and still retain sufficient numbers of customers. The fact that websites run margins does mean that if you are going to make money gambling you need to first beat the margin before you can even break even. Let's say you're really good at gambling and you can outpredict the market by 2%, you would still lose money overall gambling on any of these websites, because you also need to factor in the margin (also called the vig) that the websites charge. Sources: (1) BestfightOdds.com: www.bestfightodds.com/events/ufc218hollowayvsaldo21368 The great RPI swindle?14/12/2017 Last week it was announced that UK Rail Fares were to increase once again at the maximum allowed rate  3.4%, corresponding to the RPI increase in July 2017. News article: www.bbc.co.uk/news/business42234488 When reading this it got me thinking  why is RPI even being used any more? Aren't we supposed to be using CPI now? In 2013 the ONS stated that: "Following a consultation on options for improving the Retail Prices Index (RPI), the National Statistician, Jil Matheson, has concluded that the formula used to produce the RPI does not meet international standards and recommended that a new index be published." So basically the ONS no longer endorses RPI as the best indicator of the level of inflation in the economy. The ONS instead supports the use of CPI. So why does it matter that some organisations are still using RPI? To see why, let's take a look at a chart showing the historic RPI and CPI increases in the UK: Source: www.ons.gov.uk/economy/inflationandpriceindices RPI has been greater than CPI in every single month since 2010. In fact, in this time period, RPI has been an average of 0.8% higher than CPI. This fact might go some way to explain why the Government is so slow to move rail increases from RPI to CPI. This way the Government and Rail Companies can claim that they are only increasing their costs in line with inflation, which seems fair, yet the index they are using is actually higher than the usual inflation index used in the UK. The Government also indexes some of it's outgoings by an inflation index, for example the State Pension, so at least this is also being consistently overstated right? Well actually no! Wherever the government is using an inflation index to increase payments, it seems to have already transitioned to a CPI increase. Let's look at the list of items which use the inflation index which is more beneficial to the Government: (remember that CPI is almost always lower than RPI): There are some pretty hefty items on the list, including, the State Pension, Benefit Payments, Rail Fares, Utility bills, Student Loans. The ones that decrease government spending seem to have already transitioned over, and the ones that increase the amount of tax collected, or the cost of regulated items all seem to still be using the higher RPI index. Now let's look at the list of items which use the inflation index which is to the disadvantage of the Government. The list is certainly a lot shorter, and the items on it are less substantial. Indexed linked Government Bonds are however quite substantial. The reason that the Government is not able to move these to a CPI index is that it would be considered a default to downgrade the index once the bonds have been sold. The Government has no choice but to continue paying the bonds at RPI. Also, the yield on the bonds will be set with an eye towards the yield on a fixed bond, and the expected level of inflation. Therefore the actual index used is not necessarily that important. It's nice to see though that at least stamps are increased at a CPI rate! Source: www.ons.gov.uk/economy/inflationandpriceindices/methodologies/usersandusesofconsumerpriceinflationstatistics 
AuthorI work as a pricing actuary at a reinsurer in London. Categories
All
Archives
May 2020
