FAQs about Lloyd’s of London16/11/2020 Are Lloyd’s of London and Lloyds bank related at all?
They are not, they just happen to have a similar name. Lloyd’s of London is an insurance market, whereas Lloyd’s bank is a bank. They were both set up by people with the surname Lloyd  Lloyds bank was formed by John Taylor and Sampson Lloyd, Lloyd’s of London by Edward Lloyd. Perhaps in the mists of time those two were distantly related but that’s about it for a link. Do people drink a lot at Lloyd’s? Do I need to drink to get ahead at Lloyd’s? Go back a couple of decades and the culture (I’m told, I wasn’t actually there) was that casual daytime drinking was not just acceptable, but an indispensable part of how business was done and relationships maintained. The London market is certainly changing though, and drinking is a lot less integral to how the market functions than it used to be. Here is one writer’s experience of what it was like being a broker in the 70s: https://triptide.london/articles/diaryofaLloydsBroker To give Lloyd’s some credit, it is attempting to clean up its act, for example it recently banned anyone under the influence of drink or drugs from entering Lloyd’s: https://www.bbc.co.uk/news/business47858013 I'd be surprised if the market is 100% compliant, just go to the Lamb at lunchtime… but it’s at least a step in the right direction. Am ‘I’ really a Strange Loop?28/10/2020 I just finished reading ‘I am a strange loop’ by Douglas Hofstadter, and before I say anything else about the book, I’ll say that I really did want to like it. I’m a huge fan of his better known book ‘Godel, Escher, Bach’ for which Hofstadter won a Pulitzer Prize, I’m also very interested in the subject area – maths, logic, selfreference, cognitive science. However there were just too many things that rubbed me up the wrong way, in no particular order here were all the things I didn’t like about the book: Excess layer pricing16/9/2020 I had to solve an interesting problem yesterday relating to pricing an excess layer which was contained in another layer which we knew the price for – I didn’t price the initial layer, and I did not have a gross loss model. All I had to go on was the overall price and a severity curve which I thought was reasonably accurate. The specific layers in this case were a 9m xs 1m, and I was interested in what we would charge for a 6m xs 4m.
Just to put some concrete numbers to this, let’s say the 9m xs 1m cost \$10m The xs 1m severity curve was as follows: The St Petersburg Paradox21/8/2020
Let me introduce a game – I keep flipping a coin and you have to guess whether it will come up heads or tails. The prize pot starts at \$2, and each time you guess correctly the prize pot doubles, we keep playing until you eventually guess incorrectly at which point you get whatever has accumulated in the prize pot.
So if you guess wrong on the first flip, you just get the \$2. If you guess wrong on the second flip you get \$4, and if you get it wrong on the 10th flip you get \$1024. Knowing this, how much would you pay to enter this game? You're guaranteed to win at least \$2, so you'd obviously pay at least $\2. There is a 50% chance you'll win \$4, a 25% chance you'll win \$8, a 12.5% chance you'll win \$16, and so on. Knowing this maybe you'd pay \$5 to play  you'll probably lose money but there's a decent chance you'll make quite a bit more than \$5. Perhaps you take a more mathematical approach than this. You might reason as follows – ‘I’m a rational person therefore as any good rational person should, I will calculate the expected value of playing the game, this is the maximum I should be willing to play the game’. This however is the crux of the problem and the source of the paradox, most people do not really value the game that highly – when asked they’d pay somewhere between \$2\$10 to play it, and yet the expected value of the game is infinite....
Source: https://unsplash.com/@pujalin
The above is a lovely photo I found of St Petersburg. The reason the paradox is named after St Petersburg actually has nothing to do with the game itself, but is due to an early article published by Daniel Bernoulli in a St Petersburg journal. As an aside, having just finished the book A Gentleman in Moscow by Amor Towles (which I loved and would thoroughly recommend) I'm curious to visit Moscow and St Petersburg one day.
Two black swans? Or No Black Swans?24/5/2020 Dan Glaser, CEO of Guy Carp, stated last week that he believes that the current fallout from Coronavirus represents two simultaneous black swans.
Nassim Taleb meanwhile, the very guy who brought the term ‘black swan’ into popular consciousness, has stated that what we are dealing with at the moment isn’t even a black swan! So what’s going on here? And who is right? Justice Thomas speaks17/5/2020 I saw this story today [1], and I've got to say I absolutely love it. Here is the tag line:
“Clarence Thomas Just Asked His First Question in a Supreme Court Argument in 3 Years” For those like me not familiar with Justice Thomas, he is one of the nine Supreme Court Justices and he is famously terse: “He once went 10 years without speaking up at an argument.” “His last questions came on Feb. 29, 2016” You could say I was speechless upon reading this (see what I did there?), for a judge who sits over some of the most important trials in the US to basically never speak during oral arguments seems pretty incredible. If you are an actuary, you'll probably have done a fair bit of triangle analysis, and you'll know that triangle analysis tends to works pretty well if you have what I'd call 'nice smooth consistent' data, that is  data without sharp corners, no large one off events, and without substantially growth. Unfortunately, over the last few years, motor triangles have been anything but nice, smooth or consistent. These days, using them often seems to require more assumptions than there are data points in the entire triangle.
In case you missed it, Aon announced [1] last week that in response to the Covid19 outbreak, and the subsequent expected loss of revenue stemming from the fallout, they would be taking a series of preemptive actions. The message was that no one would lose their job, but that a majority of staff would be asked to accept a 20% salary cut.
The cuts would be made to:
So how significant will the cost savings be here? And is it fair that Aon is continuing with their dividend? I did a couple of back of the envelope calcs to investigate. This post is about two pieces of writing released this week, and how easy it is for even smart people to be wrong.
The story starts with an open letter written to the UK Government signed by 200+ scientists, condemning the government’s response to the Coronavirus epidemic; that the response was not forceful enough, and that the government was risking lives by their current course of action. The letter was widely reported and even made it to the BBC frontpage, pretty compelling stuff. Link [1] The issue is that as soon as you start scratching beneath the surface, all is not quite what it seems. Of the 200+ scientists, about 1/3 are PhD students, not an issue in and of itself, but picking out some of the subjects we’ve got:
So I finally finished Thomas Piketty's book  Capital in the 21st Century, and I thought I'd write up an interesting result that Piketty mentions, but does not elaborate on. Given the book is already 700 pages, it's probably for the best that he drew the line somewhere.
The result is specifically, that under basic models of the development of distribution of wealth in a society, it can be shown that when growth is equal to $g$, and return on capital is equal to $r$, then the distribution of wealth tends towards a Pareto distribution with parameter $rg$. That sounds pretty interesting right? My notes below are largely based on following paper by Charles I. Jones of Stanford Business School, my addition is to derive the assumption of an exponential distribution of income from more basic assumptions about labour and capital income. Link to Jones's paper [1] Stirling's Approximation23/2/2020 I’m reading ‘Information Theory, inference and learning algorithms' by David MacKay at the moment and I'm really enjoying it so far. One cool trick that he introduces early in the book is a method of deriving Stirling’s approximation through the use of the Gaussian approximation to the Poisson Distribution, which I thought I'd write up here.
Piketty also adds colour by tying his observations to the literature written at the time (Austen, Dumas, Balzac), and how the assumptions made by the authors around how money, income and capital work are also reflected in the economic data that Piketty obtained.
Hopefully I've convinced you Piketty's programme is a worthwhile one, but that still leaves the fundamental question  is his analysis correct? That's a much harder question to answer, and to be honest I really don't feel qualified to pass judgement on the entirety of the book, other than to say it strikes me as pretty convincing from the limited amount of time I've spent on it. In an attempt to contribute in some small way to the larger conversation around Piketty's work, I thought I'd write about one specific argument that Piketty makes that I found less convincing than other parts of the book. Around 120 pages in, Piketty introduces what he calls the ‘Second Fundamental Law of Capitalism’, and this is where I started having difficulties in following his argument. The Second Fundamental Law of Capitalism The rule is defined as follows: $$ B = \frac{s} { g} $$ Where $B$ , as in Piketty’s first fundamental rule, is defined as the ratio of Capital (the total stock of public and private wealth in the economy) to Income (NNP): $$B = \frac{ \text{Capital}}{\text{Income}}$$ And where $g$ is the growth rate, and $s$ is the saving rate. Unlike the first rule which is an accounting identity, and therefore true by definition, the second rule is only true ‘in the long run’. It is an equilibrium that the market will move to over time, and the following argument is given by Piketty: “The argument is elementary. Let me illustrate it with an example. In concrete terms: if a country is saving 12 percent of its income every year, and if its initial capital stock is equal to six years of income, then the capital stock will grow at 2 percent a year, thus at exactly the same rate as national income, so that the capital/income ratio will remain stable. By contrast, if the capital stock is less than six years of income, then a savings rate of 12 percent will cause the capital stock to grow at a rate greater than 2 percent a year and therefore faster than income, so that the capital/income ratio will increase until it attains its equilibrium level. Conversely, if the capital stock is greater than six years of annual income, then a savings rate of 12 percent implies that capital is growing at less than 2 percent a year, so that the capital/income ratio cannot be maintained at that level and will therefore decrease until it reaches equilibrium.” I’ve got to admit that this was the first part in the book where I really struggled to follow Piketty’s reasoning – possibly this was obvious to other people, but it wasn’t to me! Analysis – what does he mean? Before we get any further, let’s unpick exactly what Piketty means by all the terms in his formulation of the law: Income = Net national product = Gross Net product *0.9 (where the factor of 0.9 is to account for depreciation of Capital) $g$ = growth rate, but growth of what? Here it is specifically growth in income, so while this is not exactly the same as GDP growth it’s pretty close. If we assume net exports do not change, and the depreciation factor (0.9) is fixed, then the two will be equal. $s$ = saving rate – by definition this is the ratio of additional capital divided by income. Since income here is net of depreciation, we are already subtracting capital depreciation from income and not including this in our saving rate. Let’s play around with a few values, splitting growth $g$, into per capita growth and demographic growth we get the following. Note that Total growth is simply the sum of demographic and per capita growth, and Beta is calculated from the other values using the law.
So why does Piketty introduce this law?
The argument that Piketty is intending to tease out from this equality is the following:
In fact using $g=1.5 \%$ as a long term average, we can expect Beta to crystallise around a Beta of $8$! Much higher than it has been for the past 100 years. Analysis  convergence As Piketty is quick to point out, this is a long run equilibrium towards which an economy will move. Moreover, it should be noted that the convergence of this process is incredibly slow. Here is a graph plotting the evolution of Beta, from a starting point of 5, under the assumption of $g=1.5 \%$, $s = 12 \%$:
So we see that after 30 years ( i.e. approx. one generation), Beta has only increased from its starting point of $5$ to around $6$, it then takes another generation and a half to get to $7$, which is still short of its long run equilibrium of $8$.
Analysis  Is this rule true? Piketty is of course going to want to use his formula to say interesting things about the historic evolution of the Capital/Income ratio, and also use it to help predict future movements in Beta. I think this is where we start to push the boundaries of what we can easily reason, without first slowing down and methodically examining our implicit assumptions. For example – is a fixed saving rate (independent of changes in both Beta, and Growth) reasonable? Remember that the saving rate here is a saving rate on net income. So that as Beta increases, we are already having to put more money into upkeep of our current level of capital, so that a fixed net saving rate is actually consistent with an increasing gross saving rate, not a fixed gross saving rate. An increasing gross saving rate might be a reasonable assumption or it might not – this then becomes an empirical question rather than something we can reason about a priori. Another question is how the law performs for very low rates of $g$, which is in fact how Piketty is intending to use the equation. By inspection, we can see that:
As $g \rightarrow 0$, $B \rightarrow \infty $.
What is the mechanism by which this occurs in practice? It’s simply that if GDP does not grow from one year to the next, but the net saving rate is still positive, then the stock of capital will still increase, however income has not increased. This does however mean that an ever increasing share of the economy is going towards paying for capital depreciation.
Conclusion
Piketty’s law is still useful, and I do find it convincing to a first order of approximation. But I do think this section of the book could have benefited from more time spent highlighting some of the distortions potentially caused by using net income as our primary measure of income. There are multiple theoretical models used in macroeconomics, and it would have been useful for Piketty to help frame his law within the established paradigm. The Rule of 7212/1/2020 I'm always begrudgingly impressed by brokers and underwriters who can do most of their job without resorting to computers or a calculator. If you give them a gross premium for a layer, they can reel off gross and net rates on line, the implied loss cost, and give you an estimate of the price for a higher layer using an ILF in their head. When I'm working, so much actuarial modelling requires a computer (sampling from probability distributions, Monte Carlo methods, etc.) that just to give any answer at all I need to fire up Excel and make a Spreadsheet. So anytime there's a chance to do some shortcuts I'm always all for it! One mental calculation trick which is quite useful when working with compound interest is called the Rule of 72. It states that for interest rate $i$, under growth from annual compound interest, it takes approximately $\frac{72}{i} $ years for a given value to double in size. Why does it work? Here is a quick derivation showing why this works, all we need is to manipulate the exact solution with logarithms and then play around with the Taylor expansion. We are interested in the following identity, which gives the exact value of $n$ for which an investment doubles under compound interest: $$ \left( 1 + \frac{i}{100} \right)^n = 2$$ Taking logs of both sides gives the following: $$ ln \left( 1 + \frac{i}{100} \right)^n = ln(2)$$ And then bringing down the $n$: $$n* ln \left( 1 + \frac{i}{100} \right) = ln(2)$$ And finally solving for $n$: $$n = \frac {ln(2)} { ln \left( 1 + \frac{i}{100} \right) }$$ So the above gives us a formula for $n$, the number of years. We now need to come up with a simple approximation to this function, and we do so by examining the Taylor expansion denominator of the right have side: We can compute the value of $ln(2)$:
$$ln(2) \approx 69.3 \%$$
The Taylor expansion of the denominator is:
$$ln \left( 1 + \frac{i}{100} \right) = \frac{r}{100} – \frac{r^2}{20000} + … $$ In our case, it is more convenient to write this as: $$ln \left( 1 + \frac{i}{100} \right) = \frac{1}{100} \left( r – \frac{r^2}{200} + … \right) $$ For $r<10$, the second term is less than $\frac{100}{200} = 0.5$. Given the first term is of the order $10$, this means we are only throwing out an adjustment of less than $5 \%$ to our final answer. Taking just the first term of the Taylor expansion, we end up with: $$n \approx \frac{69.3 \%}{\frac{1}{100} * \frac{1}{r}}$$ And rearranging gives: $$n \approx \frac{69.3}{r}$$ So we see, we are pretty close to $ n \approx \frac{72}{r}$. Why 72? We saw above that using just the first term of the Taylor Expansion suggests we should be using the ‘rule of 69.3%' instead. Why then is this the rule of 72? There are two main reasons, the first is that for most of the interest rates we are interested in, the Rule of 72 actually gives a better approximation to the exact solution, the following table compares the exact solution, the approximation given by the ‘Rule of 69’, and the approximation given by the Rule of 72:
The reason for this is that for interest rates in the 4%10% range, the second term of the Taylor expansion is not completely negligible, and act to make the denominator slightly smaller and hence the fraction slightly bigger. It turns out 72 is quite a good fudge factor to account for this.
Another reason for using 72 over other close numbers is that 72 has a lot of divisors, in particular out of all the integers within 10 of 72, 72 has the most divisors. The following table displays the divisors function d(n), for values of n between 60 and 80. 72 clearly stands out as a good candidate.
The rule of 72 in Actuarial Modelling
The main use I find for this trick is in mentally adjusting historic claims for claims inflation. I know that if I put in 6% claims inflation, my trended losses will double in size from their original level approximately every 12 years. Other uses include when analysing investment returns, thinking about the effects of monetary inflation, or it can even be useful when thinking about the effects of discounting. Can we apply the Rule of 72 anywhere else? As an aside, we should be careful when attempting to apply the rule of 72 over too long a time period. Say we are watching a movie set in 1940, can we use the Rule of 72 to estimate what values in the movie are equivalent to now? Let's set up an example and see why it doesn't really work in practice. Let's suppose an item in our movie costs 10 dollars. First we need to pick an average inflation rate for the intervening period (something in the range of 34% is probably reasonable). We can then reason as follows; 1940 was 80 years ago, at 4% inflation, $\frac{72}{4} = 18$, and we’ve had approx. 4 lots of 18 years in that time. Therefore the price would have doubled 4 times, or will now be a factor of 16. Suggesting that 10 dollars in 1940 is now worth around 160 dollars in today's terms. It turns out that this doesn’t really work though, let’s check it against another calculation. The average price of a new car in 1940 was around 800 dollars and the average price now is around 35k, which is a factor of 43.75, quite a bit higher than 16. The issue with using inflation figures like these over very long time periods, is for a given year the difference in the underlying goods is fairly small, therefore a simple percentage change in price is an appropriate measure. When we chain together a large number of annual changes, after a certain number of years, the underlying goods have almost completely changed from the first year to the last. For this reason, simply multiplying an inflation rate across decades completely ignores both improvements in the quality of goods over time, and changes in standards of living, so doesn't really convey the information that we are actually interested in. Photo by David Preston What is a floating deductible? Excess of Loss contacts for Aviation books, specifically those covering airline risks (planes with more than 50 seats) often use a special type of deductible, called a floating deductible. Instead of applying a fixed amount to the loss in order to calculate recoveries, the deductible varies based on the size of the market loss and the line written by the insurer. These types of deductibles are reasonably common, I’d estimate something like 25% of airline accounts I’ve seen have had one. As an aside, these policy features are almost always referred to as deductibles, but technically are not actually deductibles from a legal perspective, they should probably be referred to as floating attachment instead. The definition of a deductible requires that it be deducted from the policy limit rather than specifying the point above which the policy limit sits. That’s a discussion for another day though! The idea is that the floating deductible should be lower for an airline on which an insurer takes a smaller line, and should be higher for an airline for which the insurer takes a bigger line. In this sense they operate somewhat like a surplus lines contract in property reinsurance. Before I get into my issues with them, let’s quickly review how they work in the first place. An example When binding an Excess of Loss contract with a floating deductible, we need to specify the following values upfront:
And we need to know the following additional information about a given loss in order to calculate recoveries from said loss:
A standard XoL recovery calculation with the fixed attachment given above, would first calculate the UNL (200m*0.75%=1.5m), and then deduct the fixed attachment from this (1.5m1.5m=0). Meaning in this case, for this loss and this line size, nothing would be recovered from the XoL. To calculate the recovery from XoL with a floating deductible, we would once again calculate the insured’s UNL 1.5m. However we now need to calculate the applicable deductible, this will be the lesser of 1.5m (the fixed attachment), and the insurer’s effective line (defined as their UNL divided by the market loss = 1.5m/200m) multiplied by the Original Market Loss as defined in the contract. In this case, the effective line would be 0.75%, and the Original Market Loss would be 150m, hence; 0.75%*150m = 1.125m. Since this is less than the 1.5m fixed attachment, the attachment we should use is 1.125m our limit is always just 18.5m, and doesn’t change if the attachment drops down. We would therefore calculate recoveries to this contract, for this loss size and risk, as if the layer was a 18.5m xs 1.125. Meaning the ceded loss would be 0.375m, and the net position would be 1.125m. Here’s the same calculation in an easier to follow format: So…. what’s the issue?
This may seem quite sensible so far, however the issue is with the wording. The following is an example of a fairly standard London Market wording, taken from an anonymised slip which I came across a few years ago. Priority: USD 10,000,000 each and every loss or an amount equal to the “Reinsured’s Portion’ of the total Original Insured Loss sustained by the original insured(s) of USD 200,000,000 each and every loss, whichever the lesser. … Reinsuring Clause Reinsurers shall only be liable if and when the ultimate net loss paid by the Reinsured in respect of the interest as defined herein exceeds USD 10,000,000 each and every loss or an amount equal to the Reinsured’s Proportion of the total Original Insured Loss sustained by the original insured(s) of USD 200,000,000 or currency equivalent, each and every loss, whichever the lesser (herein referred to as the “Priority”) For the purpose herein, the Reinsured’s Proportion shall be deemed to be a percentage calculated as follows, irrespective of the attachment dates of the policies giving rise to the Reinsured’s ultimate net loss and the Original Insured Loss: Reinsured Ultimate Net Loss / Original Insured Loss … The Original Insured Loss shall be defined as the total amount incurred by the insurance industry including any proportional coinsurance and or selfinsurance of the original insured(s), net of any recovery from any other source What’s going on here is that we’ve defined the effective line to be the Reinsured’s unl divided by the 100% market loss. First problem From a legal perspective, how would an insurer (or reinsurer for that matter), prove what the 100% insured market loss is? The insurer obviously knows their share of the loss, however what if this is a split placement with 70% placed in London on the same slip, 15% placed in a local market (let’s say Indonesia?), and a shortfall cover (15%) placed in Bermuda. Due to the different jurisdictions, let’s say the Bermudian cover has a number of exclusions and subjectivities, and the Indonesian cover operates under the Indonesian legal system which does not publically disclose private contract details. Even if the insurer is able to find out through a friendly broker what the other markets are paying, and therefore have a good sense of what the 100% market loss is, they may not have a legal right to this information. The airline does have a legal right to the information, however the reinsurance contract is a contract between the insured and reinsured, the airline is not a party to the reinsurance contract. The point is whether the insurer and reinsured have the legal right to the information. The above issues may sound quite theoretical, and in practice there are normally no issues with collecting on these types of contracts. But to my mind, legal language should bear up to scrutiny even when stretched – that’s precisely when you are going to rely on it. My contention is that as a general rule, it is a bad idea to rely on information in a contract which you do not have an automatic legal right to obtain. The second problem The intention with this wording, and with contracts of this form is that the effective line should basically be the same as the insured’s signed line. Assuming everything is straightforward, if the insurer takes a x% line with a limit of Y bn. If the loss is anything less than Y bn, then the insured’s effective line will simply be x%*Size of Loss / Size of loss. i.e. x%. My guess as to why it is worded this way rather than just taking the actual signed line is that we don’t want to open ourselves to a issues around what exactly we mean by ‘the signed line’ – what if the insured has exposure through two contracts both of which have different signed lines, what if there is an inuring Risk Excess which effectively nets down the gross signed line – should we then use the gross or net line? By couching the contract in terms of UNLs and Market losses we attempt to avoid these ambiguities Let me give you a scenario though where this wording does fall down: Scenario 1 – clash loss Let’s suppose there is a midair collision between two planes. Each results in an insured market loss of USD 1bn, then the Original Insured Loss is USD 2bn. If our insurer takes a 10% line on the first airline, but does not write the second airline, then their effective line is 10% * 1bn / 2bn = 5%... hmmm this is definitely equal to their signed line of 10%. You may think this is a pretty remote possibility, after all in the history of modern commercial aviation such an event has not occurred. What about the following scenario which does occur fairly regularly? Scenario 2 – airline/manufacturer split liability Suppose now there is a loss involving a single plane, and the size of the loss is once again USD 1bn, and that our insurer once again has a 10% line. In this case though, what if the manufacturer is found 50% responsible? Now the insurer only has a UNL of USD 500m, and yet once again, in the calculation of their floating deductible, we do the following: 10% * 500m/1bn = 5%. Hmmm, once again our effective line is below our signed line, and the floating deductible will drop down even further than intended. Suggested alternative wording My suggested wording, and I’m not a lawyer so this is categorically not legal advice, is to retain the basic definition of effective line  as UNL divided by some version of the 100% market loss  by doing so we still neatly sidestep the issues mentioned above around gross vs net lines, or exposure from multiple slips, but instead to replace the definition of Original Insured Loss with some variation of the following ‘the proportion of the Original Insured Loss, for which the insured derives a UNL through their involvement in some contract of insurance, or otherwise’. Basically the intention is to restrict the market loss, only to those contracts through which the insurer has an involvement. This deals with both issues – the insurer would not be able to net down their line further through references to insured losses which are nothing to do with them, as in the case of scenario 1 and 2 above, and secondly it restrict the information requirements to contracts which the insurer has an automatic legal right to have knowledge of since by definition they will be a party to the contract. I did run this idea past a few reinsurance brokers a couple of years ago, and they thought it made sense. The only downside from their perspective is that it makes the client's reinsurance slightly less responsive i.e. they knew about the strange quirk whereby the floating deductible dropped in the event of a manufacturer involvement, and saw it as a bonus for their client, which was often not fully priced in by the reinsurer. They therefore had little incentive to attempt to drive through such a change. The only people who would have an incentive to push through this change would be the larger reinsurers, though I suspect they will not do so until they've already been burnt and attempted to rely on the wording in a court case and, at which point they may find it does not quite operate in the way they intended. Excel Protect Sheet Encryption27/12/2019
The original idea for this post came from a slight quirk I noticed in some VBA code I was running (code pasted below) If you've ever needed to remove the protect sheet from a Spreadsheet without knowing the password, then you probably recognise it.
Sub RemovePassword() Dim i As Integer, j As Integer, k As Integer Dim l As Integer, m As Integer, n As Integer Dim i1 As Integer, i2 As Integer, i3 As Integer Dim i4 As Integer, i5 As Integer, i6 As Integer On Error Resume Next For i = 65 To 66: For j = 65 To 66: For k = 65 To 66 For l = 65 To 66: For m = 65 To 66: For i1 = 65 To 66 For i2 = 65 To 66: For i3 = 65 To 66: For i4 = 65 To 66 For i5 = 65 To 66: For i6 = 65 To 66: For n = 32 To 126 ActiveSheet.Unprotect Chr(i) & Chr(j) & Chr(k) & _ Chr(l) & Chr(m) & Chr(i1) & Chr(i2) & Chr(i3) & _ Chr(i4) & Chr(i5) & Chr(i6) & Chr(n) If ActiveSheet.ProtectContents = False Then MsgBox "Password is " & Chr(i) & Chr(j) & _ Chr(k) & Chr(l) & Chr(m) & Chr(i1) & Chr(i2) & _ Chr(i3) & Chr(i4) & Chr(i5) & Chr(i6) & Chr(n) Exit Sub End If Next: Next: Next: Next: Next: Next Next: Next: Next: Next: Next: Next End Sub Nothing too interesting so far, the code looks quite straight forward  we've got a big set of nested loops which appear to test all possible passwords, and will eventually brute force the password  if you've ever tried it you'll know it works pretty well. The interesting part is not so much the code itself, as the answer the code gives  the password which unlocks the sheet is normally something like ‘AAABAABA@1’. I’ve used this code quite a few times over the years, and always with similar results, the password always looks like some variation of this string. This got me thinking  surely it is unlikely that all the Spreadsheets I’ve been unlocking have had passwords of this form? So what’s going on? After a bit of research, it turns out Excel doesn’t actually store the original password, instead it stores a 4digit hash of the password. Then to unlock the Spreadsheet, Excel hashes the password attempt and compares it to the stored hashed password. Since the size of all possible passwords is huge (full calculations below), and the size of all possible hashes is much smaller, we end up with a high probability of collisions between password attempts, meaning multiple passwords can open a given Spreadsheet. I think the main reason Microsoft uses a hash function in this way rather than just storing the unhashed password is that the hash is stored by Excel as an unencrypted string within a xml file. In fact, an .xlsx file is basically just a zip containing a number of xml files. If Excel didn't first hash the password then you could simply unzip Excel file, find the relevant xml file and read the password from any text editor. With the encryption Excel selected, the best you can do is open the xml file and read the hash of the password, which does not help with getting back to the password due to the nature of the hash function. What hash function is used? I couldn't find the name of the hash anywhere, but the following website has the fullest description I could find of the actual algorithm. As an aside, I miss the days when the internet was made up of websites like this – weird, individually curated, static HTML, obviously written by someone with deep expertise, no ads as well! Here’s the link: http://chicago.sourceforge.net/devel/docs/excel/encrypt.html And the process is as follows:
Here is the algorithm to create the hash value:
a > 0x61 << 1 == 0x00C2 b > 0x62 << 2 == 0x0188 c > 0x63 << 3 == 0x0318 d > 0x64 << 4 == 0x0640 e > 0x65 << 5 == 0x0CA0 f > 0x66 << 6 == 0x1980 g > 0x67 << 7 == 0x3380 h > 0x68 << 8 == 0x6800 i > 0x69 << 9 == 0x5201 (unrotated: 0xD200) j > 0x6A << 10 == 0x2803 (unrotated: 0x1A800) count: 0x000A constant: 0xCE4B result: 0xFEF1 This value occurs in the PASSWORD record.
How many trials will we need to decrypt?
Now we know how the algorithm works, can we come up with a probabilistic bound on the number of trials we would need to check in order to be almost certain to get a collision when carrying out a brute force attack (as per the VBA code above)?
This is a fairly straight forward calculation – the probability of guessing incorrectly for a random attempt is $\frac{1}{65536}$. To keep the maths simple, if we assume independence of attempts, the probability of not getting the password after $n$ attempts is simply: $$ \left( \frac{1}{65536} \right)^n$$ The following table then displays these probabilities
So we see that with 200,000 trials, there is a less than 5% chance of not having found a match.
We can also derive the answer directly, we are interested in the following probabilistic inequality: $$ \left( 1 \frac{1}{65536} \right)^k < 0.05$$ Taking logs of both sides gives us: $$ln \left( 1 \frac{1}{65536}\right)^k = ln( 0.05)$$ And then bringing down the k: $$k * ln \left( 1 \frac{1}{65536} \right) = ln(0.05)$$ And then solving for $k$: $$k = \frac{ ln(0.05)}{ln \left(1 \frac{1}{65536}\right)} = 196,327$$
Can I work backwards to find the password from a given hash?
As we explained above, in order to decrypt the sheet, you don’t need to find the password, you only need to find a password. Let’s suppose however, for some reason we particularly wanted to find the password which was used, is there any other method to work backwards? I can only think of two basic approaches: Option 1 – find an inversion of the hashing algorithm. Since this algorithm has been around for decades, and is designed to be difficult to reverse, and so far has not been broken, this is a bit of a nonstarter. Let me know if you manage it though! Option 2 – Brute force it. This is basically your only chance, but let’s run some maths on how difficult this problem is. There are $94$ possible characters (AZ, az,09), and in Excel 2010, the maximum password length is $255$, so in all there are, $94^{255}$ possible passwords. Unfortunately for us, that is more than the total number of atoms in the universe $(10^{78})$. If we could check $1000$ passwords per second, then it would take far longer than the current age of the universe to find the correct one. Okay, so that’s not going to work, but can we make the process more efficient? Let’s restrict ourselves to looking at passwords of a known length. Suppose we know the password is only a single character, in that case we simply need to check $94$ possible passwords, one of which should unlock the sheet, hence giving us our password. In order to extend this reasoning to passwords of arbitrary but known length, let’s think of the hashing algorithm as a function and consider its domain and range: Let’s call our hashing algorithm $F$, the set of all passwords of length $i$, $A_i$, and the set of all possible password hashes $B$. Then we have a function: $$ F: A_i > B$$ Now if we assume the algorithm is approximately equally spread over all the possible values of $B$, then we can use the size of $B$ to calculate the size of the kernel $F^{1}[A_i]$. The size of $B$ doesn’t change. Since we have a $4$ digit hexadecimal, it is of size $16^4$, and since we know the size of $A_i$ is $96$, we can then estimate the size of the kernel. Let’s take $i=4$, and work it through:
$A_4$ is size $96^4 = 85m$, $B = 65536$, hence $F^{1}[A_4] = \frac{85m}{65536} = 124416$
Which means for every hash, there are $124,416$ possible $4$ digit passwords which can create this hash, and therefore may have been the original password. Here is a table of the values for $I = 1$ to $6$:
In fact we can come up with a formula for size of the kernel: $$\frac{96^i}{16^4} \sim 13.5 * 96^{i2}$$
Which we can see quickly approaches infinity as $i$ increases. So for $i$ above $5$, the problem is basically intractable without further improvement. How would we progress if we had to? The only other idea I can come up with is to generate a huge array of all possible passwords (based on brute forcing like above and recording all matches), and then start searching within this array for keywords. We could possibly use some sort of fuzzylookup against a dictionary of keywords. If the original password did not contain any words, but was instead just a fairly random collection of characters then we really would be stumped. I imagine that this problem is basically impossible (and could probably be proved to be so using information theory and entropy) Who invented this algorithm? No idea…I thought it might be fun to do a little bit of online detective work. You see this code all over the internet, but can we find the original source? This site has quite an informative page on the algorithm: https://www.spreadsheet1.com/sheetprotection.html The author of the above page is good enough to credit his source, which is the following stack exchange page: https://superuser.com/questions/45868/recoversheetprotectionpasswordinexcel Which in turns states that the code was ‘'Author unknown but submitted by brettdj of www.expertsexchange.com’ I had a quick look on Expertsexchange, but that's as far as I could get, at least we found the guy's username. Can we speed the algorithm up or improve it in any way? I think the current VBA code is basically as quick as it is going to get  the hashing algorithm should work just as fast with a short input as a 12 character input, so starting with a smaller value in the loop doesn’t really get us anything. The only real improvement I can suggest, is that if the Spreadsheet is running too slowly to be able to test a sufficient number of hashes per second, then the hashing algorithm could be implemented in python (which I started to do just out of interest, but it was a bit too fiddly to be fun). Once the algorithm is set up, the password could then be brute forced from there (in much quicker time), and one a valid password has been found, this can then be simply typed into Excel. 
AuthorI work as a pricing actuary at a reinsurer in London. Categories
All
Archives
November 2020
