Compound Poisson Loss Model in VBA13/12/2017 I was attempting to set up a Loss Model in VBA at work yesterday. The model was a CompoundPoisson FrequencySeverity model, where the number of events is simulated from a Poisson distribution, and the Severity of events is simulated from a Severity curve. There are a couple of issues you naturally come across when writing this kind of model in VBA. Firstly, the inbuilt array methods are pretty useless, in particular dynamically resizing an array is not easy, and therefore when initialising each array it's easier to come up with an upper bound on the size of the array at the beginning of the program and then not have to amend the array size later on. Secondly, Excel has quite a low memory limit compared to the total available memory. This is made worse by the fact that we are still using 32bit Office on most of our computers (for compatibility reasons) which has even lower limits. This memory limit is the reason we've all seen the annoying 'Out of Memory' error, forcing you to close Excel completely and reopen it in order to run a macro. The output of the VBA model was going to be a YLT (Yearly Loss Table), which could then easily be pasted into another model. Here is an example of a YLT with some made up numbers to give you an idea: It is much quicker in VBA to create the entire YLT in VBA and then paste it to Excel at the end, rather than pasting one row at a time to Excel. Especially since we would normally run between 10,000 and 50,000 simulations when carrying out a Monte Carlo Simulation. We therefore need to create and store an array when running the program with enough rows for the total number of losses across all simulations, but we won't know how many losses we will have until we actually simulate them. â€‹ And this is where we come across our main problem. We need to come up with an upper bound for the size of this array due to the issues with dynamically resizing arrays, but since this is going to be a massive array, we want the upper bound to be as small as possible so as to reduce the chance of a memory overflow error. Upper Bound What we need then is an upper bound on the total number of losses across all the simulations years. Let us denote our Frequency Distribution by $N_i$, and the number of Simulations by $n$. We know that $N_i$ ~ $ Poi( \lambda ) \: \forall i$. Lets denote the total size of the YLT array by $T$. We know that $T$ is going to be: $$T = \sum_{1}^{n} N_i$$ â€‹We now use the result that the sum of two independent Poisson distributions is also a Poisson distribution with parameter equal to the sum of the two parameters. That is, if $X$ ~ $Poi( \lambda)$ , and $Y$ ~ $Poi( \mu)$, then $X + Y$ ~ $Poi( \lambda + \mu)$. By induction this result can then be extended to any finite sum of independent Poisson Distributions. Allowing us to rewrite $T$ as: $$ T \sim Poi( n \lambda ) $$ We now use another result, a Poisson Distribution approaches a Normally Distribution as $ \lambda \to \infty $. In this case, $ n \lambda $ is certainly large, as $n$ is going to be set to be at least $10,000$. We can therefore say that: $$ T \sim N ( n \lambda , n \lambda ) $$ Remember that $T$ is the distribution of the total number of losses in the YLT, and that we are interested in coming up with an upper bound for $T$. Let's say we are willing to accept a probabilistic upper bound. If our upper bound works 1 in 1,000,000 times, then we are happy to base our program on it. If this were the case, even if we had a team of 20 people, running the program 10 times a day each, the probability of the program failing even once in an entire year is only 4%. I then calculated the $Z$ values for a range of probabilities, where $Z$ is the unit Normal Distribution, in particular, I included the 1 in 1,000,000 Z value. We then need to convert our requirement on $T$ to an equivalent requirement on $Z$. $$ P ( T \leq x ) = p $$ If we now adjust $T$ so that it can be replaced with a standard Normal Distribution, we get: $$P \left( \frac {T  n \lambda} { \sqrt{ n \lambda } } \leq \frac {x  n \lambda} { \sqrt{ n \lambda } } \right) = p $$
Now replacing the left hand side with $Z$ gives:
$$P \left( Z \leq \frac {x  n \lambda} { \sqrt{ n \lambda } } \right) = p $$
Hence, our upper bound is given by:
â€‹$$T \lessapprox Z \sqrt{n \lambda} + n \lambda $$
Dividing through by $n \lambda $ converts this to an upper bound on the factor above the mean of the distribution. Giving us the following:
$$ T \lessapprox Z \frac {1} { \sqrt{n \lambda}} + 1 $$
We can see that given $n \lambda$ is expected to be very large and the $Z$ values relatively modest, this bound is actually very tight.
For example, if we assume that $n = 50,000$, and $\lambda = 3$, then we have the following bounds: So we see that even at the 1 in 1,000,000 level, we only need to set the YLT array size to be 1.2% above the mean in order to not have any overflow errors on our array. References (1) Proof that the sum of two independent Poisson Distributions is another Poisson Distribution math.stackexchange.com/questions/221078/poissondistributionofsumoftworandomindependentvariablesxy
(2) Normal Approximation to the Poisson Distribution.
stats.stackexchange.com/questions/83283/normalapproximationtothepoissondistribution Combining two Rate Change Indices12/12/2017 I came across this problem at work last week, and I don't think there's anything in the notes on how to deal with it, so I have written up an explanation of how I solved it. Suppose we are carrying out a rating exercise on a number of lines of business. For two of the lines of business, Marine Hull and Marine Cargo, we have very few claims so we have decided to group these two lines together for the purpose of the Experience Rating exercise. We have been supplied with separate rate change indices for all the lines of business. How do we combine the Hull and Cargo Rate Change indices? Firstly, let's review a few basics: What is a rate change? It is a measure of the change in price of insurance from one year to the next when all else is constant. Why do we need a rate change? We will often use historic premium as an exposure measure when experience rating a book of business. If we do not adjust for the change in price of insurance then we may under or over estimate the rate of historic losses. What do we mean when we say the rate change 'for 2018'? This answer is debatable, and if there is any uncertainty it is always better to check with whoever compiled the data, but generally the '2018' rate change means the 2017  2018 rate change. How do we combine the rate changes from two lines of business? Let's work though an example to show how to do this. I am going to be using some data I made up. These figures were generated using the random number generator in excel, please don't use them for any actual work! Suppose we are given the following Premium estimates: And that we are also given the following rate changes: Then first we need to adjust the rate changes so that they are in an index. We do this by setting 2018 to be equal to 100%, and then recursively calculating the previous years using: $${Index}_n = {Index}_{n1} (1  {(Rate \: Change )}_{n1} ) $$ We can then calculate our OnLevelled Premiums, by simply multiplying our Premium figures by the Rate Change Index. $${ ( On \: Levelled \: Premium ) }_n = ({Premium}_n) \: ({Index}_n) $$ Using the combined onlevelled premiums, we can then calculate our combined Rate Index using the following formula: $${Index}_n = \frac{ { ( On \: Levelled \: Premium ) }_n } { {Premium}_n } $$ And our combined Rate Change using the following formula: $${Rate Change}_{n1} = \frac{ {Index}_n } { {Index}_{n1}} 1 $$ Most of the time we will only be interested in the combined OnLevelled Premium. The combined Rate Change is only a means to an end to obtain the combined OnLevelled Premium. If we have a model though where we need to input a combined Rate Change in order to group the two classes of business, then the method above can be used to obtain the required Rate Changes. We say that a Random Variable B has a Binomial Distribution, that is $B$ ~ $Binomial(n,p)$ if for $k \in \{ 0 , 1, ... , n \} $ : $$P(B=k) = \binom{n}{k} p^k (1p)^{ni)}$$ It is not immediately clear from this definition that this defines a discrete probability distribution, in order to show this we need to check that: $$ \sum_{i=0}^{n} \binom{n}{k} p^i (1p)^{ (ni)} = 1$$ Identity 1 In order to show this result, we make use of the binomial theorem which states that: $$ (x + y)^n = \sum_{i=0}^{n} \binom{n}{k} x^i y^{(ni)}$$ If we take $x = p$, and $y = (1p)$, then: $$ (x+y)^n = (p + (1p))^n = (1)^n = 1 $$ Allowing us to show that: $$ 1 = 1^n = (p + (1p))^n = \sum_{i=0}^{n} \binom{n}{k} p^i (1p)^{(ni)} $$ Identity 2 What other results can we derive just by manipulating the binomial theorem? If we consider the binomial expansion of $(1 + 1)^n$ then we get the following useful result: $$ 2^n = (1 + 1)^n = \sum_{i=0}^{n} \binom{n}{k} 1^i 1^{(ni)} = \sum_{i=0}^{n} \binom{n}{k} $$ I remember being pretty surprised the first time I saw this result, but there is actually another way of thinking about it which is more intuitive. We note that $$ \binom{n}{k} $$ can also be though of as the number of possible ways of making a subgroup of size $k$ from a collection of $n$ distinct objects. If we consider the power set, $\mathcal{P} (S)$, which is the collection of all possible subsets of a set. Clearly: $$\sum_{i=0}^{n} \binom{n}{k} =  \mathcal{P} (S)  $$ Therefore, if we can derive a formula for the cardinality of the power set, we know that the left hand side is equal to this, and we will be able to prove our result. To show this, let us consider the ordered set of elements of $S$: $$\{ S_1 , S_2 , ... S_n \} $$ Then we can count the number of subsets of $S$ by noting that we can form a distinct subset of $S$ by specifying a string of length $n$, made up of $0$s and $1$s, where a $1$ in the $ith$ column denotes that $s_i$ is included in the subset. Clearly this is another way of enumerating all the possible subsets of $S$, but when counted in this manner, we see that there are $2^n$ elements in the power set. Use in General Insurance We can use the binomial distribution to build a very simple loss model. Let's assume we have $n$ annual contracts, each of which either has a single loss, or no losses, further lets assume that the severity of each loss if it occurs is a fixed amount $1$. If the probability of a given contract having a loss is $p$, then the total loss amount, $X$, for the year is defined by a binomial distribution. $X$ ~ $Bin(n,p)$. Once we have specified the Gross Loss Distribution, we can very easily calculate the expected losses to various reinsurance contracts. For example, let's suppose that $n = 10$, and $p = 0.1$, and we have an Aggregate XoL contract of $8$ xs $2$, then the expected loss to the contract is: $$ \sum_{i=3}^{n} \binom{n}{k} p^i (1p)^{(ni)} (i2)$$ This formula can be directly calculated, and in this case is equal to $0.085$. It would be nice to always be able to directly calculate the expected loss to a reinsurance programme without Monte Carlo simulation, so why do we not use the binomial distribution more? To see why let's look at the underlying Gross Loss Distribution? Since we are dealing with a binomial distribution, we have simple closed forms for the central moments of the distribution, let's examine those: $$\mathbf{E} [ X ] = np$$ $$Var [ X ] = np(1p)$$ Given $(1p) <= 1$, we can see that $Var [ X ] < \mathbf{E} [ X ]$. The reason that this is an issue is that empirically, we know that this relationship does not usually hold for a portfolio of insurance losses. In addition to this, we also have the artificial restriction that the severity has to be the same for each loss. This is why we are much more likely to use a Compound Poisson Distribution instead. I saw a cool trick the other day involving the Poisson Distribution and Stirling's Approximation. Given a Poisson Distribution $ N $ ~ $ Poi( \lambda ) $ The probability that $N$ is equal to a given $n$ is defined to be: $$P ( N = n) = \frac { {\lambda}^n e^{n} } {n! } $$ What is the probability that $N$ is equal to it's mean? In this case, let's use $n$ as the mean of the distribution for reasons that will become clear later. Plugging $n$ into the definition of the Poisson distribution gives: $$P ( N = n) = \frac { n^n e^{n} } {n! } $$ At this point, we use Sterling's approximation. Which states that for large $n$: $n!$ ~ $ {\left( \frac { n } {e } \right) }^n \frac { 1 } { \sqrt{ 2 \pi n } }$ Plugging this into the definition of the Poission Distribution gives: $$P ( N = n) = \frac { n^n e^{n} } {{\left( \frac { n } {e } \right) }^n} \frac { 1 } { \sqrt{ 2 \pi n } } $$ Which simplifies to: $$P ( N = n) = \frac { 1 } { \sqrt{ 2 \pi n } } $$ So for large $n$ we end up with a nice result for the Probability that a Poisson Distribution will end up being equal to it's Expected Value. Convergence The convergence of the series is actually really quick. I checked the convergence for n between 1 and 50, and even by n=5, the approximation is very close, when I graphed it, the lines become indistinguishable very quickly. Types of Excess of Loss Reinsurance24/10/2017 One thing that I got slightly confused about when I started to work at my current job was the difference between the various types of Excess of Loss Reinsurance. The descriptions given in the IFoA notes, those given on Wikipedia, and the use of the terms in the London market are all different. The underlying contracts are all the same, but different groups have different names for them. I thought I would make a post explaining the differences. Here are the names of the subtypes of Excess of Loss Reinsurance that are used in the London Market:
(The descriptions given below just describe the basic functionality of the contracts. There will be a lot more detail in the contracts. It's always a good idea to read the slip if possible to properly understand the contract. Also, bear in mind that some people in the London Market might use these terms differently. This just represents what I would understand if someone said one of these terms to me in everyday work.) Risk Excess (RXS) The limit and attachment for this contract applies individually per risk rather than in aggregate per loss. (hence why it is called a Risk Excess) So if our RXS is 5m xs 1m and we have a loss involving two risks each of which is individually a 3m loss. The total recovery will be 4m = (3m1m) + (3m  1m) Excess of Loss (XoL) The limit and attachment for this contract apply in aggregate per loss rather than individually per risk. So if our XoL is 5m xs 1m and we have a loss involving two risks each of which is individually a 3m loss. The total recovery will be 5m = (6m1m) Catastrophe XL (Cat XL) The limit and attachment for this contract apply in aggregate for losses to all policies covered by the contract during the duration of a Catastrophe. So if our Cat XL is 500m xs 100m, and there is a Hurricane which causes insured losses of 300m, then the total recovery will be 200m = (300m  100m) Aggregate XL (Agg XL) The limit and attachment for this contract apply in aggregate for losses to all policies covered by the contract. This will normally be all policies in a single class. So if our Agg XL is 50m xs 10m and covers an Insurer's Aviation account. If the total Aviation losses for the year are 30m. Then the total recovery will be 20m = (30m  10m) The IFoA notes The IFoA notes distinguish between three types of Excess of Loss contract.
The definitions for Risk Excess of Loss and Catastrophe Excess of Loss are basically the same as those commonly used in the London Market. The IFoA definition of Aggregate Excess of Loss is different though. The Institute defines an Aggregate Excess of Loss Contract to be an Excess of Loss contract which aggregates losses across multiple risks in some way. This could be across all risks involved in one event, across all policies for a given year, across all losses in a given subclass, etc. So our standard Excess of Loss contract which we defined above, which aggregates across all risks involved in a single loss, would be considered an example of an aggregate contract according to the IFoA definition! Don't go around Lloyd's calling it an Aggregate Excess of Loss though, people will get very confused. The IFoA definitions are more logical than the ones used in the London Market, where there is an arbitrary distinction between two types of aggregation. Our standard XoL contract does aggregate losses, therefore why not call it an Agg XoL? The reason we do call it that is because everyone else does, which when talking about definitions is a pretty compelling reason, even if it is not the most logical name. Wikipedia The Wikipedia page for Reinsurance (link below) distinguishes between three types of Excess of Loss contract. en.wikipedia.org/wiki/Reinsurance
Per Risk Excess of Loss is once again defined consistently, and the Aggregate Excess of Loss is also consistent with the common usage in the London Market. However, in this case, our standard Excess of Loss contract now falls under the definition of a Catastrophe Excess of Loss Layer. The Wiki article defines a Cat Excess of Loss contract to be one that aggregates across an Occurrence or Event  where event can either be a catastrophe, such as a Hurricane, or a single loss involving multiple risks. Summary You shouldn't get caught up in who is right or wrong, as long as you are clear which definitions you are using. Fundamentally we are talking about the same underlying contracts, it's all just semantics. The definitions that are commonly used in the London Market are not online anywhere that I could see, and it caused me some confusion when I noticed the inconsistency did some googling, and nothing came up. Hopefully this helps clarify the situation to anyone else who gets confused in the future. I've got to admit, up until recently, I never really understood why anyone would buy Premium Bonds. You get an expected return of 1.15%, which is well below other asset classes, there is significant volatility in returns, and the 1.15% is not even guaranteed, i.e. it's possible you'll never get any returns at all. The risk and return profile just seem completely out of sync with the alternatives. But after speaking to my Grandad about them yesterday, I started to understand why some of the soft factors make them so attractive to savers. Then when I was mulling over it further, I realised that these soft factors would actually be seen as negatives when viewed through the lens of Modern Portfolio Theory. What are Premium Bonds? Premium Bonds were introduced by Harold Macmillan in 1956 in an attempt to encourage individuals to save more, and also to help raise money to finance Government spending. The bonds pay out prizes to bondholders based on a lottery system and are organised by the National Saving and Investment Agency. The Government uses the money to help finance the Public Sector Borrowing Requirement (the difference between the amount of tax the Government collects .every year, and the amount the Government spends  which has been negative for about 30 years now!) Every month, the NSI distributes a prize pool randomly among Bond holders, the pool is calculated so that the total annual amount paid out is equivalent to 1.15% of the total value of bonds held over the year. So as there are approximately £71bn of bonds owned in total, the prize pool will be £71bn * 1.15% = £817m per year, or £68.1m per month. This £68.1 million will then be distributed randomly to bond owners in various sized prices using the following table: Both the interest rate used to calculate the overall prize pool, and the distribution of prizes is updated from time to time, to account for changes in the overall market conditions. This table is correct as at the date of this post. Advantages of Premium Bonds In spite of their relatively low rate of return, Premium Bonds do a have a number of features that make them attractive to individual savers:
For many individual investors, the first five bullet points are extremely important. I think many savers would even say that are necessary requirements to get them to invest their money. It's the final bullet point which is particularly interesting though, the possibility of winning £1 million is a big selling point of Premium Bonds. This should interest any analyst who uses conventional theories of investment, as most investment theories would consider this a downside of Premium Bonds. To understand why this is the case, we'll need to take a brief look at Modern Portfolio Theory. Volatility of Investment Returns as a Risk Measure All the possible investment classes have different risk and return profiles. Some provide high returns, but also contain high levels of risk. Others offer lower levels of return, but also come with less risk. In order to compare different investment classes in a consistent way, we need a single definition of risk and return which we can then apply to each class. The obvious measure to use for return is simply the Mean Expected Return. The situation isn't so straight forward for our risk measure though. There is no perfect measure of risk, and all the possible measures have advantages and disadvantages. In practice, the most commonly used measure is variance of return and this is the approach used in Modern Portfolio Theory: en.wikipedia.org/wiki/Modern_portfolio_theory Under Modern Portfolio Theory, we calculate the expected return of each investment class, along with the variance of return and plot this on a graph. An investor is then assumed to want to maximise the expected return, while minimising the variance of returns (because this corresponds to risk). This allows us to construct an 'efficient frontier' corresponding to those investments, which for a given return, minimise the risk. The following graph demonstrates this analysis. By including the possibility of winning a £1 million prize, we are massively increasing the variance of the investment returns, but since the overall return is calibrated to a 1.15% expected return, the riskreward profile of premium bonds (using variance as our measure of risk) looks terrible. So we end up with the strange situation that one of the main selling points of Premium Bonds  the possibility of winning a £1 million prize each month, actually should make the investment less attractive under most theories of investment. I did a quick calculation of the variance of returns provided by premium bonds, and it comes out at a whopping 35,000%. Most of this volatility is coming from the fact that two of the prizes are for £1 million, if we take out all the prizes above £1000 and reallocate the money to the smaller prizes the volatility reduces by about 400 times, down to around 90%. But even this is still considered very volatile compared to most investment classes. For example, Burton Malkiel in A Random Walk Down Wall Street provides the following table of historic asset class returns with accompanying volatility (which I then copied from Wikipedia) So for comparison, we see that according to this analysis, investing in Small Companies Stocks provided an average return of 12.6%, much higher than premium bonds, yet only had a Standard Deviation of return of 32.9%. Under the assumptions of Modern Portfolio Theory, by using variance of investment returns as a risk measure, we would conclude that investors would prefer not to have the possibility of the £1 million prize, but instead would prefer to have more smaller prizes included in the payout. This is clearly not the case. Disadvantages of investing in Premium Bonds So far we've talked about the advantages of investing in Premium Bonds, and we've also talked about how some of these selling points go against the conclusions of Modern Portfolio Theory (though I think this might say more about the state of Modern Portfolio Theory than Premium Bonds). Are there however any clear disadvantages to investing in Premium Bonds? I think there are, and here are a few of them:
We often distinguish between real and nominal investment returns when analysing investment classes. What this means is that some investment classes provide returns which are expected to be above inflation, other classes provide returns which are of a fixed amount and may be greater than or smaller than the level of inflation in the economy, these are called nominal returns. To see what we mean by this, let's say you invest £1,000 in premium bonds in a given year and you get £25 in prizes, leaving you with £1,025 at the end of the year. Let's assume also that prices (i.e. inflation) went up by 5% during the year, which means that in order to buy something which would have cost £1,000 at the start of the year, you would now need £1,050. So your £1,025 which you got back from the Premium Bonds at the end of the year, can now buy less than the £1,000 you put in originally. So that even though you've now got more money, the real return on your investment was negative. Premium Bonds are currently paying out at 1.15%, whereas inflation in July 2017 was 2.6% according to the Office of National Statistics. This means that you are in effect losing money by holding Premium Bonds.
Due to the lottery structure of Premium Bond returns, it's possible (and in fact highly likely unless you hold large quantities of bonds) that you will get no investment returns at all. For example, someone who holds £100 of Premium bonds has a 96% chance of not getting any prizes at all in a given year. This is in contrast to a fixed interest bank account, for example if it guarantee to pay 3% pa there would be no possibility of winning a £1 million prize, but which you could be certain would always give you your 3% return at the end of the year. If you are willing to go through the hassle of shopping around for a savings account which does guarantee 3%, then you'd probably be better off from a financial perspective investing in the savings account and then buying yourself a lottery ticket every week rather than investing in premium bonds.
The expected return from Premium Bonds is much lower than the returns we could expect from other investment classes. To see how much of a different this can make, let's take two hypothetical savers, both of whom have been saving £100 per month for the last 30 years, one has been investing in Premium Bonds (and getting an average return of 3.5% pa) and another has been investing in a Property Fund (and getting a 10% average return). Over the years, the difference between these two expected returns can have a significant impact on the overall amount that they save. I ran a quick simulation of this with some volatility producing the following graph: So we've talked about the positives and negatives of Premium Bonds, and also touched on some inconsistencies between the marketing of Premium Bonds, and how they would be viewed using traditional finance techniques. Are there any other features of Premium Bonds which are interesting? One quirk of Premium Bonds which seems to have captured people's imaginations is ERNIE. So who is ERNIE? E.R.N.I.E ERNIE stands for Electronic Random Number Indicator Equipment, and is the name of the machine that is used to randomly select which bonds win, Due to the fact that Premium Bonds were first set up in 1957, creating a computer to generate random numbers was a nontrivial problem at the time. ERNIE was developed in the Post Office Research Station in North West London. Despite its quaint sounding name, the Post Office Research Station was actually at the forefront of computing research during the 40s and 50s. The Colossus, the world's first electronic programmable computer, which had been developed during the second World War to help break one of the codes used by the Axis was developed at the Post Office Research Station. The details of Colossus were classified until the 1970s so it had limited impact on most other computers built around in the 50s and 60s, however the design of ERNIE was heavily influenced by the work that had been done on Colossus during the war as the same design team worked on both. Unlike most computers today that are pseudorandom number generators, ERNIE was a true random number generator. It contained neon tubes, through which an electric current was passed. Due to the electrons bumping into the neon atoms as they passed through the neon tube, the current leaving the neon tube varied randomly. This randomness was due to millions of tiny interactions between the electrons and the neon tube and was a source of true statistical randomness. This randomness was then calibrated so that ERNIE would select a collection of random numbers between 1 and 100 million, which corresponded to the Premium Bond numbers that people had purchased. I struggled to follow a lot of the technical details of how ERNIE works, due to an insufficient knowledge of electrical engineering, but if you are interested the following link contains a technical document describing how ERNIE selected random numbers: www.tnmoc.org/sites/default/files/Ernietechnology.pdf I might do some reading up on Electrical Engineering at some point, and then write up exactly how ERNIE worked, because I couldn't find a decent nontechnical description online. Conclusion Premium Bonds were a massive success when they were launched. They managed to get people excited about saving, and they gave the Government a cheap source of borrowing. I think it's a shame they have fallen by the wayside in recent years. They are conceptually simple, they offer a lot of the guarantees that savers look for in an investment opportunity  protecting their capital, not locking them in for a fixed period  they exploit the popularity of lotteries in a positive way (by tricking people into saving more in order to play), and they don't require complicated decisions to be made by savers (who has time to trawl through the bewildering array of different options currently available  Cash ISAs, Stocks and Shares ISAs, NSI bonds, Fixed interest bank accounts....) All too often, it seems Government savings schemes are designed by individuals steeped in financial theory who are too far removed from the savers who will be using the products. We should rethink how we design the kind of products offered by the NS&I and take more account of Behavioural Economics, and Marketing Strategies rather than relying on microeconomic models which assume rational investors, or relying on models of investment returns like Modern Portfolio Theory. I finally got around to reading through the 'Curriculum 2019' announcement the Institute made last year. Over the last couple of years, the Institute has carried out a comprehensive review of the qualification process. Based on this they then launched a new curriculum, called Curriculum 2019, which is due to be phased in over the next two years. Some of the changes seem positive, but there's still some fundamental issues that I have with the exams, I thought this would be a good time to write about them. First let's look at the new exams: CS 1  Actuarial Statistics CS1 looks to be largely just CT3 with a little bit of CT6. The weighting of the subsections for Cs1 will be:
My only comments on this would be that I've seen Regression models used in practice only once while I've been working. It does seem more common to use them in econometric models, and to a lesser extent they do seem to be used in banking/investments environment but on the whole I'm not sure how relevant they are to the work carried out by most actuaries. It could be argued that Regression Models are a useful precursor to the much more powerful and widely used Generalised Linear Model, but I'm not sure it warrants a 30% weighting here. Other than that, this module looks sensible. CS 2  Actuarial Statistics CS2 is mainly based on the old CT4, and also has some of the CT6 topics which were not included in CS1. The subsections are:
The big point to note here is the inclusion of Machine Learning for the first time in the actuarial exams, I think this is a long overdue and much needed addition. My only criticism would be that it has not been given a greater weighting in the curriculum. I think the advances in Machine Learning, and the fact that most actuaries are not keeping abreast of the subject, is the biggest threat to actuaries remaining relevant within the larger analytics space. The danger is that actuaries will be increasingly sidelined over practitioners with backgrounds in Data Science and Artificial Intelligence. I would have argued for creating an entire module based on the advances in Data Science which I would have made entirely computer based with a focus on the practical application of the models. In order to make room for an increased focus on Data Science, I would have reduced the time spent on Time Series and Survival Models in this module. Time Series, like Regression Models, are not that widely used in practice as far as I'm aware, with their main uses being limited to econometric models, and to a lesser extent in investment models. This part of the curriculum in my opinion could be moved to the Specialist exams section where relevant without much loss. Likewise for survival models, which outside of life and pensions do not have wider application. CM1  Actuarial Mathematics CM1 looks like it is mainly based on CT1 and CT5. The subsections are:
The invention of Decrement Models and Life Tables in the 17th century marks the beginning of Actuarial Science as a separate field of study. For this historical reason alone, I don't think there's much chance of them being dropped from the curriculum. Personally I've never liked them, finding them messy, and the calculations fiddly and time consuming. Given the fact that they are not used outside of pensions and Life Insurance I would definitely not complain if they were removed from the Core Principles part of the exams and moved into the Specialist Principles section instead. I think I would definitely recommend to new Actuarial Students that they start with the CS exams or the CB exams before tackling the CM exams. The CM exams, while containing some useful topics, seem on the whole less relevant and less interesting than the material covered in the other exams. If a student came from a STEM background I would probably recommend they start with the CS exams, and if they came from a Economics/Management background I think I would recommend that they start with the CB exams. CM2  Actuarial Mathematics CM2 is largely based on CT8. The subsections and weightings are:
CT8 is a tricky exam to judge, I personally enjoyed studying it, yet I'm once again not sure how useful a lot of the material is. Actuaries have often been accused rightly or wrongly of being out of touch with developments in related branches, and I think this was true of the advances in Financial Mathematics 20 years ago. Historically Financial Mathematics had an independent development to Actuarial Science, and for a long time the advances in Financial Mathematics, such as the Captial Asset Pricing Model (CAPM), or the Black Scholes Option Pricing Model, were largely ignored by the Actuarial Profession. An attempt was made to correct for this, and the CT8 module was invented. The issue now is though, that CT8 still has a relatively high weighting in the exams as a whole. Returning to my point about Machine Learning, the Institute has given Option Pricing (an obscure part of investment theory) twice the weighting of the entire subject of Machine Learning! The second worry I have about the CM2 syllabus is that since the Financial Crisis, Financial Mathematics has advanced considerably, with some topics falling out of favour (one such example being the BlackScholes Option Pricing Model) and other topics have gained favour, yet the syllabus does not seem to have changed in line with these developments. The other exams in Curriculum 2019 are basically one to one replacements of current exams, so I won't go through them in detail. I will mention a few broader changes that I would make to the Institute's education policy. 3 Recommendations for a new Curriculum When I was thinking about what I would change about the course notes I had quite a few ideas  include a module on coding, specialise earlier so that GI actuaries don't need to learn about withprofit life insurance contracts, Life Actuaries don't need to learn about the CAPM etc., expand the communications section of the exams (this one in particular would not be popular..), but other than the three I've written about below, all my ideas had pros and cons and I think I could be talked out of them. The three I mention below though though I think have really compelling arguments. Recommendation 1  Rewrite the Course Notes with the aim of making the subject as easy to understand as possible The process for creating the course notes at the moment is quite convoluted: Three main problems stand out for me with this process. Firstly, when Act Ed receive the Core Reading, they are unable to change the ordering of the material. This has the unfortunate effect that often the material is not presented in the order which would make it the easiest to understand.. Material is instead introduced in the the order in which it was included as an item in the syllabus. This quite often necessitates flicking back and forward between pages and not understanding one section until you have read and understood a later section. The second problem with the current process is that by providing an annotated version of the Core Reading, rather than writing the Core Reading in a manner which could be understood without annotation in the first place, we end up with this strange hybrid style which can be a significant barrier to understanding the material. The Core Reading produced by the Institute is often terse, can be inconsistent, and is not always friendly to the reader. The annotations are often well written and help to clarify the unclear sections, but why add the extra cognitive load? Just get someone like Act Ed involved at an earlier stage and write something that is coherent and easy to understand in the first place. The third problem is that there is no single person taking responsibility for the finished product. For an Institution that prides itself on it's approach to accountability I think this is unfortunate. By giving someone with the relevant experience and skills overall creative and technical control over the finished product I believe we would end up with a much more polished and accessible finished product. The syllabus appears to be written by committee, and the Core Reading then expanded out with an awareness that clarifications will be made later. While I think Act Ed do a great job of turning this into something understandable, they are at this point constrained by the presentation of the Core Reading, I think a better product overall could be made by changing this process. Due to the above three issues, when studying for the CT exams I would normally not read the Institute Course Notes at all until just before the exam, instead relying on Course Notes (published online for free!) from uni modules that covered the same material. Due to the exemption system, I found that I could often find modules which covered precisely the same material, but the notes would be written by a university lecturer with a deep understanding of the subject, and also experience of writing notes which are meant to be understood. Whenever I did this, I found the uni notes much easy to work from. Recommendation 2  Make all the material available online for free. The Course Notes for the actuarial exams are really expensive. This basically restricts the people who are able to study the notes to those who are employed in an actuarial role, and who's employer is willing to fund them in sitting the exams. Given the professions's public charter of "in the public interest, to advance all matters relevant to actuarial science and its application and to regulate and promote the actuarial profession" I think this would be a great idea, as opening up the material to anyone who would be interested in reading them would massively increase the number of people reading the notes. This would particularly lower the barrier for prospective students from poorer backgrounds or countries where the costs would otherwise be prohibitive. To me it's a nobrainer that you should try to reduce the barriers to education as much as possible. Providing free access to allow anyone to study actuarial science is a net public positive with very little downside. There would not be a significant cost in the Institute hosting the ebooks on the website for people to download given the web infrastructure is already there. But there would be a significant loss of income. Looking at the Institute's financial accounts for 2016/2017 the total income relating to 'pre qualification and learning development' was around £10 million, the proportion of this which is made up of income related to selling the Course Notes is not specified, but my very rough guess would be that it might be somewhere around a quarter of the total, so £2.5 million. This estimate is just based on the relative cost of the Core Reading (in the range of £50  100 for most exams), and the cost of entering the exams (in the range of £220  £300 for most exams). This would represent around 10% of the Institute's total income, which is significant, but would not be impossible to offset against increases in other areas. Recommendation 3  move away from written Course Notes to Online Videos Imagine instead of having to read through the Course Notes in order to understand the material, you were instead given one to one private lectures with one of the top educators in the entire country for the six months leading up to the exam. You are allowed to have the lectures at any time of the day, if you prefer mornings you can have them in the morning, if you prefer learning at 2 AM whilst listening to Kerry Perry on full blast then the lectures can be adjusted accordingly. You can repeat any particular lecture as many times as you like, and ask the lecturer to pause half way through a lecture while you think about something. The lecturer is also on top form each lecture, they are giving the best version of that lecture that they have ever given. When you put it like that it sounds pretty good! I think some version of Massive Online Open Courses (MOOCs) or Khan Academy style videos are the future of education. The best metaphor I heard someone use to explain them is if you think back 200 years, if someone had wanted to write the Dark Knight, then Christopher Nolan would have still written the same amazing script, but rather than watching Christian Bale and Heath Ledger (two of the best actors in the world) star in a production which cost 185 million USD to make, you would have had to go to your local theatre and watch the best two actors in your village do their best take on Nolan's script. The budget would have been a fraction of 185 million, and if the actors were having a bad day, or forgot their lines, then they would have had to muddle through. Bale and Ledger had the opportunity of doing take after take until they came up with the perfect version that they were happy with. The video was then post processed and sharpened for months, before being shown to test audience and then further refined based on their feedback. The Institute could be producing 'Hollywood style' videos right now as a compliment to the Course Notes. The Newton  Pepys Problem17/6/2017 I always found it quite interesting that prior to the 19th century, Probability Theory was basically just a footnote to the study of gambling. The first time that Probability Theory was formalised in any systematic way at all was through the correspondence of three 17th century mathematicians, Pierre Fermat (famous for his last theorem), Blaise Pascal (famous for his wager), and Gerolamo Cardano (not actually famous at all) when analysing a problem in gambling called the problem of points. The problem of points is the problem of how to come up with a fair way to divide the winnings when betting on a game of chance which has interrupted before it can be finished. For example, let's say we are playing a game where we take it in turns to roll a dice and we record how many 6s we get, the first person who rolls a total of 10 6s wins. What happens if we are unable to finish the game, but one player has already rolled 8 6s, whereas their opponent has only rolled 2 6s. How should we divide the money in a fair way? Obviously it's unfair to just split the money 5050 as the player with 8 6s has a much higher chance of winning, but at the same time, there is a chance that the player with only 2 6s might get lucky and still win, so we can't just give all the money to the player who is currently winning. The solution to the problem involves calculating the probability of each player winning given their current state, and then dividing the money proportionally. In order to answer this question in a systematic way, Fermat, Pascal, and Cardano formalised many of the basic principles of Probability Theory which we still use today. Newton  Pepys Problem The Newton  Pepys problem is another famous problem related to gambling and Probability Theory. It is named after a series of correspondence between Isaac Newton and Samuel Pepys, the famous diarist, in 1693. Pepys wrote to Newton asking for his opinion on a wager that he wanted to make. Which of the following three propositions has the greatest chance of success? A. Six fair dice are tossed independently and at least one “6” appears. B. Twelve fair dice are tossed independently and at least two “6”s appear. C. Eighteen fair dice are tossed independently and at least three “6”s appear. Pepys initially believed that Option C had the highest chance of success, followed by Option B, then Option A. Newton correctly answered that it was in fact the opposite order and that Option A was the most likely, Option C was the least likely. Wikipedia has the analytical solution to the problem. Which comes out as: There's a few things I find really interesting about Newton and Pepys's exchange. The first is that it's cool to think of two very different historical figures such as Newton and Pepys being acquainted and corresponding with each other. For me, it makes them much more human and brings them to life the fact that they were both living in London and moving in the same social circles at the same time. Another interesting point is that once again, we see that Probability Theory has been advanced again due to the desire to make money from Gambling. Finally I think it's cool that Pepys was able to ask one of the greatest physicists of all time for a solution to the problem, yet the solution is trivial now. Luckily Newton was able to provide Pepys with an answer, though it might have taken Newton quite a while to calculate, especially for Option C. But you could give the problem to any student now who has access to a computer and they would be able to give you an answer in minutes by just simulating the problem stochastically. Stochastic modelling always seemed like a new form of empiricism to me, whereas calculating the answer with a computer analytically still seems like apriori reasoning. Newton probably did compute the answer analytically by hand, but he would not have been able to simulate 50,000 simulations of the game by hand. It's fundamentally a different kind of reasoning, and the closest he could have got would have been to play the game 50,000 times and record the average. Stochastic Model To calculate this myself I set up a Monte Carlo model of the game and simulated 50,000 rolls of the dice to calculate the expected probability of each of these three options. We can clearly see from this graph that Option A is the most likely Option of the three, with Option C being the least likely. We can tell all of this by just setting up a model that takes 5 minutes to build and give an answer in seconds. It makes you wonder what Newton would have been able to manage if he had access to the computing power that we take for granted now. Sources: Wikipedia: en.wikipedia.org/wiki/Newton%E2%80%93Pepys_problem An Article by Stephen Stigler: arxiv.org/pdf/math/0701089.pdf Bitcoin Mining10/6/2017 J.P. Morgan announced recently that they have developed their own Ethereum derivative called Quorum. It is designed to be a platform for smart contracts and a distributed ledger based on Blockchain technology. www.jpmorgan.com/country/US/EN/Quorum HSBC, Bank of America, and Merrill Lynch, have also announced they are setting up a Blockchain ledgers system for clearing interbank transactions: www.cityam.com/257426/blockchaintechnologycouldrevolutioniseglobaltrade And Microsoft and IBM are setting up Blockchain platforms that they can sell to other business, dubbed Blockchainasaservice (Baas) www.coindesk.com/ibmvsmicrosofttwotechgiantstwoblockchainvisions/ The info that's been released by these companies about how the technologies will actually work is rather sparse though. There seems to be a lot of buzz, but still no clear consensus on exactly how these technologies will work in practice. In order to try to understand how Blockchains might be important I did some more reading on how they work as part of the Bitcoin protocol, but I actually found myself getting really interested in some of the details of Bitcoin Mining. One of the books I read was the excellent 'Mastering Bitcoins' by Andreas Antonopoulos. It works through all the nitty gritty technical details of the Bitcoins protocol and it really helped crystallise my understanding of some of the technical details. Metaphors about signatures, ledgers, or Alice sending Bob a box with two padlocks on it will only get you so far, at a certain point you need to read through the actual algorithms that are used, and review some source code. So what is Bitcoin Mining and why is it so interesting? What is Bitcoin Mining? Mining is the process by which new transactions are sent over the Bitcoin network and also the process by which new Bitcoins are created. The term Bitcoin Mining is actually a bit of a misnomer, as the creation of the new Bitcoins is not a necessary part of Mining. Even if no new Bitcoins were created, the process of Mining would be the same, and just as important, as it is the mechanism by which transactions are processed within the Bitcoin network. The network is configured so that approximately every ten minutes one of the miners currently attempting to mine the Bitcoin network will find a solution to the hashing problem which will have the following effects:
In essence, assuming the network is not overloaded by transactions (which at the moment it is due to something called the block size limit controversy, which I might blog about another time) every ten minutes all the new transactions which have been created in the last ten minutes will be processed and sent across the Bitcoin network. All these transactions will be included in the latest block which will be added to the end of the Bitcoin Blockchain. The person who mined this latest block will receive a reward of 12.5 new Bitcoins, and all the transaction fees from the last ten minutes. One thing that I didn't understand when first reading about Bitcoins, is that there is only one Blockchain at any one point (barring something going wrong), and that all transactions across the entire Bitcoin network are processed by a single miner in a single Block which is added to the end of the Blockchain. The problem which miners need to solve in order to create a new block in the Blockchain is to find a hash generated by running the the SHA256 algorithm twice on the new block so that the number of leading zeros of the hash is less than the current difficulty specified by the bitcoin network. SHA256 is basically just a complicated algorithm that produces outputs that are effectively random. They are random in the sense that it is impossible to predict what output you will get for a given input, but if you use precisely the same input you will get the sense answer every time. If this sounds a bit complicated, don't worry, it took me ages to get my head around how it all works. Effectively, there is no way to shortcut the above process, SHA256 was designed so that there is no way to predict an input which will generate a given output. If I change the input by a tiny amount, the output will change completely, and there is no pattern to how the output is effected by a chance in the input. The only way to find a valid output is to brute force the problem. So essentially, the only way to mine a new block is to repeatedly attempt to create a new block using all the information about the transactions you would like to include and adding an arbitrary string to the end of the transactions, which you vary every time you calculate the hash of this input, until you find a value that satisfies the conditions set by the Bitcoin network. If someone else finds a solution before you, then everyone starts again with the new set of transactions. Total Hashing Power On average, every ten minutes somewhere in the world a miner will find a valid solution and mine a new block. Whenever the average time to find a solution gets too high or too low, the difficulty of the problem is decreased or increased automatically so as to bring the average time closer to ten minutes. The problem that needs to be solved by the Miners was designed in a clever way so that it could be made arbitrarily hard or easy depending on how many miners are attempting to Mine the network. What is the monetary value of successfully mining a Bitcoin block? We can easily check this by looking at the average transaction fees from the last few blocks that have been mined. For example in the latest block : blockchain.info/block/0000000000000000010465d2dc60e5bf41911b98411ee6b04632a97af41a5df9 The miner received a reward of 12.5 Bitcoins, and also received 1.5 Bitcoins in transaction fees. At today's exchange rate, a Bitcoin is worth around 3,000 USD, which means each block is worth 42,000 USD to the miner at today's exchange rate. Given six blocks are mined per hour, 24 hours per day. The total value of mining the Bitcoin network is approximately 6 million USD per day, or 2 billion USD per year! Given these massive sums up for grabs, there has naturally been a huge arms race in miners attempting to capture this value. Given the design of the SHA256 algorithm and the fact that the only way to mine Bitcoins is to brute force the problem, the only way to increase your share of the 2 billion USD pa is to increase the number of hashes you are checking per second. In fact we can track the total hashing power of the Bitcoin network and see how this has increased over the last 10 years, I took the following graph from Blockchain.info. We can see that the total hashing power has been increasing exponentially year on year. The total Bitcoin network is currently estimated as running at around 5,000 PetaHashes per second. Which in long form is 5,000,000,000,000,000 hashes per second. Bitcoin mining was originally carried out by miners using the CPU in a normal desktop computer, but as the number of miners increased, miners started to adapt by using GPU in their computer instead which is much more efficient. Once everyone started to use GPUs though, the next step was for miners to start using something called Field Programmable gate arrays. These are circuits which can be optimised to carry out specific operations very efficiently, so we can set one up that is optimised to carry out the operations that are used in the SHA256 algorithm very efficiently. The latest step in the arms race is the use of circuits called Application Specific Integrated Circuits (ASIC), these are circuits which are designed to do nothing but carry out the specific operations of the SHA256 algorithm extremely efficiently. While the Field Programmable Gate Arrays had been optimised by the people who had bought them to carry out the SHA256 algorithm, the ASICs can do nothing but carry out the algorithm. So due to the fact that the Bitcoin network uses the SHA256 algorithm to validate blocks, we have the weird situation that manufacturers have mass produced ASIC which have the sole function of carrying out the SHA256 algorithm millions of times a second. Who would have guessed that that would have happened ten years ago? The Mining Arms Race The point to remember when thinking about mining is that once the processing power of miners gets above a very small initial threshold, there is no benefit to the network as a whole in increasing the amount of total processing power. The Bitcoin network naturally increases the complexity of the problem that miners need to solve if the total level of hashing increases, so that it always takes approximately ten minutes to mine a block. Bitcoin mining really is an arms race in that if all miners agreed tomorrow to reduce their mining output by 90% there would be no negative effect on the network as a whole, and everyone would still receive the same share of the mining reward. Yet, as soon as one miner starts mining in a much more efficient way, all other miners need to do the same or risk losing out. What does it matter if all this effort is going into mining Bitcoins? The issues is that due to the sums involved, we are now globally spending a huge amount of money and computation power on carrying out what effectively counts as pointless calculations. If aliens visited us tomorrow they'd probably ask why we have a network of computers set up which are carrying out quadrillions of calculations per second of the same fairly uninteresting algorithm. Let's try to put the Bitcoin network into some context. For comparison, the largest Supercomputer in the world is currently the Sunway TaihuLight system at the National Supercomputing centre in Wuxi, China. It has over 10 million cores, and a max speed of 93 PetaFlops per second. Which means it can perform approximately 93,000,000,000,000,000 floating point operations per second. How does this compare to the total bitcoin network? It's impossible to compare the network directly given the fact that so much of the current hashing power is dominated by ASICs which are unable to do anything other than calculate the hash function. We can however attempt to make some comparisons by using other metrics as proxies. When I looked at the most common ASIC used by miners, the Antminer S7 looks to be one of the most widely used circuits by amatuer Miners. It has a hashpower of 4.73 TH/s and comes at a cost of 500 USD. If we divide the total hashing power of the network by the hashpower of the S7, we can derive a (very) rough estimate of the total cost of the hardware currently used in the bitcoin network. This comes out as 500 USD * 5,000 Quadrillion / 4.73 Trillion = 528m USD. We'll use this number later on to estimate the size of the supercomputer we could have brought instead. Since the above estimate is so rough, let's think another way to estimate the total cost of the computing power making up the Bitcoin Network to give ourselves a range of values. If we think instead about the average annualised mining reward from the Bitcoin Network over the last year, and then think about the kind of investment returns Miners would be expecting from the investment in hardware, this will give us another estimate of the total amount spent on Mining equipment across the Bitcoin network. The average Bitcoin price over the last year, according to CoinDesk was 971 USD. I've put an image of the graph of the price over the last year below, but for the calculation I downloaded the Daily midprice and then averaged accross the year. Assuming 14 Bitcoins received per block mined as per our analysis above, and 6 blocks mined per hour over the year we get a value of around 700 million USD as the average amount that the network as a whole received for mining bitcoins in the last year. Assuming a rate of return commensurate with the risk (let's say a range of 50% to 100%) and assuming this rate of return includes the cost of electricity. We are looking at a capital value of between 700 million/1 and 700 million/0.5 currently invested in mining the Bitcoin network. This alternative estimate gives us a range between 700 million USD and 1.4 billion USD spent on the hardware currently being used to mine Bitcoins. If we take this dollar value of the computing power being used to mine the Bitcoin network and compare it to the FLOPs per Second per dollar of the largest supercomputers in the world we can estimate the speed of the supercomputer we could have purchased instead. The Sunway TaihuLight system, which is currently the most powerful in the world, is estimated to cost around 273 million USD. So by this metric, the Bitcoin network could be said to be twice as powerful, 3 times as powerful, or even 5 times as powerful as the world's largest supercomputer depending on which estimate of the cost of the Bitcoin hardware currently being used. The frustrating conclusion is that we have collectively gathered a network with a total computing power multiple times that of the largest supercomputer in the word and yet all the computation we are carrying out is effectively useless. The proofofwork underlying Bitcoin is essentially an arbitrarily hard piece of computing who's only utility is to secure the Bitcoin network. Of course this in itself is a valid purpose, but it definitely does not warrant more computing power than the top 5 super computers in the world combined! Gridcoin I'm not the first person to notice this problem and there have been attempts to develop altcoins which harness this computing power to attempt to solve useful problems. One such altcoin is Gridcoin, which randomly assigns a reward to a miner who is mining Gridcoin in proportion to the amount of useful computation they have contributed in the last ten minutes. Users of Gridcoin can select which project they contribute computing power to from a centrally maintained whitelist. The whitelist includes projects such as simulating Protein Folding (used in medical research), searching for Prime Numbers, running climate models, and analysing data from particle physic experiments. The current issue with Gridcoin though is that it relies on a centralised system to allocate the mining rewards. This undermines many of the benefits of the Bitcoin system which was designed to be a centralised, nontrust based system. What we ultimately need is a system which combines the decentralised Bitcoin protocol, with a system that rewards some sort of useful proofofwork algorithm. I could be on my own here, but personally I always thought the definition of Sample Standard Deviation is pretty ugly. $$ \sqrt {\frac{1}{n  1} \sum_{i=1}^{n} { ( x_i  \bar{x} )}^2 } $$ We've got a square root involved which can be problematic, and what's up with the $\frac{1}{n1}$? Especially the fact that it's inside the square root, also why do we even need a separate definition for a sample standard deviation rather than a population standard deviation? When I looked into why we do this, it turns out that the concept of sample standard deviation is actually a bit of a mess. Before we tear it apart too much though, let's start by looking at some of the properties of standard deviation which are good. Advantages of Standard Deviation
The last property is a really important one. The $\frac{1}{n1}$ factor is a correction we make which we are told turns the sample standard deviation into an unbiased estimator of the population standard deviation. We can test this pretty easily, I sampled 50,000 simulations from a probability distribution and then measured the squared difference between the mean of the sample standard deviation and the actual value computed analytically. We see that the Average Error converges quite quickly but for some reason it doesn't converge to 0 as expected! It turns out that the usual formula for the sample standard deviation is not actually an unbiased estimator of the population standard deviation after all. I'm pretty sure they never mentioned that in my stats lectures at uni. The $n1$ correction changes the formula for sample variance into an unbiased estimator, and the formula we use for the sample standard deviation is just the square root of the unbiased estimator for variance. If we do want an unbiased estimator for the sample standard deviation then we need to make an adjustment based not just on the sample size, but also the underlying distribution. Which in many cases we are not going to know at all. The wiki page has a good summary of the problem, and also has formulas for the unbiased estimator of the sample standard deviation: en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation Just to give you a sense of the complexity, here is the factor that we need to apply to the usual definition of sample standard deviation in order to have an unbiased estimator for a normal distribution. $$ \frac {1} { \sqrt{ \frac{2}{n1} } \frac{ \Gamma \Big( \frac{ n } {2} \Big) } {\Gamma \Big( \frac{n1}{2} \Big)} } $$ Where $\Gamma$ is the Gamma function. Alternatives to Standard Deviation Are there any obvious alternatives to using standard deviation as our default measure of variability? Nassim Nicholas Taleb, author of Black Swan, is also not a fan of the wide spread use of the standard deviation of a distribution as a measure of its volatility. Taleb has different issues with it, mainly around the fact that it was often overused in banking by analysts who thought it completely characterised volatility. So for example, when modelling investment returns, an analyst would look at the sample standard deviation, and then assume the investment returns follow a Lognormal distribution with this standard deviation, when we should actually be modelling returns with a much fatter tailed distributions. So his issue was the fact that people believed that they were fully characterising volatility in this way, when they should have also been considering kurtosis and higher moments or considering fatter tailed distributions. Here is a link to Taleb's rant which is entertaining as always: www.edge.org/responsedetail/25401 Taleb's suggestion is a different statistic called Mean Absolute Deviation the definition is. $$\frac{1}{n} \sum_{i=1}^n  x_i  \bar{x}  $$ We can see immediately why mathematicians prefer to deal with the standard deviation instead of the mean absolute deviation, working with sums of absolute values is normally much more difficult analytically than working with the square root of the sum of squares. In the ages of ubiquitous computing though, this should probably be a smaller consideration. Designing a new number system I got interested in alternative number systems when I read Malcolm Gladwell's book Outliers back in uni. You might be surprised that Gladwell would write about this topic, but he actually uses it to attempt to explain why Asian students tend to do so well at maths in the US. I was flicking through an old notebook the other day and I can across my attempt at designing such a system. I thought it might be interesting to write up my system. To set the scene, here is the relevant extract from the book: "Take a look at the following list of numbers: 4,8,5,3,9,7,6. Read them out loud to yourself. Now look away, and spend twenty seconds memorizing that sequence before saying them out loud again.If you speak English, you have about a 50 percent chance of remembering that sequence perfectly If you’re Chinese, though, you’re almost certain to get it right every time. Why is that? Because as human beings we store digits in a memory loop that runs for about two seconds. We most easily memorize whatever we can say or read within that two second span. And Chinese speakers get that list of numbers—4,8,5,3,9,7,6—right every time because—unlike English speakers—their language allows them to fit all those seven numbers into two seconds. That example comes from Stanislas Dehaene’s book “The Number Sense,” and as Dehaene explains: Chinese number words are remarkably brief. Most of them can be uttered in less than onequarter of a second (for instance, 4 is ‘si’ and 7 ‘qi’) Their English equivalents—”four,” “seven”—are longer: pronouncing them takes about onethird of a second. The memory gap between English and Chinese apparently is entirely due to this difference in length. In languages as diverse as Welsh, Arabic, Chinese, English and Hebrew, there is a reproducible correlation between the time required to pronounce numbers in a given language and the memory span of its speakers. In this domain, the prize for efficacy goes to the Cantonese dialect of Chinese, whose brevity grants residents of Hong Kong a rocketing memory span of about 10 digits. It turns out that there is also a big difference in how numbernaming systems in Western and Asian languages are constructed. In English, we say fourteen, sixteen, seventeen, eighteen and nineteen, so one would think that we would also say oneteen, twoteen, and threeteen. But we don’t. We make up a different form: eleven, twelve, thirteen, and fifteen. Similarly, we have forty, and sixty, which sound like what they are. But we also say fifty and thirty and twenty, which sort of sound what they are but not really. And, for that matter, for numbers above twenty, we put the “decade” first and the unit number second: twentyone, twentytwo. For the teens, though, we do it the other way around. We put the decade second and the unit number first: fourteen, seventeen, eighteen. The number system in English is highly irregular. Not so in China, Japan and Korea. They have a logical counting system. Eleven is ten one. Twelve is ten two. Twentyfour is two ten four, and so on. That difference means that Asian children learn to count much faster. Four year old Chinese children can count, on average, up to forty. American children, at that age, can only count to fifteen, and don’t reach forty until they’re five: by the age of five, in other words, American children are already a year behind their Asian counterparts in the most fundamental of math skills. The regularity of their number systems also means that Asian children can perform basic functions—like addition—far more easily. Ask an English sevenyearold to add thirtyseven plus twenty two, in her head, and she has to convert the words to numbers (37 + 22). Only then can she do the math: 2 plus 7 is nine and 30 and 20 is 50, which makes 59. Ask an Asian child to add threetensseven and two tenstwo, and then the necessary equation is right there, embedded in the sentence. No number translation is necessary: It’s fivetens nine. “The Asian system is transparent,” says Karen Fuson, a Northwestern University psychologist, who has done much of the research on AsianWestern differences. “I think that it makes the whole attitude toward math different. Instead of being a rote learning thing, there’s a pattern I can figure out. There is an expectation that I can do this. There is an expectation that it’s sensible. For fractions, we say three fifths. The Chinese is literally, ‘out of five parts, take three.’ That’s telling you conceptually what a fraction is. It’s differentiating the denominator and the numerator.” The muchstoried disenchantment with mathematics among western children starts in the third and fourth grade, and Fuson argues that perhaps a part of that disenchantment is due to the fact that math doesn’t seem to make sense; its linguistic structure is clumsy; its basic rules seem arbitrary and complicated. Asian children, by contrast, don’t face nearly that same sense of bafflement. They can hold more numbers in their head, and do calculations faster, and the way fractions are expressed in their language corresponds exactly to the way a fraction actually is—and maybe that makes them a little more likely to enjoy math, and maybe because they enjoy math a little more they try a little harder and take more math classes and are more willing to do their homework, and on and on, in a kind of virtuous circle. When it comes to math, in other words, Asians have builtin advantage. . ." Here's a link to Gladwell's website which contains the extract. gladwell.com/outliers/ricepaddiesandmathtests/ Base10 System Gladwell mainly talks about the words that the Chinese use for numbers and the structure inherent in them, but there is actually another more interesting way we can embed structure in our number system. Our current number system is base10, which means that we have different symbols for all the numbers up to 10 (0,1,2,3,4,...,9), and then when we get to the number 10, we use the first number again but we move it over one place, and then put a zero in it's place. When we are teaching a child how to write numbers, this is exactly how we explain it to them. For example to write two hundred and seven, we would tell them that they need to put a 2 in the hundreds columns a 0 in the tens column and a 7 in the units column  207. The fact that we use a base10 number system is actually just a historical quirk. The only reason humans started to use it is just that we have 10 fingers, there's no particular mathematical benefit to a base10 system. We could actually base our number system on any integer. The most commonly used alternative is the binary number system used extensively in computing. The binary number system is a base2 number system where we only have 2 symbols  0 and 1. The trick is that instead of reusing the first symbol when we get to ten, we do it when we get to two. So the sequence of numbers up to 10 in binary is: In fact we don't even need to restrict ourselves to integers as the base of our number system. For example, we could even use base $ \pi $! Under this system if we want to write $ \pi $, then we would just write 1, to write $ \pi^2 $ it would just be 10. Writing one in base $ \pi $ would be quite difficult though, and we would need to either write $ 1 / {\pi} $ or invent a new character. Base 12 number system So what would be the ideal number on which to base our number system? I'm going to make the argument that a base 12 number system would be the best option. 12 has a large number of small factors and this makes it ideal as a base.
Returning back to Gladwell's idea, another change we could make to the number system to help make it more structured is to change the actual symbols we use for the integers. Here was my attempt from 2011 for alternative symbols we could use. My approach was to embed as much structure into the numbers as possible, so the symbol for three is actually just a combination of the symbols for one and two. This applies for all the numbers other than one, two, four, and eight. I wonder if it would be possible to come up with some sort of rule about how the symbols rotate to further reduce the total number of symbols and also to add some additional structure to the symbols. Let me know if you can think of any additional improvements that could be made to our number system. I think the first time I read about the Difference Engine was actually in the novel of the same name by William Gibson and Bruce Sterling. The book is an alternative history, set in the 19th centuary where Charles Babbage, actually finished building the Difference Engine, a Mechanical calculator he designed. This in turn lead to him getting funding for the Analytical Engine, a Turingcomplete mechanical computer which Babbage also designed in real life, but also didn't actually finish building. I really enjoyed the book, but how plausible was this chain of events? How did the Difference Engine work? And what problem was Babbage trying to solve when he came up with the idea for the Difference Engine? Computers before Computers Before electronic computers were invented, scientist and engineers were forced to use various tricks and short cuts to enable them to carry out difficult calculations by hand. One short cut that was used extensively, and one which Babbage would have been very familiar with, was the use of log tables to speed up multiplication. A log tables is simply a table which lists the values of the logarithmic function. If like me you've never used a log table, then how are they useful? Log Tables The property of logarithms that makes them useful in simplifying certain calculations is that: $ log(AB) = log(A) + log(B) $ We use this property often in number theory when we wish to turn a multiplicative problem in to an arithmetic problem and vice versa. In this case it's more straightforward though. If we want to calculate $A*B$ we can convert the figures into logs, and then we just need to add the logs together and convert back to obtain the value of $A*B$. Lets say we want to calculate $ 134.7 * 253.9 $ . What we can do instead is calculate: $ log( 134.7) = 2.1294 $ and $ log (253.9) = 2.4047 $ then we just need to add together $ 2.1294 + 2.4047 = 4.5341 $ and convert back from logs
$ 10^{4.5340} = 34200 $ which we can easily verify as the number we require. Haven't we just made our problem even harder though? Before we needed to multiply two numbers, and now instead we need to do two conversions and an addition, admittedly it's easier to add two large numbers together than multiply them, but what about the conversion? The way around this is to have a book of the values of the log function so that we can just look up the log of any number we are interested in, allowing us to easily convert to and from logs. This is probably a good point to introduce Charles Babbage properly, Babbage was an English Mathematician, Inventor, and Philosopher born in 1791. He was a rather strange guy, as well as making important contributions to Computer Science, Mathematics and Economics, Babbage founded multiple societies. One society was founded to investigate paranormal activity, one was founded to promote the use of Leibniz notation in calculus, and another was founded in an attempt to foster support for the banning of organ grinders. When he wasn't keeping himself busy investigating the supernatural, Babbage was also a keen astronomer. Since astronomy is computation heavy, this meant that Babbage was forced to make extensive use of the log tables that were available at the time. These tables had all been calculated by hand by people called computers. Being a computer was a legitimate job at one point, they would sit and carry out calculations by hand all day every day, not the most exciting job if you ask me. Because the log tables had all been created by teams of human calculators, errors had naturally crept in. The tables were also very expensive to produce. This lead Babbage to conceive of a mechanical machine for calculating log tables, he called this invention the Difference Engine. Method of differences A difference engine uses the method of finite differences to calculate the integer values of polynomial functions. I remember noticing something similar to the method of finite differences when I was playing around trying to guess the formulas for sequences of integers at school. If someone gives you an integer sequence and they ask you to find the formula, say we are given, $1,4,7,10,13,...$ Then, in this case, the answer is easy, we can see that we are just adding 3 each time. To make this completely obvious, we can write in the differences between the numbers.. 1  4  7  10  13  ... 3 3 3 3 What if we are given a slightly more complex sequence though? For example: $1,3,6,10,15,...$ This one is a bit more complicated, let's see what happens when we add in the differences again: 1  3  6  10  15 2 3 4 5 Now we see that there is an obvious pattern in the number we are adding on each time. Looking at the differences between these numbers we see: 1  3  6  10  15 2 3 4 5 1 1 1 So what's happened here? We now have stability on the second level of the differences. It turns out that this is equivalent to the underlying formula being a quadratic. In this case the formula is $0.5*x^2+1.5x+1$. If we assume the first number in the sequence is equivalent to $x=0$ we can now easily recreate the sequence, and easily calculate the next value. Let's try a difficult example, if we are given the following sequence and told to guess the next value, we can use a similar method to get the answer. $2,5,18,47,98,177$ Setting up the method: 2  5  18  47  98  177 3 13 29 51 79 10 16 22 28 6 6 6 Since we get a constant at the third level, this sequence must be a cubic, once we know this, it's much easier to guess what the actual formula is. In this case it is x^3x^2x+3. Babbage's insight was that we can calculate the next value in this sequence just by adding on another diagonal to this table of differences. Adding $6$ to $28$ gives $34$, then adding $34$ to $79$ gives $113$, and then adding $113$ to $177$ gives us $290$. Which means that the next value in the sequence is $290$ So we get: 2  5  18  47  98  177  290 3 13 29 51 79 113 10 16 22 28 34 6 6 6 6 As you might guess, this process generalises to higher order polynomials. For a given sequence, if we keep trying the differences of the differences and eventually get to a constant then we will know the sequence is formed by a polynomial, and we will also know the degree of the polynomial. So if you are ever a given an integer sequence and asked to find the pattern, always check the differences and see if it eventually becomes a constant, if it does, then you will know the order of the polynomial which defines the sequence and you will also be able to easily compute the next value directly. So how does this apply to Babbage's difference engine? The insight is that we have here a method of enumerating the integer values of a polynomial just using addition. Also at each stage we only need to store the values of the leading diagonal. And each polynomial is uniquely determined by specifying its differences. The underlying message is that multiplication is difficult. In order to come up with a shortcut for multiplication, we use log tables to make the problem additive. And even further, we now have a method for calculating the log function which also avoids multiplication. So given we have this method of calculating the integer values of a polynomial, how can we use this to calculate values of the log function? Approximating Functions The obvious way to approximate a log function with a polynomial would be to just take its Taylor expansion. For example, the Taylor expansion of $ log ( 1  x ) $ is: $ log ( 1  x ) =  \sum^{\infty}_{n=1} \frac{x^n}n $ There is a downside to using the Taylor expansion though. Given the mechanical constraints at the time, Babbagge's Difference Engine could only simulate a 7th degree polynomial. So how close can we get with a Taylor expansion? We can use Taylor's theorem to calculate the convergence of the approximation, but this will be quite a bit of work, and since we can easily calculate the actual value of the log function it's easier to just test the approximation with a computer. So taking $log(0.5)$ as an example, when I use a calculator, I am told that it equals $0.6931$, but when I check the 7th order polynomial I get $0.6923$, and it's not until I get to the 10th polynomial that we are accurate even to 4 digits. If we require a more accurate approximation, we will have to use numerical methods in conjunction with a restriction on the range of convergence. This would mean that if we wished to compute $log(x)$ on the interval [0,1], for 100 different points on the interval, we would break [0,1] into subintervals and then use a different polynomial fit for each subinterval. If you'd like to read more about how the actual machine worked then the Wikipedia is really useful. en.wikipedia.org/wiki/Difference_engine And if you are interested in reading more about the mathematics behind using a Difference Engine then the following website is really good: edthelen.org/bab/babintro.html Theresa May announced yesterday that the Government is considering plans to remove the triple lock on the Basic State Pension. The reason being that the triple lock is proving to be too generous to pensioners and that the money could be better spent elsewhere. www.theguardian.com/money/2017/apr/26/theresamayconsideringscrappingtriplelockonpensions I thought I would do some modelling to demonstrate the effect that the triple lock policy has on the total spending on the state pension. We will see that the triple lock is definitely unsustainable over the long term, but that the effect is relatively slow. Just to be clear, this demonstration is not an argument for or against removing the triple lock in the near future. If we believe that Pensioners are currently being being underpaid overall then we may wish to retain the triple lock for longer. If we think that Pensioners are currently being overpaid overall, then we will probably want to remove the triple lock soon. This is just a demonstration of the effect of the triple lock. What is the triple lock? The triple lock on the basic state pension states that the yearly increase in the state pension shall be the greater of:
So this means that the increase will always be at least 2.5%, but if either inflation or the increase in average earnings is greater than 2.5%, then the increase in the state pension will be more than 2.5%. A Stochastic Model of the UK Economy I set up a very basic stochastic model of the UK economy to examine the expected impact of the triple lock on the amount that the UK spends on the state pension as a proportion of the total UK budget. I first began by collecting data on the annual increases in AWE, CPI, and GDP. With this data, I then fitted correlated normal distributions to the annual increments. Once I had paramterised these variables, I then projected the spending on the state pension and also the overall UK spending by these modelled increments, year by year, over 10,000 simulations.. There are definitely more sophisticated and accurate ways of modelling the economy than this, but I think it should be good enough to give useful results. I had to make an assumption about the increase in the total UK Government spending, I decided to model this in line with GDP growth which I think should be a reasonable approximation. I have also used the fact that the UK Government currently spends 12% of the annual budget on paying the state pension as a starting point. This graph shows the proportion of the UK Budget projected to be spent on the state pension over the next 20 years. I have included a colour gradient on the graph to show the uncertainty in the estimate. The different gradients show the 12.5 percentiles. The black line shows the mid point of the modelled distribution. We see that the average annual amount we expect to be spent on the state pension increases from 12% to over 20% over the course of the next 20 years. Note that this is just based on the increases from the triple lock and does not consider any changes in respect of increasing life expectancy, ageing population, or an increase in the state pension age. Therefore in practice, the effect is likely to be greater than this. An increase to 20% would be equivalent to an additional £62 billion per year in today's money on the state pension. An Alternative View From a more mathematical view, the fundamental reason the triple lock works like this can be seen from the following equation: $$\sum_{i = 2018}^{2037} {max ( a_i , b_i ) } >= max ( \sum_{i = 2018}^{2037} a_i , \sum_{i = 2018}^{2037} b_i )$$ Since we expect average annual earnings to increase in line with GDP in the long term, and we also expect the total UK budget to increase in line with GDP, we will eventually expect any series which is based on a maximum of increases to average earnings, and a fixed floor of 2.5% to reach 100% of GDP, and therefore 100% of the total UK budget. Data Sources GDP data source: data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG Average Earnings source: www.ons.gov.uk/employmentandlabourmarket/peopleinwork/earningsandworkinghours/datasets/averageweeklyearnings CPI data source: www.ons.gov.uk/economy/inflationandpriceindices/timeseries/d7g7/mm23 Driverless, Autonomous, Selfdriving, robotic, drone cars, whatever you want to call them, I think selfdriving cars are going to be awesome.. The potential benefits include:
But how far away are we from this being a reality? It seems like we are constantly being told that selfdriving cars are just on the horizon, and that wide spread use of selfdriving cars will arrive sooner than we think. It got me thinking though, surely even when manufacturers start churning out driverless cars, isn't it still going to take a considerable amount of time before they begin to replace all the cars currently being driven? Most people will not suddenly go out and replace their current car the moment selfdriving cars are available on the market. Replacing all the old cars Almost all cars that are currently being driven today will never be selfdriving, it will only be new cars, after a certain point, that will start to be selfdriving. So even if all new cars from now on were to be self driving, there would still be a delay as old driven cars were slowly replaced by self driving cars. I thought I'd try to do some modelling to see how quickly would this process might take place. As a starting point, let's assume that we will start to see selfdriving cars being produced by 2019. I found the following report from the Department of Transport which details the number of cars on the road today, and the number of new cars registered every year. www.gov.uk/government/uploads/system/uploads/attachment_data/file/608374/vehiclelicensingstatistics2016.pdf The important statistics are: There are currently $37.3$m cars on the road Each year the number of registered cars increases by approximately $600,000$ Around $3.3$m new cars are registered every year. I then extrapolated these statistics based on three different scenarios: Scenario 1  All new cars from 2019 onward are driverless We can see that even under this very optimistic scenario, it's not until 2025 that we will see a majority of cars on the road being driverless. It's probably not reasonable though to assume that all new cars produced after 2019 will be driverless, so let's look at the effect of slowing increasing the proportion of new cars that are driverless. Scenario 2  Assuming linear increase in % of new cars produced which are Driverless between 2019 and 2030 In this scenario we assume that in 2018 all new cars are driven, and that by 2030 all new cars are driverless, and we assume a linear increase in the % of new cars which are driverless between these two years. We see that under this scenario, it's not until 2030 that we start to see a majority of driverless cars on the road. To get an alternative view, let's look at a quicker rate of adoption, let's suppose instead that by 2025, all new cars will be driverless. Now we see that a majority of cars are driverless by around 2027, with a strong majority emerging by 2030. Conclusion Even when we assume that driverless cars will start to be produced by 2019, based on current trends of car replacement, and depending on the speed at which self driving cars are produced, we shouldn't expect a majority of cars on the road to be driverless until at least the late 20s or maybe even early 30s. So when analysts say that driverless cars will be common much sooner than people expect, they need to be careful about how they define common. Bitcoin  Who is Satoshi Nakamoto21/4/2017 Who is the mysterious Satoshi Nakamoto? Let me pitch you an idea for a movie  following the 2007 financial crisis, fed up with the corruption of the modern financial system, a lone genius creates a new virtual currency with which he aims to completely undermine the modern baking system. This new currency allows instantaneous online payments to be made with minimal transaction fees and with almost complete anonymity. Better yet, this system is completely decentralised, requiring no central bank or governing body. To further add to the mystique, our hero decides to eschew fame, remaining completely anonymous while netting himself a cool USD 1 billion in bitcoins. But our hero decides to walk away and leave the USD 1 billion in bitcoins untouched on a public ledger on the internet, proving to the world that he was never in it for the money. All that he leaves behind is a name  ***Cue dramatic music***  Satoshi Nakamoto, Chuck in some bad guys and a love interest and we've got the making of a Hollywood blockbuster! This is of course the true story of the origins of bitcoin. Unsurprisingly, there have been many attempts to find the true identity of Satoshi Nakamoto, every six months or so a new candidate is found and the media jumps on the bandwagon, but none of the candidates so far have been really convincing. I thought I'd do a bit of digging myself and see what we have to work with, and what we can know for certain., and what we can speculate about. So what info do we have work with? Satoshi left behind the following:
The Forum Posts Almost all the forum posts are highly technical, and there is very little to be gleaned about Satoshi's identity from the content of the posts. I did look through most of them just in case. But based on an idea in Satoshi's Wikipedia article, I have graphed the timestamps from the forum posts. All the forum posts can be found on the following website, which I scraped using web scraper and then chucked into excel to extract the timestamps: satoshi.nakamotoinstitute.org/ We can see that there is a clear trend for most posts to be made between 4pm and 11pm, with almost none being made between 5 am  1 pm, suggesting that this is when Satoshi is asleep. Based on most people I know who don't have a 95 job, but are still involved in IT, this is a pretty reasonable sleeping pattern for someone living in a GMT time zone. If we assume that Satoshi has a conventional sleeping pattern though, then we would expect him to be living somewhere on the US East Coast. Both of these seem plausible to me, it does gets less plausible though to consider someone living much further east than Europe. I then graphed the weekday of each forum post. Which shows a fairly stable pattern of posts through out the week. Nothing too surprising here. It has also been noted that the blog posts from Satoshi use British spellings rather than US spelling. Let's also test that. I collected a list of words that are spelt differently in UK and US English, and cross referenced it against the blog posts we scraped earlier. The following words were all used by Satoshi but with the UK spelling. amortised colour decentralised favour fulfil gauge grey greyed labelled labelling liberalising modernised optimised prioritising reorganised This strongly suggests that the author is most familiar with British English over American English. It's also been noted (and I concur) that Satoshi's posts are written like a native English speaker. He uses common idioms well and his grammer and structuring all give compelling reasons to think that he is a native English Speaker .This has been put forward by some as definitive proof that he is British. I'm not so sure though, having met Europeans who through having a lot of exposure to English speakers growing up, now sound like native English speakers when writing or texting. Leaving the blog posts for the time being, let's look at the emails in the mailing list. Mailing List Emails The mailing list was a Cryptography focused mailing list, established in 2000, and can be found through the following link: www.metzdowd.com/mailman/listinfo/cryptography The website gives the following introduction to the mailing list. "Cryptography" is a lownoise moderated mailing list devoted to cryptographic technology and its political impact. Occasionally, the moderator allows the topic to veer more generally into security and privacy technology and its impact, but this is rare. WHAT TOPICS ARE APPROPRIATE: "On topic" discussion includes technical aspects of cryptosystems, social repercussions of cryptosystems, and the politics of cryptography such as export controls or laws restricting cryptography. Satoshi began posting to the mailing list in November 2011, and his first post was an introduction to his new bitcoin system, it gave a brief overview and then linked to the paper he had written which contained the technical details. It therefore seems that the mailing list was a method of generating interest for his already fleshed out system rather than something he contributed to already. I was initially slightly suspicious of how well written the emails are compared to the forum posts. It's been suggested by a few people that Satoshi might actually be the name chosen by a group of collaborators rather than one single person. On consideration though, the mailing list is said to be 'highly moderated' and therefore it should perhaps not be surprisingly that Satoshi has polished his grammar and writing when sending emails to the mailing list. Plus you'd expect quite a bit more care when replying to an email rather than making a forum post. To be honest I struggled to gleam much more from the emails other than a couple of interesting quotes which I've included at the end of this post. The Genesis Block Satoshi created the first block of the first blockchain. Since there were no preceding transactions, Satoshi was able to insert a message into the block. The message he selected was: "The Times 03/Jan/2009 Chancellor on brink of second bailout for banks" This tells us a few things. Firstly, it's evidence that no Bitcoins were mined prior to this date. Secondly, it could be seen as a comment on the financial bailout that was ongoing at the time and which may have cause Satoshi to develop Bitcoin in the first place. And finally, it's another link to the UK, given Satoshi has selected a British newspaper to timestamp his first block. Other Random Thoughts:
Here are some additional thoughts on the Satoshi question which I have included in the hopes that someone else might find them useful. Since Satoshi stopped working on Bitcoin in 2011, perhaps we should be looking for someone who has made interesting contributions to a different project since then? Would a better programmer than me be able to spot idiosyncrasies in Satoshi's coding style which could be traced in other places? What if someone trawled Github and looked for these quirks? Some people have attempted a Stylometric Analysis. I haven't looked into this at all, but it's something I might look into at another point. Satoshi is the Japanese name of the main character (Ash Ketchum) in Pokemon and also the name of the creator of Pokemon, Satoshi Tajiri. Are there any other famous Satoshis? Or Famous Nakamotos? I did a quick google, but I couldn't find anyone who stood out to me. Satoshi was familiar with Mises' regression theorem, which is a pretty niche economic concept from Ludwig von Mises, an economist from the Austrian School. The Austrian School are famously associated with libertarian or right wing anarchist views. Satochi seems pretty au fait with libertarian concepts generally Prior to Bitcoin's rise, crytocurrencies were a very niche interest, perhaps it would be worthwhile to look at who was going to conferences, writing papers, working in the industry, etc. prior to 2007. It should be a relatively small group of people, and you would imagine that Satoshi would have a footprint in there somewhere. Some interesting quotes from Satoshi: Yes, but we can win a major battle in the arms race and gain a new territory of freedom for several years. Governments are good at cutting off the heads of a centrally controlled networks like Napster, but pure P2P networks like Gnutella and Tor seem to be holding their own. ... I appreciate your questions. I actually did this kind of backwards. I had to write all the code before I could convince myself that I could solve every problem, then I wrote the paper. I think I will be able to release the code sooner than I could write a detailed spec. You're already right about most of your assumptions where you filled in the blanks. ... It's very attractive to the libertarian viewpoint if we can explain it properly. I'm better with code than with words though. ... I believe I've worked through all those little details over the last year and a half while coding it, and there were a lot of them. The functional details are not covered in the paper, but the sourcecode is coming soon. I sent you the main files. ... Banks must be trusted to hold our money and transfer it electronically, but they lend it out in waves of credit bubbles with barely a fraction in reserve. Conclusions? To draw a few tentative conclusions, we seem to be looking at: A native English speaker. Who picked a Japanese pseudonym. Who favours British English over US English. Who selected a British newspaper to timestamp his genesis block. Who's background is primarily coding based. Who seems to hold libertarian views and be motivated by libertarian beliefs Who has an interest in Crytography and Crytocurrencies which stretches back to at least 2007. And who appears to be operating either on the East Coast or on a Western European time zone. Surely there can't be many people out there who meet all these criteria? 
AuthorI work as a pricing actuary at a reinsurer in London. Categories
All
Archives
May 2020
