THE REINSURANCE ACTUARY
  • Blog
  • Project Euler
  • Category Theory
  • Disclaimer

An Actuary learns Machine Learning - Part 13 - Kaggle Tabular Playground Competition - June 22

1/7/2022

 

Picture



In which we recreate the previous analysis, but in Python this time. And then add a new submission using Mean rather than median to impute missing values.





Source: https://somewan.design


Read More

An Actuary learns Machine Learning - Part 12 - Kaggle Tabular Playground Competition - June 22

24/6/2022

 

Picture

​

​In which we start a new Kaggle competition, submit a dummy attempt, and then build a very basic Excel model to establish a baseline for future progress.
​




​Source: https://somewan.design


Read More

An Actuary learns Machine Learning - Part 11 - Titanic revisited & Gradient Boosting Classifiers

8/10/2021

 

Picture



In which we try out the best performing algorithm from our house price prediction problem - Gradient Boosted Regression - on the Titanic problem, but don't actually manage to improve on our old score...




​
Source: https://somewan.design


Read More

An Actuary learns Machine Learning - Part 10 - More label encoding / Gradient Boosted Regression

15/2/2021

 

Picture



In which we correct our label encoding method from last time, try out a new algorithm - Gradient Boosted Regression - and finally managed to improve our score (by quite a lot it turns out)





Source: https://somewan.design


Read More

An Actuary learns Machine Learning - Part 9 - Cross Validation / Label Encoding / Feature Engineering

10/2/2021

 

Picture



In which we set up K-fold Cross Validation to assess model performance, spend quite a while tweaking our model, use hyper-parameter tuning, but then end up not actually improving our model.



​

​Source: https://somewan.design


Read More

An Actuary learns Machine Learning - Part 8 - Data Cleaning / more Null Values / more Random Forests

6/2/2021

 

Picture



In which we deal with those pesky null values, add additional variables to our Random Forest model, but only actually improve our score by a marginal amount.





​Source: https://somewan.design


Read More

An Actuary learns Machine Learning - Part 7 - Sub-plots /Null Values/ Random Forests

4/2/2021

 

Picture



In which we plot an excessive number of graphs, fix our problems with null values, re-run our algorithm, and significantly improve our accuracy.





Source: https://somewan.design


Read More

An Actuary learns Machine Learning - Part 6 - Jupyter/Regression/Kaggle house prices

2/2/2021

 

Picture


​In which we start a new Kaggle challenge, try out a new Python IDE, build our first regression model, but most importantly - make these blog posts look much cleaner.






​Source: https://somewan.design


Read More

An Actuary learns Machine Learning – Part 5 – lots of machine learning models

17/1/2021

 

Picture


In which we take our final stab at the titanic challenge by ‘throwing the kitchen sink’ at the problem, setting up another 5 different machine learning models and seeing if they improve our performance (hint they do not, but hopefully it's still interesting)
​


​
​Source: https://somewan.design


Read More

An Actuary learns Machine Learning – Part 4 – Error correction/data cleansing/Feature Engineering

10/1/2021

 

Picture


​
​In which we do more data exploration, find and then fix a mistake in our previous model, spend some time on feature engineering, and manage to set a new high-score. 
​



​
​Source: https://somewan.design


Read More

An Actuary learns Machine Learning – Part 3 – Automatic testing/feature importance/K-fold cross validation

23/12/2020

 

Picture

In which we don’t actually improve our model but we do improve our workflow - being able to check our test score ourselves, analysing the importance of each variable using an algorithm, and then using an algorithm to select the best hyper-parameters

Source: https://somewan.design

Read More

An Actuary learns Machine Learning – Part 2 – Spyder/Random Forest/Hyper-Parameters

13/12/2020

 

Picture


​In which we build our first machine learning model in Python, beat our previous Excel model on our first attempt, and then fail multiple time to improve this new model…
​



Source: https://somewan.design

Read More

An Actuary learns Machine Learning – Part 1 – Kaggle/Titanic/Excel

5/12/2020

 

Picture

​
In which we enter a machine learning competition, predict who survived the titanic, build an Excel model, and then realise it performs no better than Kaggle’s ‘test submission’...

Source: https://somewan.design

Read More

Data Science, Machine Learning, Data Mining... What do they mean exactly?

14/9/2016

 


"I don't know what you mean by 'glory,' " Alice said.
Humpty Dumpty smiled contemptuously. "Of course you don't—till I tell you. I meant 'there's a nice knock-down argument for you!' "
"But 'glory' doesn't mean 'a nice knock-down argument'," Alice objected.
"When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean—neither more nor less."
"The question is," said Alice, "whether you can make words mean so many different things."
"The question is," said Humpty Dumpty, "which is to be master—that's all."

I don't think Lewis Carroll had 'Big Data' or 'Machine Learning' in mind when he penned these words, however I think the quote is quite apt in this context. All to often these buzzwords seem to fall foul to the Humpty Dumpty principle, they mean just what the speaker chooses them to mean - regardless of what the words actually mean to anyone else. So what do these terms actually mean?

Machine Learning

The field of study which investigates algorithms that give computers the ability to learn without being explicitly programmed.
 
What do we mean by ‘learn’ in this context? The definition used by Machine Learning practitioners, originally stated by  Arthur Samuel is:

 "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."

So what problems can Machine Learning algorithms be applied to? The main advances in machine learning have been in the following areas:
  • Classification - classifying data items into groups based on a training set. For example, a computer given a set of emails and told which are spam and which are not spam will be able to use machine learning to classify new emails as either spam or non-spam.
  • Cluster Analysis – identifying similarities in items in data sets without being explicitly told what to look for.
  • Computer vision – teaching computers to understand what they are seeing from visual inputs
  • Object recognition – a combination of the above three area in which a computer is able to correctly recognise objects from visual inputs.
  • Natural Language Processing – being able to correctly interpret natural languages.
  • Search Engines – taking human input to a search engine and suggesting appropriate results
  • Speech and handwriting recognition – translating speech and handwriting into written text.
​
A trait shared by all these problems is that previously computers were thought to be incapable of tackling them. This is one reason why Machine Learning is such an exciting and growing field of study.

If you'd like to know more about Machine Learning then Andrew Ng at Stanford University has released a really good free online course through Coursera which can be accessed through the following link:​
https://www.coursera.org/learn/machine-learning

Big Data

Big Data can be defined as data which conforms to the 3Vs. Big Data is available at a higher volume, higher velocity (rate at which data is generated) and/or greater variety than normal data sources.
 
So for example, looking at an insurance company, claims data would not count as Big Data, the volume will be fairly low, velocity will be slow, and variety will be fairly uniform.
 
The browsing patterns of an aggregator website on the other hand would count as Big Data. For example, the amount of time someone spends on Comparethemarket.com, their clicks, what they search for, how many searches they make, how often they return to the website before making a purchase, etc. would count as Big Data. There would be a massive volume of data to analyse and the data would be available in real time. (It wouldn’t meet the variety criteria, but that’s not a necessary condition)
 
Due to the need to extract useful information from Big Data, and the difficulties created by the 3Vs, we cannot rely on traditional methods of data analysis. Given the volume and velocity of Big Data, we require methods of analysis that does not need to be programmed explicitly, this is where Machine Learning fits in. Machine Learning in the guise of speech and handwriting recognition can also be important if the data generated is in audio form but needs to be combined with other data.

Data Mining

Data Mining is a catch all term for the process of analysing and summarising data into useful information. Data may be in the form of Big Data, and methods used may be based on Machine Learning (where the algorithm learns from the data) or may be more traditional.

Data Visualisation

Data Visualisation is the process of creating visual graphics that aid in understanding and exploring data. It has become increasingly important for two reasons, firstly, the rise in the volume of data sets means that new methods are required to understand data, secondly, an increase in computing power means that more advanced visualisation techniques are now possible.
​
Data Science

Data Science is a broad term which encompasses processes which aim to extract knowledge or insight from Data. Data science therefore includes all the previous fields.
 
For example, in carrying at an analysis, we will first collect our data, which may or may not be in the form of Big Data, we will then mine our data, possibly using machine learning, and then present our results through Data Visualisation.

    Author

    ​​I work as an actuary and underwriter at a global reinsurer in London.

    I mainly write about Maths, Finance, and Technology.
    ​
    If you would like to get in touch, then feel free to send me an email at:

    ​LewisWalshActuary@gmail.com

      Sign up to get updates when new posts are added​

    Subscribe

    RSS Feed

    Categories

    All
    Actuarial Careers/Exams
    Actuarial Modelling
    Bitcoin/Blockchain
    Book Reviews
    Economics
    Finance
    Forecasting
    Insurance
    Law
    Machine Learning
    Maths
    Misc
    Physics/Chemistry
    Poker
    Puzzles/Problems
    Statistics
    VBA

    Archives

    March 2023
    February 2023
    October 2022
    July 2022
    June 2022
    May 2022
    April 2022
    March 2022
    October 2021
    September 2021
    August 2021
    July 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    November 2020
    October 2020
    September 2020
    August 2020
    May 2020
    March 2020
    February 2020
    January 2020
    December 2019
    November 2019
    October 2019
    September 2019
    April 2019
    March 2019
    August 2018
    July 2018
    June 2018
    March 2018
    February 2018
    January 2018
    December 2017
    November 2017
    October 2017
    September 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    December 2016
    November 2016
    October 2016
    September 2016
    August 2016
    July 2016
    June 2016
    April 2016
    January 2016

  • Blog
  • Project Euler
  • Category Theory
  • Disclaimer