Category: Machine Learning

An Actuary learns Machine Learning - Part 13 - Kaggle Tabular Playground Competition - June 22

1/7/2022

In which we recreate the previous analysis, but in Python this time. And then add a new submission using Mean rather than median to impute missing values.

Source: https://somewan.design

Read More

An Actuary learns Machine Learning - Part 12 - Kaggle Tabular Playground Competition - June 22

24/6/2022

In which we start a new Kaggle competition, submit a dummy attempt, and then build a very basic Excel model to establish a baseline for future progress.

Source: https://somewan.design

Read More

An Actuary learns Machine Learning - Part 11 - Titanic revisited & Gradient Boosting Classifiers

8/10/2021

In which we try out the best performing algorithm from our house price prediction problem - Gradient Boosted Regression - on the Titanic problem, but don't actually manage to improve on our old score...

Source: https://somewan.design

Read More

An Actuary learns Machine Learning - Part 10 - More label encoding / Gradient Boosted Regression

15/2/2021

In which we correct our label encoding method from last time, try out a new algorithm - Gradient Boosted Regression - and finally managed to improve our score (by quite a lot it turns out)

Source: https://somewan.design

Read More

An Actuary learns Machine Learning - Part 9 - Cross Validation / Label Encoding / Feature Engineering

10/2/2021

In which we set up K-fold Cross Validation to assess model performance, spend quite a while tweaking our model, use hyper-parameter tuning, but then end up not actually improving our model.

Source: https://somewan.design

Read More

An Actuary learns Machine Learning - Part 8 - Data Cleaning / more Null Values / more Random Forests

6/2/2021

In which we deal with those pesky null values, add additional variables to our Random Forest model, but only actually improve our score by a marginal amount.

Source: https://somewan.design

Read More

An Actuary learns Machine Learning - Part 7 - Sub-plots /Null Values/ Random Forests

4/2/2021

In which we plot an excessive number of graphs, fix our problems with null values, re-run our algorithm, and significantly improve our accuracy.

Source: https://somewan.design

Read More

An Actuary learns Machine Learning - Part 6 - Jupyter/Regression/Kaggle house prices

2/2/2021

In which we start a new Kaggle challenge, try out a new Python IDE, build our first regression model, but most importantly - make these blog posts look much cleaner.

Source: https://somewan.design

Read More

An Actuary learns Machine Learning – Part 5 – lots of machine learning models

17/1/2021

In which we take our final stab at the titanic challenge by ‘throwing the kitchen sink’ at the problem, setting up another 5 different machine learning models and seeing if they improve our performance (hint they do not, but hopefully it's still interesting)

Source: https://somewan.design

Read More

An Actuary learns Machine Learning – Part 4 – Error correction/data cleansing/Feature Engineering

10/1/2021

In which we do more data exploration, find and then fix a mistake in our previous model, spend some time on feature engineering, and manage to set a new high-score.

Source: https://somewan.design

Read More

An Actuary learns Machine Learning – Part 3 – Automatic testing/feature importance/K-fold cross validation

23/12/2020

In which we don’t actually improve our model but we do improve our workflow - being able to check our test score ourselves, analysing the importance of each variable using an algorithm, and then using an algorithm to select the best hyper-parameters

Source: https://somewan.design

Read More

An Actuary learns Machine Learning – Part 2 – Spyder/Random Forest/Hyper-Parameters

13/12/2020

In which we build our first machine learning model in Python, beat our previous Excel model on our first attempt, and then fail multiple time to improve this new model…

Source: https://somewan.design

Read More

An Actuary learns Machine Learning – Part 1 – Kaggle/Titanic/Excel

5/12/2020

In which we enter a machine learning competition, predict who survived the titanic, build an Excel model, and then realise it performs no better than Kaggle’s ‘test submission’...

Source: https://somewan.design

Read More

Data Science, Machine Learning, Data Mining... What do they mean exactly?

14/9/2016

"I don't know what you mean by 'glory,' " Alice said.
Humpty Dumpty smiled contemptuously. "Of course you don't—till I tell you. I meant 'there's a nice knock-down argument for you!' "
"But 'glory' doesn't mean 'a nice knock-down argument'," Alice objected.
"When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean—neither more nor less."
"The question is," said Alice, "whether you can make words mean so many different things."
"The question is," said Humpty Dumpty, "which is to be master—that's all."

I don't think Lewis Carroll had 'Big Data' or 'Machine Learning' in mind when he penned these words, however I think the quote is quite apt in this context. All to often these buzzwords seem to fall foul to the Humpty Dumpty principle, they mean just what the speaker chooses them to mean - regardless of what the words actually mean to anyone else. So what do these terms actually mean?

Machine Learning

The field of study which investigates algorithms that give computers the ability to learn without being explicitly programmed.

What do we mean by ‘learn’ in this context? The definition used by Machine Learning practitioners, originally stated by Arthur Samuel is:

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."

So what problems can Machine Learning algorithms be applied to? The main advances in machine learning have been in the following areas:

Classification - classifying data items into groups based on a training set. For example, a computer given a set of emails and told which are spam and which are not spam will be able to use machine learning to classify new emails as either spam or non-spam.
Cluster Analysis – identifying similarities in items in data sets without being explicitly told what to look for.
Computer vision – teaching computers to understand what they are seeing from visual inputs
Object recognition – a combination of the above three area in which a computer is able to correctly recognise objects from visual inputs.
Natural Language Processing – being able to correctly interpret natural languages.
Search Engines – taking human input to a search engine and suggesting appropriate results
Speech and handwriting recognition – translating speech and handwriting into written text.

A trait shared by all these problems is that previously computers were thought to be incapable of tackling them. This is one reason why Machine Learning is such an exciting and growing field of study.

If you'd like to know more about Machine Learning then Andrew Ng at Stanford University has released a really good free online course through Coursera which can be accessed through the following link:

https://www.coursera.org/learn/machine-learning

Big Data

Big Data can be defined as data which conforms to the 3Vs. Big Data is available at a higher volume, higher velocity (rate at which data is generated) and/or greater variety than normal data sources.

So for example, looking at an insurance company, claims data would not count as Big Data, the volume will be fairly low, velocity will be slow, and variety will be fairly uniform.

The browsing patterns of an aggregator website on the other hand would count as Big Data. For example, the amount of time someone spends on Comparethemarket.com, their clicks, what they search for, how many searches they make, how often they return to the website before making a purchase, etc. would count as Big Data. There would be a massive volume of data to analyse and the data would be available in real time. (It wouldn’t meet the variety criteria, but that’s not a necessary condition)

Due to the need to extract useful information from Big Data, and the difficulties created by the 3Vs, we cannot rely on traditional methods of data analysis. Given the volume and velocity of Big Data, we require methods of analysis that does not need to be programmed explicitly, this is where Machine Learning fits in. Machine Learning in the guise of speech and handwriting recognition can also be important if the data generated is in audio form but needs to be combined with other data.

Data Mining

Data Mining is a catch all term for the process of analysing and summarising data into useful information. Data may be in the form of Big Data, and methods used may be based on Machine Learning (where the algorithm learns from the data) or may be more traditional.

Data Visualisation

Data Visualisation is the process of creating visual graphics that aid in understanding and exploring data. It has become increasingly important for two reasons, firstly, the rise in the volume of data sets means that new methods are required to understand data, secondly, an increase in computing power means that more advanced visualisation techniques are now possible.

Data Science

Data Science is a broad term which encompasses processes which aim to extract knowledge or insight from Data. Data science therefore includes all the previous fields.

For example, in carrying at an analysis, we will first collect our data, which may or may not be in the form of Big Data, we will then mine our data, possibly using machine learning, and then present our results through Data Visualisation.

An Actuary learns Machine Learning - Part 13 - Kaggle Tabular Playground Competition - June 22

Read More

An Actuary learns Machine Learning - Part 12 - Kaggle Tabular Playground Competition - June 22

Read More

An Actuary learns Machine Learning - Part 11 - Titanic revisited & Gradient Boosting Classifiers

Read More

An Actuary learns Machine Learning - Part 10 - More label encoding / Gradient Boosted Regression

Read More

An Actuary learns Machine Learning - Part 9 - Cross Validation / Label Encoding / Feature Engineering

Read More

An Actuary learns Machine Learning - Part 8 - Data Cleaning / more Null Values / more Random Forests

Read More

An Actuary learns Machine Learning - Part 7 - Sub-plots /Null Values/ Random Forests

Read More

An Actuary learns Machine Learning - Part 6 - Jupyter/Regression/Kaggle house prices

Read More

An Actuary learns Machine Learning – Part 5 – lots of machine learning models

Read More

An Actuary learns Machine Learning – Part 4 – Error correction/data cleansing/Feature Engineering

Read More

An Actuary learns Machine Learning – Part 3 – Automatic testing/feature importance/K-fold cross validation

Read More

An Actuary learns Machine Learning – Part 2 – Spyder/Random Forest/Hyper-Parameters

Read More

An Actuary learns Machine Learning – Part 1 – Kaggle/Titanic/Excel

Read More

Data Science, Machine Learning, Data Mining... What do they mean exactly?

Author

Sign up to get updates when new posts are added

Categories

Archives

Author

Sign up to get updates when new posts are added​

Categories

Archives

Sign up to get updates when new posts are added